python 如何从curve_fit获取置信区间

2fjabf4q  于 2024-01-05  发布在  Python
关注(0)|答案(2)|浏览(220)

我的问题涉及统计和python,我是这两个方面的初学者。我正在运行一个模拟,对于自变量(X)的每个值,我为因变量(Y)生成1000个值。我所做的是,我计算了X的每个值的Y的平均值,并使用scipy.optimize.curve_fit拟合这些平均值。曲线拟合得很好,但是我也想画置信区间。我不确定我所做的是否正确,或者我想做的是否可以做到,但我的问题是我如何从curve_fit产生的协方差矩阵中获得置信区间。代码首先从文件中读取平均值,然后简单地使用curve_fit。

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from scipy.optimize import curve_fit
  4. def readTDvsTx(L, B, P, fileformat):
  5. # L should be '_Fixed_' or '_'
  6. TD = []
  7. infile = open(fileformat.format(L, B, P), 'r')
  8. infile.readline() # To remove header
  9. for line in infile:
  10. l = line.split() # each line contains TxR followed by CD followed by TD
  11. if eval(l[0]) >= 70 and eval(l[0]) <=190:
  12. td = eval(l[2])
  13. TD.append(td)
  14. infile.close()
  15. tdArray = np.array(TD)
  16. return tdArray
  17. def rec(x, a, b):
  18. return a * (1 / (x**2)) + b
  19. fileformat = 'Densities_file{}BS{}_PRNTS{}.txt'
  20. txR = np.array(range(70, 200, 20))
  21. parents = np.array(range(1,6))
  22. disc_p1 = readTDvsTx('_Fixed_', 5, 1, fileformat)
  23. popt, pcov = curve_fit(rec, txR, disc_p1)
  24. plt.plot(txR, rec(txR, popt[0], popt[1]), 'r-')
  25. plt.plot(txR, disc_p1, '.')
  26. print(popt)
  27. plt.show()

字符串
下面是结果拟合:

bfnvny8b

bfnvny8b1#

这里有一个快速而错误的答案:您可以将ab参数的协方差矩阵的误差近似为其对角线的平方根:np.sqrt(np.diagonal(pcov))。然后可以使用参数不确定性来绘制置信区间。
答案是错误的,因为在将数据拟合到模型之前,您需要估计disc_p1平均点的误差。在求平均值时,您丢失了有关总体分散度的信息,导致curve_fit认为您输入的y点是绝对的和无可争议的。这可能会导致低估参数误差。
为了估计平均Y值的不确定性,您需要估计它们的离差度量,并将其沿着给curve_fit,同时说明您的误差是绝对的。下面是一个如何对随机数据集执行此操作的示例,其中每个点都由从正态分布中提取的1000个样本组成。

  1. from scipy.optimize import curve_fit
  2. import matplotlib.pylab as plt
  3. import numpy as np
  4. # model function
  5. func = lambda x, a, b: a * (1 / (x**2)) + b
  6. # approximating OP points
  7. n_ypoints = 7
  8. x_data = np.linspace(70, 190, n_ypoints)
  9. # approximating the original scatter in Y-data
  10. n_nested_points = 1000
  11. point_errors = 50
  12. y_data = [func(x, 4e6, -100) + np.random.normal(x, point_errors,
  13. n_nested_points) for x in x_data]
  14. # averages and dispersion of data
  15. y_means = np.array(y_data).mean(axis = 1)
  16. y_spread = np.array(y_data).std(axis = 1)
  17. best_fit_ab, covar = curve_fit(func, x_data, y_means,
  18. sigma = y_spread,
  19. absolute_sigma = True)
  20. sigma_ab = np.sqrt(np.diagonal(covar))
  21. from uncertainties import ufloat
  22. a = ufloat(best_fit_ab[0], sigma_ab[0])
  23. b = ufloat(best_fit_ab[1], sigma_ab[1])
  24. text_res = "Best fit parameters:\na = {}\nb = {}".format(a, b)
  25. print(text_res)
  26. # plotting the unaveraged data
  27. flier_kwargs = dict(marker = 'o', markerfacecolor = 'silver',
  28. markersize = 3, alpha=0.7)
  29. line_kwargs = dict(color = 'k', linewidth = 1)
  30. bp = plt.boxplot(y_data, positions = x_data,
  31. capprops = line_kwargs,
  32. boxprops = line_kwargs,
  33. whiskerprops = line_kwargs,
  34. medianprops = line_kwargs,
  35. flierprops = flier_kwargs,
  36. widths = 5,
  37. manage_ticks = False)
  38. # plotting the averaged data with calculated dispersion
  39. #plt.scatter(x_data, y_means, facecolor = 'silver', alpha = 1)
  40. #plt.errorbar(x_data, y_means, y_spread, fmt = 'none', ecolor = 'black')
  41. # plotting the model
  42. hires_x = np.linspace(50, 190, 100)
  43. plt.plot(hires_x, func(hires_x, *best_fit_ab), 'black')
  44. bound_upper = func(hires_x, *(best_fit_ab + sigma_ab))
  45. bound_lower = func(hires_x, *(best_fit_ab - sigma_ab))
  46. # plotting the confidence intervals
  47. plt.fill_between(hires_x, bound_lower, bound_upper,
  48. color = 'black', alpha = 0.15)
  49. plt.text(140, 800, text_res)
  50. plt.xlim(40, 200)
  51. plt.ylim(0, 1000)
  52. plt.show()

字符串


的数据

**编辑:**如果你没有考虑数据点的内在误差,你可能可以使用我之前提到的“快而错”的情况。协方差矩阵对角项的平方根可以用来计算你的置信区间。但是,注意,置信区间 * 已经缩小了 *,因为我们已经去掉了不确定性:

  1. from scipy.optimize import curve_fit
  2. import matplotlib.pylab as plt
  3. import numpy as np
  4. func = lambda x, a, b: a * (1 / (x**2)) + b
  5. n_ypoints = 7
  6. x_data = np.linspace(70, 190, n_ypoints)
  7. y_data = np.array([786.31, 487.27, 341.78, 265.49,
  8. 224.76, 208.04, 200.22])
  9. best_fit_ab, covar = curve_fit(func, x_data, y_data)
  10. sigma_ab = np.sqrt(np.diagonal(covar))
  11. # an easy way to properly format parameter errors
  12. from uncertainties import ufloat
  13. a = ufloat(best_fit_ab[0], sigma_ab[0])
  14. b = ufloat(best_fit_ab[1], sigma_ab[1])
  15. text_res = "Best fit parameters:\na = {}\nb = {}".format(a, b)
  16. print(text_res)
  17. plt.scatter(x_data, y_data, facecolor = 'silver',
  18. edgecolor = 'k', s = 10, alpha = 1)
  19. # plotting the model
  20. hires_x = np.linspace(50, 200, 100)
  21. plt.plot(hires_x, func(hires_x, *best_fit_ab), 'black')
  22. bound_upper = func(hires_x, *(best_fit_ab + sigma_ab))
  23. bound_lower = func(hires_x, *(best_fit_ab - sigma_ab))
  24. # plotting the confidence intervals
  25. plt.fill_between(hires_x, bound_lower, bound_upper,
  26. color = 'black', alpha = 0.15)
  27. plt.text(140, 630, text_res)
  28. plt.xlim(60, 200)
  29. plt.ylim(0, 800)
  30. plt.show()



如果你不确定是否要包含绝对误差,或者在你的案例中如何估计它们,你最好在Cross Validated上寻求建议,因为Stack Overflow主要是讨论回归方法的实现,而不是讨论底层的统计数据。

展开查看全部
uttx8gqw

uttx8gqw2#

这里是我写的一些Python笔记本和Python脚本的链接,这些脚本展示了如何使用scipy.optimize.curve_fit或lmfit的最佳参数和协方差矩阵的输出来使用delta方法计算置信区间和预测区间:
https://github.com/gjpelletier/delta_method
Here is an example of what the confidence intervals and prediction intervals from the delta method look like

相关问题