Scipy -如何改进此曲线拟合-找到正确的函数

qoefvg9y 于 2023-06-06 发布在其他

关注(0)|答案(2)|浏览(184)

我试图找到两个变量（pv_ratio，battery_ratio）和第三个变量“value”之间的关系。这两个比率的范围从0到5，每0.0625点（81 x81 =6561点），并且“值”福尔斯在[0，1]内。
可以找到csv here，看起来像这样：

battery_ratio   pv_ratio    value
0   0.0000  0   1
1   0.0625  0   1
2   0.1250  0   1
3   0.1875  0   1
4   0.2500  0   1
5   0.3125  0   1
6   0.3750  0   1
7   0.4375  0   1
8   0.5000  0   1
9   0.5625  0   1

这些图给予了我的变量之间的关系：

下面是我的代码来拟合我的曲线，使用sicpy.optimize.curve_fit并寻找指数关系。此代码片段将csv读入pandas df，找到f函数的最佳参数，绘制结果并为拟合给出分数。
我一直在以迭代的方式工作，尝试了许多f的公式，一点一点地提高分数。

from scipy.optimize import curve_fit
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (14.0, 8.0)

def f(X, a, b, c, d, e):
# the function I came up with after some trials, and which I'm trying to improve
    bt_r = X[:,0]  #battery_ratio
    pv_r = X[:,1] #pv_ratio
    return  (1 - a * np.exp(- e * pv_r ** b)) * np.exp(- (d ** bt_r) * c * pv_r)

def fit():
#find optimal parameters and score fit
    X = df[variables].values
    y = df.value.values
    popt, pcov = curve_fit(f, X, y)
    y_real, y_fit = df['value'], f(df[variables].values, *popt)
    score = np.round(np.sum(((y_fit - y_real)**2)),1)
    return popt, score
        
def check_fit(values):
    #Plot (y_fit, y) for all subsets
    def plot_subset(ax, variable, r_value):
        """Scatter plot (y_fit and y) against 'variable' with the other variable set at ratio
        - variable : string ['pv_ratio', 'battery_ratio']
        - r_value : float 
        """
        # ratio stands for the second variable which is fixed
        ratio = list(set(variables) - set([variable]))[0]
        df_ = df.query("{} == {}".format(ratio, r_value))

        # plot y and y fit
        y_real, y_fit = df_['value'], f(df_[variables].values, *popt)
        for y, c in zip([y_real, y_fit], ['b', 'r']):        
            ax.scatter(df_[variable], y, color=c, s=10, alpha=0.95)
        ax.set_title('{} = {}'.format(ratio, r_value))

    fig, ax = plt.subplots(nrows=2, ncols=len(values), sharex=True, sharey=True)
    for icol, r_value in enumerate(values):
        plot_subset(ax[0, icol], 'pv_ratio', r_value)
        plot_subset(ax[1, icol], 'battery_ratio', r_value)
        
    fig.tight_layout()
    print 'Score: {}'.format(score)
    

df = pd.read_csv('data.csv', index_col=0)
variables = ['battery_ratio', 'pv_ratio']
popt, score = fit()
check_fit([0,3,5]) #plot y_real and y_fit for these ratios

上面的代码生成了下面的图片（蓝色：真实的，red：fit），并为fit打分。

我能得到的最好分数（=sum((y_real - y_fit)²/len(y))）是9.3e-4，在实践中仍然不是很好，特别是在爬坡阶段。
我现在被困在一个点上，反复尝试的过程显示出它的局限性。我应该如何更快、更有效地设计我的试衣功能？我能得到比6.1更好的分数吗？

scipy

来源：https://stackoverflow.com/questions/35723204/scipy-how-can-i-improve-this-curve-fitting-finding-the-right-function

2条答案

按热度按时间

f8rj6qna1#

这与Python无关，您希望将数据放入曲面中。
恢复数据。做一个1/x的值，并作出趋势线，一行一行。你做到了

赞(0）回复(0）举报 2023-06-06

gjmwrych2#

正如@jon-custer所建议的，我尝试了n多项式拟合。我的代码是this SO answer的一个稍微修改的版本。

import itertools
import numpy as np
import matplotlib.pyplot as plt

def polyfit2d(data, order=3):
    x = data.pv_ratio
    y = data.battery_ratio
    z = data.value
    
    ncols = (order + 1)**2
    G = np.zeros((x.size, ncols))
    
    ij = itertools.product(range(order+1), range(order+1))
    for k, (i,j) in enumerate(ij):
        G[:,k] = x**i * y**j
    m, _, _, _ = np.linalg.lstsq(G, z)
    
    y['fit'] = polyval2d(x, y, m)
    return m, y_fit

def polyval2d(x, y, m):
    order = int(np.sqrt(len(m))) - 1
    ij = itertools.product(range(order+1), range(order+1))
    z = np.zeros_like(x)
    for a, (i,j) in zip(m, ij):
        z += a * x**i * y**j  
    return z

m, y_fit = polyfit2d(df, 7)

上图显示了最大残差和归一化分数。我得到的最好结果是7次多项式。我的分数下降到~6.4e-5，残差永远不会超过5.5%，这是一个我很好的准确性。

赞(0）回复(0）举报 2023-06-06

我来回答

Scipy -如何改进此曲线拟合-找到正确的函数

2条答案

相关问题

热门标签

最新问答