如何拟合用scipy.stats.rv_continuous定义的分布？

wpcxdonn 于 2022-11-10 发布在其他

关注(0)|答案(1)|浏览(117)

我想用python中的分布组合来拟合数据，最合理的方式似乎是通过scipy.stats.rv_continuous。我可以使用这个类定义一个新的分布，并拟合一些人工数据，但是拟合产生的变量比分布的自由参数多2个，我不知道如何解释这些变量。此外，拟合非常慢，所以任何关于如何加快速度的建议都将受到高度赞赏。
下面是一个最小可重复性的例子（为了回答这个问题，我将使用正态分布和对数正态分布的组合）：

import numpy as np
import scipy.stats as stats

# Create the new distribution combining a normal and lognormal distr

def lognorm(x,s,loc,scale):
    return(stats.lognorm.pdf(x, s = s, loc = loc, scale = scale))
def norm(x,loc,scale):
    return(stats.norm.pdf(x, loc = loc, scale = scale))

class combo_dist_gen(stats.rv_continuous):
    "Gaussian and lognormal combination"
    def _pdf(self, x,  s1, loc1, scale1, loc2, scale2):
        return (lognorm(x, s1, loc1, scale1) + norm(x, loc2, scale2))

combo_dist = combo_dist_gen(name='combo_dist')

# Generate some artificial data

gen_data = np.append(stats.norm.rvs(loc=0.2, scale=0.1, size=5000),\
    stats.lognorm.rvs(size=5000, s=0.1, loc=0.2, scale=0.5))

# Fit the data with the new distribution

# I provide initial values not too far from the original distribution

Fit_results = combo_dist.fit(gen_data, 0.15, 0.15, 0.6, 0.25, 0.05)

一部分来自非常缓慢的拟合似乎工作，然而它返回7个变量，而原始分布只有5个自由参数：

print(Fit_results)
(0.0608036989522803, 0.030858042734341062, 0.9475658421131599, 0.4083398045761335, 0.11227588564167855, -0.15941656336149485, 0.8806248445561231)

我不明白这两个额外的变量是什么，以及它们是如何进入分布的定义的。
如果我使用拟合结果生成新的pdf，我可以很好地再现原始分布，但仅使用所有7个变量：

xvals = np.linspace(-1,3, 1000)
gen_data_pdf = (lognorm(xvals,0.1, 0.2, 0.5)+norm(x, 0.2,0.1))
ydata1 = combo_dist.pdf(xvals,*Fit_results)
ydata2 = combo_dist.pdf(xvals,*Fit_results[:5])

plt.figure()
plt.plot(xvals, gen_data_pdf, label = 'Original distribution')
plt.plot(xvals, ydata1, label = 'Fitted distribution, all parameters')
plt.plot(xvals, ydata2, label = 'Fitted distribution, only first 5 parameters')

plt.legend()

p.s.1官方文档对我来说有点晦涩，似乎没有提供任何有用的例子。在SO上有一些答案提供了一些解释（如here和here），但似乎没有一个解决我的问题。
p.s.2我知道组合分布的pdf未被归一化为1。在我最初的实现中，我将pdf除以2，但由于某种原因，额外的除法拟合不起作用（RuntimeError，无收敛）

scipy

来源：https://stackoverflow.com/questions/71573674/how-to-fit-a-distribution-defined-with-scipy-stats-rv-continuous

1条答案

按热度按时间

slwdgvem1#

这两个变量是loc和scale参数，用于根据文档对分布进行平移和缩放。只需通过以下方式固定值：

Fit_results = combo_dist.fit(gen_data, 0.15, 0.15, 0.6, 0.25, 0.05,
                             floc=0, fscale=1)

赞(0）回复(0）举报 2022-11-10

我来回答

如何拟合用scipy.stats.rv_continuous定义的分布？

1条答案

相关问题

热门标签

最新问答