使用SciPy更好地理解对数正态分布

2mbi3lxu 于 2023-08-05 发布在其他

关注(0)|答案(1)|浏览(96)

我知道在scipy中有很多关于对数正态分布的问题，如this，this，this和this，但我仍然有疑问。
我试图用SciPy重现this example，因为我能理解其中的步骤，但我不能。
数据是：

from scipy.stats import lognorm, norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

x = [20, 22, 25, 30, 60]
fig,ax = plt.subplots(1,1)
sns.kdeplot(x, color='blue',fill=False,ax=ax)

字符串
我想拟合一个对数正态分布：

shape_x, loc_x, scale_x = lognorm.fit(x,floc=0)
print(f'Estimated parameters for log-normal distribution of parameter x:')
print(f'Shape (s) of x: {shape_x}')
print(f'Location (loc) of x: {loc_x}')
print(f'Scale (scale) of x: {scale_x}')

型
根据StackOverflow和scipy文档中的其他问题，平均值和标准差应为：

mu_x = np.log(scale_x)
sigma_x = shape_x
print(f'Mean (μ) of x: {mu_x}')
print(f'Standard deviation (σ) of x: {sigma_x}')

型
接下来，我尝试使用这些参数创建合成数据，以检查：

synthetic_data_B = np.random.lognormal(mean=mu_x, sigma=sigma_x, size=len(x))
pdf_x = lognorm.pdf(x, s = shape_x, loc=loc_x, scale=scale_x)

fig,ax = plt.subplots(1,1)
sns.kdeplot(x, color='blue',fill=False,ax=ax)
sns.kdeplot(synthetic_data_B, color='red',fill=False,ax=ax)
ax.plot(x,pdf_x,color='green')

型

的数据
我意识到：

文章中的中位数是scipy的尺度参数。
本文中的μ是我的mu_x = np·log（scale_x），但σ是不同的，在文章中是0.437，而scipy给出了0.391。
如果我用lognorm.mean（shape_x，loc_x，scale_x）计算平均值，它与文章非常相似。
如果我用lognorm.std（shape_x，loc_x，scale_x）计算标准差，它会给出不同的结果。

我的问题是：

为什么σ不同？
用拟合参数预测的合成数据与原始数据不匹配，为什么？
如果我试图做相反的事情，从拟合的参数中恢复x分布，我得到的是应该是的球道。
如何生成合成数据来表示真实的x？

scipy

来源：https://stackoverflow.com/questions/76811839/better-understanding-log-normal-with-scipy

1条答案

按热度按时间

rhfm7lfc1#

问题是，scipy fit返回的内容不包括贝塞尔校正。
非常容易检查

import numpy as np
from scipy.stats import lognorm

x = [20, 22, 25, 30, 60]
shape_x, loc_x, scale_x = lognorm.fit(x,floc=0)
print(f'Estimated parameters for log-normal distribution of parameter x:')
print(f'Shape (s) of x: {shape_x}')
print(f'Location (loc) of x: {loc_x}')
print(f'Scale (scale) of x: {scale_x}')

mu_x = np.log(scale_x)
sigma_x = shape_x
print(f'Mean (μ) of x: {mu_x}')
print(f'Standard deviation (σ) of x: {sigma_x}')

lnx = np.log(x)
q = lnx-mu_x

t = np.sqrt(np.sum(q*q)/len(q))
print(t)

字符串
最后一行将打印0.3913832002383578，这与scipy fit返回值相同。
您可以轻松地使用人工样本进行复查：

r = lognorm.rvs(sigma_x, loc=0.0, scale=scale_x, size=10000)
 shape_x, loc_x, scale_x = lognorm.fit(r, floc=0)
 print(shape_x, loc_x, scale_x)

型
对我来说它打印

0.3912912820809421 0 28.8544068486573

型
这是与之前相同的值（好的，模统计噪声）

赞(0）回复(0）举报 2023-08-05

我来回答

使用SciPy更好地理解对数正态分布

1条答案

相关问题

热门标签

最新问答