我尝试在Python中实现Polya-Aeppli(几何泊松)分布,通过将scipy.stats.rv_discrete类子类化来指定PMF。
我已经实现了如下:
from scipy.stats import rv_discrete
import numpy as np
import math
class PolyaAeppli(rv_discrete):
def _pmf(self, k: np.ndarray, lambda_, theta_) -> np.ndarray:
"""
Probability mass function for Polya-Aeppli distribution. Extension of Poisson distribution for
moddeling group arrivals.
:param k: internal parameter of rv_discrete
:param lambda_: arrival rate param of Poisson dist: [0, inf). The higher the lambda, the more arrivals
:param theta_: probability param of Geometric dist: [0, 1]. The LOWER the theta, the more arrivals
:return: probability values for each k
"""
if isinstance(lambda_, np.ndarray or list):
lambda_ = lambda_[0]. # depending on the situation, this is either a number or a list of the same number, so correct accordingly
if isinstance(theta_, np.ndarray or list):
theta_ = theta_[0]
k = np.asarray(k, dtype=int)
res = np.zeros(len(k))
for ix, k_ in enumerate(k): # sorry, cannot vectorize
if k_ == 0:
res[ix] = np.exp(-lambda_)
else:
res[ix] = np.exp(-lambda_) * np.sum([(np.power(lambda_, i) / math.factorial(i)) *
((1 - theta_) ** (k_ - i)) * (theta_ ** i) *
(comb(k_ - 1, i - 1)) for i in range(1, k_ + 1)])
return res
这有点古怪,因为k
在馈送到_pmf
时是np.ndarray
,但这对range()
不起作用。数学公式似乎是正确的,但是当绘制具有不同参数的样本时,结果似乎不正确:
PA = PolyaAeppli(name='polya_aeppli')
# sample from distribution
resa = PA.rvs(lambda_=0.5, theta_=0.5, size=1000)
resb = PA.rvs(lambda_=0.5, theta_=0.9, size=1000)
resc = PA.rvs(lambda_=0.9, theta_=0.5, size=1000)
resd = PA.rvs(lambda_=0.9, theta_=0.9, size=1000)
# plot histogram
plt.hist(resa, label='a', alpha=0.5)
plt.hist(resb, label='b', alpha=0.5)
plt.hist(resc, label='c', alpha=0.5)
plt.hist(resd, label='d', alpha=0.5)
plt.legend()
plt.show()
polya-Aeppli results
我预计d
的平均到达人数最多,因为团队规模和到达率都是最高的。但情况似乎并非如此。
是我的代码有问题,还是我误解了分布/结果?
谢谢!
1条答案
按热度按时间dzjeubhm1#
好吧,我认为这个公式实际上是正确的!我只是对Theta参数感到困惑。结果表明,每次到达的预期组大小随着Theta的减小而增长。所以一切都很好!这里只留下这个,以防将来有人需要在Python中实现Polya-Aeppli PMF。
(And如果有人能改进这个非常古怪代码,那就太好了)。