scipy 如何在Python中使用rv_discrete实现Polya-Aeppli分布?

qnzebej0  于 2023-05-17  发布在  Python
关注(0)|答案(1)|浏览(118)

我尝试在Python中实现Polya-Aeppli(几何泊松)分布,通过将scipy.stats.rv_discrete类子类化来指定PMF。
我已经实现了如下:

from scipy.stats import rv_discrete
import numpy as np
import math

class PolyaAeppli(rv_discrete):

    def _pmf(self, k: np.ndarray, lambda_, theta_) -> np.ndarray:
        """
        Probability mass function for Polya-Aeppli distribution. Extension of Poisson distribution for
        moddeling group arrivals.

        :param k: internal parameter of rv_discrete
        :param lambda_: arrival rate param of Poisson dist: [0, inf). The higher the lambda, the more arrivals
        :param theta_: probability param of Geometric dist: [0, 1]. The LOWER the theta, the more arrivals
        :return: probability values for each k
        """

        if isinstance(lambda_, np.ndarray or list):
            lambda_ = lambda_[0]. # depending on the situation, this is either a number or a list of the same number, so correct accordingly
        if isinstance(theta_, np.ndarray or list):
            theta_ = theta_[0]
        k = np.asarray(k, dtype=int)
        res = np.zeros(len(k))
        for ix, k_ in enumerate(k):  # sorry, cannot vectorize
            if k_ == 0:
                res[ix] = np.exp(-lambda_)
            else:
                res[ix] = np.exp(-lambda_) * np.sum([(np.power(lambda_, i) / math.factorial(i)) *
                                                     ((1 - theta_) ** (k_ - i)) * (theta_ ** i) *
                                                     (comb(k_ - 1, i - 1)) for i in range(1, k_ + 1)])
        return res

这有点古怪,因为k在馈送到_pmf时是np.ndarray,但这对range()不起作用。数学公式似乎是正确的,但是当绘制具有不同参数的样本时,结果似乎不正确:

PA = PolyaAeppli(name='polya_aeppli')

    # sample from distribution
    resa = PA.rvs(lambda_=0.5, theta_=0.5, size=1000)
    resb = PA.rvs(lambda_=0.5, theta_=0.9, size=1000)
    resc = PA.rvs(lambda_=0.9, theta_=0.5, size=1000)
    resd = PA.rvs(lambda_=0.9, theta_=0.9, size=1000)

    # plot histogram

    plt.hist(resa, label='a', alpha=0.5)
    plt.hist(resb, label='b', alpha=0.5)
    plt.hist(resc, label='c', alpha=0.5)
    plt.hist(resd, label='d', alpha=0.5)
    plt.legend()
    plt.show()

polya-Aeppli results
我预计d的平均到达人数最多,因为团队规模和到达率都是最高的。但情况似乎并非如此。
是我的代码有问题,还是我误解了分布/结果?
谢谢!

dzjeubhm

dzjeubhm1#

好吧,我认为这个公式实际上是正确的!我只是对Theta参数感到困惑。结果表明,每次到达的预期组大小随着Theta的减小而增长。所以一切都很好!这里只留下这个,以防将来有人需要在Python中实现Polya-Aeppli PMF。
(And如果有人能改进这个非常古怪代码,那就太好了)。

相关问题