scipy 向量化多元正态分布计算

kqhtkvqz 于 2023-03-30 发布在其他

关注(0)|答案(1)|浏览(181)

我在3D空间中有n个点，每个点都有一个相应的猜测和确定性。我想计算每个点的多元正态分布，给出它的猜测和确定性。目前，我使用迭代方法和Scipy.statsmultivariate_normal函数，如下面的代码片段所示：

import numpy as np
from scipy.stats import multivariate_normal

n = 10

def create_points(n):
    return np.random.randint(0, 1000, size=(n, 3))

real = create_points(n)
guess = create_points(n)
uncertainties = np.random.randint(1, 100, size=n)

def iterative_scoring_scipy(real, guess, uncertainties):
    score = 0.0
    covariances = [
        np.diag([uncertainties[i]**2]*3)
        for i in range(len(real))
    ]
    for i in range(n):
        score += multivariate_normal.pdf(real[i], mean=guess[i], cov=covariances[i])

    return score

print(iterative_scoring_scipy(real, guess, uncertainties))

下面是一个不使用scipy而是使用numpy的尝试：

def iterative_scoring_numpy(real, guess, uncertainties):
    score = 0.0
    for i in range(n):
        # calculate the covariance matrix
        cov = np.diag([uncertainties[i]**2]*3)

        # calculate the determinant and inverse of the covariance matrix
        det = np.linalg.det(cov)
        inv = np.linalg.inv(cov)

        # calculate the difference between the real and guess points
        diff = real[i] - guess[i]

        # calculate the exponent
        exponent = -0.5 * np.dot(np.dot(diff, inv), diff)

        # calculate the constant factor
        const = 1 / np.sqrt((2 * np.pi)**3 * det)

        # calculate the probability density function value
        pdf = const * np.exp(exponent)

        # add the probability density function value to the score
        score += pdf

    return score

print(iterative_scoring_numpy(real, guess, uncertainties))

然而，这两种方法都很慢，我正在寻找一种方法来加快它使用矢量化。我如何矢量化任何代码片段，使它更快？

scipy

来源：https://stackoverflow.com/questions/75841918/vectorizing-multivariate-normal-distribution-calculation

1条答案

按热度按时间

oxosxuxt1#

multivariate_normal.pdf似乎没有被向量化。即使它被向量化了，它的代码也非常通用，效率很低（它会进行许多重复的检查，调用支持ND数组的函数，并支持代码中无论如何都不会发生的极端情况）。您的第二次尝试是一个很好的尝试，但Numpy函数在小向量/矩阵上操作，而它们可以在整个数据集上操作。简短地说，整个循环可以向量化。我选择阅读SciPy代码，以便编写我自己的multivariate_normal.pdf的向量化实现，仅支持您的输入类型（即3D点）。以下是结果：

# [...] same as in the question

# Vectorized equivalent to multivariate_normal.pdf
def mvn_pdf(x, mean, cov):
    mean = np.asarray(mean, dtype=float)
    cov = np.asarray(cov, dtype=float)

    assert x.ndim == 2 and x.shape[1] == 3
    assert mean.ndim == 2 and mean.shape[1] == 3
    assert cov.ndim == 3 and cov.shape[1:] == (3, 3)

    s, u = np.linalg.eigh(cov)
    eps = 2.22e-10 * np.max(np.abs(s), axis=1)

    # Each covariance matrix must be symmetric positive definite
    assert np.all(abs(s) > eps[:, None])

    # The rest of the code is unsafe if this is not true 
    # It enable further optimizations
    assert np.allclose(u, np.identity(3))

    log_pdet = np.sum(np.log(s), axis=1)

    maha = np.sum((x - mean)**2 / s, axis=1)
    log_2pi = np.log(2 * np.pi)
    logpdf = -0.5 * (3 * log_2pi + log_pdet + maha)

    return np.exp(logpdf)

def fast_iterative_scoring_scipy(real, guess, uncertainties):
    covariances = (np.identity(3) * (uncertainties ** 2)[:,None,None])
    score = np.sum(mvn_pdf(np.array(real), np.array(guess), covariances))
    return score

print(iterative_scoring_numpy(real, guess, uncertainties))

在我的机器（处理器i5- 9600 KF，Numpy 1.22.4和SciPy 1.6.3）上，n =10时，速度大约快10倍，n=100时，速度大约快50倍。输入越大，加速越好。
请注意，如果你确信Assert在你的情况下总是为真，你可以安全地删除Assert。如果没有Assert，这应该会快**两倍。

赞(0）回复(0）举报 2023-03-30

我来回答

scipy 向量化多元正态分布计算

1条答案

相关问题

热门标签

最新问答