scipy 向量化多元正态分布计算

kqhtkvqz  于 2023-03-30  发布在  其他
关注(0)|答案(1)|浏览(181)

我在3D空间中有n个点,每个点都有一个相应的猜测和确定性。我想计算每个点的多元正态分布,给出它的猜测和确定性。目前,我使用迭代方法和Scipy.statsmultivariate_normal函数,如下面的代码片段所示:

import numpy as np
from scipy.stats import multivariate_normal

n = 10

def create_points(n):
    return np.random.randint(0, 1000, size=(n, 3))

real = create_points(n)
guess = create_points(n)
uncertainties = np.random.randint(1, 100, size=n)

def iterative_scoring_scipy(real, guess, uncertainties):
    score = 0.0
    covariances = [
        np.diag([uncertainties[i]**2]*3)
        for i in range(len(real))
    ]
    for i in range(n):
        score += multivariate_normal.pdf(real[i], mean=guess[i], cov=covariances[i])

    return score

print(iterative_scoring_scipy(real, guess, uncertainties))

下面是一个不使用scipy而是使用numpy的尝试:

def iterative_scoring_numpy(real, guess, uncertainties):
    score = 0.0
    for i in range(n):
        # calculate the covariance matrix
        cov = np.diag([uncertainties[i]**2]*3)

        # calculate the determinant and inverse of the covariance matrix
        det = np.linalg.det(cov)
        inv = np.linalg.inv(cov)

        # calculate the difference between the real and guess points
        diff = real[i] - guess[i]

        # calculate the exponent
        exponent = -0.5 * np.dot(np.dot(diff, inv), diff)

        # calculate the constant factor
        const = 1 / np.sqrt((2 * np.pi)**3 * det)

        # calculate the probability density function value
        pdf = const * np.exp(exponent)

        # add the probability density function value to the score
        score += pdf

    return score

print(iterative_scoring_numpy(real, guess, uncertainties))

然而,这两种方法都很慢,我正在寻找一种方法来加快它使用矢量化。我如何矢量化任何代码片段,使它更快?

oxosxuxt

oxosxuxt1#

multivariate_normal.pdf似乎没有被向量化。即使它被向量化了,它的代码也非常通用,效率很低(它会进行许多重复的检查,调用支持ND数组的函数,并支持代码中无论如何都不会发生的极端情况)。您的第二次尝试是一个很好的尝试,但Numpy函数在小向量/矩阵上操作,而它们可以在整个数据集上操作。简短地说,整个循环可以向量化。我选择阅读SciPy代码,以便编写我自己的multivariate_normal.pdf的向量化实现,仅支持您的输入类型(即3D点)。以下是结果:

# [...] same as in the question

# Vectorized equivalent to multivariate_normal.pdf
def mvn_pdf(x, mean, cov):
    mean = np.asarray(mean, dtype=float)
    cov = np.asarray(cov, dtype=float)

    assert x.ndim == 2 and x.shape[1] == 3
    assert mean.ndim == 2 and mean.shape[1] == 3
    assert cov.ndim == 3 and cov.shape[1:] == (3, 3)

    s, u = np.linalg.eigh(cov)
    eps = 2.22e-10 * np.max(np.abs(s), axis=1)

    # Each covariance matrix must be symmetric positive definite
    assert np.all(abs(s) > eps[:, None])

    # The rest of the code is unsafe if this is not true 
    # It enable further optimizations
    assert np.allclose(u, np.identity(3))

    log_pdet = np.sum(np.log(s), axis=1)

    maha = np.sum((x - mean)**2 / s, axis=1)
    log_2pi = np.log(2 * np.pi)
    logpdf = -0.5 * (3 * log_2pi + log_pdet + maha)

    return np.exp(logpdf)

def fast_iterative_scoring_scipy(real, guess, uncertainties):
    covariances = (np.identity(3) * (uncertainties ** 2)[:,None,None])
    score = np.sum(mvn_pdf(np.array(real), np.array(guess), covariances))
    return score

print(iterative_scoring_numpy(real, guess, uncertainties))

在我的机器(处理器i5- 9600 KF,Numpy 1.22.4和SciPy 1.6.3)上,n =10时,速度大约快10倍,n=100时,速度大约快50倍。输入越大,加速越好。
请注意,如果你确信Assert在你的情况下总是为真,你可以安全地删除Assert。如果没有Assert,这应该会快**两倍。

相关问题