scipy 不规则间隔点高斯和滤波

jm2pwxwz  于 2022-11-23  发布在  其他
关注(0)|答案(3)|浏览(170)

我有一组点(x,y)作为两个向量x,y,例如:

from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')

现在,我想使用高斯函数对数据进行平滑处理,并仅在x轴上的某些点(间隔规则)对其进行评估。

x_eval = linspace(0,1,11)

我得到的提示是这个方法被称为“高斯和过滤器”,但到目前为止我还没有在numpy/scipy中找到任何实现,尽管乍一看这似乎是一个标准问题。由于x值不是等间距的,我不能使用scipy.ndimage.gaussian_filter1d。
通常这种平滑是通过毛皮空间和核相乘来完成的,但我真的不知道这是否可能与不规则间隔的数据。
谢谢你的建议

xzv2uavs

xzv2uavs1#

这对于非常大的数据集来说会很麻烦,但是您所要求的正确计算将按如下方式进行:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0) # for repeatability
x = np.random.rand(30)
x.sort()
y = np.random.rand(30)

x_eval = np.linspace(0, 1, 11)
sigma = 0.1

delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval = np.dot(weights, y)

plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
plt.show()

ctrmrzij

ctrmrzij2#

在回答之前,我会说这是一个DSP问题,而不是编程问题...
......话虽如此,您的问题有一个简单的两步解决方案。

步骤1:重新采样数据

因此,为了说明这一点,我们可以创建一个具有不等采样的随机数据集:

import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)

这给出了如下结果:

我们可以使用简单的线性插值对该数据进行重新采样:

nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x

这将我们的数据转换为:

步骤2:应用筛选器

在此阶段,您可以使用scipy提供的一些工具,将高斯滤波器应用于具有给定sigma值的数据:

import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)

将其与原始数据进行对比:

sigma值的选择决定了过滤器的宽度。

w46czmvw

w46czmvw3#

根据@Jaime的回答,我编写了一个函数,它通过一些附加文档实现了这一点,并能够丢弃远离数据点的估计值。
我认为置信区间可以通过自举法得到,但我还没有这样做。

def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
    """Apply gaussian sum filter to data.
    
    xdata, ydata : array
        Arrays of x- and y-coordinates of data. 
        Must be 1d and have the same length.
    
    xeval : array
        Array of x-coordinates at which to evaluate the smoothed result
    
    sigma : float
        Standard deviation of the Gaussian to apply to each data point
        Larger values yield a smoother curve.
    
    null_thresh : float
        For evaluation points far from data points, the estimate will be
        based on very little data. If the total weight is below this threshold,
        return np.nan at this location. Zero means always return an estimate.
        The default of 0.6 corresponds to approximately one sigma away 
        from the nearest datapoint.
    """
    # Distance between every combination of xdata and xeval
    # each row corresponds to a value in xeval
    # each col corresponds to a value in xdata
    delta_x = xeval[:, None] - xdata

    # Calculate weight of every value in delta_x using Gaussian
    # Maximum weight is 1.0 where delta_x is 0
    weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))

    # Multiply each weight by every data point, and sum over data points
    smoothed = np.dot(weights, ydata)

    # Nullify the result when the total weight is below threshold
    # This happens at evaluation points far from any data
    # 1-sigma away from a data point has a weight of ~0.6
    nan_mask = weights.sum(1) < null_thresh
    smoothed[nan_mask] = np.nan
    
    # Normalize by dividing by the total weight at each evaluation point
    # Nullification above avoids divide by zero warning shere
    smoothed = smoothed / weights.sum(1)

    
    return smoothed

相关问题