scipy python中具有缺失数据的二维卷积

j5fpnvbx  于 2022-11-09  发布在  Python
关注(0)|答案(3)|浏览(217)

我知道有scipy.signal.convolve2d函数可以处理2d numpy数组的二维卷积,还有numpy.ma模块可以处理丢失的数据,但这两种方法似乎并不兼容(这意味着即使你在numpy中屏蔽了一个二维数组,convolve2d中的进程不会受到影响)。有没有办法只使用numpy和scipy包来处理卷积中缺少的值?
例如:
所需的卷积结果(数组、内核、边界='wrap'):

-1  - -1 -1 4
               -1 -1  - -1 4
    Result =   -1 -1 -1  - 5
                - -1 -1  4 4
                1 -1 -1 -1 -

感谢Aguy的建议,这是一个很好的方法来帮助计算卷积后的结果。现在让我们假设我们可以从Array.mask中得到Array的掩码,这将给予的结果为

False True  False False False                       
                   False False True  False False
    Array.mask ==  False False False True  False
                   True  False False False False
                   False False False False True

如何使用此掩码将卷积后的结果转换为掩码数组?

kxxlusnw

kxxlusnw1#

我不认为用0来代替是正确的方法,你是在把卷积值推向0。这些缺失应该被视为“缺失”。因为它们代表了缺失的信息片段,没有理由假设它们可能是0,而且它们根本不应该参与任何计算。
我试着将缺失值设置为numpy.nan,然后进行卷积,结果表明,内核和任何缺失值之间的任何重叠都会在结果中给出nan,即使重叠是在内核中的0,所以在结果中会得到一个扩大的缺失孔。根据您的应用程序,这可能是您想要的结果。
但是在某些情况下,您不想因为一个单一的缺失而丢弃这么多的信息(也许〈= 50%的缺失仍然是可以接受的)。在这种情况下,我发现了另一个模块 astropy,它有更好的实现:numpy.nan s被忽略(或替换为插值?)。
因此,使用astropy时,您将执行以下操作:

from astropy.convolution import convolve
inarray=numpy.where(inarray.mask,numpy.nan,inarray) # masking still doesn't work, has to set to numpy.nan
result=convolve(inarray,kernel)

但是,您仍然无法控制允许多少丢失。为了实现这一点,我创建了一个函数,它使用scipy.ndimage.convolve()作为初始卷积,但每当涉及丢失(numpy.nan)时,都要手动重新计算值:

def convolve2d(slab,kernel,max_missing=0.5,verbose=True):
    '''2D convolution with missings ignored

    <slab>: 2d array. Input array to convolve. Can have numpy.nan or masked values.
    <kernel>: 2d array, convolution kernel, must have sizes as odd numbers.
    <max_missing>: float in (0,1), max percentage of missing in each convolution
                   window is tolerated before a missing is placed in the result.

    Return <result>: 2d array, convolution result. Missings are represented as
                     numpy.nans if they are in <slab>, or masked if they are masked
                     in <slab>.

    '''

    from scipy.ndimage import convolve as sciconvolve

    assert numpy.ndim(slab)==2, "<slab> needs to be 2D."
    assert numpy.ndim(kernel)==2, "<kernel> needs to be 2D."
    assert kernel.shape[0]%2==1 and kernel.shape[1]%2==1, "<kernel> shape needs to be an odd number."
    assert max_missing > 0 and max_missing < 1, "<max_missing> needs to be a float in (0,1)."

    #--------------Get mask for missings--------------
    if not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab))==False:
        has_missing=False
        slab2=slab.copy()

    elif not hasattr(slab,'mask') and numpy.any(numpy.isnan(slab)):
        has_missing=True
        slabmask=numpy.where(numpy.isnan(slab),1,0)
        slab2=slab.copy()
        missing_as='nan'

    elif (slab.mask.size==1 and slab.mask==False) or numpy.any(slab.mask)==False:
        has_missing=False
        slab2=slab.copy()

    elif not (slab.mask.size==1 and slab.mask==False) and numpy.any(slab.mask):
        has_missing=True
        slabmask=numpy.where(slab.mask,1,0)
        slab2=numpy.where(slabmask==1,numpy.nan,slab)
        missing_as='mask'

    else:
        has_missing=False
        slab2=slab.copy()

    #--------------------No missing--------------------
    if not has_missing:
        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
    else:
        H,W=slab.shape
        hh=int((kernel.shape[0]-1)/2)  # half height
        hw=int((kernel.shape[1]-1)/2)  # half width
        min_valid=(1-max_missing)*kernel.shape[0]*kernel.shape[1]

        # dont forget to flip the kernel
        kernel_flip=kernel[::-1,::-1]

        result=sciconvolve(slab2,kernel,mode='constant',cval=0.)
        slab2=numpy.where(slabmask==1,0,slab2)

        #------------------Get nan holes------------------
        miss_idx=zip(*numpy.where(slabmask==1))

        if missing_as=='mask':
            mask=numpy.zeros([H,W])

        for yii,xii in miss_idx:

            #-------Recompute at each new nan in result-------
            hole_ys=range(max(0,yii-hh),min(H,yii+hh+1))
            hole_xs=range(max(0,xii-hw),min(W,xii+hw+1))

            for hi in hole_ys:
                for hj in hole_xs:
                    hi1=max(0,hi-hh)
                    hi2=min(H,hi+hh+1)
                    hj1=max(0,hj-hw)
                    hj2=min(W,hj+hw+1)

                    slab_window=slab2[hi1:hi2,hj1:hj2]
                    mask_window=slabmask[hi1:hi2,hj1:hj2]
                    kernel_ij=kernel_flip[max(0,hh-hi):min(hh*2+1,hh+H-hi), 
                                     max(0,hw-hj):min(hw*2+1,hw+W-hj)]
                    kernel_ij=numpy.where(mask_window==1,0,kernel_ij)

                    #----Fill with missing if not enough valid data----
                    ksum=numpy.sum(kernel_ij)
                    if ksum<min_valid:
                        if missing_as=='nan':
                            result[hi,hj]=numpy.nan
                        elif missing_as=='mask':
                            result[hi,hj]=0.
                            mask[hi,hj]=True
                    else:
                        result[hi,hj]=numpy.sum(slab_window*kernel_ij)

        if missing_as=='mask':
            result=numpy.ma.array(result)
            result.mask=mask

    return result

下图展示了输出结果。左边是一个30x30的随机Map,上面有3个numpy.nan的洞,大小为:
1.一对一

  1. 3 × 3
  2. 5 × 5

右边是卷积输出,采用5x5内核(全1),容差水平为50%(max_missing=0.5)。
因此,前2个较小的空洞使用附近的值填充,而在最后一个空洞中,因为缺失的数量〉0.5x5x5 = 12.5,所以放置numpy.nan来表示缺失的信息。

6gpjuf90

6gpjuf902#

我发现了一个黑客。而不是nan使用虚数(在那里是nan改变它为1i)运行卷积和设置,无论虚数值是高于一个阈值,它是nan。无论何时,它是以下只是采取真实的值。这里是一个代码片段:

frames_complex = np.zeros_like(frames_, dtype=np.complex64)
frames_complex[np.isnan(frames_)] = np.array((1j))
frames_complex[np.bitwise_not(np.isnan(frames_))] =                         
frames_[np.bitwise_not(np.isnan(frames_))]
convolution = signal.convolve(frames_complex, gaussian_window, 'valid')
convolution[np.imag(convolution)>0.2] = np.nan
convolution = convolution.astype(np.float32)
5jvtdoz2

5jvtdoz23#

基于the idea from Ilan Schvartzman in a previous answer的改进版本。此外,它可以补偿设置为0的缺失值(在真实的空间中),并支持归一化为np.sum(in2)。这两个值都可以分别通过参数correct_missingnorm进行调整。对于一维版本,只需将scipy.signal.convolve2d替换为scipy.signal.convolve即可。

import scipy.signal
import numpy as np

def masked_convolve2d(in1, in2, correct_missing=True, norm=True, valid_ratio=1./3., *args,**kwargs):
    """A workaround for np.ma.MaskedArray in scipy.signal.convolve. 
    It converts the masked values to complex values=1j. The complex space allows to set a limit
    for the imaginary convolution. The function use a ratio `valid_ratio` of np.sum(in2) to 
    set a lower limit on the imaginary part to mask the values.
    I.e. in1=[[1.,1.,--,--]] in2=[[1.,1.]] -> imaginary_part/sum(in2): [[1., 1., .5, 0.]]
    -> valid_ratio=.5 -> out:[[1., 1., .5, --]].
    PARAMETERS
    ---------
    in1 : array_like
        First input.
    in2 : array_like
        Second input. Should have the same number of dimensions as `in1`.
    correct_missing : bool, optional
        correct the value of the convolution as a sum over valid data only, 
        as masked values account 0 in the real space of the convolution.
    norm : bool, optional
        if the output should be normalized to np.sum(in2).
    valid_ratio: float, optional
        the upper limit of the imaginary convolution to mask values. Defined by the ratio of np.sum(in2).
    *args,**kwargs: optional
        parsed to scipy.signal.convolve(..., *args,**kwargs)
    """
    if not isinstance(in1, np.ma.MaskedArray):
        in1 = np.ma.array(in1)

    # np.complex128 -> stores real as np.float64
    con = scipy.signal.convolve2d(in1.astype(np.complex128).filled(fill_value=1j), 
                                  in2.astype(np.complex128), 
                                  *args,**kwargs
                                 )

    # split complex128 to two float64s
    con_imag = con.imag
    con = con.real
    mask = np.abs(con_imag/np.sum(in2)) > valid_ratio

    # con_east.real / (1. - con_east.imag): correction, to get the mean over all valid values
    # con_east.imag > percent: how many percent of the single convolution value have to be from valid values
    if correct_missing:
        correction = np.sum(in2) - con_imag
        con[correction!=0] *= np.sum(in2)/correction[correction!=0]

    if norm:
        con /= np.sum(in2)

    return np.ma.array(con, mask=mask)

示例

显示correct_missing与输入的差异的示例:

in1 = np.ones((1, 6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)

in2 = np.ones((1, 4))

in1
>>> array([[1.0, 1.0, 1.0, 1.0, --, --]])

correct_missing的掩码卷积为:

masked_convolve2d(in1, in2, correct_missing=True, mode='valid', norm=True)
>>> masked_array(data=[[1.0, 1.0, --]],
                 mask=[[False, False,  True]])

在不进行校正的情况下,如果要使用np.nan填充屏蔽值:

b = masked_convolve2d(in1, in2, correct_missing=False, mode='valid', norm=True)
b.filled(np.nan)
>>> array([[1.  , 0.75,  nan]]))

性能测试

我根据以下输入,对照Jason's codeconvolve2d)测试了我的版本(masked_convolve2d):

in1 = np.ones((10,6))
in1[:, 4:] = 0
in1 = np.ma.masked_equal(in1, 0)

in2 = np.ones((3,3))  # <kernel> shape needs to be an odd number for Jason's code

在我的计算机上有以下结果:

%timeit -n 10 -r 10 convolve2d(in1, in2)
>>> 2.44 ms ± 117 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

%timeit -n 10 -r 10 masked_convolve2d(in1, in2)
>>> 131 µs ± 19.1 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

相关问题