scipy 高效地查找非矩形2D网格上最近点的索引

xwmevbvl 于 2022-12-23 发布在其他

关注(0)|答案(3)|浏览(135)

我有一个不规则的（非矩形的）lon/lat网格和一堆在lon/lat坐标系中的点，这些点应该与网格上的点相对应（尽管它们可能由于数值原因而稍微偏离），现在我需要对应的lon/lat点的索引。
我已经写了一个函数来做这个，但是它真的很慢。

def find_indices(lon,lat,x,y):
    lonlat = np.dstack([lon,lat])
    delta = np.abs(lonlat-[x,y])
    ij_1d = np.linalg.norm(delta,axis=2).argmin()
    i,j = np.unravel_index(ij_1d,lon.shape)
    return i,j

ind = [find_indices(lon,lat,p*) for p in points]

我确信numpy/scipy有更好（更快）的解决方案，我已经在谷歌上搜索了很多，但是到目前为止我还没有找到答案。
关于如何有效地找到对应（最近）点的索引，有什么建议吗？

**PS：**这个问题来自another one）。

scipy

来源：https://stackoverflow.com/questions/32909087/efficiently-find-indices-of-nearest-points-on-non-rectangular-2d-grid

3条答案

按热度按时间

mwg9r5ms1#

如果这些点足够局部化，你可以直接尝试scipy.spatial的cKDTree实现，正如我在另一篇文章中所讨论的，那篇文章是关于插值的，但你可以忽略它，只使用查询部分。
TL;DR版本：
阅读scipy.sptial.cKDTree的文档，通过将(n, m)形状的numpy ndarray对象传递给初始化器来创建树，树将从nm维坐标创建。

tree = scipy.spatial.cKDTree(array_of_coordinates)

然后，使用tree.query()检索k的最近邻居（可能使用近似和并行化，参见文档），或者使用tree.query_ball_point()查找给定距离容差内的所有邻居。
如果这些点没有很好地局部化，并且球面曲率/非平凡拓扑起作用，你可以尝试将流形分成多个部分，每个部分都足够小，可以被认为是局部的。

赞(0）回复(0）举报 2022-12-23

ttcibm8c2#

这是使用scipy.spatial.distance.cdist的通用矢量化方法-

import scipy

# Stack lon and lat arrays as columns to form a Nx2 array, where is N is grid**2
lonlat = np.column_stack((lon.ravel(),lat.ravel()))

# Get the distances and get the argmin across the entire N length
idx = scipy.spatial.distance.cdist(lonlat,points).argmin(0)

# Get the indices corresponding to grid's shape as the final output
ind = np.column_stack((np.unravel_index(idx,lon.shape))).tolist()

样品运行-

In [161]: lon
Out[161]: 
array([[-11.   ,  -7.82 ,  -4.52 ,  -1.18 ,   2.19 ],
       [-12.   ,  -8.65 ,  -5.21 ,  -1.71 ,   1.81 ],
       [-13.   ,  -9.53 ,  -5.94 ,  -2.29 ,   1.41 ],
       [-14.1  ,  -0.04 ,  -6.74 ,  -2.91 ,   0.976]])

In [162]: lat
Out[162]: 
array([[-11.2  ,  -7.82 ,  -4.51 ,  -1.18 ,   2.19 ],
       [-12.   ,  -8.63 ,  -5.27 ,  -1.71 ,   1.81 ],
       [-13.2  ,  -9.52 ,  -5.96 ,  -2.29 ,   1.41 ],
       [-14.3  ,  -0.06 ,  -6.75 ,  -2.91 ,   0.973]])

In [163]: lonlat = np.column_stack((lon.ravel(),lat.ravel()))

In [164]: idx = scipy.spatial.distance.cdist(lonlat,points).argmin(0)

In [165]: np.column_stack((np.unravel_index(idx,lon.shape))).tolist()
Out[165]: [[0, 4], [0, 4], [0, 4], [0, 4], [0, 4], [0, 4], [3, 3]]

运行时间测试-
定义功能：

def find_indices(lon,lat,x,y):
    lonlat = np.dstack([lon,lat])
    delta = np.abs(lonlat-[x,y])
    ij_1d = np.linalg.norm(delta,axis=2).argmin()
    i,j = np.unravel_index(ij_1d,lon.shape)
    return i,j

def loopy_app(lon,lat,pts):
    return [find_indices(lon,lat,pts[i,0],pts[i,1]) for i in range(pts.shape[0])]

def vectorized_app(lon,lat,points):
    lonlat = np.column_stack((lon.ravel(),lat.ravel()))
    idx = scipy.spatial.distance.cdist(lonlat,points).argmin(0)
    return np.column_stack((np.unravel_index(idx,lon.shape))).tolist()

时间：

In [179]: lon = np.random.rand(100,100)

In [180]: lat = np.random.rand(100,100)

In [181]: points = np.random.rand(50,2)

In [182]: %timeit loopy_app(lon,lat,points)
10 loops, best of 3: 47 ms per loop

In [183]: %timeit vectorized_app(lon,lat,points)
10 loops, best of 3: 16.6 ms per loop

为了挤出更多的性能，可以使用np.concatenate代替np.column_stack。

赞(0）回复(0）举报 2022-12-23

j8yoct9x3#

根据@丛马的回答，我找到了以下解决方案：

def find_indices(points,lon,lat,tree=None):
    if tree is None:
        lon,lat = lon.T,lat.T
        lonlat = np.column_stack((lon.ravel(),lat.ravel()))
        tree = sp.spatial.cKDTree(lonlat)
    dist,idx = tree.query(points,k=1)
    ind = np.column_stack(np.unravel_index(idx,lon.shape))
    return [(i,j) for i,j in ind]

为了更好地理解这个解决方案以及Divakar's answer中的解决方案，下面是我使用find_indices的函数的一些时序（以及它在速度方面的瓶颈）：

spatial_contour_frequency/pil0                :   331.9553
spatial_contour_frequency/pil1                :   104.5771
spatial_contour_frequency/pil2                :     2.3629
spatial_contour_frequency/pil3                :     0.3287

pil0是我的初始方法，pil1是Divakar的，pil2/pil3是上面的最终解决方案，其中树是在pil2中动态创建的（即，对于调用find_indices的循环的每次迭代）并且在pil3中仅一次（有关详细信息，请参见other thread）尽管Divakar对我最初的方法进行了改进，使速度提高了3倍，cKDTree将这一点提升到了一个全新的水平，又有了50倍的加速！而且将树的创建移出函数使事情变得更快。
此答案以edit的形式发布在CC BY-SA 3.0下，通过OP flotzilla有效查找非矩形2D网格上最近点的索引。

赞(0）回复(0）举报 2022-12-23

我来回答

scipy 高效地查找非矩形2D网格上最近点的索引

3条答案

相关问题

热门标签

最新问答