在pandas dataframe中获取特定范围索引的最佳算法

zbdgwd5y 于 2023-09-29 发布在其他

关注(0)|答案(1)|浏览(109)

我需要落实司机是否超速行驶。GPS信息每秒从安装在驾驶员驾驶的车辆中的GPS设备上传，如下所示。

[(37.165224, 127.2354123), ... ,(37.123456, 127.123456)]

此外，有速度限制信息为每个roal在网格格式如下.

(MinX, MaxX, MinY, MaxY, Speed Limit)
[37.123456, 37.123458, 127.123456, 127.123458, 80]
[37.123457, 37.123458, 127.123457, 127.123459, 70]
...

该网格信息具有重叠部分，其是由立交桥引起的信息。
因此，如果驾驶员被包括在多个网格信息中，则决定选择对应的速度限制的最大速度作为速度限制。
例如，在上述信息中，如果驾驶员包括在(37.123457, 127.123457)中，则根据网格信息，速度限制为80或70，并且基于最高限制速度规则，速度限制被确定为80。
我决定将其实现为每日批处理，并使用pandas dataframe实现了它，如下所示。

with open("/path/file.pickle", "rb") as f:
  matsers:list = pickle.load(f)
def func_max_spd_iterrow(lon, lat, master):
  limit_spd = -1
  if ((lat >= 33.0) & (lat <= 39.0)):
    locs = master.loc[(master['MINY']<=lat)&(master['MAXY']>lat)&(master['MINX']<=lon)&(master['MAXX']>lon)]
  
  if(locs.size != 0):
    spd = locs['LIMIT_SPD'].max()
  return spd

如上图所示，网格信息是使用pickle加载的，数据长度约为1000万。所以，仅仅分析一个司机2,000秒的驾驶记录就需要大约12秒
因此，我有一个问题，分析3,000个行程信息，因为它是极其耗时的。

是否有合适的算法来解决这个问题或pandas dataframe方法可以加速索引？

pandas

来源：https://stackoverflow.com/questions/77109105/the-best-algorithm-to-get-index-with-specific-range-at-pandas-dataframe

1条答案

按热度按时间

bgtovc5b1#

#Convert your grid information into GeoDataFrame:
import geopandas as gpd
# Create a GeoDataFrame from your grid information
gdf = gpd.GeoDataFrame(grid_info, 
                       columns=['Speed Limit'], 
                       geometry=gpd.GeoSeries([Polygon([(minX, minY), (maxX, minY), (maxX, maxY), (minX, maxY)]) 
                                               for minX, maxX, minY, maxY, _ in grid_info]))

然后，

#Create a spatial index for the GeoDataFrame:
gdf.sindex

然后，

#Iterate through your GPS points and perform spatial queries:
from shapely.geometry import Point
def find_speed_limit(lon, lat):
    point = Point(lon, lat)
    possible_matches_index = list(gdf.sindex.intersection(point.bounds))
    possible_matches = gdf.iloc[possible_matches_index]
    for idx, row in possible_matches.iterrows():
        if point.within(row['geometry']):
            return row['Speed Limit']
    return -1  # No matching grid cell found

用法示例：

speed_limit = find_speed_limit(37.165224, 127.2354123)
print(speed_limit)

展开查看全部

赞(0）回复(0）举报 2023-09-29

我来回答

在pandas dataframe中获取特定范围索引的最佳算法

1条答案

相关问题

热门标签

最新问答