python:在应用已定义的haversine函数时,groupby()和apply()出现问题

olmpazwi  于 2021-08-20  发布在  Java
关注(0)|答案(1)|浏览(447)

我试图通过定义的haversine函数计算以下数据集的距离。该函数在其他数据上运行良好。但是,在这个特定的数据集中,我尝试使用groupby(df.index),它给出了一个错误:
无法将序列转换为<class'float'>
我以前使用过groupby()和apply(),没有问题。我不明白这件事发生了什么,我怎样才能解决它。
这是数据

latitude    longitude   datetime
356a192b7913b04c54574d18c28d46e6395428ab    57.723610   11.925191   2021-06-13 14:22:11.682
356a192b7913b04c54574d18c28d46e6395428ab    57.723614   11.925187   2021-06-13 14:22:13.562
356a192b7913b04c54574d18c28d46e6395428ab    57.723610   11.925172   2021-06-13 14:22:28.635
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.723637   11.925056   2021-06-13 14:22:59.336
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.724075   11.923708   2021-06-13 14:23:44.905
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723610   11.925191   2021-06-13 14:22:04.000
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723614   11.925178   2021-06-13 14:22:44.170
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723827   11.924635   2021-06-13 14:23:14.479
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723866   11.924005   2021-06-13 14:23:29.605

代码如下:

df2 = pd.concat([df.add_suffix('_pre').shift(), trips], axis=1)
df2

>>

                                           latitude_pre longitude_pre   datetime_pre    latitude    longitude   datetime
356a192b7913b04c54574d18c28d46e6395428ab            NaN         NaN                  NaT    57.723610   11.925191   2021-06-13 14:22:11.682
356a192b7913b04c54574d18c28d46e6395428ab    57.723610   11.925191   2021-06-13 14:22:11.682 57.723614   11.925187   2021-06-13 14:22:13.562
356a192b7913b04c54574d18c28d46e6395428ab    57.723614   11.925187   2021-06-13 14:22:13.562 57.723610   11.925172   2021-06-13 14:22:28.635
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.723610   11.925172   2021-06-13 14:22:28.635 57.723637   11.925056   2021-06-13 14:22:59.336
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.723637   11.925056   2021-06-13 14:22:59.336 57.724075   11.923708   2021-06-13 14:23:44.905
77de68daecd823babbb58edb1c8e14d7106e83bb    57.724075   11.923708   2021-06-13 14:23:44.905 57.723610   11.925191   2021-06-13 14:22:04.000
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723610   11.925191   2021-06-13 14:22:04.000 57.723614   11.925178   2021-06-13 14:22:44.170
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723614   11.925178   2021-06-13 14:22:44.170 57.723827   11.924635   2021-06-13 14:23:14.479
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723827   11.924635   2021-06-13 14:23:14.479 57.723866   11.924005   2021-06-13 14:23:29.605

df2.groupby(df2.index).apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']))

>>
cannot convert the series to <class 'float'>

如果需要,这里是haversine():

def haversine(lat1, lon1, lat2, lon2):
    R = 6373.0 * 1000 # Earth's radius (in m)

    dlon = radians(lon2) - radians(lon1)
    dlat = radians(lat2) - radians(lat1)

    a = math.sin(dlat / 2)**2 + math.cos(radians(lat1)) * math.cos(radians(lat2)) * math.sin(dlon / 2)**2
    return R *2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

之所以需要_pre列,是因为我正在迭代相同列的点坐标。将应用偏移,因为第一个点坐标没有要计算距离的上一个点。
编辑:
我试图将datetime列从datetime转换为epoch,但错误仍然存在。目前,所有列都是float类型。
要将其转换为epoch,我使用了:

import datetime as dt

df['datetime'] = (df['datetime'] - dt.datetime(1970,1,1)).dt.total_seconds()

还尝试:

shift(fill_value=0)

也犯了同样的错误

oxalkeyp

oxalkeyp1#

如果将print(lat1)添加到haversine函数中,则会打印以下内容:

356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64
356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64
356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64
356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64

lat1的“值”是一个系列,而不是单个值。这就是你想要的吗?现在还不清楚你想要的是什么,但我相信错误就在那里,因为它只寻找一个值。

相关问题