如何解决“ValueError：在pandas中处理时间序列数据时，出现“cannot reindex on an axis with duplicate labels”错误,

falq053o 于 2023-04-04 发布在其他

关注(0)|答案(2)|浏览(1558)

cube   timestamp          temp   
timestamp               
2022-08-01 00:15:05.135  A1       2022-08-01 00:15:05.135    NaN

2022-08-01 00:15:37.255  A1       2022-08-01 00:15:37.255    23.17  

2022-08-01 00:23:05.139  A1       2022-08-01 00:23:05.139    NaN    

2022-08-01 00:23:15.137  A1       2022-08-01 00:23:15.137    NaN    

2022-08-11 11:33:20.738  P19      2022-08-11 00:15:05.135    NaN

我试图插值NaN值的温度的基础上的时间戳相对于立方体使用下面的代码

idata.set_index(idata['timestamp'],inplace = True)

idata['temp'] = idata.groupby('cube')['temp'].apply(lambda x:x.interpolate(method="time",limit_direction = "both"))

在执行此代码时，我得到错误“ValueError：我不能删除重复的标签（时间戳），因为它可能属于不同的多维数据集。请建议处理这种情况的替代方案。

pandas

来源：https://stackoverflow.com/questions/73324491/how-to-resolve-valueerror-cannot-reindex-on-an-axis-with-duplicate-labels-err

2条答案

按热度按时间

8dtrkrch1#

要用作索引的列中可能有重复的值。索引值必须唯一。
您可以使用df['timestamp'].duplicated()找到它们

赞(0）回复(0）举报 2023-04-04

mwg9r5ms2#

我认为问题源于先设置重复索引，然后再做groupby。相反，我建议您先按cube分组，然后在每个组内进行插值：

def interp_group(g):
     g.set_index('timestamp', inplace=True)
     g['temp'] = g.temp.interpolate(method="time",limit_direction = "both")
     return g

cubes = df.groupby('cube')
interpolated = groups.apply(f)

你得到一个带有MultiIndex的 Dataframe ，组作为第一级，时间戳作为第二级。列temp按照你的需要插值：

In [36]: interpolated
Out[36]:
                              cube   temp
cube timestamp
  A1 2022-08-01 00:15:05.135    A1  23.17
     2022-08-01 00:15:37.255    A1  23.17
     2022-08-01 00:23:05.139    A1  23.17
     2022-08-01 00:23:15.137    A1  23.17
 P19 2022-08-11 00:15:05.135   P19    NaN

或者，如果您更喜欢可读性较低的一行程序：

df.groupby('cube').apply(lambda g: g.set_index('timestamp').temp.interpolate(method="time",limit_direction = "both"))

赞(0）回复(0）举报 2023-04-04

我来回答

如何解决“ValueError：在pandas中处理时间序列数据时，出现“cannot reindex on an axis with duplicate labels”错误,

2条答案

相关问题

热门标签

最新问答