Pandas/matplotlib新手：用不同的指数聚集时间序列数据？

hpxqektj 于 2023-03-09 发布在其他

关注(0)|答案(2)|浏览(139)

I'm getting to grips with pandas/matplotlib, and looking to aggregate multiple data series with (marginally) differing indices. For example:

Series 1

seconds_since_start	Value
0.0	35
0.8	41
1.1	48

Series 2

seconds_since_start	Value
0.0	31
0.7	37
1.1	41

At present, I'm plotting both series as 2 separate line graphs. Ultimately, I'm looking to create a single line that shows, for any given x value, the mean y of both series. The values between specified points can be assumed to be linear.
I assume this is a common task, but the ways I'm trying involve a lot more complexity than I suspect is necessary.
In short: is there a straightforward way in plot the mean for series that have differing index values?
Notes:

While the only immediate need is graphing, ideally the aggregation would be calculated in pandas, not matplotlib
The solution will aggregate >100 different series, not just 2

matplotlib

来源：https://stackoverflow.com/questions/75658389/pandas-matplotlib-newbie-aggregating-time-series-data-with-differing-indices

2条答案

按热度按时间

c6ubokkw1#

一个解决方案是找到序列索引的并集，并对任何缺失的值进行插值。然后可以将序列连接在一起，并计算每个索引的平均值。下面的代码假设序列位于名为series的列表中。
首先，获取索引的并集：

from functools import reduce

new_index = reduce(np.union1d, [s.index.values for s in series])

在示例中，new_index将为array([0. , 0.7, 0.8, 1.1])。
现在，reindex系列和concat它们一起：

df = pd.concat([s.reindex(new_index).rename(f'Value_{i}') for i, s in enumerate(series)], axis=1)
df = df.interpolate('linear')
df['Avg'] = df.mean(axis=1)

结果：

Value_0  Value_1   Avg
seconds_since_start                        
0.0                     35.0     31.0  33.0
0.7                     38.0     37.0  37.5
0.8                     41.0     39.0  40.0
1.1                     48.0     41.0  44.5

赞(0）回复(0）举报 2023-03-09

7vhp5slm2#

您可以使用pd.concat聚合100多个系列，然后在计算平均值之前按seconds_since_start分组：

dfs = [df1, df2]  # all your data here
df = pd.concat(dfs, axis=0).groupby('seconds_since_start', as_index=False)['Value'].mean()
df.plot(x='seconds_since_start', y='Value', marker='o')

输出：

>>> df
   seconds_since_start  Value
0                  0.0   33.0
1                  0.7   37.0
2                  0.8   41.0
3                  1.1   44.5

赞(0）回复(0）举报 2023-03-09

我来回答

Pandas/matplotlib新手：用不同的指数聚集时间序列数据？

2条答案

相关问题

热门标签

最新问答