I'm getting to grips with pandas/matplotlib, and looking to aggregate multiple data series with (marginally) differing indices. For example:
Series 1
seconds_since_start | Value |
---|---|
0.0 | 35 |
0.8 | 41 |
1.1 | 48 |
Series 2
seconds_since_start | Value |
---|---|
0.0 | 31 |
0.7 | 37 |
1.1 | 41 |
At present, I'm plotting both series as 2 separate line graphs. Ultimately, I'm looking to create a single line that shows, for any given x value, the mean y of both series. The values between specified points can be assumed to be linear.
I assume this is a common task, but the ways I'm trying involve a lot more complexity than I suspect is necessary.
In short: is there a straightforward way in plot the mean for series that have differing index values?
Notes:
- While the only immediate need is graphing, ideally the aggregation would be calculated in pandas, not matplotlib
- The solution will aggregate >100 different series, not just 2
2条答案
按热度按时间c6ubokkw1#
一个解决方案是找到序列索引的并集,并对任何缺失的值进行插值。然后可以将序列连接在一起,并计算每个索引的平均值。下面的代码假设序列位于名为
series
的列表中。首先,获取索引的并集:
在示例中,
new_index
将为array([0. , 0.7, 0.8, 1.1])
。现在,
reindex
系列和concat
它们一起:结果:
7vhp5slm2#
您可以使用
pd.concat
聚合100多个系列,然后在计算平均值之前按seconds_since_start
分组:输出: