此问题已在此处有答案:
Aggregation in Pandas(2个答案)
昨天关门了。
我有一个 Dataframe ,它是由不一致的日期时间对象索引的。我看到过类似的例子,其中值可以每天平均,但不是每个id的每天平均值。我可以为每个selection_id创建一个新的 Dataframe ,但我认为有一个更好的方法,我只是不能在网上找到。
在我的 Dataframe 是:
| selection_id | price |
| ------------- | --------------- |
|2023-05-13 05:57:07.554 | 1 | 1.50 |
|2023-05-13 06:08:59.193 | 1 | 1.56 |
|2023-05-13 06:08:59.085 | 1 | 1.61 |
|2023-05-13 06:08:59.085 | 1 | 1.50 |
|2023-05-13 06:08:59.085 | 1 | 1.51 |
|2023-05-13 06:08:59.085 | 45 | 3.12 |
|2023-05-13 05:57:07.554 | 45 | 3.16 |
|2023-05-13 06:08:59.193 | 45 | 3.12 |
|2023-05-13 06:08:59.085 | 45 | 3.16 |
|2023-05-13 06:08:59.085 | 45 | 3.12 |
|2023-05-13 06:08:59.085 | 98 | 7.05 |
|2023-05-13 06:08:59.085 | 98 | 7.52 |
|2023-05-13 05:57:07.554 | 98 | 7.11 |
|2023-05-13 06:08:59.193 | 98 | 7.99 |
|2023-05-13 06:08:59.085 | 98 | 7.50 |
|2023-05-13 06:08:59.085 | 98 | 7.20 |
|2023-05-13 06:08:59.085 | 98 | 7.65 |
|2023-05-13 06:08:59.085 | 98 | 7.45 |
|2023-05-14 05:57:07.554 | 1 | 2.50 |
|2023-05-14 06:08:59.193 | 1 | 2.56 |
|2023-05-14 06:08:59.085 | 1 | 2.61 |
|2023-05-14 06:08:59.085 | 1 | 2.50 |
|2023-05-14 06:08:59.085 | 1 | 2.51 |
|2023-05-14 06:08:59.085 | 45 | 2.12 |
|2023-05-14 05:57:07.554 | 45 | 2.16 |
|2023-05-14 06:08:59.193 | 45 | 2.12 |
|2023-05-14 06:08:59.085 | 45 | 2.16 |
|2023-05-14 06:08:59.085 | 45 | 2.12 |
|2023-05-14 06:08:59.085 | 98 | 7.05 |
|2023-05-14 06:08:59.085 | 98 | 7.52 |
|2023-05-14 05:57:07.554 | 98 | 7.11 |
|2023-05-14 06:08:59.193 | 98 | 7.99 |
|2023-05-14 06:08:59.085 | 98 | 7.50 |
|2023-05-14 06:08:59.085 | 98 | 7.20 |
|2023-05-14 06:08:59.085 | 98 | 7.65 |
|2023-05-14 06:08:59.085 | 98 | 7.45 |
我想把数据框总结为:
| selection_id | price |
| -------------- | ---------------- |
|2023-05-13 | 1 | 1.536 |
|2023-05-13 | 45 | 3.136 |
|2023-05-13 | 98 | 7.434 |
|2023-05-14 | 1 | 2.536 |
|2023-05-14 | 45 | 2.136 |
|2023-05-14 | 98 | 7.434 |
2条答案
按热度按时间zyfwsgd61#
您可以用途:
输出:
wwtsj6pe2#
要获取不带时间的日期,可以使用
dt.date
。然后使用groupby
将数据按date
和selection id
分组,然后应用agg
计算每组的平均值。