pandas数据框中每个id的不一致秒数数据的每日加权平均值[重复]

bq3bfh9z  于 2023-05-21  发布在  其他
关注(0)|答案(2)|浏览(83)

此问题已在此处有答案

Aggregation in Pandas(2个答案)
昨天关门了。
我有一个 Dataframe ,它是由不一致的日期时间对象索引的。我看到过类似的例子,其中值可以每天平均,但不是每个id的每天平均值。我可以为每个selection_id创建一个新的 Dataframe ,但我认为有一个更好的方法,我只是不能在网上找到。
在我的 Dataframe 是:

| selection_id  | price           |
                                             | ------------- | --------------- |
                    |2023-05-13 05:57:07.554 | 1             | 1.50            |
                    |2023-05-13 06:08:59.193 | 1             | 1.56            |
                    |2023-05-13 06:08:59.085 | 1             | 1.61            |
                    |2023-05-13 06:08:59.085 | 1             | 1.50            |
                    |2023-05-13 06:08:59.085 | 1             | 1.51            |
                    |2023-05-13 06:08:59.085 | 45            | 3.12            |
                    |2023-05-13 05:57:07.554 | 45            | 3.16            |
                    |2023-05-13 06:08:59.193 | 45            | 3.12            |
                    |2023-05-13 06:08:59.085 | 45            | 3.16            |
                    |2023-05-13 06:08:59.085 | 45            | 3.12            |
                    |2023-05-13 06:08:59.085 | 98            | 7.05            |
                    |2023-05-13 06:08:59.085 | 98            | 7.52            |
                    |2023-05-13 05:57:07.554 | 98            | 7.11            |
                    |2023-05-13 06:08:59.193 | 98            | 7.99            |
                    |2023-05-13 06:08:59.085 | 98            | 7.50            |
                    |2023-05-13 06:08:59.085 | 98            | 7.20            |
                    |2023-05-13 06:08:59.085 | 98            | 7.65            |
                    |2023-05-13 06:08:59.085 | 98            | 7.45            |
                    |2023-05-14 05:57:07.554 | 1             | 2.50            |
                    |2023-05-14 06:08:59.193 | 1             | 2.56            |
                    |2023-05-14 06:08:59.085 | 1             | 2.61            |
                    |2023-05-14 06:08:59.085 | 1             | 2.50            |
                    |2023-05-14 06:08:59.085 | 1             | 2.51            |
                    |2023-05-14 06:08:59.085 | 45            | 2.12            |
                    |2023-05-14 05:57:07.554 | 45            | 2.16            |
                    |2023-05-14 06:08:59.193 | 45            | 2.12            |
                    |2023-05-14 06:08:59.085 | 45            | 2.16            |
                    |2023-05-14 06:08:59.085 | 45            | 2.12            |
                    |2023-05-14 06:08:59.085 | 98            | 7.05            |
                    |2023-05-14 06:08:59.085 | 98            | 7.52            |
                    |2023-05-14 05:57:07.554 | 98            | 7.11            |
                    |2023-05-14 06:08:59.193 | 98            | 7.99            |
                    |2023-05-14 06:08:59.085 | 98            | 7.50            |
                    |2023-05-14 06:08:59.085 | 98            | 7.20            |
                    |2023-05-14 06:08:59.085 | 98            | 7.65            |
                    |2023-05-14 06:08:59.085 | 98            | 7.45            |

我想把数据框总结为:

| selection_id   | price            |
                                             | -------------- | ---------------- |
                    |2023-05-13              | 1              | 1.536            |
                    |2023-05-13              | 45             | 3.136            |
                    |2023-05-13              | 98             | 7.434            |
                    |2023-05-14              | 1              | 2.536            |
                    |2023-05-14              | 45             | 2.136            |
                    |2023-05-14              | 98             | 7.434            |
zyfwsgd6

zyfwsgd61#

您可以用途:

out = (
    df.groupby([pd.to_datetime(df.index).date, #or df.index.date if it's a DatetimeIndex
                "selection_id"])["price"].mean().reset_index(level=1)
      )

输出:

print(out)

           selection_id  price
2023-05-13            1   1.54
2023-05-13           45   3.14
2023-05-13           98   7.43
2023-05-14            1   2.54
2023-05-14           45   2.14
2023-05-14           98   7.43
wwtsj6pe

wwtsj6pe2#

要获取不带时间的日期,可以使用dt.date。然后使用groupby将数据按dateselection id分组,然后应用agg计算每组的平均值。

df = pd.DataFrame(data)

df['datetime'] = pd.to_datetime(df['datetime'])

summary_df = df.groupby([df['datetime'].dt.date, 'selection_id']).agg({'price': 'mean'}).reset_index()

summary_df.head()

相关问题