pandas按类别分组和计数

hts6caw3  于 2023-04-04  发布在  其他
关注(0)|答案(4)|浏览(146)

我有一个数据框如下:

Date            User    Tradetype  Cost
2022-01-01      User1    Sell       $1
2022-01-01      User1    Buy        $2
2022-01-01      User1    Sell       $0
2022-01-01      User2    Buy        $1
2022-01-01     User2     Sell       $1

我希望以这样一种方式聚合它,得到以下输出

User      Date         Tradetype      Count    TotalCost
User1    2022-01-01     Buy             2       $2
User1    2022-01-01     Sell            2       $2
User2    2022-01-01     Buy             1       $1
User2    2022-01-01     Sell            1       $1

我已经尝试了df.groupby('User'),但不知道如何进一步进行,因为Tradetype也必须聚合。任何关于如何在pandas中实现这一点的输入?谢谢!

kxkpmulp

kxkpmulp1#

这应该行得通:

df['Cost'] = df.Cost.str.lstrip('$').astype(int)
pd.merge(
    df.rename(columns={'Cost':'TotalCost'}).groupby(by=['Date', 'User', 'Tradetype']).sum(),
    df.rename(columns={'Cost':'Count'}).groupby(by=['Date', 'User', 'Tradetype']).count(),
    left_index=True,
    right_index=True,
).reset_index()

结果:

Date   User Tradetype  TotalCost  Count
0  2022-01-01  User1       Buy          2      1
1  2022-01-01  User1      Sell          1      2
2  2022-01-01  User2       Buy          1      1
3  2022-01-01  User2      Sell          1      1
5ssjco0h

5ssjco0h2#

df.groupby(['User','Tradetype']).agg(['sum','count'])
可以对user和tradetype列进行分组,并应用count和sum函数

gorkyyrv

gorkyyrv3#

使用您提供的 Dataframe :

import pandas as pd

df = pd.DataFrame(
    {
        "Date": ["2022-01-01", "2022-01-01", "2022-01-01", "2022-01-01", "2022-01-01"],
        "User": ["User1", "User1", "User1", "User2", "User2"],
        "Tradetype": ["Sell", "Buy", "Sell", "Buy", "Sell"],
        "Cost": ["$1", "$2", "$0", "$1", "$1"],
    }
)

下面是一种使用Pandas pipeassign以及Python的strip和f-strings的方法:

df = (
    df.groupby(["User", "Date", "Tradetype"])
    .agg(list)
    .pipe(
        lambda df_: df_.assign(
            Count=df_["Cost"].apply(len),
            TotalCost=df_["Cost"].apply(
                lambda x: f"${sum([int(c.strip('$')) for c in x])}"
            ),
        )
    )
    .drop(columns="Cost")
    .reset_index()
)

然后:

print(df)
# Output

    User        Date Tradetype  Count TotalCost
0  User1  2022-01-01       Buy      1        $2
1  User1  2022-01-01      Sell      2        $1
2  User2  2022-01-01       Buy      1        $1
3  User2  2022-01-01      Sell      1        $1
iecba09b

iecba09b4#

(df.groupby(['Date','User','Tradetype '])
['Cost'].agg(['count','sum']).reset_index()

相关问题