将pandas col除以groupby df [duplicate]的有效方法

disbfnqx  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(142)

此问题已在此处有答案

Pandas percentage of total with groupby(16个答案)
4天前关闭。
用一个例子更容易解释,比如我有一个示例 Dataframe ,其中包含yearcc_ratingnumber_x

df = pd.DataFrame({"year":{"0":2005,"1":2005,"2":2005,"3":2006,"4":2006,"5":2006,"6":2007,"7":2007,"8":2007},"cc_rating":{"0":"2","1":"2a","2":"2b","3":"2","4":"2a","5":"2b","6":"2","7":"2a","8":"2b"},"number_x":{"0":9368,"1":21643,"2":107577,"3":10069,"4":21486,"5":110326,"6":10834,"7":21566,"8":111082}})

df 

year    cc_rating   number_x
0   2005    2   9368
1   2005    2a  21643
2   2005    2b  107577
3   2006    2   10069
4   2006    2a  21486
5   2006    2b  110326
6   2007    2   10834
7   2007    2a  21566
8   2007    2b  111082

问题

我怎样才能得到每年number_x的百分比?含义:

直接除法不能工作,因为年份不能设置为原始df中的索引,因为它不是唯一的。
现在我正在做下面的事情,但是效率很低,我相信有更好的方法。

df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
df = df.drop('number_y', axis=1)

谢谢!

7eumitmz

7eumitmz1#

可能的解决方案:

(df.assign(
    perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
    .round(2)))

输出:

year cc_rating  number_x   perc
0  2005         2      9368   6.76
1  2005        2a     21643  15.62
2  2005        2b    107577  77.62
3  2006         2     10069   7.10
4  2006        2a     21486  15.14
5  2006        2b    110326  77.76
6  2007         2     10834   7.55
7  2007        2a     21566  15.03
8  2007        2b    111082  77.42

相关问题