基于Pandas中另一列的总和对组内的行进行排名

ttp71kqs  于 2023-05-05  发布在  其他
关注(0)|答案(1)|浏览(122)

我有一个Pandas dataframe下面。我需要创建一个RANKING列,其中排名基于UNIT级别的STAFF_ID的SOLD_NO的SUM。这意味着,对于每个UNIT,STAFF_ID按照SOLD_NO的SUM的降序排列。
我试过这个代码,但它是不正确的,主要是不是所有的单位有一个排名1.
df['RANKING'] = df.groupby(['UNIT'])['SOLD_NO'].transform(np.sum).rank(ascending=False, method='dense')
有什么我需要改变的吗?

szqfcxe2

szqfcxe21#

import pandas as pd

# Your dataframe
data = {
    'UNIT': ['AA', 'AA', 'AA', 'AA','AA', 'BB', 'BB', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC'],
    'STAFF_ID': [111, 111, 222, 222, 222, 333, 333, 444, 444, 444, 444, 555, 555, 555, 555, 555, 666],
    'SOLD_NO': [2, 5, 1, 0, 10, 1, 1, 10, 4, 2, 6, 1, 1, 4, 1, 1, 8]
}

df = pd.DataFrame(data)

# Calculate the sum of SOLD_NO for each STAFF_ID in each UNIT
df_sum = df.groupby(['UNIT', 'STAFF_ID'])['SOLD_NO'].sum().reset_index()

# Calculate the ranking within each UNIT based on the sum of SOLD_NO
df_sum['RANKING'] = df_sum.groupby(['UNIT'])['SOLD_NO'].rank(ascending=False, method='dense')

# Merge the ranking back to the original dataframe
df = df.merge(df_sum[['UNIT', 'STAFF_ID', 'RANKING']], on=['UNIT', 'STAFF_ID'])

print(df)

结果:

UNIT  STAFF_ID  SOLD_NO  RANKING
0    AA       111        2      2.0
1    AA       111        5      2.0
2    AA       222        1      1.0
3    AA       222        0      1.0
4    AA       222       10      1.0
5    BB       333        1      1.0
6    BB       333        1      1.0
7    CC       444       10      1.0
8    CC       444        4      1.0
9    CC       444        2      1.0
10   CC       444        6      1.0
11   CC       555        1      2.0
12   CC       555        1      2.0
13   CC       555        4      2.0
14   CC       555        1      2.0
15   CC       555        1      2.0
16   CC       666        8      2.0

相关问题