我有一个数据表:
Col1
0 1.0
1 1.0
2 1.0
3 2.0
4 3.0
5 4.0
6 NaN
如何对仅重复值进行排名(不考虑NaN)?不幸的是,我当前的输出也是唯一值的排名:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 1.0
4 3.0 1.0
5 4.0 1.0
6 NaN NaN
我需要的输出是:
Col1 Rn
0 1.0 1.0
1 1.0 2.0
2 1.0 3.0
3 2.0 NaN
4 3.0 NaN
5 4.0 NaN
6 NaN NaN
代码示例:
import numpy as np
import pandas as pd
df = pd.DataFrame([[1],
[1],
[1],
[2],
[3],
[4],
[np.NaN]], columns=['Col1'])
print(df)
# Adding row_number for each pair:
df['Rn'] = df[df['Col1'].notnull()].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)
# I managed to select only necessary rows for mask, but how can I apply it along with groupby?:
m = df.dropna().loc[df['Col1'].duplicated(keep=False)]
print(m)
谢谢你!
3条答案
按热度按时间c7rzv4ha1#
尝试:
图纸:
6vl6ewon2#
您可以识别
duplicated
值,并仅计算以下值的rank
:注意,如果你想增加重复的计数,你可以使用
groupby.cumcount
:输出:
nukf8bse3#
当我们从Maven那里得到答案的时候;一种方式可以是:
使用
sort_values()