pandas 如何排名只有重复的行和没有南?

31moq8wy  于 2023-06-20  发布在  其他
关注(0)|答案(3)|浏览(99)

我有一个数据表:

Col1
0   1.0
1   1.0
2   1.0
3   2.0
4   3.0
5   4.0
6   NaN

如何对重复值进行排名(不考虑NaN)?不幸的是,我当前的输出也是唯一值的排名:

Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  1.0
4   3.0  1.0
5   4.0  1.0
6   NaN  NaN

我需要的输出是:

Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

代码示例:

import numpy as np
import pandas as pd

df = pd.DataFrame([[1],
                   [1],
                   [1],
                   [2],
                   [3],
                   [4],
                   [np.NaN]], columns=['Col1'])
print(df)

# Adding row_number for each pair:
df['Rn'] = df[df['Col1'].notnull()].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)

# I managed to select only necessary rows for mask, but how can I apply it along with groupby?:
m = df.dropna().loc[df['Col1'].duplicated(keep=False)]
print(m)

谢谢你!

c7rzv4ha

c7rzv4ha1#

尝试:

m = df['Col1'].duplicated(keep=False)
df['Rn'] = df[m].groupby('Col1')['Col1'].rank(method="first", ascending=True)
print(df)

图纸:

Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN
6vl6ewon

6vl6ewon2#

您可以识别duplicated值,并仅计算以下值的rank

# identify duplicated rows
m = df['Col1'].duplicated(keep=False)

# compute the rank only for those
df['Rn'] = df.loc[m, 'Col1'].rank(method='first', ascending=True)

注意,如果你想增加重复的计数,你可以使用groupby.cumcount

m = df['Col1'].duplicated(keep=False)

df['Rn'] = df.loc[m, ['Col1']].groupby('Col1').cumcount().add(1)

输出:

Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN
nukf8bse

nukf8bse3#

当我们从Maven那里得到答案的时候;一种方式可以是:
使用sort_values()

m = df['Col1'].duplicated(keep=False)

df['Rn'] = pd.Series(df[m].sort_values(['Col1'],ascending=True).index + 1)

   Col1   Rn
0   1.0  1.0
1   1.0  2.0
2   1.0  3.0
3   2.0  NaN
4   3.0  NaN
5   4.0  NaN
6   NaN  NaN

相关问题