pandas 根据不同标准查找排序值

bsxbgnwa  于 2023-05-05  发布在  其他
关注(0)|答案(3)|浏览(113)

我有一个汽车数据集,包括Car_code,customer rent code,Rent Dateas下表:

import pandas as pd
import numpy as np

df = pd.DataFrame([['x','iii-2019-10-16','18/04/2019'],
                   ['x','iii-2019-10-16','21/07/2019'],
                   ['x','iii-2019-10-16','12/09/2019'],
                   ['x','zzz-2020-10-25','12/04/2022'],
                   ['y','qqq-2018-05-28','10/12/2017'], 
                   ['y','qqq-2018-05-28','15/02/2018'],
                   ['y','ooo-2019-11-22','30/05/2019'],
                   ['y','rrr-16-12-2020','16/12/2020'],
                   ['z','ppt-2019-12-03','07/02/2018'],
                   ['z','ttt-2019-12-03','28/05/2019'],
                   ['z','ttt-2019-12-03','09/09/2019'],
                   ['z','ttt-2019-12-03','30/09/2019']
                  
                  ],
                  columns=['Car_code','customer_rent_code','Rent_Date'])
df

我需要创建两个新列:
1-按“Car_code”、“customer_rent_code”和“Rent_Date”对表进行排序
2-第一个新列是'Rent_rank',它表示每个客户租赁代码的每辆车的租金等级,因此例如car x,前3个租金将采取等级1,然后第四个将采取等级2
3-第二列是'Rent_rank_per_customer',它表示相同'customer_rent_code'内的租金等级,因此汽车x的值将是(1,2,3,1)
预期结果如下:

sg3maiej

sg3maiej1#

有2个分组和转换:

df['Rent_rank_per_customer'] = (df.groupby(['Car_code', 'customer_rent_code'])['customer_rent_code']
                                .transform(lambda x: x.rank(method='first')))
df['Rent_rank'] = df.groupby('Car_code')['customer_rent_code'].transform(lambda x: x.factorize()[0] + 1)
Car_code customer_rent_code   Rent_Date  Rent_rank_per_customer  Rent_rank
0         x     iii-2019-10-16  18/04/2019                     1.0          1
1         x     iii-2019-10-16  21/07/2019                     2.0          1
2         x     iii-2019-10-16  12/09/2019                     3.0          1
3         x     zzz-2020-10-25  12/04/2022                     1.0          2
4         y     qqq-2018-05-28  10/12/2017                     1.0          1
5         y     qqq-2018-05-28  15/02/2018                     2.0          1
6         y     ooo-2019-11-22  30/05/2019                     1.0          2
7         y     rrr-16-12-2020  16/12/2020                     1.0          3
8         z     ppt-2019-12-03  07/02/2018                     1.0          1
9         z     ttt-2019-12-03  28/05/2019                     1.0          2
10        z     ttt-2019-12-03  09/09/2019                     2.0          2
11        z     ttt-2019-12-03  30/09/2019                     3.0          2
2lpgd968

2lpgd9682#

你可以试试这个:

import pandas as pd

df = pd.DataFrame([['x','iii-2019-10-16','18/04/2019'],
                   ['x','iii-2019-10-16','21/07/2019'],
                   ['x','iii-2019-10-16','12/09/2019'],
                   ['x','zzz-2020-10-25','12/04/2022'],
                   ['y','qqq-2018-05-28','10/12/2017'], 
                   ['y','qqq-2018-05-28','15/02/2018'],
                   ['y','ooo-2019-11-22','30/05/2019'],
                   ['y','rrr-16-12-2020','16/12/2020'],
                   ['z','ppt-2019-12-03','07/02/2018'],
                   ['z','ttt-2019-12-03','28/05/2019'],
                   ['z','ttt-2019-12-03','09/09/2019'],
                   ['z','ttt-2019-12-03','30/09/2019']
                  ],
                  columns=['Car_code','customer_rent_code','Rent_Date'])

df = df.sort_values(['Car_code', 'customer_rent_code', 'Rent_Date'])
df['Rent_rank'] = df.groupby(['Car_code', 'customer_rent_code']).cumcount() + 1
df['Rent_rank_per_customer'] = df.groupby(['customer_rent_code'])['Rent_Date'].rank(method='dense')

print(df)
sz81bmfz

sz81bmfz3#

df["Date"] = pd.to_datetime(df["Rent_Date"],dayfirst=True)

df = df.sort_values(by=['Car_code','Date'],ascending=[True,True])

df['Rent_rank_per_customer'] = df.groupby(["Car_code","customer_rent_code"]).Date.rank(method='dense')

items = df.drop_duplicates(subset=['Car_code','customer_rent_code'],keep='first').set_index(['Car_code','customer_rent_code']).index

mapping_dict = dict(zip(items, range(1, len(items) + 1)))
df['Rent_rank'] = df.set_index(['Car_code','customer_rent_code']).index.map(mapping_dict)
df['Rent_rank'] = df.Rent_rank - df.groupby('Car_code').transform('min').Rent_rank + 1

相关问题