pandas 细化数据框的分隔

yws3nbqq  于 2023-05-15  发布在  其他
关注(0)|答案(1)|浏览(79)

一个数据框有三列(model、lot_1和class)

import pandas as pd

df = pd.DataFrame({'model':['A','A','A','A', 'A','A','A','A','B','B','B','B','B','B'],
                  'lot_1':[0,0,0,0,0,1,2,2,0,0,1,1,2,2],
                  'class':['0','3','3','2','2','1','5','5','0','0','0','1','1','1']})

我想将Lot_1按类别按型号划分,并将其添加到Lot_2。
Lot_1的编号必须保持有序,类别的编号并不意味着顺序或大小(只是为了区分)
即使是同一个lot_1号,如果类不一样,按Lot_1的顺序除,应该是下一个号
当模型改变时,lot_1再次从零开始
我想要的在下面

m3eecexj

m3eecexj1#

对两列的连续列使用DataFrame.cumsumDataFrame.shift,并传递到GroupBy.rank

s = df[['lot_1','class']].ne(df[['lot_1','class']].shift()).any(axis=1).cumsum()

df['lot_2'] = s.groupby(df['model']).rank(method='dense').astype(int).sub(1)
print (df)
   model  lot_1 class  lot_2
0      A      0     0      0
1      A      0     3      1
2      A      0     3      1
3      A      0     2      2
4      A      0     2      2
5      A      1     1      3
6      A      2     5      4
7      A      2     5      4
8      B      0     0      0
9      B      0     0      0
10     B      1     0      1
11     B      1     1      2
12     B      2     1      3
13     B      2     1      3

使用factorize的替代解决方案:

s = df[['lot_1','class']].ne(df[['lot_1','class']].shift()).any(axis=1).cumsum()

df['lot_2'] = s.groupby(df['model']).transform(lambda x: pd.factorize(x)[0])
print (df)
   model  lot_1 class  lot_2
0      A      0     0      0
1      A      0     3      1
2      A      0     3      1
3      A      0     2      2
4      A      0     2      2
5      A      1     1      3
6      A      2     5      4
7      A      2     5      4
8      B      0     0      0
9      B      0     0      0
10     B      1     0      1
11     B      1     1      2
12     B      2     1      3
13     B      2     1      3

相关问题