groupby中的Pandas标度列

ssm49v7z  于 2022-12-16  发布在  其他
关注(0)|答案(1)|浏览(166)

我想在两个键列分组的[0,1]之间缩放列“amount”。

key1  key2 amount
0   a   1   10  
1   a   1   20  
2   a   1   30  
3   a   2   10  
4   a   2   40
5   a   2   100 
6   b   1   30  
7   b   1   150 
8   b   1   150 
9   b   2   0   
10  b   2   100 
11  b   2   1000

应该变成

key1  key2 amount  amount_scaled
0   a   1   10      0
1   a   1   20      0.5
2   a   1   30      1
3   a   2   10      0
4   a   2   40      0.25
5   a   2   100     1
6   b   1   30      0
7   b   1   150     1
8   b   1   150     1
9   b   2   0       0
10  b   2   100     0.1
11  b   2   1000    1

我试过了

from sklearn.preprocessing import MinMaxScaler
df = pd.DataFrame({"key1":['a','a','a','a','a','a','b','b','b','b','b','b'],"key2":[1,1,1,2,2,2,1,1,1,2,2,2],"amount":[10,20,30,10,40,100,30,150,150,0,100,1000]})
df.groupby(['key1','key2'])['amount'].apply(lambda x: MinMaxScaler().fit_transform(x))

没有成功有什么建议吗

afdcj2ne

afdcj2ne1#

您可以使用纯种Pandas:

g = df.groupby(['key1', 'key2'])['amount']
MIN = g.transform('min')
df['amount_scaled'] = df['amount'].sub(MIN).div(g.transform('max').sub(MIN))

或使用函数:

def scale(g):
    MIN = g.min()
    return g.sub(MIN).div(g.max()-MIN)

df['amount_scaled'] = df.groupby(['key1', 'key2'])['amount'].transform(scale)

输出:

key1  key2  amount  amount_scaled
0     a     1      10       0.000000
1     a     1      20       0.500000
2     a     1      30       1.000000
3     a     2      10       0.000000
4     a     2      40       0.333333
5     a     2     100       1.000000
6     b     1      30       0.000000
7     b     1     150       1.000000
8     b     1     150       1.000000
9     b     2       0       0.000000
10    b     2     100       0.100000
11    b     2    1000       1.000000

相关问题