pandas 将DataFrame组的最后一个值分配给该组的所有子组

qyuhtwio 于 2023-11-15 发布在其他

关注(0)|答案(3)|浏览(125)

在Python Pandas中，我有一个DataFrame。我将这个DataFrame按列分组，并希望将一列的最后一个值分配给另一列的所有行。
我知道我可以通过以下命令选择组的最后一行：

import pandas as pd

df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)})
print(df)
print("-")
result = df.groupby('a').nth(-1)
print(result)

字符串
测试结果：

型
如何将这个操作的结果赋值回原始的嵌套框架，这样我就有了这样的东西：

a   b b_new
0  1  20 21
1  1  21 21
2  2  30 30
3  3  40 41
4  3  41 41

型

pandas

来源：https://stackoverflow.com/questions/47924400/assign-last-value-of-dataframe-group-to-all-entries-of-that-group

3条答案

按热度按时间

baubqpgj1#

使用transform和last：

df['b_new'] = df.groupby('a')['b'].transform('last')

字符串
备选方案：

df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])

print(df)
   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

型
使用nth和join的解决方案：

df = df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')
print(df)
   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

型

时间：

N = 10000

df = pd.DataFrame({'a':np.random.randint(1000,size=N),
                   'b':np.random.randint(10000,size=N)})

#print (df)

def f(df):
    return df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')

#cᴏʟᴅsᴘᴇᴇᴅ1
In [211]: %timeit df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))
100 loops, best of 3: 3.57 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2
In [212]: %timeit df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))
10 loops, best of 3: 71.3 ms per loop

#jezrael1
In [213]: %timeit df['b_new'] = df.groupby('a')['b'].transform('last')
1000 loops, best of 3: 1.82 ms per loop

#jezrael2
In [214]: %timeit df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])
10 loops, best of 3: 178 ms per loop
    
#jezrael3
In [219]: %timeit f(df)
100 loops, best of 3: 3.63 ms per loop

型

注意事项

结果并没有解决给定组数量的性能问题，这将对其中一些解决方案的计时产生很大影响。

赞(0）回复(0）举报 2023-11-15

a11xaf1n2#

两种可能性，groupby + nth + map或replace

df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))

字符串
或者，

df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))

型
您也可以将nth(-1)替换为last()（实际上，这样做会使速度更快），但是nth为您提供了从b中的每个组中选择哪些项的更大灵活性。

df

   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

型

赞(0）回复(0）举报 2023-11-15

7gcisfzg3#

我想这应该很快

df.merge(df.drop_duplicates('a',keep='last'),on='a',how='left')
Out[797]: 
   a  b_x  b_y
0  1   20   21
1  1   21   21
2  2   30   30
3  3   40   41
4  3   41   41

字符串

赞(0）回复(0）举报 2023-11-15

我来回答

pandas 将DataFrame组的最后一个值分配给该组的所有子组

3条答案

相关问题

热门标签

最新问答