pandas 如何用以前的值填充列直到满足条件

k3bvogb1  于 2023-09-29  发布在  其他
关注(0)|答案(3)|浏览(94)

背景我有一个由数百万个值组成的大型数据框。它看起来像下面

import pandas as pd
df = pd.DataFrame({'level':[1,1,1,2,3,2,2,1,2,1,2,3,3,4,5,2,3,4,5,1,1],
                   'type':['','','com','','','','','','','','com','','','','','','','com','','','']})
print(df)
     level  type
0       1     
1       1     
2       1   com
3       2     
4       3     
5       2     
6       2     
7       1     
8       2     
9       1     
10      2   com
11      3     
12      3     
13      4     
14      5     
15      2     
16      3     
17      4   com
18      5     
19      1     
20      1

我想把我的画框洗成这样。将type=com后的前一个值填入空白处,直到level等于/小于带有com的起始行的level。有什么想法吗?非常感谢。

level  type   type_new
0       1              
1       1              
2       1  com      com
3       2           com
4       3           com
5       2           com
6       2           com
7       1              
8       2              
9       1              
10      2  com      com
11      3           com
12      3           com
13      4           com
14      5           com
15      2              
16      3              
17      4  com      com
18      5           com
19      1              
20      1
wn9m85ua

wn9m85ua1#

一个选择:

# identify non-empty
m = df['type'].ne('')

# mask and forward fill
s = df['level'].where(m).ffill()

# set up "com" on values above threshold
df['type_new'] = ''
df.loc[(m|s.lt(df['level'])).groupby(m.cumsum()).cummin(), 'type_new'] = 'com'

另一种方法:

# identify non-empty
m = df['type'].ne('')

# forward-fill
tmp = df.where(m).ffill()

# identify values that return to initial value
m2 = df['level'].mask(m).groupby(m.cumsum()).cummin().gt(tmp['level'])

# assign
df['type_new'] = df['type'].mask(m2).ffill()

输出量:

level type type_new
0       1              
1       1              
2       1  com      com
3       2           com
4       3           com
5       2           com
6       2           com
7       1              
8       2              
9       1              
10      2  com      com
11      3           com
12      3           com
13      4           com
14      5           com
15      2              
16      3              
17      4  com      com
18      5           com
19      1              
20      1
iswrvxsc

iswrvxsc2#

cond1 = df['type'].eq('com')
grp = cond1.cumsum()
cond2 = df['level'] <= df.groupby(grp)['level'].transform('first')
df['type_new'] = df['type'].mask(grp.gt(0) & cond2.groupby(grp).cumsum().eq(1), 'com')

df

level   type type_new
0   1       
1   1       
2   1   com com
3   2       com
4   3       com
5   2       com
6   2       com
7   1       
8   2       
9   1       
10  2   com com
11  3       com
12  3       com
13  4       com
14  5       com
15  2       
16  3       
17  4   com com
18  5       com
19  1       
20  1
v09wglhw

v09wglhw3#

用途:

#forward fill type
s = df['type'].replace('', np.nan).ffill()
  
#test com    
m = df['type'].eq('com')

#test if level is less or equal by forward filled first value if com
mask = df['level'].le(df['level'].where(m).ffill()) & ~m

 #per groups by com set empty string
df['type_new'] = s.mask(mask.groupby(m.cumsum()).cummax(), '').fillna('')
print(df)
    level type type_new
0       1              
1       1              
2       1  com      com
3       2           com
4       3           com
5       2           com
6       2           com
7       1              
8       2              
9       1              
10      2  com      com
11      3           com
12      3           com
13      4           com
14      5           com
15      2              
16      3              
17      4  com      com
18      5           com
19      1              
20      1

相关问题