显示列中包含的字符串

thtygnil  于 2022-10-23  发布在  其他
关注(0)|答案(1)|浏览(162)

如何创建名为“reason”的列,显示哪些字符串匹配?

match_d = {"col_a":["green", "purple"], "col_b":["weak", "stro", "strong"],...}
df
    fruit   col_a     col_b
0   apple   yellow     NaN
1   pear    blue       NaN
2   banana  green      strong
3   cherry  green      heavy
4   grapes  brown      light
...

预期产出

fruit   col_a     col_b      reason
0   apple   yellow     NaN        NaN
1   pear    blue       NaN        NaN
2   banana  green      strong     col_a:["green"], col_b:["stro", "strong"]
3   cherry  green      heavy      col_a:["green"] 
4   grapes  brown      light
cl25kdpy

cl25kdpy1#

match_d连接匹配值使用嵌套列表理解,如果不是空字符串,则使用列名连接值:

match_d = {"col_a":["green", "purple"], "col_b":["weak", "stro", "strong"]}

cols = list(match_d.keys())
L = [[','.join(z for z in match_d[x] if pd.notna(y) and z in y) 
      for y in df[x]]  for x in df[cols]]

df['reason'] = [np.nan if ''.join(x) == '' else ';'.join(f'{a}:[{b}]' 
                for a, b in zip(cols, x) if b != '') 
                for x in zip(*L)]
print (df)
    fruit   col_a   col_b                             reason
0   apple  yellow     NaN                                NaN
1    pear    blue     NaN                                NaN
2  banana   green  strong  col_a:[green];col_b:[stro,strong]
3  cherry   green   heavy                      col_a:[green]
4  grapes   brown   light                                NaN

.apply的替代解决方案:

match_d = {"col_a":["green", "purple"], "col_b":["weak", "stro", "strong"]}

cols = list(match_d.keys())
df1 = df[cols].apply(lambda x: [','.join(z for z in match_d[x.name] 
                                         if pd.notna(y) and z in y) for y in x])

df['reason'] = [np.nan if ''.join(x) == '' else ';'.join(f'{a}:[{b}]' 
                for a, b in zip(cols, x) if b != '') 
                for x in df1.to_numpy()]

编辑:要为reason列中未缺失的值添加子字符串,请使用:

m = df['reason'].notna()
df.loc[m, 'reason'] = 'fruit[not_empty];' + df.loc[m, 'reason']
print (df)
    fruit   col_a   col_b                                             reason
0   apple  yellow     NaN                                                NaN
1    pear    blue     NaN                                                NaN
2  banana   green  strong  fruit[not_empty];col_a:[green];col_b:[stro,str...
3  cherry   green   heavy                     fruit[not_empty];col_a:[green]
4  grapes   brown   light                                                NaN

相关问题