符合多种条件的Pandas;连接结果以创建新列

2ekbmq32  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(129)

下面是我的代码,您可以看到我使用np.select来识别列中的字符串是否包含任何代码,并创建一个引用列,其中包含基于逻辑的描述。

# Creating Score column
col         = 'codes_desc'
conditions  = [(df_merged[col].str.contains('R27', case=False)),
               (df_merged[col].str.contains('R38', case=False)),
               (df_merged[col].str.contains('R52', case=False)),
               (df_merged[col].str.contains('R62', case=False)),
               (df_merged[col].str.contains('R21', case=False)),
               (df_merged[col].str.contains('R22', case=False)),
               (df_merged[col].str.contains('R23', case=False)),
               (df_merged[col].str.contains('R57', case=False)),
               (df_merged[col].str.contains('R82', case=False)),
               (df_merged[col].str.contains('R86', case=False)),
               (df_merged[col].str.contains('R20', case=False)), 
               (df_merged[col].str.contains('R98', case=False)) 
              ]

choices     = [ 
 'The person is a Ninja',
 'The person is a Pirate',
 'The person is a Doctor',
 'The person is a Samurai',
 'The person is a Admiral',
 'The person is a Police',
 'The person is a Teacher',
 'The person is a Singer',
 'The person is a Guitarist',
 'The person is a Chef',
 'The person is a Runner',
 'The person is a Wizard'
]
df_merged["reference"] = np.select(conditions, choices, default= 'Reason Unknown')

但我在 Dataframe 中发现列“codes_desc”包含两个代码的情况,例如:

codes_desc

The selected codes are R27, R22.

在本例中,我希望输出类似于列“Reference”:

1. 'The person is a Ninja'
2. 'The person is a Police'

但是由于np.select的工作方式类似于case语句;它会提取最后一个代码描述,那么怎么做呢?

8zzbczxx

8zzbczxx1#

设置

print(df)
   codes_desc
0  fo bar R20
1   R98 grok 
2    R98, R21
3         R82

溶液

extract所有匹配代码,然后使用Map字典将mapcodesMap到对应值,然后使用groupby并使用join聚合

d = {'R27': 'The person is a Ninja',
     'R38': 'The person is a Pirate',
     'R52': 'The person is a Doctor',
     'R62': 'The person is a Samurai',
     'R21': 'The person is a Admiral',
     'R22': 'The person is a Police',
     'R23': 'The person is a Teacher',
     'R57': 'The person is a Singer',
     'R82': 'The person is a Guitarist',
     'R86': 'The person is a Chef',
     'R20': 'The person is a Runner',
     'R98': 'The person is a Wizard'}

pat = r'\b(%s)\b' % '|'.join(d)
codes = df['codes_desc'].str.extractall(pat)[0]
df['reference'] = codes.map(d).groupby(level=0).agg(', '.join)

结果

codes_desc                                        reference
0  fo bar R20                           The person is a Runner
1   R98 grok                            The person is a Wizard
2    R98, R21  The person is a Wizard, The person is a Admiral
3         R82                        The person is a Guitarist

相关问题