下面是我的代码,您可以看到我使用np.select
来识别列中的字符串是否包含任何代码,并创建一个引用列,其中包含基于逻辑的描述。
# Creating Score column
col = 'codes_desc'
conditions = [(df_merged[col].str.contains('R27', case=False)),
(df_merged[col].str.contains('R38', case=False)),
(df_merged[col].str.contains('R52', case=False)),
(df_merged[col].str.contains('R62', case=False)),
(df_merged[col].str.contains('R21', case=False)),
(df_merged[col].str.contains('R22', case=False)),
(df_merged[col].str.contains('R23', case=False)),
(df_merged[col].str.contains('R57', case=False)),
(df_merged[col].str.contains('R82', case=False)),
(df_merged[col].str.contains('R86', case=False)),
(df_merged[col].str.contains('R20', case=False)),
(df_merged[col].str.contains('R98', case=False))
]
choices = [
'The person is a Ninja',
'The person is a Pirate',
'The person is a Doctor',
'The person is a Samurai',
'The person is a Admiral',
'The person is a Police',
'The person is a Teacher',
'The person is a Singer',
'The person is a Guitarist',
'The person is a Chef',
'The person is a Runner',
'The person is a Wizard'
]
df_merged["reference"] = np.select(conditions, choices, default= 'Reason Unknown')
但我在 Dataframe 中发现列“codes_desc”包含两个代码的情况,例如:
codes_desc
The selected codes are R27, R22.
在本例中,我希望输出类似于列“Reference”:
1. 'The person is a Ninja'
2. 'The person is a Police'
但是由于np.select
的工作方式类似于case语句;它会提取最后一个代码描述,那么怎么做呢?
1条答案
按热度按时间8zzbczxx1#
设置
溶液
让
extract
所有匹配代码,然后使用Map字典将map
codes
Map到对应值,然后使用groupby
并使用join
聚合结果