这是在Pandas数据框中搜索某些关键词以进行分类的后续问题。
我有一个关键字列表,我想根据它对工作描述进行分类。下面是输入文件、示例关键字和代码
job_description
Managing engineer is responsible for
This job entails assisting to
Engineer is required the execute
Pilot should be able to control
Customer specialist advices
Different cases brought by human resources department
cat_dict = {
"manager": ["manager", "president", "management", "managing"],
"assistant": ["assistant", "assisting", "customer specialist"],
"engineer": ["engineer", "engineering", "scientist", "architect"],
"HR": ["human resources"]
}
def classify(desc):
for cat, lst in cat_dict.items():
if any(x in desc.lower() for x in lst):
return cat
df['classification'] = df["job_description"].apply(classify)
如果只有一个词,例如“经理”或“助理”,则代码运行良好,但如果有两个词,例如“客户专员”或“人力资源”,则代码无法识别
1条答案
按热度按时间sr4lhrrt1#
我想你的cat_dict字典里少了一个逗号。我试过你的例子:
结果:
成功地将助理分为“客户专员”和将HR分为“人力资源”。