Pandas数据框架中关键词组合的搜索与分类

kg7wmglp  于 2022-11-20  发布在  其他
关注(0)|答案(1)|浏览(169)

这是在Pandas数据框中搜索某些关键词以进行分类的后续问题。
我有一个关键字列表,我想根据它对工作描述进行分类。下面是输入文件、示例关键字和代码

job_description
Managing engineer is responsible for
This job entails assisting to
Engineer is required the execute
Pilot should be able to control
Customer specialist advices
Different cases brought by human resources department

cat_dict = {
    "manager": ["manager", "president", "management", "managing"],
    "assistant": ["assistant", "assisting", "customer specialist"],
    "engineer": ["engineer", "engineering", "scientist", "architect"],
    "HR": ["human resources"]
}

def classify(desc):
    for cat, lst in cat_dict.items():
        if any(x in desc.lower() for x in lst):
            return cat

df['classification'] = df["job_description"].apply(classify)

如果只有一个词,例如“经理”或“助理”,则代码运行良好,但如果有两个词,例如“客户专员”或“人力资源”,则代码无法识别

sr4lhrrt

sr4lhrrt1#

我想你的cat_dict字典里少了一个逗号。我试过你的例子:

import pandas as pd

cat_dict = {
    "manager": ["manager", "president", "management", "managing"],
    "assistant": ["assistant", "assisting", "customer specialist"],
    "engineer": ["engineer", "engineering", "scientist", "architect"],
    "HR": ["human resources"]
}

def classify(desc):
    for cat, lst in cat_dict.items():
        if any(x in desc.lower() for x in lst):
            return cat

text_df = pd.Series(text.split('\n')[1:])
text_df.apply(classify)

结果:

0      manager
1    assistant
2     engineer
3         None
4    assistant
5           HR
dtype: object

成功地将助理分为“客户专员”和将HR分为“人力资源”。

相关问题