pandas 使用字典的值保留列中的字符串

stszievb  于 2022-11-27  发布在  其他
关注(0)|答案(3)|浏览(143)

我想根据字典的键和值保留最大值的字符串。有什么建议吗?

fruit_dict = {
  "Apple": 10,
  "Watermelon": 20,
  "Cherry": 30
}

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Apple, Watermelon",
            "Cherry, Watermelon",
            "Apple",
            "Cherry, Apple",
            "Cherry",
        ],
    }
)

   ID                name
0   1   Apple, Watermelon
1   2  Cherry, Watermelon
2   3               Apple
3   4       Cherry, Apple
4   5              Cherry

预期输出:

ID        name
0   1  Watermelon
1   2      Cherry
2   3       Apple
3   4      Cherry
4   5      Cherry
c3frrgcw

c3frrgcw1#

使用apply的一种方法是将maxfruit_dict.get作为密钥:

new_df = (df.assign(name=df['name'].str.split(', ')
            .apply(lambda l: max(l, key=fruit_dict.get)))
          )

或者,如果您预期字典中会遗漏某些名称:

new_df = (df.assign(name=df['name'].str.split(', ')
            .apply(lambda l: max(l, key=lambda x: fruit_dict.get(x, float('-inf'))))
          )

输出:

ID        name
0   1  Watermelon
1   2      Cherry
2   3       Apple
3   4      Cherry
4   5      Cherry
pcrecxhr

pcrecxhr2#

用途:

df = (df.assign(name= df['name'].str.split(', '))
        .explode('name')
       .assign(new = lambda x: x['name'].map(fruit_dict))
        .sort_values(['ID', 'new'], ascending=[True, False])
        .drop_duplicates('ID')
       )
print (df)
   ID        name  new
0   1  Watermelon   20
1   2      Cherry   30
2   3       Apple   10
3   4      Cherry   30
4   5      Cherry   30

或者:

df['new'] = df['name'].apply(lambda x: max(x.split(', '), key=fruit_dict.get))
print (df)
   ID                name         new
0   1   Apple, Watermelon  Watermelon
1   2  Cherry, Watermelon      Cherry
2   3               Apple       Apple
3   4       Cherry, Apple      Cherry
4   5              Cherry      Cherry

EDIT:如果没有匹配项,则返回第一个值:
第一次
如果需要NaN s如果没有匹配:
第一个

zz2j4svz

zz2j4svz3#

fruit_dict = {
    "Apple": 10,
    "Watermelon": 20,
    "Cherry": 30
}

df.assign(name=df.name.str.split(',')).name.map(lambda x:pd.Series(fruit_dict)[x].nlargest().index.values[0])

0    Watermelon
1        Cherry
2         Apple
3        Cherry
4        Cherry
Name: name, dtype: object

相关问题