pandas 从列表中的字符串创建列

1mrurvl1  于 2022-11-20  发布在  其他
关注(0)|答案(2)|浏览(152)

我试图从一个列表中创建一组列,该列表从另一列中获取一个字符串。
我在this post中找到了一个临时的解决方案,但是它只创建一个列,例如,我在String1中有“我有一只狗和一只猫”。

In [7]: df["animal"] = df["String1"].map(lambda s: next((animal for animal in search_list if animal in s), "other"))
   ...:

In [8]: df
Out[8]:
   weight                  String1 animal
0      70        Labrador is a dog    dog
1      10      Abyssinian is a cat    cat
2      65  German Shepard is a dog    dog
3       1         pigeon is a bird  other

如何创建两个列,如['animal_1']和[' animal_2'],使['animal_1']中同时包含“dog”和[' animal_2']中的“cat”?
所需的输出如下所示:

weight                  String1 animal_1 animal_2
0      70        Labrador is a dog    dog
1      10      Abyssinian is a cat    cat
2      65  German Shepard is a dog    dog
3       1         pigeon is a bird  other
4      30   I have a dog and a cat    dog   cat
f0brbegy

f0brbegy1#

您可以用途:

animals = ['dog', 'cat']
regex = '|'.join(animals)

out = (df.join(
         df['String1'].str.extractall(fr'\b({regex})\b')[0].unstack()
           .rename(columns=lambda x: f'animal_{x+1}')
        )
          .fillna({'animal_1': 'other'})
     )

输出量:

weight                  String1 animal_1 animal_2
0      70        Labrador is a dog      dog      NaN
1      10      Abyssinian is a cat      cat      NaN
2      65  German Shepard is a dog      dog      NaN
3       1         pigeon is a bird    other      NaN
4      30   I have a dog and a cat      dog      cat
ltqd579y

ltqd579y2#

最好在一开始就编译正则表达式,然后在循环中使用编译后的正则表达式。

import re
import pandas as pd

ANIMALS = {"dog", "cat"}
PATTERN = re.compile("|".join(rf"\b{x}\b" for x in ANIMALS))

data = {"String1": ["Labrador is a dog", "Abyssinian is a cat", "German Shepard is a dog", "pigeon is a bird", "I have a dog and a cat"]}
df = pd.DataFrame(data)

for ix, item in df["String1"].items():
    for i, animal in enumerate(pattern.findall(item)):
        df.loc[ix, f"animal_{i+1}"] = animal
df.fillna({"animal_1": "other"}, inplace=True)

相关问题