pandas 在python中检查该列是否有字符串元素与其他列字符串列表匹配

ckx4rj1h  于 2022-12-28  发布在  Python
关注(0)|答案(2)|浏览(190)

df

CAR1                        CAR2
['ford','hyundai']         ['ford','hyundai']
['ford','hyundai']         ['hyundai','nissan']
['ford','hyundai']         ['bmw', 'audi']

预期产出:

CAR1                        CAR2                   Flag
['ford','hyundai']         ['ford','hyundai']        1
['ford','hyundai']         ['hyundai','nissan']      1
['ford','hyundai']         ['bmw', 'audi']           0

如果CAR1中的任何元素/字符串与CAR2匹配,则引发标志1,否则引发标志0
我的尝试是:

df[[x in y for x,y in zip(df['CAR1'], df['CAR2'])]
jfgube3f

jfgube3f1#

EDIT:首先将列转换为列表:

import ast

cols = ['CAR1','CAR2']
df[cols] = df[cols].apply(ast.literal_eval)

在列表解析中使用set.intersection,并转换为布尔值和整数,以实现True,False1/0的Map:

df['Flag'] = [int(bool(set(x).intersection(y))) for x,y in zip(df['CAR1'], df['CAR2'])]

替代解决方案:

df['Flag'] = [1 if set(x).intersection(y) else 0 for x,y in zip(df['CAR1'], df['CAR2'])]

print (df)
              CAR1               CAR2  Flag
0  [ford, hyundai]    [ford, hyundai]     1
1  [ford, hyundai]  [hyundai, nissan]     1
2  [ford, hyundai]        [bmw, audi]     0
vc9ivgsu

vc9ivgsu2#

您可以在列表解析中使用set运算(如果集合重叠,isdisjoint返回False,使用1-x将其反转并转换为整数):

df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in zip(df['CAR1'], df['CAR2'])]
  • 注意:isdisjoint的速度非常快,因为它不需要读取完整的集合,只要找到一个公共项就返回False。*

输出:

CAR1               CAR2  Flag
0  [ford, hyundai]    [ford, hyundai]     1
1  [ford, hyundai]  [hyundai, nissan]     1
2  [ford, hyundai]        [bmw, audi]     0
从字符串
from ast import literal_eval

df['Flag'] = [1-set(s1).isdisjoint(s2) for s1, s2 in
               zip(df['CAR1'].apply(literal_eval),
                   df['CAR2'].apply(literal_eval))]

相关问题