python 应用函数查找不在列表中的元素

ct2axkht  于 2022-12-17  发布在  Python
关注(0)|答案(3)|浏览(190)

我想应用一个函数,返回引用列表中找不到的元素。我想得到的是下面的内容。

import pandas as pd

product_list = ['Chive & Garlic', 'The Big Smoke',
                'Jalapeno & Lemon', 'Spinach & Artichoke']

data = [['ACTIVE BODY', ['Chive & Garlic', 'The Big Smoke'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
        ['AG VALLEY FOODS', ['Chive & Garlic', 'Spinach & Artichoke'], ['The Big Smoke', 'Jalapeno & Lemon']],
        ['ALIM MICHEL HALLORAN', ['The Big Smoke', 'Chive & Garlic'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
        ['ALIMENTATION IAN DES', ['The Big Smoke', 'Jalapeno & Lemon'],['Chive & Garlic', 'Spinach & Artichoke']]]

df = pd.DataFrame(data, columns=['store', 'products', 'missing_products'])

其中missing_products是列表类型的产品,在products列的数组中找不到
我尝试了以下功能,但无法正常工作

def gap(row):
    for item in product_list:
        if item not in row:
            return item

值得注意的是,products列中的每个值都是一个数组,而不是字符串列表。不确定这是否会产生影响。

[['ACADEMIE DU GOURMET ACADEMY INC', array([nan], dtype=object)],
 ['ACTIVE BODY',
  array(['Chive & Garlic', 'Garlic Tzatziki', 'The Big Smoke'], dtype=object)],
 ['AG VALLEY FOODS',
  array(['Chive & Garlic', 'Spinach & Artichoke'], dtype=object)],
 ['ALIM MICHEL HALLORAN',
  array(['The Meadow', 'The Big Smoke', 'Chive & Garlic',
         'Jalapeno & Lemon', 'Dill & Truffle'], dtype=object)],
 ['ALIMENTATION IAN DES',
  array(['The Big Smoke', 'Jalapeno & Lemon'], dtype=object)]]

提前感谢您的帮助!

t5zmwmid

t5zmwmid1#

创建帮助器列表并追加匹配值:

def gap(row):
    out = []
    for item in product_list:
        if item not in row:
            out.append(item)
    return out

具有列表解析的备选项:

def gap(row):
    return [item for item in product_list if item not in row]

df['missing_products1'] = df['products'].apply(gap)

仅列出理解解决方案:

df['missing_products1'] = [[item for item in product_list if item not in row] for row in df['products']]
lx0bsm1f

lx0bsm1f2#

我建议使用set操作,这应该是最有效的:

S = set(product_list)

df['missing_products'] = [list(S.difference(x)) for x in df['products']]

输出:

store                               products  \
0           ACTIVE BODY        [Chive & Garlic, The Big Smoke]   
1       AG VALLEY FOODS  [Chive & Garlic, Spinach & Artichoke]   
2  ALIM MICHEL HALLORAN        [The Big Smoke, Chive & Garlic]   
3  ALIMENTATION IAN DES      [The Big Smoke, Jalapeno & Lemon]   

                          missing_products  
0  [Spinach & Artichoke, Jalapeno & Lemon]  
1        [Jalapeno & Lemon, The Big Smoke]  
2  [Spinach & Artichoke, Jalapeno & Lemon]  
3    [Spinach & Artichoke, Chive & Garlic]
41zrol4v

41zrol4v3#

您可以将 Dataframe 创建为二进制 Dataframe ,其中如果商店有该产品,则放置1,如果没有,则放置0
这样,它可以更通用,而不仅仅是数据框中的列表。

相关问题