pandas 根据另一个列表的条件过滤数据

cgfeq70w  于 2022-12-21  发布在  其他
关注(0)|答案(3)|浏览(146)

我有一张名单。

name= ["John Lewis", "Michael Armstrong", "Kurt Abela","Brian Watson", "Gregory Dubois"]

name = pd.DataFrame(name)

我有一个名为dfpandas.DataFrame()

df = pd.DataFrame(
    {'Name':['Karan Singh,John Lewis', 'Michael Armstrong, Fabian Schreiber', 'Roy Dalhuisen', 'Arya Yildirim,Gregory Dubois'],
    'ID':[23,22,21,24]})

现在,我想过滤df,只有在name中出现的名称在过滤后也会出现在df中。
我试过了,但没用:

df = df[~df.index.isin(name.index)
6pp0gazn

6pp0gazn1#

您可以像这样使用apply

import pandas as pd
df={'Name':[['Karan Singh,John Lewis'],['Michael Armstrong, Fabian Schreiber'],['Roy Dalhuisen'],['Arya Yildirim,Gregory Dubois'],["hh,bb"]],'ID':[23,22,21,24,28]}
#df to pandas
df = pd.DataFrame(df)

#union all the names and remove , 
all_names=[]
for name in df['Name'].values.tolist():
    all_names.extend(name[0].split(','))

print(all_names)

def filter_names(row):
    names = row.split(',')
    return any(name in names for name in all_names)

df_filtered = df[df['Name'].apply(filter_names)]
print(df_filtered)

结果

Name  ID
0               Karan Singh,John Lewis  23
1  Michael Armstrong, Fabian Schreiber  22
2                        Roy Dalhuisen  21
3         Arya Yildirim,Gregory Dubois  24
4                                hh,bb  28

                                  Name  ID
0               Karan Singh,John Lewis  23
1  Michael Armstrong, Fabian Schreiber  22
3         Arya Yildirim,Gregory Dubois  24
wlp8pajw

wlp8pajw2#

您可以像这样使用:第一个月

um6iljoc

um6iljoc3#

示例

name= ["John Lewis","Michael Armstrong","Kurt Abela","Brian Watson","Gregory Dubois"]
data = {'Name':['Karan Singh,John Lewis','Michael Armstrong, Fabian Schreiber','Roy Dalhuisen','Arya Yildirim,Gregory Dubois'],'ID':[23,22,21,24]}
df = pd.DataFrame(data)

第一个月

Name                                ID
0   Karan Singh,John Lewis              23
1   Michael Armstrong, Fabian Schreiber 22
2   Roy Dalhuisen                       21
3   Arya Yildirim,Gregory Dubois        24

我不知道你到底想要什么。

代码1

out = (df.assign(Name=df['Name'].str.split(','))
       .explode('Name')[lambda x: x['Name'].isin(name)])

out

Name                ID
0   John Lewis          23
1   Michael Armstrong   22
3   Gregory Dubois      24

代码2

out = df[df['Name'].str.contains('|'.join(name))]

out

Name                                ID
0   Karan Singh,John Lewis              23
1   Michael Armstrong, Fabian Schreiber 22
3   Arya Yildirim,Gregory Dubois        24

更新

name= ["John Lewis", "Michael Armstrong", "Kurt Abela","Brian Watson", "Gregory Dubois"]
name = pd.DataFrame(name)

df = pd.DataFrame(
    {'Name':[['Karan Singh','John Lewis'], ['Michael Armstrong', 'Fabian Schreiber'], ['Roy Dalhuisen'], ['Arya Yildirim', 'Gregory Dubois']],
    'ID':[23,22,21,24]})

name

0
0   John Lewis
1   Michael Armstrong
2   Kurt Abela
3   Brian Watson
4   Gregory Dubois

df

Name                                    ID
0   [Karan Singh, John Lewis]               23
1   [Michael Armstrong, Fabian Schreiber]   22
2   [Roy Dalhuisen]                         21
3   [Arya Yildirim, Gregory Dubois]         24

代码

out = df[df['Name'].apply(lambda x: name[0].isin(x).sum() > 0)]

out

Name                                    ID
0   [Karan Singh, John Lewis]               23
1   [Michael Armstrong, Fabian Schreiber]   22
3   [Arya Yildirim, Gregory Dubois]         24

相关问题