包含列表的列上的Pandas连接-匹配任意

6vl6ewon  于 2023-02-14  发布在  其他
关注(0)|答案(2)|浏览(128)

我有两个 Dataframe
我想在一个列上进行连接其中一列是列表,
如果列表中的任何值匹配,则需要加入

df1 = 

| index | col_1 |
| ----- | ----- |
| 1     | 'a'   |
| 2     | 'b'   |

df2 = 

| index_2 | col_1            |
| ------- | -----            |
| A       | ['a', 'c']       |
| B       | ['a', 'd', 'e']  |

I am looking something like  
df1.join(df2, on='col_1', type_=any, type='left')

| index |col_1_x |index_2|col_1_y        |
| ----- |--------|_______| -----         |
| 1     |'a'     | A     |['a', 'c']     |
| 1     |'a'     | A     |['a', 'd', 'e']|
km0tfn4u

km0tfn4u1#

您可以执行以下操作:

import pandas as pd

df1 = pd.DataFrame({'index': [1, 2], 'col_1': ['a', 'b']})
df2 = pd.DataFrame({'index_2': ['A', 'B'], 'col_1': [['a', 'c'], ['a', 'd', 'e']]})

# check for matches
def any_match(list1, list2):
    if list1 is None or list2 is None:
        return False
    return any(x in list2 for x in list1)

# join the dataframes based on matching values
result = pd.merge(df1, df2, how='cross')
result = result[result.apply(lambda x: any_match(x['col_1_x'], x['col_1_y']), axis=1)]

print(result[['index', 'col_1_x', 'index_2', 'col_1_y']])

该函数返回:

index col_1_x index_2    col_1_y
0      1       a       A     [a, c]
1      1       a       B  [a, d, e]
wkyowqbh

wkyowqbh2#

您可以使用explode,然后使用merge,如下所示:

import pandas as pd

# Create the input dataframes
df1 = pd.DataFrame({'index': [1, 2], 'col_1': ['a', 'b']})
df2 = pd.DataFrame({'index_2': ['A', 'B'], 'col_1': [['a', 'c'], ['a', 'd', 'e']]})

# Explode the list column in df2 to multiple rows
df2_exploded = df2.explode('col_1')

# Perform a regular join on the common column
result = df1.merge(df2_exploded, left_on='col_1', right_on='col_1', how='left')

# Get the "col_1" from un-exploded data
result = result.merge(df2, on='index_2', how='left').dropna()

df_exploded看起来像这样:

index_2 col_1
0       A     a
0       A     c
1       B     a
1       B     d
1       B     e

最终的result如下所示:

index col_1_x index_2    col_1_y
0      1       a       A     [a, c]
1      1       a       B  [a, d, e]

相关问题