pandas 通过匹配两个 Dataframe 的完整行创建布尔掩码

9o685dep  于 2023-01-04  发布在  其他
关注(0)|答案(3)|浏览(167)

我有两个 Dataframe ,每个 Dataframe 包含两列美国州和城镇。我想在第一个 Dataframe 中创建一个新列,该列包含布尔值,以指示与州配对的城镇是否在第二个 Dataframe 中。
示例:

df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'], 
                   'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})

df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'], 
                   'capitals':['paris', 'madrid', 'london', 'rome']})

df

  countries capitals
0    france    paris
1   germany   berlin
2     spain   madrid
3        uk   london
4    norway     oslo
5     italy     rome

df2

  countries capitals
0    france    paris
1     spain   madrid
2        uk   london
3     italy     rome

我想做的是

df> countries  capitals  bool
    france     paris     True
    germany    berlin    False
    spain      madrid    True
    uk         london    True
    norway     oslo      False
    italy      rome      True

谢谢大家!

50pmv0ei

50pmv0ei1#

执行带指示符的完全外部联接。

u = df.merge(df2, how='outer', indicator='bool')
u['bool'] = u['bool'] == 'both'
u

  countries capitals   bool
0    france    paris   True
1   germany   berlin  False
2     spain   madrid   True
3        uk   london   True
4    norway     oslo  False
5     italy     rome   True

在中间步骤中,我们看到

df.merge(df2, how='outer', indicator='bool')

  countries capitals       bool
0    france    paris       both
1   germany   berlin  left_only
2     spain   madrid       both
3        uk   london       both
4    norway     oslo  left_only
5     italy     rome       both

indicator指定了行的位置。现在我们要标记所有bool显示为“both”的行(以获得您想要的输出)。

58wvjzkj

58wvjzkj2#

方法isin就可以做到这一点:

>>> df1['bool'] = df1['countries'].isin(df2['countries'].values)
>>> df1
  countries capitals   bool
0    france    paris   True
1   germany   berlin  False
2     spain   madrid   True
3        uk   london   True
4    norway     oslo  False
5     italy     rome   True
0mkxixxg

0mkxixxg3#

df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'], 
                   'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})

df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'], 
                   'capitals':['paris', 'madrid', 'london', 'rome']})

df['bool'] = False

# Loop efficiently through pandas data frame
for idx, row in df.iterrows():
    if row.countries in df2.countries.values:
        df.loc[idx, 'bool'] = True 

print(df)
countries capitals   bool
0    france    paris   True
1   germany   berlin  False
2     spain   madrid   True
3        uk   london   True
4    norway     oslo  False
5     italy     rome   True

相关问题