如何基于多个列连接pandas嵌套?

a2mppw5e  于 2024-01-04  发布在  其他
关注(0)|答案(1)|浏览(86)

我有两个像下面这样的框架:

level                        title
Level 0                      Effective
Level 1                      Evaluation
Level 1                      Ice Breaker
Level 1                      Fire
Level 2                      Introduction
Level 2                      Understanding
Level 3                      Connect

字符串

level                        title
Level 0                      Effective
Level 1                      Evaluation
Level 1                      Comedy
Level 2                      Introduction
Level 2                      Understanding
Level 4                      Connect


我想根据这两列来连接它们,匹配的条目应该在彼此的前面,而缺失的条目应该有Null。所需的输出如下所示:

level               title               level               title
Level 0             Effective           Level 0             Effective
Level 1             Evaluation          Level 1             Evaluation
Level 1             Ice Breaker 
                                        Level 1             Comedy
Level 1             Fire    
Level 2             Introduction        Level 2             Introduction
Level 2             Understanding       Level 2             Understanding
Level 3             Connect             
                                        Level 4             Connect

xzv2uavs

xzv2uavs1#

您可以使用merge,但首先区分左/右的列名(例如,使用add_suffix):

out = df1.merge(df2.add_suffix('_'), how='outer',
                left_on=['level', 'title'],
                right_on=['level_', 'title_'])

字符串
输出量:

level          title   level_         title_
0  Level 0      Effective  Level 0      Effective
1  Level 1     Evaluation  Level 1     Evaluation
2  Level 1    Ice Breaker      NaN            NaN
3  Level 1           Fire      NaN            NaN
4  Level 2   Introduction  Level 2   Introduction
5  Level 2  Understanding  Level 2  Understanding
6  Level 3        Connect      NaN            NaN
7      NaN            NaN  Level 1         Comedy
8      NaN            NaN  Level 4        Connect


如果你想在合并的键上排序,另一种方法是:

out = (df1.merge(df2, how='outer',
                 left_on=[df1['level'], df1['title']],
                 right_on=['level', 'title'])
          .sort_values(by=['level'])
          #.drop(columns=['level', 'title']) # uncomment to drop merged keys
      )


输出量:

level          title  level_x        title_x  level_y        title_y
0  Level 0      Effective  Level 0      Effective  Level 0      Effective
1  Level 1     Evaluation  Level 1     Evaluation  Level 1     Evaluation
2  Level 1    Ice Breaker  Level 1    Ice Breaker      NaN            NaN
3  Level 1           Fire  Level 1           Fire      NaN            NaN
7  Level 1         Comedy      NaN            NaN  Level 1         Comedy
4  Level 2   Introduction  Level 2   Introduction  Level 2   Introduction
5  Level 2  Understanding  Level 2  Understanding  Level 2  Understanding
6  Level 3        Connect  Level 3        Connect      NaN            NaN
8  Level 4        Connect      NaN            NaN  Level 4        Connect

相关问题