pandas 将不匹配的 Dataframe 划分为两个 Dataframe

dly7yett  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(105)

下面是我的代码,其中我看到生成了一个不匹配的 Dataframe ,并且不匹配的记录一个在另一个下面显示。我希望一个表的不匹配和另一个表的不匹配被隔离并显示在两个 Dataframe 中。表1的一个 Dataframe 不匹配记录,另一个表的一个 Dataframe 不匹配记录。

import pandas as pd

def merge_dataframes(left_dataframe, left_dataframe_suffix, right_dataframe, right_dataframe_suffix, how, key_columns, indicator):
    """
    Joins both the dataframes on the key columns.

    :param left_dataframe: Dataframe on the left side of the join
    :param left_dataframe_suffix: Suffix for the left_dataframe columns after join
    :param right_dataframe: Dataframe on the right side of the join
    :param right_dataframe_suffix: Suffix for the right_dataframe columns after join
    :param how: Type of join Valid values: {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default value: ‘inner’
    :param key_columns: Columns based on which join should happen
    :param indicator:  Boolean. If True, adds a '_merge' column specifying whether a record belonged to 'both', or the 'left_only' dataframe or 'right_only dataframe'
    :return: Joint dataframes
    """
    comparison_df = left_dataframe.merge(right_dataframe,
                                       indicator=indicator,
                                       how=how,
                                       on=key_columns,
                                       suffixes=(left_dataframe_suffix, right_dataframe_suffix))
    return comparison_df

raw = pd.DataFrame({'EntityID': ['Appple', 'Banana', 'Mango'],
                    'Date': [10, 20, 30], 'Val': [10, 30, 15]})

# create data_frame2 by creating a dictionary
# in which values are stored as list
bob = pd.DataFrame({'EntityID': ['Appple', 'Banana', 'Mango'],
                    'Date': [10, 20, 30], 'Val': [100, 200, 15]})

match = raw.merge(bob, how='inner', indicator=False)

#print(match.to_string(index=False))
mismatch=pd.concat([raw, bob]).drop_duplicates(keep=False).to_string(index=False)
print(mismatch)

所示的输出如下。

EntityID  Date  Val
  Appple    10   10
  Banana    20   30
  Appple    10  100
  Banana    20  200

我的期望输出是

Mismatch raw          Mismatch bob
EntityID Date Val  EntityID Date Val
Apple    10   10   Apple    10   100
Banana   20   30   Banana   20   200

Pandas有办法做到这一点吗?

nbysray5

nbysray51#

其中一个选项是使用concat,然后使用drop_duplicateswithout merging):

mismatch = (pd.concat([raw, bob], keys=["Mismatch raw", "Mismatch bob"])
                .drop_duplicates(keep=False).unstack(0)
                .swaplevel(axis=1).sort_index(axis=1, ascending=False)
                .to_string(index=False)
           )

输出:

print(mismatch)
​
Mismatch raw               Mismatch bob              
         Val EntityID Date          Val EntityID Date
          10   Appple   10          100   Appple   10
          30   Banana   20          200   Banana   20

相关问题