下面是我的代码,其中我看到生成了一个不匹配的 Dataframe ,并且不匹配的记录一个在另一个下面显示。我希望一个表的不匹配和另一个表的不匹配被隔离并显示在两个 Dataframe 中。表1的一个 Dataframe 不匹配记录,另一个表的一个 Dataframe 不匹配记录。
import pandas as pd
def merge_dataframes(left_dataframe, left_dataframe_suffix, right_dataframe, right_dataframe_suffix, how, key_columns, indicator):
"""
Joins both the dataframes on the key columns.
:param left_dataframe: Dataframe on the left side of the join
:param left_dataframe_suffix: Suffix for the left_dataframe columns after join
:param right_dataframe: Dataframe on the right side of the join
:param right_dataframe_suffix: Suffix for the right_dataframe columns after join
:param how: Type of join Valid values: {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default value: ‘inner’
:param key_columns: Columns based on which join should happen
:param indicator: Boolean. If True, adds a '_merge' column specifying whether a record belonged to 'both', or the 'left_only' dataframe or 'right_only dataframe'
:return: Joint dataframes
"""
comparison_df = left_dataframe.merge(right_dataframe,
indicator=indicator,
how=how,
on=key_columns,
suffixes=(left_dataframe_suffix, right_dataframe_suffix))
return comparison_df
raw = pd.DataFrame({'EntityID': ['Appple', 'Banana', 'Mango'],
'Date': [10, 20, 30], 'Val': [10, 30, 15]})
# create data_frame2 by creating a dictionary
# in which values are stored as list
bob = pd.DataFrame({'EntityID': ['Appple', 'Banana', 'Mango'],
'Date': [10, 20, 30], 'Val': [100, 200, 15]})
match = raw.merge(bob, how='inner', indicator=False)
#print(match.to_string(index=False))
mismatch=pd.concat([raw, bob]).drop_duplicates(keep=False).to_string(index=False)
print(mismatch)
所示的输出如下。
EntityID Date Val
Appple 10 10
Banana 20 30
Appple 10 100
Banana 20 200
我的期望输出是
Mismatch raw Mismatch bob
EntityID Date Val EntityID Date Val
Apple 10 10 Apple 10 100
Banana 20 30 Banana 20 200
Pandas有办法做到这一点吗?
1条答案
按热度按时间nbysray51#
其中一个选项是使用
concat
,然后使用drop_duplicates
(without merging):输出: