pandas Dataframe比较三列并查找不匹配的列

mwngjboj 于 2024-01-04 发布在其他

关注(0)|答案(2)|浏览(94)

我有以下代码

merged_df = pd.merge(df_old, df_new, on=['column1','column2','column3'], how='right', indicator=True)

    # Filter rows where the indicator column is 'left_only' or 'right_only'
    mismatched_rows = merged_df[merged_df['_merge'] == 'right_only'] #.isin(['left_only', 'right_only'])]
    
    # Drop the indicator column
    mismatched_rows = mismatched_rows.drop('_merge', axis=1)
    
    # Display the mismatched rows
    #print(mismatched_rows)
    mismatched_rows.to_csv(file_name + "_output_mismatched.csv",index=False)

字符串
上面的代码是匹配的索引以及，例如。

in dataframe 1, 
column1 column2 column3
x         y       z
a         b       c

in dataframe 2, 
column1 column2 column3
a         b       c
x         y       z

型
在上面的column2值y和column3值z的代码中，与coulmn1 x匹配，唯一的区别是它是第二行。但仍然在上面的代码中，两行都不匹配。
我们怎么能让两排都匹配呢？

pandas

来源：https://stackoverflow.com/questions/77564480/dataframe-compare-three-columns-and-find-mismatched

2条答案

按热度按时间

uurity8g1#

import pandas as pd

df_old = pd.DataFrame({'column1': ['x', 'a'], 
                      'column2': ['y', 'b'], 
                      'column3': ['z', 'c'], })
df_new = pd.DataFrame({'column1': ['a', 'x'], 
                      'column2': ['b', 'y'], 
                      'column3': ['c', 'z'], })

merged_df = pd.merge(df_old, df_new, on=['column1','column2','column3'], how='right', indicator=True)

# Filter rows where the indicator column is 'left_only' or 'right_only'
mismatched_rows = merged_df[merged_df['_merge'] == 'right_only'] #.isin(['left_only', 'right_only'])]

# Drop the indicator column
mismatched_rows = mismatched_rows.drop('_merge', axis=1)

# Display the mismatched rows
print(mismatched_rows)

字符串
导致

Empty DataFrame
Columns: [column1, column2, column3]
Index: []

型
Pandas版本2.0.1
检查你的Pandas版本。

赞(0）回复(0）举报 2024-01-04

ocebsuys2#

出于某种原因，您似乎已经让第一行代码完成了您所寻求的任务，但随后可能会对您的合并请求产生一点误解。

首先，复制代码

从相同的输入数据开始：

df_1 = pd.DataFrame(columns = ['column1','column2','column3'],
                    data =   [['x',      'y',      'z'],
                              ['a',      'b',      'c']])
df_2 = pd.DataFrame(columns = ['column1','column2','column3'],
                    data =   [['a',      'b',      'c'],
                              ['x',      'y',      'z']])

字符串
第一行的输出：

merged_df = pd.merge(df_2, df_1, 
                     on=['column1','column2','column3'], 
                     how='right', 
                     indicator=True)

  column1 column2 column3 _merge
0       x       y       z   both
1       a       b       c   both

型
然后，您试图通过“right_only”进行过滤，但“both”行都匹配。

说明：

引用你的话：“上述代码也与索引匹配”
然而，它不会，因为：看看pandas.DataFrame.merge：
连接是在列或索引上完成的。如果连接列上的列，DataFrame索引将被忽略。
您请求对列进行联接，因此index被忽略。
所以，当你问 “我们如何让两行都匹配”，那么你已经有了。“两行”都是匹配的，不清楚你还在想什么。
为了进一步的帮助，

请注意提供一个完整的MRE，包括所需的输出，因为你的问题不是100%清楚。
一些阅读Pandas Merging 101

赞(0）回复(0）举报 2024-01-04

我来回答

pandas Dataframe比较三列并查找不匹配的列

2条答案

相关问题

热门标签

最新问答