(1)我正在使用Python的pandas比较两个csv文件。在这两个文件中有完全相同的数据集,应该返回类似于“两个文件相同”的语句。但是,有一列的标题为“错误”,该列为空,因为没有记录错误值。
(2)当我做一个文件比较,脚本拿起“错误”列为真(或差异发现)
(3)我的代码在下面
(4)有人能帮忙吗?如果单元格是空的,我如何避免它?实际上,我有另一组数据
,在这两个文件中,有值为“无”的列,它有相同的行为。(两个文件:文件a和文件b在相同的位置,值为“无”,比较结果表明比较后有差异。)
我的代码:
import pandas as pd
import numpy as np # Import numpy for NaN values
# List of file paths
file_paths = ['test_file_1.csv', 'test_file_2.csv']
# Create a list to store DataFrames
dataframes = []
# Load all CSV files into DataFrames
for file_path in file_paths:
df = pd.read_csv(file_path)
dataframes.append(df)
# Initialize a dictionary to store differences
differences = {}
# Compare each pair of DataFrames
for i in range(len(dataframes)):
for j in range(i + 1, len(dataframes)):
df1 = dataframes[i]
df2 = dataframes[j]
# Check if either DataFrame is None or has errors
if df1 is None or df2 is None:
continue
# Fill empty cells with NaN
df1 = df1.fillna(np.nan)
df2 = df2.fillna(np.nan)
# Compare the DataFrames cell by cell
comparison_df = df1 != df2 # Use != to create a boolean DataFrame where differences are True
print("BreakPoint")
# Find the row and column indices where differences occur
diff_locations = comparison_df.stack().reset_index()
diff_locations.columns = ['Row', 'Column', 'Different']
# Filter rows where differences are True
diff_locations = diff_locations[diff_locations['Different']]
# Store differences in the dictionary
key = f'({file_paths[i]}) vs ({file_paths[j]})'
differences[key] = diff_locations
print("break point")
# Output the differences
for key, diff_locations in differences.items():
if diff_locations.empty:
print(f"{key}: The two CSV files are identical.")
else:
print(f"{key}: The two CSV files have differences at the following locations:")
print(diff_locations)
字符串
1条答案
按热度按时间li9yvcax1#
NaN从不与自己相等。开始时使用
df1.fillna(np.nan)
是没有意义的--列已经有NaN。我建议您用途:字符串
或者,更好:
型