为什么pandas不能在这种情况下处理超过550个 Dataframe [重复]

2ul0zpep 于 2023-08-01 发布在其他

关注(0)|答案(1)|浏览(87)

此问题已在此处有答案：

"Anti-merge" in pandas (Python)（2个答案）
10天前关闭。

import pandas as pd

file1 = 'Total.xlsx'
df1 = pd.read_excel(file1)

file2 = 'Recent.xlsx'
df2 = pd.read_excel(file2)

non_matching_rows = []

for index1, row1 in df1.iterrows():
    row_matches = False
    
    for index2, row2 in df2.iterrows():
        if row1.equals(row2):
            row_matches = True
            break
    
    if not row_matches:
        non_matching_rows.append(row1)

non_matching_df = pd.DataFrame(non_matching_rows)

display(non_matching_df)
print(non_matching_df.count())

字符串
Total.xlsx包含近40k条记录，Recent.xlsx包含近36k条记录。我需要找到Total.xlsx中唯一的剩余4k记录。我试过上面的代码，但它不适用于整个excel文件。当我试图减少这两个文件中的记录时，它可以处理和产生准确的结果[只减少到550条记录]。任何超过500条记录的文件都不工作（我也试过块大小，没赢）.有什么建议吗？

pandas

来源：https://stackoverflow.com/questions/76742328/why-pandas-cant-process-more-than-550-dataframes-in-this-case

1条答案

按热度按时间

1zmg4dgp1#

由于时间复杂度为O（n^2），代码对于大型文件来说耗时太长。使用merge（）方法，而不是遍历 Dataframe 的行。下面是一个例子：

merged_df = pd.merge(df1, df2, how='outer', indicator=True)
non_matching_df = merged_df[merged_df['_merge'] == 'left_only'].drop('_merge', axis=1)

display(non_matching_df)
print(non_matching_df.count())

字符串
indicator = True参数将column_merge添加到合并后的数据框中，指示每一行是否存在于两个数据框中（both）、仅存在于左数据框中（left_only）还是仅存在于右数据框中（right_only）。

赞(0）回复(0）举报 2023-08-01

我来回答

为什么pandas不能在这种情况下处理超过550个 Dataframe [重复]

1条答案

相关问题

热门标签

最新问答