根据给定列pandas中的缺失值将行从一个 Dataframe 添加到另一个 Dataframe

0s7z1bwu 于 2023-04-19 发布在其他

关注(0)|答案(2)|浏览(134)

我一直在寻找一个答案很长一段时间，但没有找到它。我有两个 Dataframe ，一个是target，另一个是backup，它们都有相同的列。我想做的是查看给定的列，并添加所有从backup到target的行，这些行不在target中。最直接的解决方案是：

import pandas as pd
import numpy as np

target = pd.DataFrame({
         "key1": ["K1", "K2", "K3", "K5"],
         "A": ["A1", "A2", "A3", np.nan],
         "B": ["B1", "B2", "B3", "B5"],
     })

backup = pd.DataFrame({
         "key1": ["K1", "K2", "K3", "K4", "K5"],
         "A": ["A1", "A", "A3", "A4", "A5"],
         "B": ["B1", "B2", "B3", "B4", "B5"],
     })

merged = target.copy()

for item in backup.key1.unique():
    if item not in target.key1.unique():
        merged = pd.concat([merged, backup.loc[backup.key1 == item]])

merged.reset_index(drop=True, inplace=True)

给予

key1    A   B
0   K1   A1  B1
1   K2   A2  B2
2   K3   A3  B3
3   K5  NaN  B5
4   K4   A4  B4

现在我已经尝试了几件事，只用Pandas，他们都没有工作。

pandas concat

# Does not work because it creates duplicate lines and if dropped, the updated rows which are different will not be dropped -- compare the line with A or NaN

pd.concat([target, backup]).drop_duplicates()

  key1  A   B
0   K1  A1  B1
1   K2  A2  B2
2   K3  A3  B3
3   K5  NaN B5
1   K2  A   B2
3   K4  A4  B4
4   K5  A5  B5

pandas merge

# Does not work because the backup would overwrite data in the target -- NaN

pd.merge(target, backup, how="right")

  key1  A   B
0   K1  A1  B1
1   K2  A   B2
2   K3  A3  B3
3   K4  A4  B4
4   K5  A5  B5

1.重要的是，它不是this post的重复，因为我不想有一个新的列，更重要的是，值不是target中的NaN，它们根本就不存在。此外，如果我使用建议的合并列，target中的NaN将被backup中的值替换，这是不需要的。
1.它不是使用combine_first pandas的this post的副本，因为在这种情况下，NaN由错误的backup的值填充：

target.combine_first(backup)

   key1 A   B
0   K1  A1  B1
1   K2  A2  B2
2   K3  A3  B3
3   K5  A4  B5
4   K5  A5  B5

1.最后

target.join(backup, on=["key1"])

给了我一个恼人的

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

我真的不明白，因为两者都是纯字符串，而且proposed solution不起作用。
所以我想问一下，我错过了什么？我如何使用一些pandas方法来完成它？非常感谢。

pandas

来源：https://stackoverflow.com/questions/65774788/add-rows-from-one-dataframe-to-another-based-on-missing-values-in-a-given-column

2条答案

按热度按时间

58wvjzkj1#

将concat与boolean indexing中的Series.isin过滤的backup中不存在的target.key1过滤行一起使用：

merged = pd.concat([target, backup[~backup.key1.isin(target.key1)]])
print (merged)
  key1    A   B
0   K1   A1  B1
1   K2   A2  B2
2   K3   A3  B3
3   K5  NaN  B5
3   K4   A4  B4

赞(0）回复(0）举报 2023-04-19

ldioqlga2#

也许你可以尝试在df.drop_duplicates()中使用一个'subset'参数？

pd.concat([target, backup]).drop_duplicates(subset = "key1")

其给出输出：

key1    A   B
0   K1   A1  B1
1   K2   A2  B2
2   K3   A3  B3
3   K5  NaN  B5
3   K4   A4  B4

赞(0）回复(0）举报 2023-04-19

我来回答

根据给定列pandas中的缺失值将行从一个 Dataframe 添加到另一个 Dataframe

2条答案

相关问题

热门标签

最新问答