如何获取 Dataframe 的删除副本索引

bqujaahr 于 2021-08-25 发布在 Java

关注(0)|答案(2)|浏览(346)

我正在使用 df = df.drop_duplicates(["col1",["col2"]) 在我的pandas Dataframe 上，但我需要知道删除的行索引，我如何才能做到这一点？

python DataFrame pandas python-3.x indexing

来源：https://stackoverflow.com/questions/68302606/how-to-get-dropped-duplicates-index-of-pandas-dataframe

2条答案

按热度按时间

50pmv0ei1#

使用 boolean indexing 戴着面具 DataFrame.duplicated 仅适用于指数：

df = pd.DataFrame({'col1':[1] * 4, 'col2':[2,2,3,2]})
print (df)
   col1  col2
0     1     2
1     1     2
2     1     3
3     1     2

print (df.drop_duplicates(["col1","col2"]))
   col1  col2
0     1     2
2     1     3

mask = df.duplicated(["col1","col2"])
idx = df.index[mask]
print (idx)
Int64Index([1, 3], dtype='int64')

或使用 Index.difference 如果已删除重复项：

df1 = df.drop_duplicates(["col1","col2"])
idx = df.index.difference(df1.index)
print (idx)
Int64Index([1, 3], dtype='int64')

赞(0）回复(0）举报 2021-08-25

efzxgjgh2#

你可以去 duplicated :

dups = df.duplicated(["col1", "col2"])
dups[dups].index

第一行给出一个布尔数组，用于标记行是否重复。第二行对自身使用布尔索引来选择 True 然后我们得到它们的索引。

赞(0）回复(0）举报 2021-08-25

我来回答

如何获取 Dataframe 的删除副本索引

2条答案

相关问题

热门标签

最新问答