此问题已在此处有答案:
drop duplicates on multiple columns irrespective of the order (a/b == b/a) [duplicate](1个答案)
Efficient way in Pandas for removing columns with duplicate values in different columns(1个答案)
19小时前关闭。
我有以下dataframe:
data = {
'person1_name': ['John_Ethan_Wayne', 'John_Ethan_Wayne', 'Michael_Wayne', 'Michael_Wayne', 'Patrick_Wayne', 'Patrick_Wayne'],
'family1_name': ['Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne'],
'person2_name': ['Michael_Wayne', 'Patrick_Wayne', 'Patrick_Wayne', 'John_Ethan_Wayne', 'John_Ethan_Wayne', 'Michael_Wayne'],
'family2_name': ['Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne', 'Wayne']
}
df = pd.DataFrame(data)
person1_name family1_name person2_name family2_name
John_Ethan_Wayne Wayne Michael_Wayne Wayne
John_Ethan_Wayne Wayne Patrick_Wayne Wayne
Michael_Wayne Wayne Patrick_Wayne Wayne
Michael_Wayne Wayne John_Ethan_Wayne Wayne
Patrick_Wayne Wayne John_Ethan_Wayne Wayne
Patrick_Wayne Wayne Michael_Wayne Wayne
字符串
我想删除(person1_name, family1_name)
和(person2_name, family2_name)
的副本,忽略关系的方向。
最终结果应为:
person1_name family1_name person2_name family2_name
John_Ethan_Wayne Wayne Michael_Wayne Wayne
Michael_Wayne Wayne Patrick_Wayne Wayne
Patrick_Wayne Wayne John_Ethan_Wayne Wayne
型
3条答案
按热度按时间nmpmafwu1#
在您给出的示例中,以下内容就足够了:
字符串
这是因为在两行的情况下:
A、B
B、A
它删除了B,A,因为B < A的计算结果为False。
zyfwsgd62#
字符串
35g0bw713#
字符串