df['cycles'] = df['cycles'].map(np.sort)
df['cycles_str'] = [','.join(map(str, c)) for c in df['cycles']]
# Here we check if matches are >1, because it will match with itself once!
df['is_subset'] = [df['cycles_str'].str.contains(c_str).sum() > 1 for c_str in df['cycles_str']]
df = df.loc[df['is_subset'] == False]
df = df.drop(['cycles_str', 'is_subset'], axis=1)
cycles members
0 [3, 4, 5, 9] 4
2 [2, 3, 4] 3
编辑-以上操作不适用于[1,2,4]和[1,2,3,4]**
重写了代码。使用2个循环和set检查列表解析的子集:
# check if >1 True, as it will match with itself once!
df['is_subset'] = [[set(y).issubset(set(x)) for x in df['cycles']].count(True)>1 for y in df['cycles']]
df = df.loc[df['is_subset'] == False]
df = df.drop('is_subset', axis=1)
print(df)
cycles members
0 [9, 5, 4, 3] 4
2 [2, 4, 3] 3
1条答案
按热度按时间ryevplcw1#
首先,你可以对列表进行排序,因为它们是数字,并将它们转换为字符串。然后,对于每个字符串,只需检查它是否是任何其他行的子字符串,如果是,它就是子集。由于所有内容都进行了排序,我们可以确保数字的顺序不会影响这一步。
最后,只过滤掉那些没有被标识为子集的。
重写了代码。使用2个循环和
set
检查列表解析的子集: