我有一个 Dataframe ,如下所示:
page reference ids - subject word
1 apple ['aaaa', 'bbbbb', 'cccc'] name app
1 apple ['bndv', 'asasa', 'swdsd'] fruit is
1 apple ['bsnm', 'dfsd', 'dgdf'] fruit text
1 bat ['asas', 'ddfgd', 'ff'] thing sport
1 cat ['sds', 'dffd', 'gdg'] fruit color
1 bat ['sds', 'fsss', 'ssfd'] thing was
1 bat ['fsf', 'sff', 'fss'] place that
2 dog ['fffds', 'gd', 'sdg'] name mud
2 egg ['dfff', 'sdf', 'vcv'] place gun
2 dog ['dsfd', 'fds', 'gfdg'] thing kit
2 egg ['ddd', 'fg', 'dfg'] place hut
我想按引用列和主题列进行groupby。输出应该如下所示:
output:
page reference ids subject word
1 apple [['bndv', 'asasa', 'swdsd'],['bsnm', 'dfsd', 'dgdf']] fruit [[is], [text]]
1 apple ['aaaa', 'bbbbb', 'cccc'] name [app]
1 bat [['asas', 'ddfgd', 'ff'], [['sds', 'fsss', 'ssfd']] thing [[sport], [was]]
1 bat ['fsf', 'sff', 'fss'] place [that]
1 cat ['sds', 'dffd', 'gdg'] fruit [color]
2 dog ['fffds', 'gd', 'sdg'] name [mud]
2 dog ['dsfd', 'fds', 'gfdg'] thing [kit]
2 egg [['dfff', 'sdf', 'vcv'], ['ddd', 'fg', 'dfg']] place [[gun], [hut]]
1条答案
按热度按时间fdbelqdn1#
首先分组和聚合必要的字段:
请注意,这也将每个
word
值 Package 在一个列表中,就像您希望在所需的输出中所做的那样。我也只是假设在每个组中取最小的page
值,因为您没有提到该变量的规则。您可以将agg
函数中的min
值更新为您认为合适的任何值。然后,如果length为1,则可以清除列表: