假设我有以下 Dataframe :
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar","bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two","two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large", "large"],
"D": [1, 2, 3, 4, 5, 6, 7, 8, 9,99999]})
如果“A”、“B”和“C”中的值有交集,我想联接(连接?或合并?)“D”列中的值。所谓交集,我的意思是我想拥有这个DataFrame:
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
有聚合函数,如最小值,最大值,总和等,但我不能拿出一个解决方案。
1条答案
按热度按时间ffscu2ro1#
将列
D
转换为字符串,以便可以按join
在GroupBy.agg
中进行聚合:或者使用lambda函数:
如果每组
D
中的值可能重复,并且需要唯一值,则添加DataFrame.drop_duplicates
或Series.unique
:一个二个一个一个