根据Pandas形中其他列的交集连接列中的唯一值

n3ipq98p  于 2023-01-04  发布在  其他
关注(0)|答案(1)|浏览(113)

假设我有以下 Dataframe :

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar","bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two","two"],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large", "large"],
                   "D": [1, 2, 3, 4, 5, 6, 7, 8, 9,99999]})

如果“A”、“B”和“C”中的值有交集,我想联接(连接?或合并?)“D”列中的值。所谓交集,我的意思是我想拥有这个DataFrame:

A   B   C   D
0   foo one small   1
1   foo one large   2,3
2   foo two small   4,5
3   bar one large   6
4   bar one small   7
5   bar two small   8
6   bar two large   9,99999

有聚合函数,如最小值,最大值,总和等,但我不能拿出一个解决方案。

ffscu2ro

ffscu2ro1#

将列D转换为字符串,以便可以按joinGroupBy.agg中进行聚合:

df1 = (df.assign(D = df.D.astype(str))
        .groupby(['A','B','C'], sort=False)['D']
        .agg(','.join)
        .reset_index())
print (df1)
     A    B      C        D
0  foo  one  small        1
1  foo  one  large      2,3
2  foo  two  small      4,5
3  bar  one  large        6
4  bar  one  small        7
5  bar  two  small        8
6  bar  two  large  9,99999

或者使用lambda函数:

df1 = (df.groupby(['A','B','C'], sort=False)['D']
        .agg(lambda x: ','.join(x.astype(str)))
        .reset_index())
print (df1)
     A    B      C        D
0  foo  one  small        1
1  foo  one  large      2,3
2  foo  two  small      4,5
3  bar  one  large        6
4  bar  one  small        7
5  bar  two  small        8
6  bar  two  large  9,99999

如果每组D中的值可能重复,并且需要唯一值,则添加DataFrame.drop_duplicatesSeries.unique
一个二个一个一个

相关问题