pandas Dataframe 中的行连接

vq8itlhq 于 2023-04-28 发布在其他

关注(0)|答案(3)|浏览(131)

我有下一张table。

我需要转换这个输入，你可以在下面的输出示例中看到：

import pandas as pd

# Define the input data
data = {
    'ID': [500, 500, 500, 500, 500, 500, 500, 500, 400, 400, 400, 400, 400, 300, 200],
    'item': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'A', 'B', 'A', 'C', 'E', 'D', 'E'],
    'Counter': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 1, 2, 1, 1, 1],
    'C': ['XX', 'XX', 'XX', 'XX', 'XX', 'YY', 'YY', 'YY', 'XX', 'XX', 'YY', 'YY', 'YY', 'XX', 'XX']
}

# Convert the input data to a Pandas DataFrame
df = pd.DataFrame(data)

如果你有任何想法请分享。非常感谢！

pandas

来源：https://stackoverflow.com/questions/76073715/rows-concatenate-in-pandas-dataframe

3条答案

按热度按时间

x6yk4ghg1#

您的I/O不匹配，但（* 基于您的输入屏幕截图 *），您可以尝试以下操作：

out = (df.groupby(["ID", df["priority"].eq(1).cumsum(), "C"], as_index=False, sort=False)
          ["item"].agg("-".join).assign(ideal= lambda x: x.pop("item").str.cat(x.pop("C"), sep="-"))
      )

输出：

print(out)

    ID         ideal
0  500  A-B-C-D-E-XX
1  500    A-B-C-E-YY
2  400        A-B-XX
3  400        A-C-YY
4  400          E-YY
5  300          D-XX
6  200          E-XX

赞(0）回复(0）举报 2023-04-28

f4t66c6m2#

在块开始条件上使用cumsum来标识块，然后使用groupby和agg：

out = (df.groupby(['ID',df['Counter'].eq(1).cumsum()], sort=False, as_index=False)
         .agg({'item':'-'.join, 'C':'first'})
         .assign(ideal=lambda x: x['item']+'-'+x['C'])
      )
print(out)

输出：

ID       item   C         ideal
0  500  A-B-C-D-E  XX  A-B-C-D-E-XX
1  500      A-B-C  YY      A-B-C-YY
2  400        A-B  XX        A-B-XX
3  400        A-C  YY        A-C-YY
4  400          E  YY          E-YY
5  300          D  XX          D-XX
6  200          E  XX          E-XX

赞(0）回复(0）举报 2023-04-28

ddhy6vgd3#

你只需要按ID和“C”列分组，并使用其他答案中的eq(1).cumsum()。它巧妙地帮助创建公共组，因为计数器的任何值1都会创建一个新组。然后通过用连字符分隔符连接它们来聚合字符串。然后，重置索引，使其福尔斯到一个平面数据框，并根据ID进行排序以匹配您的输出。

out = df.groupby(['ID', 'C', df.Counter.eq(1).cumsum()])["item"].agg(lambda x : '-'.join(x)).reset_index().sort_values(by="ID", ascending=False)

产出

ID   C  Counter       item
5  500  XX        1  A-B-C-D-E
6  500  YY        2      A-B-C
2  400  XX        3        A-B
3  400  YY        4        A-C
4  400  YY        5          E
1  300  XX        6          D
0  200  XX        7          E

然后

out["ideal"] = out["item"] + "-" + out["C"]

    ID   C  Counter       item         ideal
5  500  XX        1  A-B-C-D-E  A-B-C-D-E-XX
6  500  YY        2      A-B-C      A-B-C-YY
2  400  XX        3        A-B        A-B-XX
3  400  YY        4        A-C        A-C-YY
4  400  YY        5          E          E-YY
1  300  XX        6          D          D-XX
0  200  XX        7          E          E-XX

赞(0）回复(0）举报 2023-04-28

我来回答

pandas Dataframe 中的行连接

3条答案

相关问题

热门标签

最新问答