如何聚合DataFrame中的3列，以便在Python Pandas中的单独列中具有值的计数和分布？

hfsqlsce 于 2022-12-16 发布在 Python

关注(0)|答案(3)|浏览(117)

我有PandasDataFrame像下面：
数据类型：

ID -整数
时间-整数
TG -整数

| 识别号|时间|TG|
| - ------|- ------|- ------|
| 一百一十一|二〇二一〇一〇一|无|
| 一百一十一|小行星2021|无|
| 一百一十一|小行星2021| 1个|
| 二百二十二|小行星2021|无|
| 二百二十二|小行星2021| 1个|
| 三百三十三|小行星2021| 1个|
我需要聚合上面的DataFrame，以便知道：

TIME中每个值有多少个ID
TIME中每个值有多少个TG“1”
TIME中每个值有多少个TG“0”
所以我需要像下面这样的东西：

TIME     | num_ID | num_1 | num_0
---------|--------|-------|--------
20210101 | 2      | 0     | 2
20210201 | 3      | 2     | 1
20210301 | 1      | 1     | 0

在Python Padas中我如何才能做到这一点？

pandas

来源：https://stackoverflow.com/questions/74770208/how-to-aggregate-3-columns-in-dataframe-to-have-count-and-distribution-of-values

3条答案

按热度按时间

vktxenjb1#

将GroupBy.size用于计数TIME值，将crosstab用于计数0和1值：

df1 = (df.groupby('TIME').size().to_frame('num_ID')
         .join(pd.crosstab(df['TIME'], df['TG']).add_prefix('num_'))
         .reset_index())
print (df1)
       TIME  num_ID  num_0  num_1
0  20210101       2      2      0
1  20210201       3      1      2
2  20210301       1      0      1

如果需要仅计算GroupBy.agg中的0和1值，则另一个想法是：

df1 = (df.assign(num_0 = df['TG'].eq(0),
                num_1 = df['TG'].eq(1))
        .groupby('TIME').agg(num_ID = ('TG','size'),
                             num_1=('num_1','sum'),
                             num_0=('num_0','sum'),
                             )
        .reset_index()
        )
print (df1)
       TIME  num_ID  num_1  num_0
0  20210101       2      0      2
1  20210201       3      2      1
2  20210301       1      1      0

赞(0）回复(0）举报 2022-12-16

km0tfn4u2#

import pandas as pd

# Create the DataFrame
df = pd.DataFrame({
    'ID': [111, 111, 111, 222, 222, 333],
    'TIME': [20210101, 20210201, 20210301, 20210101, 20210201, 20210201],
    'TG': [0, 0, 1, 0, 1, 1]
})

# Group the DataFrame by the 'TIME' column
grouped_df = df.groupby('TIME')

# Aggregate the grouped DataFrame and create a new DataFrame
# that counts the number of IDs, number of 1s and number of 0s
# for each value in the 'TIME' column
result_df = grouped_df.agg({
    'ID': 'nunique',  # Count the number of unique IDs
    'TG':'sum' 
}).rename(columns={'ID': 'num_ID', 'TG': 'num_1'})

# Calculate the number of 0s in the 'TG' column
# by subtracting the number of 1s from the total number of entries
result_df['num_0'] = grouped_df['TG'].count() - result_df['num_1']

# Reorder the columns in the result DataFrame
result_df = result_df[['num_ID', 'num_1', 'num_0']]

# Print the result DataFrame
print(result_df)

赞(0）回复(0）举报 2022-12-16

irlmq6kh3#

dict1 = {'ID':pd.Series.nunique, 'TG': [lambda x: x.eq(1).sum(), lambda x: x.eq(0).sum()]}
col1 = ['num_id', 'num_1', 'num_0']
df.groupby('TIME').agg(dict1).set_axis(col1, axis=1).reset_index()

结果：

TIME        num_id  num_1   num_0
0   20210101    2       0       2
1   20210201    3       2       1
2   20210301    1       1       0

赞(0）回复(0）举报 2022-12-16

我来回答

如何聚合DataFrame中的3列，以便在Python Pandas中的单独列中具有值的计数和分布？

3条答案

相关问题

热门标签

最新问答