具有多个重复字段的Pandas的聚合主元

5sxhfpxr  于 2023-02-06  发布在  其他
关注(0)|答案(2)|浏览(91)

我有一个 Dataframe ,看起来像这样:

id       Field_name  Field_value
1           consent          yes
1   _REACTION TIME_         5547
1              age            24
1           gender             X
1   _REACTION TIME_        45396
1         education          uni
1          language           EN
1   _REACTION TIME_       105187
2           consent          yes
2   _REACTION TIME_         3547
2              age            25
2           gender             F
2   _REACTION TIME_        42396
2         education          uni
2          language           EU
2   _REACTION TIME_       115427

我希望每个ID对应一行,每个_REACTION TIME_行对应一个不同的列,例如:

id  consent  _REACTION TIME_1  age gender  _REACTION TIME_2  education language _REACTION TIME_3
1       yes              5547   24      X             45396        uni       EN           105187
2       yes              3547   25      F             42396        uni       EU           115427

我一直在寻找这个问题的答案,但我找不到这个特定的问题时,只有一些条目是重复的,但他们是重复多次。
先谢了!

xvw2m8pv

xvw2m8pv1#

GroupBy.cumcount仅用于与DataFrame.duplicated重复的行,因此可能按DataFrame.pivot进行透视,最后为原始列顺序添加DataFrame.reindex

m = df.duplicated(['id','Field_name'], keep=False)
df.loc[m, 'Field_name'] += df[m].groupby(['id','Field_name']).cumcount().add(1).astype(str)

cols = df['Field_name'].unique()
df = df.pivot(index='id', columns='Field_name', values='Field_value').reindex(cols, axis=1)
print (df)
Field_name consent _REACTION TIME_1 age gender _REACTION TIME_2 education  \
id                                                                          
1              yes             5547  24      X            45396       uni   
2              yes             3547  25      F            42396       uni   

Field_name language _REACTION TIME_3  
id                                    
1                EN           105187  
2                EU           115427

避免覆盖原始DataFrame的解决方案类似:

m = df.duplicated(['id','Field_name'], keep=False)
s = df['Field_name'].add(df.groupby(['id','Field_name']).cumcount().add(1)
                           .astype(str)).where(m, df['Field_name'])

df1 = (df.assign(Field_name=s)
        .pivot(index='id', columns='Field_name', values='Field_value')
        .reindex(s.unique(), axis=1))
print (df1)
Field_name consent _REACTION TIME_1 age gender _REACTION TIME_2 education  \
id                                                                          
1              yes             5547  24      X            45396       uni   
2              yes             3547  25      F            42396       uni   

Field_name language _REACTION TIME_3  
id                                    
1                EN           105187  
2                EU           115427
41zrol4v

41zrol4v2#

如果希望保留_REACTION TIME_,而不是在列标题中将其重命名为_REACTION TIME_1,可以执行groupby.apply

out = (df.groupby('id').apply(lambda g: g.drop('id', axis=1).set_index('Field_name').T)
       .reset_index(level=0).reset_index(drop=True)
       .rename_axis('', axis=1))
print(out)

   id consent _REACTION_TIME_ age gender _REACTION_TIME_ education language _REACTION_TIME_
0   1     yes            5547  24      X           45396       uni       EN          105187
1   2     yes            3547  25      F           42396       uni       EU          115427

相关问题