使用Pandas在dataframe中进行棘手的转换(压缩和解聚合)

zd287kbt  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(88)

我希望压缩dataframe中的列,并对值进行聚合:

数据

state   range   type    Q1 24   Q2 24   stat
NY      up      AA      2       2       grow
NY      up      AA      1       0       re
NY      up      BB      1       1       grow
NY      up      BB      0       0       re
NY      up      DD      2       3       grow
NY      up      DD      0       1       re
CA      low     AA      0       2       grow
CA      low     AA      1       0       re
CA      low     BB      0       1       grow
CA      low     BB      0       0       re
CA      low     DD      0       3       grow
CA      low     DD      1       0       re


dataframe:

import pandas as pd

data = {
    'state': ['NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA'],
    'range': ['up', 'up', 'up', 'up', 'up', 'up', 'low', 'low', 'low', 'low', 'low', 'low'],
    'type': ['AA', 'AA', 'BB', 'BB', 'DD', 'DD', 'AA', 'AA', 'BB', 'BB', 'DD', 'DD'],
    'Q1 24': [2, 1, 1, 0, 2, 0, 0, 1, 0, 0, 0, 1],
    'Q2 24': [2, 0, 1, 0, 3, 1, 2, 0, 1, 0, 3, 0],
    'stat': ['grow', 're', 'grow', 're', 'grow', 're', 'grow', 're', 'grow', 're', 'grow', 're']
}

df = pd.DataFrame(data)
print(df)

期望

state   qtr     type    range   stat    
NY      Q1 24   AA01    up      grow    
NY      Q1 24   AA02    up      grow    
NY      Q1 24   AA03    up      re  
NY      Q1 24   BB01    up      grow    
NY      Q1 24   DD01    up      grow    
NY      Q1 24   DD02    up      grow    
CA      Q1 24   AA01    low     re  
CA      Q1 24   DD01    low     re  
NY      Q2 24   AA01    up      grow    
NY      Q2 24   AA02    up      grow    
NY      Q2 24   BB01    up      grow    
NY      Q2 24   DD01    up      grow    
NY      Q2 24   DD02    up      grow    
NY      Q2 24   DD03    up      grow    
NY      Q2 24   DD04    up      re  
CA      Q2 24   AA01    low     grow    
CA      Q2 24   AA02    low     grow    
CA      Q2 24   BB01    low     grow    
CA      Q2 24   DD01    low     grow    
CA      Q2 24   DD02    low     grow    
CA      Q2 24   DD03    low     grow

(df
.pivot_longer(
    index = slice('state', 'type'), 
    names_to = ("qtr", ".value"), 
    names_sep = " ")
)

更新计数

newdf=newdf.assign(count=newdf['type']+(newdf.groupby(['state','type'])['type'].cumcount()+1).astype(str))

然而,这只是一个片段,但我仍在试图弄清楚如何解聚计数。我正在研究这个。任何建议都很感激。

pieyvz9o

pieyvz9o1#

您可以在Q1 24Q2 24值上melt现有的 Dataframe ,然后过滤掉所有具有0计数的行。然后可以使用index.repeat为每行创建count副本。然后使用groupby计算每个('state', 'type', 'qtr')组的行数,并使用该数字生成新的type值:

df2 = pd.melt(df, id_vars=['state', 'range', 'type', 'stat'], value_vars=['Q1 24', 'Q2 24'], var_name='qtr', value_name='count')
df2 = df2[df2['count'] > 0]

df2 = df2.loc[df2.index.repeat(df2['count'])]

df2['count'] = df2.groupby(['state', 'type', 'qtr'])['type'].cumcount()+1

df2['type'] = df2[['type', 'count']].apply(lambda r:f"{r['type']}{r['count']:02d}", axis=1)

df2 = df2.reset_index(drop=True).drop('count', axis=1)

输出:

state range  type  stat    qtr
0     NY    up  AA01  grow  Q1 24
1     NY    up  AA02  grow  Q1 24
2     NY    up  AA03    re  Q1 24
3     NY    up  BB01  grow  Q1 24
4     NY    up  DD01  grow  Q1 24
5     NY    up  DD02  grow  Q1 24
6     CA   low  AA01    re  Q1 24
7     CA   low  DD01    re  Q1 24
8     NY    up  AA01  grow  Q2 24
9     NY    up  AA02  grow  Q2 24
10    NY    up  BB01  grow  Q2 24
11    NY    up  DD01  grow  Q2 24
12    NY    up  DD02  grow  Q2 24
13    NY    up  DD03  grow  Q2 24
14    NY    up  DD04    re  Q2 24
15    CA   low  AA01  grow  Q2 24
16    CA   low  AA02  grow  Q2 24
17    CA   low  BB01  grow  Q2 24
18    CA   low  DD01  grow  Q2 24
19    CA   low  DD02  grow  Q2 24
20    CA   low  DD03  grow  Q2 24

相关问题