在pandas框架中堆叠多个列

6psbrbz9  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(102)

我有一个pandas数据框架,想将4列堆叠为2列。

df = pd.DataFrame({'date':['2023-12-01', '2023-12-05', '2023-12-07'],
'other_col':['a', 'b', 'c'],
'right_count':[4,7,9], 'right_sum':[2,3,5],
'left_count':[1,8,5], 'left_sum':[0,8,4]})

个字符
想把这个

date    other_col   side    count   sum
0   2023-12-01  a       right   4        2
1   2023-12-05  b       right   7        3
2   2023-12-07  c       right   9        5
3   2023-12-01  a       left    1        0
4   2023-12-05  b       left    8        8
5   2023-12-07  c       left    5        4

bweufnob

bweufnob1#

您可以使用带有临时MultiIndex的自定义整形:

out = (df
   .set_index(['date', 'other_col'])
   .pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
   .rename_axis(columns=['side', None])
   .stack('side').reset_index()
)

字符串
melt + pivot

tmp = df.melt(['date', 'other_col'], var_name='side')
tmp[['side', 'col']] = tmp['side'].str.split('_', n=1, expand=True)

out = (tmp.pivot(index=['date', 'other_col', 'side'],
                 columns='col', values='value')
          .reset_index().rename_axis(columns=None)
      )


输出量:

date other_col   side  count  sum
0  2023-12-01         a   left      1    0
1  2023-12-01         a  right      4    2
2  2023-12-05         b   left      8    8
3  2023-12-05         b  right      7    3
4  2023-12-07         c   left      5    4
5  2023-12-07         c  right      9    5


或者更简单地使用janitor库和pivot_longer

# pip install pyjanitor
import janitor

out = df.pivot_longer(index=['date', 'other_col'],
                      names_to=('side', '.value'),
                      names_pattern=r'([^_]+)_([^_]+)')


输出量:

date other_col   side  count  sum
0  2023-12-01         a  right      4    2
1  2023-12-05         b  right      7    3
2  2023-12-07         c  right      9    5
3  2023-12-01         a   left      1    0
4  2023-12-05         b   left      8    8
5  2023-12-07         c   left      5    4

qxgroojn

qxgroojn2#

这可以通过pandas中的melt函数来完成,以取消透视,然后操作结果

import pandas as pd    

df = pd.DataFrame({'date':['2023-12-01', '2023-12-05', '2023-12-07'],
                   'other_col':['a', 'b', 'c'],
                   'right_count':[4,7,9], 'right_sum':[2,3,5],
                   'left_count':[1,8,5], 'left_sum':[0,8,4]})
print(df)
# unpivot the DataFrame using melt
melted_df = pd.melt(df, id_vars=['date', 'other_col'], var_name='side_count_sum', value_name='value')
    
# Split the column into two separate columns
melted_df[['side', 'count_sum']] = melted_df['side_count_sum'].str.split('_', expand=True)
    
# Pivot the DataFrame 
result_df = melted_df.pivot_table(index=['date', 'other_col', 'side'], columns='count_sum', values='value').reset_index()
    
# Rename the columns
result_df.columns.name = None
result_df = result_df.rename(columns={'count': 'count', 'sum': 'sum'})

print(result_df)

字符串

相关问题