将pandas df转换为多列

ee7vknir  于 2023-05-15  发布在  其他
关注(0)|答案(3)|浏览(191)

我很抱歉,如果这是一个可能的重复。我有一个这样的dataframe:

{'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
 'Series Name': {0: 'CO2 emissions (metric tons per capita)',
  1: 'Electric power consumption (kWh per capita)',
  2: 'Energy use (kg of oil equivalent per capita)'},
 '2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
 '2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
 '2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
 '2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
 '2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
 '2015': {0: '4.30185120706067', 1: '..', 2: '..'},
 '2016': {0: '4.20180210453832', 1: '..', 2: '..'},
 '2017': {0: '4.07139674183186', 1: '..', 2: '..'},
 '2018': {0: '3.9756664767256', 1: '..', 2: '..'},
 '2019': {0: '3.74054556792816', 1: '..', 2: '..'},
 '2020': {0: '..', 1: '..', 2: '..'},
 '2021': {0: '..', 1: '..', 2: '..'},
 '2022': {0: '..', 1: '..', 2: '..'}}

我有一个Country NameSeries Name和所有年份的列。我想把它转换成一个单独的列,所有的年份和序列名称中的所有唯一值都作为不同的列,它们的值都在这个列中。(Series Name有大约10个类别,我在示例中只显示了3个以供参考)。
期望的df应该是这样的:

Country Name    Year    C02 emission    Electric power consumption   Energy use
...

我不知道我该怎么做,任何建议都将不胜感激。

xdyibdwo

xdyibdwo1#

您可以通过首先melt Dataframe 以获得年份作为列,然后pivot该结果以获得Series Name作为列来实现所需的结果:

m = df.melt(id_vars=['Country Name', 'Series Name'], var_name='Year')
out = m.pivot(columns=['Series Name'], index=['Country Name', 'Year'], values=['value'])

然后,您可以整理列索引和名称并重置索引:

out.columns = out.columns.droplevel().str.replace(r'\s+\(.*$', '', regex=True)
out = out.reset_index()

输出:

Country Name  Year     C02 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
3     Argentina  2013  4.34212454655109           2967.37655805218  1967.02167752077
4     Argentina  2014  4.20905330505396           3074.70207056563  2029.92282543737
5     Argentina  2015  4.30185120706067                         ..                ..
6     Argentina  2016  4.20180210453832                         ..                ..
7     Argentina  2017  4.07139674183186                         ..                ..
8     Argentina  2018   3.9756664767256                         ..                ..
9     Argentina  2019  3.74054556792816                         ..                ..
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..
ubof19bj

ubof19bj2#

这里有一个选项:

out = (
    pd.DataFrame(data) #you can replace this line with `df`
      .melt(id_vars=["Country Name", "Series Name"], var_name="Year")
      .pivot(index=["Country Name", "Year"], columns="Series Name", values="value")
      .pipe(lambda x: x.set_axis(x.columns.str.extract(r"(.+)\s+\(.+\)", expand=False),
                                 axis=1)).reset_index().rename_axis(None, axis=1)
)

输出:

print(out)

   Country Name  Year     CO2 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
..          ...   ...               ...                        ...               ...
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

[13 rows x 5 columns]
lndjwyie

lndjwyie3#

图书馆需要:Pandas

import pandas as pd

准备数据:

d = {'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
 'Series Name': {0: 'CO2 emissions (metric tons per capita)',
  1: 'Electric power consumption (kWh per capita)',
  2: 'Energy use (kg of oil equivalent per capita)'},
 '2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
 '2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
 '2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
 '2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
 '2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
 '2015': {0: '4.30185120706067', 1: '..', 2: '..'},
 '2016': {0: '4.20180210453832', 1: '..', 2: '..'},
 '2017': {0: '4.07139674183186', 1: '..', 2: '..'},
 '2018': {0: '3.9756664767256', 1: '..', 2: '..'},
 '2019': {0: '3.74054556792816', 1: '..', 2: '..'},
 '2020': {0: '..', 1: '..', 2: '..'},
 '2021': {0: '..', 1: '..', 2: '..'},
 '2022': {0: '..', 1: '..', 2: '..'}}

加工:

df = pd.DataFrame(data = d).T
df

输出:

相关问题