pandas 是否可以使用panda.melt()将列转换为列的元素?

zujrkrfu  于 2023-01-19  发布在  其他
关注(0)|答案(2)|浏览(107)

例如,创建3个DataFrame并将其连接起来:

df1 = pd.DataFrame({'1':np.random.rand(10), 
                    '2':np.random.rand(10), 
                    '3':np.random.rand(10), 
                    '4':np.random.rand(10)}).assign(Location = 'A')
df2 = pd.DataFrame({'1':np.random.rand(10), 
                    '2':np.random.rand(10), 
                    '3':np.random.rand(10), 
                    '4':np.random.rand(10)}).assign(Location = 'B')
df3 = pd.DataFrame({'1':np.random.rand(10), 
                    '2':np.random.rand(10), 
                    '3':np.random.rand(10), 
                    '4':np.random.rand(10)}).assign(Location = 'C')
cdf = pd.concat([df1, df2, df3])

它给我们:
我尝试用列[1, 2, 3, 4]交换Location的元素,即[A, B, C],结果应该是:

A         B         C   original_axis
0  0.088333  0.009939  0.117679        1
1  0.340463  0.537707  0.261598        1
2  0.243001  0.497268  0.098827        1
3  0.514245  0.133166  0.435097        1
4  0.715491  0.597498  0.635434        1
5  0.578538  0.017931  0.255325        1
6  0.854928  0.485360  0.954285        1
7  0.460750  0.959085  0.657368        1
8  0.573592  0.990451  0.605646        1
9  0.086094  0.206977  0.801752        1
0  0.549036  0.370826  0.387348        2
1  0.323326  0.891108  0.546695        2
2  0.695849  0.329717  0.089323        2
3  0.815171  0.125956  0.973977        2
4  0.123839  0.093963  0.094081        2
5  0.679839  0.093035  0.378317        2
6  0.144703  0.467749  0.146022        2
7  0.011454  0.226450  0.833976        2
8  0.527904  0.848700  0.457423        2
9  0.492924  0.027877  0.608478        2
0  0.723353  0.225212  0.411559        3
1  0.271650  0.042965  0.102511        3
2  0.089035  0.443239  0.222270        3
3  0.848120  0.325830  0.025566        3
4  0.991819  0.820941  0.824764        3
5  0.458587  0.998833  0.519268        3
6  0.615102  0.518176  0.878235        3
7  0.459854  0.920891  0.295850        3
8  0.913309  0.023715  0.871933        3
9  0.847045  0.635648  0.976897        3
0  0.017517  0.744492  0.626891        4
1  0.217091  0.480721  0.488773        4
2  0.261332  0.727520  0.851186        4
3  0.636652  0.198315  0.274114        4
4  0.674848  0.121301  0.979112        4
5  0.179912  0.923860  0.626627        4
6  0.838932  0.708422  0.007303        4
7  0.960088  0.221822  0.717819        4
8  0.388269  0.451810  0.379369        4
9  0.639050  0.323490  0.078356        4

为此,我使用mdf = pd.melt(cdf, id_vars='Location', var_name='time')将其融化

Location time     value
0          A    1  0.046463
1          A    1  0.282879
2          A    1  0.570865
3          A    1  0.719256
4          A    1  0.771334
..       ...  ...       ...
115        C    4  0.189733
116        C    4  0.244067
117        C    4  0.763828
118        C    4  0.268741
119        C    4  0.586709

并使用mdf.pivot(index = None, columns = 'Location', values = 'value'),但得到:

Location         A   B         C
0         0.046463 NaN       NaN
1         0.282879 NaN       NaN
2         0.570865 NaN       NaN
3         0.719256 NaN       NaN
4         0.771334 NaN       NaN
..             ...  ..       ...
115            NaN NaN  0.189733
116            NaN NaN  0.244067
117            NaN NaN  0.763828
118            NaN NaN  0.268741
119            NaN NaN  0.586709

然后,我使用mdf.pivot_table(columns = 'Location', values = 'value', index = None),但得到:

Location         A         B         C
value     0.497467  0.421472  0.474544

我不知道该怎么改正。
如何执行上述交换?

rjzwgtxy

rjzwgtxy1#

由于数据是随机的,所以输出可能会不同--尽管解决方案应该是适用的。其思想是long到wide和back到long的组合a-如果由于重复而遇到index error,则可以添加一个带有groupby的唯一计数器:

(cdf
.set_index('Location', append = True)
.unstack('Location')
.stack(level=0)
.rename_axis(index = [None, 'original_axis'], columns=None)
# depending on your end goal you may skip 
# the code below
.reset_index('original_axis')
.loc(axis=1)[['A', 'B', 'C', 'original_axis']]
.reset_index(drop=True)
) 
           A         B         C original_axis
0   0.243073  0.839207  0.204089             1
1   0.946684  0.154315  0.108707             2
2   0.197635  0.083025  0.022954             3
3   0.941140  0.085282  0.233555             4
4   0.564369  0.387318  0.620620             1
5   0.747247  0.693141  0.900607             2
6   0.784011  0.211551  0.882538             3
7   0.990973  0.714996  0.560776             4
8   0.211258  0.257663  0.077108             1
9   0.976495  0.413717  0.397158             2
10  0.039176  0.548559  0.592832             3
11  0.892454  0.283256  0.740506             4
12  0.112643  0.523662  0.422404             1
13  0.674075  0.861542  0.040622             2
14  0.158149  0.765760  0.180017             3
15  0.764700  0.983974  0.168166             4
16  0.603871  0.303765  0.893277             1
17  0.793998  0.774284  0.740311             2
18  0.460644  0.875710  0.128431             3
19  0.191124  0.473910  0.982510             4
20  0.050415  0.021572  0.242766             1
21  0.460084  0.937372  0.030185             2
22  0.791560  0.652112  0.074385             3
23  0.991015  0.924507  0.543565             4
24  0.215605  0.261787  0.740050             1
25  0.103512  0.004706  0.161741             2
26  0.362365  0.120351  0.182423             3
27  0.877847  0.863288  0.917792             4
28  0.508424  0.713536  0.360404             1
29  0.429722  0.741964  0.374892             2
30  0.148857  0.367541  0.773396             3
31  0.733401  0.097494  0.631226             4
32  0.073434  0.195457  0.495498             1
33  0.575820  0.845070  0.589751             2
34  0.382577  0.202422  0.619968             3
35  0.441545  0.393003  0.331093             4
36  0.242787  0.985934  0.552977             1
37  0.366286  0.167777  0.040000             2
38  0.283623  0.799820  0.657023             3
39  0.213178  0.578928  0.093788             4
kknvjkwl

kknvjkwl2#

如果可能,使用计数器,通过原始DataFrame cdf中的索引值将ignore_index=False添加到melt,转换为index并添加到旋转到index参数,最后将MultiIndex的第一级转换为列并添加一些数据清理:

cdf = pd.concat([df1, df2, df3])
mdf = (pd.melt(cdf, id_vars='Location', var_name='time', ignore_index=False)
         .reset_index()
         .pivot_table(index = ['index','time'], columns = 'Location', values = 'value')
         .reset_index(level=1)
         .sort_values('time')
         .reset_index(drop=True)
         .rename_axis(None, axis=1)
         .assign(original_axis = lambda x: x.pop('time'))
         )
print (mdf.head(20))
           A         B         C original_axis
0   0.321988  0.261495  0.834801             1
1   0.727275  0.721664  0.189777             1
2   0.890422  0.405724  0.549623             1
3   0.141341  0.078760  0.550035             1
4   0.524387  0.663287  0.742467             1
5   0.544935  0.218225  0.629431             1
6   0.588052  0.553420  0.021791             1
7   0.467896  0.972283  0.013206             1
8   0.022090  0.411311  0.715893             1
9   0.126596  0.625526  0.347060             1
10  0.390991  0.183550  0.115243             2
11  0.360875  0.110048  0.518665             2
12  0.180323  0.412962  0.709591             2
13  0.162077  0.912792  0.607409             2
14  0.337959  0.357006  0.474816             2
15  0.394469  0.863313  0.514885             2
16  0.501382  0.729779  0.627029             2
17  0.035648  0.585990  0.732794             2
18  0.456373  0.187173  0.362184             2
19  0.151172  0.391720  0.894907             2

或者:

cdf = pd.concat([df1, df2, df3])

mdf = (pd.melt(cdf.assign(g = cdf.groupby('Location').cumcount()), 
              id_vars=['Location','g'], var_name='time', ignore_index=False)
          .pivot_table(index = ['g','time'], columns = 'Location', values = 'value')
          .reset_index(level=1)
          .sort_values('time')
          .reset_index(drop=True)
          .rename_axis(None, axis=1)
          .assign(original_axis = lambda x: x.pop('time'))
         )
print (mdf.head(20))
           A         B         C original_axis
0   0.321988  0.261495  0.834801             1
1   0.727275  0.721664  0.189777             1
2   0.890422  0.405724  0.549623             1
3   0.141341  0.078760  0.550035             1
4   0.524387  0.663287  0.742467             1
5   0.544935  0.218225  0.629431             1
6   0.588052  0.553420  0.021791             1
7   0.467896  0.972283  0.013206             1
8   0.022090  0.411311  0.715893             1
9   0.126596  0.625526  0.347060             1
10  0.390991  0.183550  0.115243             2
11  0.360875  0.110048  0.518665             2
12  0.180323  0.412962  0.709591             2
13  0.162077  0.912792  0.607409             2
14  0.337959  0.357006  0.474816             2
15  0.394469  0.863313  0.514885             2
16  0.501382  0.729779  0.627029             2
17  0.035648  0.585990  0.732794             2
18  0.456373  0.187173  0.362184             2
19  0.151172  0.391720  0.894907             2

相关问题