有没有办法把一个单索引的PandasDataFrame改造成一个多索引的来适应时间序列?

1tu0hz3e  于 2023-02-11  发布在  其他
关注(0)|答案(2)|浏览(141)

下面是一个示例数据框:

import pandas as pd

sample_dframe = pd.DataFrame.from_dict(
    {
        "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
        "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
        "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
        "V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
        "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
    }
)

数据框如下所示:

上面的样本形状是(22, 5),包含idV1..V4列。我需要将其转换为多索引 Dataframe (作为时间序列),其中对于给定的id,我需要对给定idV1 .. V4的5个值(时间步长)进行分组。
即,由于存在2个唯一的id值,所以它应该给予我形状为(2, 4, 5)的帧。

5jdjgkvh

5jdjgkvh1#

IIUC,您可能需要:

sample_dframe.set_index('id').stack()
  • 注意:输出为Series,对于DataFrame添加.to_frame(name='col_name')。*

输出:

id     
123  V1       2552
     V2       3434
     V3        382
     V4      32628
     V1        813
            ...   
456  V4    4088920
     V1    7279544
     V2       4453
     V3      58830
     V4    6323521
Length: 88, dtype: int64

或者,也许:

(sample_dframe
 .assign(time=lambda d: d.groupby('id').cumcount())
 .set_index(['id', 'time']).stack()
 .swaplevel('time', -1)
 )

输出:

id       time
123  V1  0          2552
     V2  0          3434
     V3  0           382
     V4  0         32628
     V1  1           813
                  ...   
456  V4  10      4088920
     V1  11      7279544
     V2  11         4453
     V3  11        58830
     V4  11      6323521
Length: 88, dtype: int64
zrfyljdw

zrfyljdw2#

import itertools
import timeit
from pandas import DataFrame
import numpy as np
import pandas as pd
from datetime import datetime
from pandas import DataFrame
import functools as ft

df= pd.DataFrame.from_dict(
    {
        "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
        "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
        "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
        "V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
        "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
    }
)

print(df)

"""
     id       V1    V2     V3       V4
0   123     2552  3434    382    32628
1   123      813   133    161     4471
2   123      496   424   7237     4781
3   123      401   491   7503     1497
4   123     4078  8217    561    45104
5   123      952   915   6801     8657
6   123     7279  7179   1072    81074
7   123      544  5414   9660     1091
8   123      450   450  62107   370835
9   123      548   548   6233     2058
10  456      433   433   5403     4447
11  456     4696  4696   3745     7376
12  456      244   244   8613   302237
13  456     9735  9735   6302     6833
14  456     4263  4263    557    48348
15  456      642   642   4256     3545
16  456      255   255   9874     4263
17  456     2813  2813   3013      642
18  456      496   496   9352      255
19  456      401   401   4522     2813
20  456  4078952  4952   3232  4088920
21  456  7279544  4453  58830  6323521

"""

df = df.set_index('id').stack().reset_index().drop(columns = 'level_1').rename(columns = {0:'V1_new'})
print(df)
"""
    id   V1_new
0   123     2552
1   123     3434
2   123      382
3   123    32628
4   123      813
..  ...      ...
83  456  4088920
84  456  7279544
85  456     4453
86  456    58830
87  456  6323521
"""

相关问题