将Pandas中的行合并为一个长行

zynd9foi  于 2023-04-18  发布在  其他
关注(0)|答案(2)|浏览(112)

我有以下的样本DF

import pandas as pd
import numpy as np

# Create a range of timestamps for 100 consecutive days starting from today
timestamps = pd.date_range(start=pd.Timestamp.now().floor('H'), periods=100, freq='H')

# Create a DataFrame with 100 rows and 3 columns
df = pd.DataFrame({'timestamp': timestamps,
                   # 'value1': np.random.randn(100),
                   # 'value2': np.random.randint(0, 10, 100)})
                   'value1': 'abc', 'value2': 'def'})

# Group the rows by day and apply a custom function that concatenates values
df = df.groupby(df['timestamp'].dt.date).agg({'timestamp': 'first',
                                               'value1': lambda x: ', '.join(x),
                                               'value2': lambda x: ', '.join(x),
                                              }).reset_index(drop=True)
print(df.head())

现在我已经在创建dataframe的过程中注解掉了value 1和value 2,以便首先尝试使用字符串。最终我需要它与浮点数一起工作。现在它将字符串组合为一个长逗号分隔的字符串,在一列中,我想这离我想要的又近了一小步。
我试图实现的是创建一个DF,它将1天的所有行合并为一个巨大的行,重命名列名,因此列应该看起来像这样:timestamp,value1_00,value2_00,value1_01,value2_01,...,value2_23
现在我已经尝试了一点,但是因为我对pandas很陌生,所以我不完全确定从哪里开始,解决这个问题的一个好的开始是什么?我正在考虑的一件事是创建一个小for循环来创建新的列,因为它可以是一个简单的循环,如:

for i in range(0,24):
    if i < 10:
        new_column1 = 'value1_0' + str(i)
        new_column2 = 'value2_0' + str(i)
        new_columns = [new_column1, new_column2]
        df[new_columns] = np.nan
    else:
        new_column1 = 'value1_' + str(i)
        new_column2 = 'value2_' + str(i)
        new_columns = [new_column1, new_column2]
        df[new_columns] = np.nan

但是我如何将正确的值添加到正确的新列中呢?我确信可以通过创建几个函数来完成,但肯定有更方便的方法!

0lvr5msh

0lvr5msh1#

IIUC,您可以使用pivot

df = pd.DataFrame({'timestamp': timestamps, 'value1': 'abc', 'value2': 'def'})
​
out = (df.assign(date= df['timestamp'].dt.date, hour= df['timestamp'].dt.hour)
            .pivot(index='date', columns='hour', values=['value1', 'value2']))
​
out.columns = [f'{col[0]}_{col[1]:02d}' for col in out.columns]
​
out = out.reset_index()

输出:

print(out)

         date value1_00 value1_01  ... value2_21 value2_22 value2_23
0  2023-04-14       NaN       NaN  ...       NaN       NaN       def
1  2023-04-15       abc       abc  ...       def       def       def
2  2023-04-16       abc       abc  ...       def       def       def
3  2023-04-17       abc       abc  ...       def       def       def
4  2023-04-18       abc       abc  ...       def       def       def
5  2023-04-19       abc       abc  ...       NaN       NaN       NaN

[6 rows x 49 columns]
njthzxwz

njthzxwz2#

我想我正确地理解了你的问题,如果是这样,你可以使用一个groupby沿着unstacking来得到想要的结果。
我可以在今天下班后更新解释,我想尽快把这个给你!

import pandas as pd
import numpy as np

# Create a range of timestamps for 100 consecutive days starting from today
timestamps = pd.date_range(start=pd.Timestamp.now().floor('H'), periods=100, freq='H')

# Create a DataFrame with 100 rows and 3 columns
df = pd.DataFrame({'timestamp': timestamps,
                   'value1': np.random.randn(100),
                   'value2': np.random.randint(0, 10, 100)})

# SOLUTION BELOW
groups = []
for day, group in df.groupby(df['timestamp'].dt.date):
    # remove column from frame and reset index
    group = group.drop(columns='timestamp')
    group = group.reset_index(drop=True)

    # unstack to get into wide format
    test = group.unstack().to_frame()
    test= test.T
    test = test.reset_index(drop=True)
    # join multiindex and assign columns to joined list
    tups = list(test.columns)
    new_cols = [f'{i[0]}_{i[1]}' for i in tups]
    test.columns = new_cols
    # insert day
    test.insert(0, 'timestamp', day)
    groups.append(test)

new_frame = pd.concat(groups, axis=0)

timestamp   value1_0    value1_1    value1_2    value1_3    value1_4    value1_5    value1_6    value1_7    value1_8    ... value2_14   value2_15   value2_16   value2_17   value2_18   value2_19   value2_20   value2_21   value2_22   value2_23
0   2023-04-14  -0.385854   -0.756550   0.160178    0.188067    0.027170    -1.908315   0.261987    0.440461    0.221609    ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0   2023-04-15  -1.699114   0.934643    0.268068    0.053558    -0.378112   -0.527642   1.015539    0.520676    -1.924164   ... 3.0 3.0 1.0 5.0 0.0 8.0 8.0 9.0 0.0 8.0
0   2023-04-16  -0.640399   0.039664    0.135044    -0.628916   0.247612    -1.727348   -1.342490   -1.361860   -0.128229   ... 3.0 6.0 6.0 9.0 7.0 6.0 8.0 8.0 3.0 6.0
0   2023-04-17  -1.441397   -1.779004   0.641108    -0.119969   -0.007153   -0.441358   2.292929    0.788280    1.296164    ... 4.0 9.0 8.0 0.0 8.0 2.0 3.0 7.0 9.0 8.0
0   2023-04-18  -0.109793   -0.378564   -1.037385   -1.667078   -0.453537   0.564961    -0.339580   -1.452894   0.138015    ... 2.0 1.0 2.0 8.0 1.0 NaN NaN NaN NaN NaN

请通过打印new_frame让我知道这是否对您有效。

相关问题