pandas 如何熔化和取消透视多标题 Dataframe ?

tjvv9vkg  于 2022-12-10  发布在  其他
关注(0)|答案(1)|浏览(121)

I have this data that I want to unpivot and melt into columns. The data is a multi-header table. I have a sample dictionary of the data.

Edit here___

I don't know how to convert a dictionary with multiple keys like I had shown previously into a df so let's restructure the dictionary like so...

data = {
    "id": {
        0: "month",
        1: "11/30/2021",
        2: "12/31/2021",
        3: "1/31/2022",
        4: "2/28/2022",
        5: "3/31/2022",
    },
    "A48": {0: "storage", 1: "0", 2: "29", 3: "35", 4: "33", 5: "30"},
    "A48.1": {0: "use", 1: "0", 2: "1", 3: "0", 4: "0", 5: "0"},
    "A62": {0: "direct", 1: "0", 2: "0", 3: "2", 4: "3", 5: "2"},
    "A62.1": {0: "storage", 1: "0", 2: "57", 3: "69", 4: "65", 5: "59"},
    "A62.2": {0: "use", 1: "0", 2: "1", 3: "0", 4: "0", 5: "0"},
}

Now let's get the Dataframe...

dfc = pd.DataFrame.from_dict(data)
dfc.columns=pd.MultiIndex.from_arrays([dfc.columns,dfc.iloc[0]])
dfc = dfc.iloc[2:].reset_index(drop=True)

Which looks like this:

id     A48 A48.1    A62   A62.1 A62.2
        month storage   use direct storage   use
0  12/31/2021      29     1      0      57     1
1   1/31/2022      35     0      2      69     0
2   2/28/2022      33     0      3      65     0
3   3/31/2022      30     0      2      59     0

What I am looking for is a table like this.
| month | id | direct | storage | use |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 11/30/2021 | A48 | NaN | 0 | 0 |
| 12/31/2021 | A48 | NaN | 29 | 1 |
| 1/31/2022 | A48 | NaN | 35 | 0 |
| 2/28/2022 | A48 | NaN | 33 | 0 |
| 3/31/2022 | A48 | NaN | 30 | 0 |
| 11/30/2021 | A62 | 0 | 0 | 0 |
| 12/31/2021 | A62 | 0 | 57 | 1 |
| 1/31/2022 | A62 | 2 | 69 | 0 |
| 2/28/2022 | A62 | 3 | 65 | 0 |
| 3/31/2022 | A62 | 2 | 59 | 0 |

6ojccjat

6ojccjat1#

为以后使用定义以下helper函数:

import pandas as pd

def helper(df):
    return df.pipe(
        lambda df_: df_.rename(columns={"col1": df_["col0"].unique()[0]})
        .drop(columns="col0")
        .reset_index(drop=True)
    )

然后,使用Pandasmeltconcatmerge方法:

# Setup
n = dfc.shape[0]

# Melt dataframe and cleanup
melted_dfc = dfc.melt()
melted_dfc.columns = ["id", "col0", "col1"]
melted_dfc["id"] = melted_dfc["id"].replace(r"[.]\d+", "", regex=True)

# Get intermediate dataframes
month_df = helper(melted_dfc.loc[: n - 1, :]).drop(columns="id")
sub_dfs = [
    pd.concat([month_df, helper(df)], axis=1)
    for df in [
        melted_dfc.loc[i : i + n - 1, :] for i in range(n, melted_dfc.shape[0], n)
    ]
]

# Merge intermediate dataframes
final_df = sub_dfs[0]
for sub_df in sub_dfs[1:]:
    final_df = pd.merge(
        left=final_df, right=sub_df, how="outer", on=["month", "id"]
    ).fillna(0)

# Cleanup temporary columns created during merge
columns_to_merge = set(
    col[:-2] for col in final_df.columns if col.endswith(("_x", "_y"))
)
for col in columns_to_merge:
    final_df[col] = final_df[f"{col}_x"].astype(int) + final_df[f"{col}_y"].astype(int)
    final_df = final_df.drop(columns=[f"{col}_x", f"{col}_y"])

最后道:

print(final_df)
# Output
        month   id direct  storage  use
0  12/31/2021  A48      0       29    1
1   1/31/2022  A48      0       35    0
2   2/28/2022  A48      0       33    0
3   3/31/2022  A48      0       30    0
4  12/31/2021  A62      0       57    1
5   1/31/2022  A62      2       69    0
6   2/28/2022  A62      3       65    0
7   3/31/2022  A62      2       59    0

相关问题