pandas 使用pd.wide_to_long后某些列的总和错误?

agxfikkp  于 2023-05-27  发布在  其他
关注(0)|答案(1)|浏览(158)

我的问题是:以Medium(5+7)开始的列的总和,而不是12,以及以Lower和Upper开始的其余列,使用pd.wide_to_long,并且仅显示第一个总和,即sum-Md。
我有以下dataframe:

Date  Medium-Ab  Lower-B.c   Upper-Dd  Medium-Fb  Lower-Gc Upper-H.I  year
09/2022          5          3         10          7         4        12  2022
10/2022          8          4         12          9         6        14  2022
11/2022          9          6         14         10         9        16  2022
12/2022         15         14         20          5         4        18  2022
01/2023         17         13         25         13         8        12  2023 
    ...        ...        ...        ...        ...       ...       ...  ...
12/2023         16         11         24         16        12        19  2023
01/2024         27         23         35         33        28        42  2023 
    ...        ...        ...        ...        ...       ...       ...   ...
12/2024         10         11         14         16        12        19  2023
    ...        ...        ...        ...        ...       ...       ...  ...
12/2032        ...        ...        ...        ...       ...       ...  ...

我想要的是:

Date  Medium-Ab  Lower-B.c  Upper-Dd  Medium-Fb  Lower-Gc  Upper-H.I  year  sum-Md sum-Lo  sum-Up
09/2022          5          3        10          7         4         12  2022      12      7      22 
10/2022          8          4        12          9         6         14  2022     ...    ...     ...
11/2022          9          6        14         10         9         16  2022     ...    ...     ...
    ...        ...        ...       ...        ...       ...        ...   ...     ...    ...     ...
11/2022        ...        ...       ...        ...       ...        ...   ...     ...    ...     ...

我的尝试是:

df['sum-Md','sum-Lo','sum-Up'] = (
   pd.wide_to_long(
       df, stubnames=["Medium", 'Lower', 'Upper'],
       i=["Date",], j="zone",
       sep="-", suffix='\w+'
   )
   .query("year>=2022", engine="python")
   .groupby("Date")
   .Medium
   .sum()
   .array
)
wpx232ag

wpx232ag1#

如果我理解的话
是的,我想在一个新的列中总结中等抗体和中等Fb,其名称为sum-Md,以及在一个新的列中的较低的列,其名称为sum-Lo和在一个新的列中的较高的列,然后将新的三列追加到原始 Dataframe 。
正确,那么最直接和明确的方法就是:

df['sum-Md'] = df['Medium-Ab'] + df['Medium-Fb']
df['sum-Lo'] = df['Lower-B.c'] + df['Lower-Gc']
df['sum-Up'] = df['Upper-Dd'] + df['Upper-H.I']
# Part in comment
df['Medium-Ab'] = df['Medium-Ab'] / df['sum-Md']
df['Medium-Fb'] = df['Medium-Fb'] / df['sum-Md']
df['Lower-B.c'] = df['Lower-B.c'] / df['sum-Lo']
df['Lower-Gc'] = df['Lower-Gc'] / df['sum-Lo']
df['Upper-Dd'] = df['Upper-Dd'] / df['sum-Up']
df['Upper-H.I'] = df['Upper-H.I'] / df['sum-Up']

如果dataframe实际上有更多的排序列,那么您可以尝试更编程的方法,如

for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
    col, cols = f'sum-{short}', [c for c in df.columns if c.startswith(start)]
    df[col] = df[cols].sum(axis=1)
    # Part in comment
    df[cols] = df[cols].div(df[col].values, axis=0)

df[['sum-Md','sum-Lo','sum-Up']] = (
    df.drop(columns=['Date', 'year'])
    .groupby(lambda c: c.split('-')[0], axis=1, sort=False)
    .sum()
)
# Part in comment
for start, short in ('Medium', 'Md'), ('Lower', 'Lo'), ('Upper', 'Up'):
    base = f'sum-{short}'
    for col in df.filter(regex=f'^{start}').columns:
        df[col] = df[col] / df[base]

相关问题