pandas 矢量化多索引操作,无需任何循环或硬编码

bfhwhh0e  于 2023-02-28  发布在  其他
关注(0)|答案(1)|浏览(96)

我不确定我的请求是否可行,但我想这样做:

  • 创建一个具有N个多索引和每个多索引的2列(X和Y)的 Dataframe 。然后,我想为每个多索引创建一个Z列,其中是相应多索引的X和Y列之和。
  • 所有的操作都必须是矢量化的,我不希望有任何循环或硬编码(比如说,N〉1000时不会扩展)。新的列数应该是多索引的N * 3(x,y,z)。

这能做到吗?
下面是两个使用硬编码和列表理解/循环的代码示例,这不是我想要的。
列表理解:

import pandas as pd
import numpy as np

# Define N as the number of multi indexes
N = 3

# Create a list of tuples for the multi index levels
levels = [('A', 'B', 'C'), ('x', 'y')]

# Use itertools.product to generate all combinations of levels
from itertools import product
columns = list(product(*levels))

# Create a dataframe with random integers from 0 to 9 and 10 rows
df = pd.DataFrame(np.random.randint(0, 10, size=(10, len(columns))), columns=pd.MultiIndex.from_tuples(columns))

# Sum x and y columns for each multi index level using groupby and assign to z column using loop comprehension
df[[(*level, 'z') for level in levels[0]]] = df.groupby(level=0, axis=1).sum()

# Print the dataframe
print(df)

硬编码:

import pandas as pd
import numpy as np

# Define N as the number of multi indexes
N = 3

# Create a list of tuples for the multi index levels
levels = [('A', 'B', 'C'), ('x', 'y')]

# Use itertools.product to generate all combinations of levels
from itertools import product
columns = list(product(*levels))

# Create a dataframe with random integers from 0 to 9 and 10 rows
df = pd.DataFrame(np.random.randint(0, 10, size=(10, len(columns))), columns=pd.MultiIndex.from_tuples(columns))

# Sum x and y columns for each multi index level using groupby and assign to z column
df[('A', 'z')] = df.groupby(level=0, axis=1).sum()[('A')]
df[('B', 'z')] = df.groupby(level=0, axis=1).sum()[('B')]
df[('C', 'z')] = df.groupby(level=0, axis=1).sum()[('C')]

# Print the dataframe
print(df)

输出:

A     B     C      A   B   C
   x  y  x  y  x  y   z   z   z
0  8  5  9  5  9  9  13  14  18
1  7  4  6  6  0  2  11  12   2
2  4  1  5  1  5  8   5   6  13
3  5  3  5  6  2  0   8  11   2
4  4  3  5  9  3  0   7  14   3
5  9  4  8  3  3  4  13  11   7
6  0  5  7  3  6  1   5  10   7
7  2  9  2  8  0  9  11  10   9
8  5  2  7  5  1  9   7  12  10
9  7  3  9  2  5  5  10  11  10
tjvv9vkg

tjvv9vkg1#

如果我理解正确的话,可以使用pd.concat

df2 = df.groupby(level=0, axis=1).sum()
df2.columns = pd.MultiIndex.from_product([df2.columns, ['z']])

df = pd.concat([df, df2], axis=1)
print(df)

图纸:

A     B     C      A   B   C
   x  y  x  y  x  y   z   z   z
0  1  9  6  8  5  3  10  14   8
1  1  3  5  7  7  9   4  12  16
2  2  9  8  1  8  7  11   9  15
3  0  0  0  3  6  5   0   3  11
4  9  7  7  4  7  7  16  11  14
5  6  7  8  1  2  0  13   9   2
6  1  3  5  3  4  4   4   8   8
7  7  9  0  3  3  0  16   3   3
8  6  0  4  7  7  9   6  11  16
9  4  4  0  8  4  8   8   8  12

相关问题