我不确定我的请求是否可行,但我想这样做:
- 创建一个具有N个多索引和每个多索引的2列(X和Y)的 Dataframe 。然后,我想为每个多索引创建一个Z列,其中是相应多索引的X和Y列之和。
- 所有的操作都必须是矢量化的,我不希望有任何循环或硬编码(比如说,N〉1000时不会扩展)。新的列数应该是多索引的N * 3(x,y,z)。
这能做到吗?
下面是两个使用硬编码和列表理解/循环的代码示例,这不是我想要的。
列表理解:
import pandas as pd
import numpy as np
# Define N as the number of multi indexes
N = 3
# Create a list of tuples for the multi index levels
levels = [('A', 'B', 'C'), ('x', 'y')]
# Use itertools.product to generate all combinations of levels
from itertools import product
columns = list(product(*levels))
# Create a dataframe with random integers from 0 to 9 and 10 rows
df = pd.DataFrame(np.random.randint(0, 10, size=(10, len(columns))), columns=pd.MultiIndex.from_tuples(columns))
# Sum x and y columns for each multi index level using groupby and assign to z column using loop comprehension
df[[(*level, 'z') for level in levels[0]]] = df.groupby(level=0, axis=1).sum()
# Print the dataframe
print(df)
硬编码:
import pandas as pd
import numpy as np
# Define N as the number of multi indexes
N = 3
# Create a list of tuples for the multi index levels
levels = [('A', 'B', 'C'), ('x', 'y')]
# Use itertools.product to generate all combinations of levels
from itertools import product
columns = list(product(*levels))
# Create a dataframe with random integers from 0 to 9 and 10 rows
df = pd.DataFrame(np.random.randint(0, 10, size=(10, len(columns))), columns=pd.MultiIndex.from_tuples(columns))
# Sum x and y columns for each multi index level using groupby and assign to z column
df[('A', 'z')] = df.groupby(level=0, axis=1).sum()[('A')]
df[('B', 'z')] = df.groupby(level=0, axis=1).sum()[('B')]
df[('C', 'z')] = df.groupby(level=0, axis=1).sum()[('C')]
# Print the dataframe
print(df)
输出:
A B C A B C
x y x y x y z z z
0 8 5 9 5 9 9 13 14 18
1 7 4 6 6 0 2 11 12 2
2 4 1 5 1 5 8 5 6 13
3 5 3 5 6 2 0 8 11 2
4 4 3 5 9 3 0 7 14 3
5 9 4 8 3 3 4 13 11 7
6 0 5 7 3 6 1 5 10 7
7 2 9 2 8 0 9 11 10 9
8 5 2 7 5 1 9 7 12 10
9 7 3 9 2 5 5 10 11 10
1条答案
按热度按时间tjvv9vkg1#
如果我理解正确的话,可以使用
pd.concat
:图纸: