pandas 如何在pands数据框架中重新构建多索引，使其类似于excel数据透视表

qgelzfjb 于 2022-12-09 发布在其他

关注(0)|答案(1)|浏览(105)

我有一个数据框，其中我有一个2或3级的多索引，我想重新塑造它作为Excel中通常的数据透视表，以便能够做'内部'总计（见图）。x1c 0d1x
我尝试使用df.pivot_table()和通过.groupby()的多索引，但没有任何结论
我只有数据框

下面是代码

df = pd.DataFrame({'Products': ['Products A','Products A', 
                           'Products A','Products B', 'Products B', 
                           'Products A', 'Products B', 'Products A'],

                   'Sub Products': ['Phone A','Phone B', 
                                   'Laptop B','Phone B', 'Laptop 
                                    B','Phone A','Phone B','Laptop A'],

                   'Color' : ['Green',  'Blue','Red',
                            'Red','Red','Blue','Green','Blue']})

df.groupby(['Products','Sub Products','Color' ]).count()

如果你有任何想法，这将是超级有帮助的！谢谢。

pandas

来源：https://stackoverflow.com/questions/74733421/how-to-reshape-multi-index-in-a-pandas-dataframe-like-an-excel-pivot-table

1条答案

按热度按时间

xwbd5t1u1#

在Pandas中，通常不会将此聚合信息作为同一分组DataFrame的一部分，而是在之后使用单独的命令获取，例如：
grand_total = df.sum()
请注意，您在问题中提供的数据并不能完全生成您的图像。数字不同，一些A/B标签也不一致。下面我编辑了您提供的代码，重现了与您的图像匹配的内容，假设您提供的示例数据的每一行都是一个“单元”。

df = pd.DataFrame(
    {
        "Products": [
            "Products A",
            "Products A",
            "Products A",
            "Products B",
            "Products B",
            "Products A",
            "Products B",
            "Products A",
        ],
        "Sub Products": [
            "Phone A",
            "Phone A",
            "Laptop A",
            "Phone B",
            "Laptop  B",
            "Phone A",
            "Phone B",
            "Laptop A",
        ],
        "Color": ["Green", "Blue", "Red", "Red", "Red", "Blue", "Green", "Blue"],
    }
)
df['Count'] = 1
df = df.groupby(['Products','Sub Products','Color' ]).sum()

# To view the totals at any particular level of the multi-index
display(df.groupby(level=0)['Count'].sum())
display(df.groupby(level=1)['Count'].sum())
display(df.groupby(level=2)['Count'].sum())

这将为您提供所需的信息......但是，从您的注解中可以看出，您似乎只需要一种特定的显示格式。使用下面的代码可以实现这一点，但它会丢失多索引数据框的实际组织结构：

out = pd.DataFrame(columns=['Product','Count'])
for n1, d1 in df.groupby(level=0):
  out = pd.concat([out,pd.DataFrame({"Product": n1, "Count": d1.sum().values})])
  d1 = d1.droplevel(0)
  for n2, d2 in d1.groupby(level=0):
    out = pd.concat([out,pd.DataFrame({"Product": n2, "Count": d2.sum().values})])
    d2 = d2.droplevel(0)
    for n3, d3 in d2.groupby(level=0):
      out = pd.concat([out,pd.DataFrame({"Product": n3, "Count": d3.sum().values})])
display(out)

产量：

Product Count
0   Products A  5
0   Laptop A    2
0   Blue    1
0   Red 1
0   Phone A 3
0   Blue    2
0   Green   1
0   Products B  3
0   Laptop  B   1
0   Red 1
0   Phone B 2
0   Green   1
0   Red 1

更好的是，下面是上面的递归版本：

# recursive version for arbitrarily deep multi-index
def traverse_multiindex(d, out):
  for n1, d1 in d.groupby(level=0):
    out = pd.concat([out,pd.DataFrame({"Product": n1, "Count": d1.sum().values})])
    if (d1.index.nlevels>1):
      d2 = d1.droplevel(0)
      out = traverse_multiindex(d2, out)
  return out

# initialize empty
out = pd.DataFrame(columns=['Product','Count'])
out = traverse_multiindex(df, out)
display(out)

赞(0）回复(0）举报 2022-12-09

我来回答

pandas 如何在pands数据框架中重新构建多索引，使其类似于excel数据透视表

1条答案

相关问题

热门标签

最新问答