matplotlib 如何创建分组和堆叠条形图

jmo0nnb3 于 2023-05-01 发布在其他

关注(0)|答案(1)|浏览(148)

我有一个非常庞大的数据集，其中有许多子公司为不同国家的三个客户群服务，类似于这样的情况（实际上有更多的子公司和日期）：

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'subsidiary': ['EU','EU','EU','EU','EU','EU','EU','EU','EU','US','US','US','US','US','US','US','US','US'],'date': ['2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05','2019-03','2019-04', '2019-05'],'business': ['RETAIL','RETAIL','RETAIL','CORP','CORP','CORP','PUBLIC','PUBLIC','PUBLIC','RETAIL','RETAIL','RETAIL','CORP','CORP','CORP','PUBLIC','PUBLIC','PUBLIC'],'value': [500.36,600.45,700.55,750.66,950.89,1300.13,100.05,120.00,150.01,800.79,900.55,1000,3500.79,5000.36,4500.25,50.17,75.25,90.33]})
print(df)

我想通过制作一个堆积条形图来对每个子公司进行分析。为了做到这一点，我首先定义x轴为唯一月份，并定义一个国家/地区的每个业务类型的子集，如下所示：

x=df['date'].drop_duplicates() 
EUCORP = df[(df['subsidiary']=='EU') & (df['business']=='CORP')] 
EURETAIL = df[(df['subsidiary']=='EU') & (df['business']=='RETAIL')] 
EUPUBLIC = df[(df['subsidiary']=='EU') & (df['business']=='PUBLIC')]

然后，我可以按业务类型制作一个条形图：

plotEUCORP = plt.bar(x=x, height=EUCORP['value'], width=.35)
plotEURETAIL = plt.bar(x=x, height=EURETAIL['value'], width=.35)
plotEUPUBLIC = plt.bar(x=x, height=EUPUBLIC['value'], width=.35)

然而，如果我试图将这三个组合在一个图表中，我总是失败：

plotEURETAIL = plt.bar(x=x, height=EURETAIL['value'], width=.35)
plotEUCORP = plt.bar(x=x, height=EUCORP['value'], width=.35, bottom=EURETAIL)
plotEUPUBLIC = plt.bar(x=x, height=EUPUBLIC['value'], width=.35, bottom=EURETAIL+EUCORP)
plt.show()

我总是收到下面的错误消息：
ValueError：缺少StrCategoryConverter的类别信息;这可能是由于无意中混合了分类数据和数字数据造成的
转换错误：无法将值转换为轴单位：子公司日期业务价值0 EU 2019-03零售500。36 1 EU 2019-04零售600。45 2 EU 2019-05 RETAIL 700.55
我试着把月份转换成日期格式和/或索引它，但它实际上让我更加困惑。..
我真的很感激以下任何方面的帮助/支持，因为我已经花了很多时间来解决这个问题（我仍然是一个Python新手，Sry）：
1.如何修复创建堆叠条形图的错误？
1.假设错误可以修复，这是创建条形图的最有效方法吗？例如，我真的需要为每个子公司创建三个子DFS吗？或者有更好的方法吗？）
1.是否有可能编写一个迭代代码，生成一个按国家/地区的堆叠条形图，这样我就不需要为每个子公司创建一个条形图了？

matplotlib

来源：https://stackoverflow.com/questions/69242928/how-to-create-grouped-and-stacked-bars

1条答案

按热度按时间

6qqygrtg1#

作为一个参考，堆叠的条形图不是最好的选择，因为它们会使比较条形图值变得困难，并且很容易被误解。可视化的目的是以易于理解的格式呈现数据;确保信息清晰。并排的酒吧往往是一个更好的选择。
并排堆叠的条形图是一个很难构建的手动过程，最好使用像seaborn.catplot这样的图形级方法，它将创建一个易于阅读的数据可视化。
条形图刻度位于0索引范围（不是日期时间），日期只是标签，因此没有必要将它们转换为datetime dtype。
*在python 3.8.11、pandas 1.3.2、matplotlib 3.4.3、seaborn 0.11.2中测试

`seaborn`

import seaborn as sns

sns.catplot(kind='bar', data=df, col='subsidiary', x='date', y='value', hue='business')

创建分组和堆叠条形图

参见Stacked Bar Chart和Grouped bar chart with labels
*在OP中创建堆叠条的问题是bottom被设置在该组的整个 Dataframe 上，而不是仅设置构成条高度的值。
*我是否真的需要为每个子公司创建三个子dfs。是的，每个组都需要一个DataFrame，在本例中是6。
可以使用dict-comprehension将.groupby对象解压缩为dict来自动创建数据子集。
data = {''.join(k): v for k, v in df.groupby(['subsidiary', 'business'])}创建dict的DataFrames
访问值，如：data['EUCORP'].value
自动化绘图创建更加困难，如图所示，x取决于每个刻度的条形图组数，bottom取决于每个后续绘图的值。

import numpy as np
import matplotlib.pyplot as plt

labels=df['date'].drop_duplicates()  # set the dates as labels
x0 = np.arange(len(labels))  # create an array of values for the ticks that can perform arithmetic with width (w)

# create the data groups with a dict comprehension and groupby
data = {''.join(k): v for k, v in df.groupby(['subsidiary', 'business'])}

# build the plots
subs = df.subsidiary.unique()
stacks = len(subs)  # how many stacks in each group for a tick location
business = df.business.unique()

# set the width
w = 0.35

# this needs to be adjusted based on the number of stacks; each location needs to be split into the proper number of locations
x1 = [x0 - w/stacks, x0 + w/stacks]

fig, ax = plt.subplots()
for x, sub in zip(x1, subs):
    bottom = 0
    for bus in business:
        height = data[f'{sub}{bus}'].value.to_numpy()
        ax.bar(x=x, height=height, width=w, bottom=bottom)
        bottom += height
        
ax.set_xticks(x0)
_ = ax.set_xticklabels(labels)

正如您所看到的，小值很难辨别，并且使用ax.set_yscale('log')不能像预期的那样处理堆叠的条形图（例如：例如，它不会使小值更可读）。

只创建堆叠条

正如@r-beginners所提到的，使用.pivot或.pivot_table将dataframe重塑为宽形式，以创建x轴为元组（'date'，'subsidiary'）的堆叠条。
如果每个类别都没有重复值，则使用.pivot
如果存在必须与aggfunc组合的重复值，则使用.pivot_table（例如例如'sum'、'mean'等。）

# reshape the dataframe
dfp = df.pivot(index=['date', 'subsidiary'], columns=['business'], values='value')

# plot stacked bars
dfp.plot(kind='bar', stacked=True, rot=0, figsize=(10, 4))

赞(0）回复(0）举报 2023-05-01

我来回答

matplotlib 如何创建分组和堆叠条形图

1条答案

`seaborn`

创建分组和堆叠条形图

只创建堆叠条

相关问题

热门标签

最新问答