matplotlib 如何设置 Dataframe 分组后的断条顺序

uxh89sit 于 2023-06-06 发布在其他

关注(0)|答案(2)|浏览(476)

下面的代码示例绘制了一个破碎的barh图，其中包含在一段时间内加入和离开音乐乐队的人员列表：

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

result = pd.DataFrame([['Bill', 1972, 1974],
                       ['Bill', 1976, 1978],
                       ['Bill', 1967, 1971],
                       ['Danny', 1969, 1975],
                       ['Danny', 1976, 1977],
                       ['James', 1971, 1972],
                       ['Marshall', 1967, 1975]],
                      columns=['Person', 'Year_start', 'Year_left'])

fig, ax = plt.subplots()

names = sorted(result['Person'].unique())

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, len(names)))

height = 0.5
for y, (name, g) in enumerate(result.groupby('Person')):
    ax.broken_barh(list(zip(g['Year_start'],
                            g['Year_left']-g['Year_start'])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )

ax.set_ylim(0-height, len(names)-1+height)
ax.set_xlim(result['Year_start'].min()-1, result['Year_left'].max()+1)
ax.set_yticks(range(len(names)), names)

ax.grid(True)
plt.show()

输出结果如下：

我需要按'Year_start'和'Year_left'对条形图（沿着y轴中的Persons）进行排序，两者都是升序。
我知道如何在数据分组后聚合和排序dataframe中的值，并且我应该在之后重置索引：

sorted_result = result.groupby('Person').agg({'Year_start': min, 'Year_left': max})
sorted_result = sorted_result.sort_values(['Year_start', 'Year_left'], ascending=[True, True]).reset_index()
print(sorted_result)

但是在绘制ax.broken_barh时，我很难将这种排序嵌入到现有的“for in”循环中（也是因为我认为在单次迭代中不可能使用“agg”执行“sort_values”和“groupby”）。这种排序在这个脚本中是否可能，或者我应该完全重新考虑脚本结构？非常感谢！

matplotlib

来源：https://stackoverflow.com/questions/76406816/how-to-set-broken-bar-order-after-grouping-the-dataframe

2条答案

按热度按时间

kcrjzv8t1#

在IIIC中，您所需要做的就是在使用groupby()时使用sort=False，并预先以您想要的方式对 Dataframe 进行排序。代码的其余部分可以保持不变：
编辑：然而，由于排序是非常具体的，不容易在sort_values()中覆盖，我建议在外部 Dataframe 中进行排序，然后将其合并回原始 Dataframe 进行排序。

result = pd.DataFrame([['Bill', 1972, 1974],
                       ['Bill', 1976, 1978],
                       ['Bill', 1967, 1971],
                       ['Danny', 1969, 1975],
                       ['Danny', 1976, 1977],
                       ['James', 1971, 1972],
                       ['Marshall', 1967, 1975]],
                      columns=['Person', 'Year_start', 'Year_left'])

sorter = result.groupby('Person').agg({'Year_start':'min','Year_left':'max'})\
    .sort_values(['Year_start','Year_left'],
                 ascending=[True,True])\
        .index.to_frame().\
            assign(sorter = range(result['Person'].nunique()))\
                .set_index('Person').to_dict()['sorter']
                                                               
result['sorter'] = result['Person'].map(sorter)
result = result.sort_values('sorter',ascending=True)

fig, ax = plt.subplots()

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, result['Person'].nunique()))

height = 0.5
names = []
for y, (name, g) in enumerate(result.groupby('Person',sort=False)): #Here I'm using sort=False to avoid groupby from sorting it differently
    print(name)
    ax.broken_barh(list(zip(g['Year_start'],
                            g['Year_left']-g['Year_start'])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )
    names.append(name)

代码的其余部分保持不变。该输出：
我还做了一个小小的改进，没有静态定义稍后将传递的names，而是在循环进行时创建列表，以便名称始终与bar匹配。这就是我使用result['Person'].nunique()而不是len(names)的原因
编辑：根据与OP [1]的讨论编辑代码：https://i.stack.imgur.com/6epWQ.png

赞(0）回复(0）举报 2023-06-06

disbfnqx2#

你几乎到了那里：-）你已经有了按最早开始和最早结束排序的名字。您只需要将Person列按照您收到的顺序更改为categorical，然后通过在分组之前添加sort_values（'Person'）来执行barh绘图。更新的代码如下。添加注解，使它变得容易。希望这就是你正在寻找的...
请注意，我认为您正在使用matplotlib 3.2或更早版本。所以set_yticks（）工作。但是，它在新版本中被弃用。将其拆分为set_yticks（）和set_yticklabels（），这在新版本中是必需的。此外，将名称更改为sorted_result.Person.to_list（），以便标签正确对齐。

result = pd.DataFrame([['Bill', 1972, 1974],
                       ['Bill', 1976, 1978],
                       ['Bill', 1967, 1971],
                       ['Danny', 1969, 1975],
                       ['Danny', 1976, 1977],
                       ['James', 1971, 1972],
                       ['Marshall', 1967, 1975]],
                      columns=['Person', 'Year_start', 'Year_left'])

fig, ax = plt.subplots()

names = sorted(result['Person'].unique())

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, len(names)))

height = 0.5

## NEW ADDED CODE ##
## This is your code.. get the sorted_result
sorted_result = result.groupby('Person').agg({'Year_start': min, 'Year_left': max})
sorted_result = sorted_result.sort_values(['Year_start', 'Year_left'], ascending=[True, True]).reset_index()

## Change Person to categorical, so that, when you sort it, it will be in the order you need
## Notice that I am using sorted_result.Person.to_list(), basically sort order as you need
result['Person'] = pd.Categorical(
    result['Person'], 
    categories=sorted_result.Person.to_list(), 
    ordered=True
)

## Here, added sort_values('Person') before grouping...
for y, (name, g) in enumerate(result.sort_values('Person').groupby('Person')):
    ax.broken_barh(list(zip(g['Year_start'],
                            g['Year_left']-g['Year_start'])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )

ax.set_ylim(0-height, len(names)-1+height)
ax.set_xlim(result['Year_start'].min()-1, result['Year_left'].max()+1)
ax.set_yticks(range(len(sorted_result.Person.to_list())))  ##Changed name
ax.set_yticklabels(sorted_result.Person.to_list())  ## Changed name

ax.grid(True)
plt.show()

赞(0）回复(0）举报 2023-06-06

我来回答

matplotlib 如何设置 Dataframe 分组后的断条顺序

2条答案

相关问题

热门标签

最新问答