matplotlib 为不同的组创建小提琴图并使用两个不同的y轴

oug3syen  于 2023-05-07  发布在  其他
关注(0)|答案(3)|浏览(217)

我目前有以下情节:

问题是,由于短期小提琴图约为-0.1,长期小提琴图约为-0.5,因此图表的可读性远远低于它可能的可读性。因此,我想创建第二个y轴,连接到shortrun小提琴图。
我想使用两个不同的y轴创建一个小提琴图,同时在x轴上为多个标签绘制多个小提琴图。
我在尝试创作一个小提琴的情节。具体来说,对于3个不同的风险组,我想为每个长期和短期弹性绘制一个小提琴图(总共6个小提琴)。由于长期弹性和短期弹性的数量级不同,我想用不同的y尺度来表示长期和短期。
这就是我到目前为止所做的:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

np.random.seed(50)

# generate some random data
data1 = pd.DataFrame(np.random.normal(loc=0, scale=1, size=1000), columns=['Value'])
data2 = pd.DataFrame(np.random.normal(loc=5, scale=0.1, size=100), columns=['Value'])
data3 = pd.DataFrame(np.random.normal(loc=1, scale=1, size=1000), columns=['Value'])
data4 = pd.DataFrame(np.random.normal(loc=1, scale=0.1, size=100), columns=['Value'])
data5 = pd.DataFrame(np.random.normal(loc=2, scale=1, size=1000), columns=['Value'])
data6 = pd.DataFrame(np.random.normal(loc=2, scale=0.1, size=100), columns=['Value'])

# create the figure and the axes
fig, ax1 = plt.subplots()

# create the first set of violin plots on ax1
sns.violinplot(data=[data1['Value'], data3['Value'], data5['Value']], ax=ax1, palette=['tab:blue', 'tab:orange', 'tab:green'])

# set the label and the color of the left y-axis
ax1.set_ylabel('Data 1', color='tab:blue')
ax1.tick_params(axis='y', labelcolor='tab:blue')

# create the second axes sharing the x-axis with ax1
ax2 = ax1.twinx()

# create the second set of violin plots on ax2
sns.violinplot(data=[data2['Value'], data4['Value'], data6['Value']], ax=ax2, palette=['tab:red', 'tab:purple', 'tab:brown'])

# set the label and the color of the right y-axis
ax2.set_ylabel('Data 2', color='tab:red')
ax2.tick_params(axis='y', labelcolor='tab:red')

# set the x-axis tick locations and labels
ax1.set_xticks([0, 1, 2])
ax1.set_xticklabels(['No Risk', 'Double Risk', 'Expenditure Risk'])

# set the x-axis label and the title
ax1.set_xlabel('Risk Level')
ax1.set_title('Three Sets of Violin Plots with Different Y-Axes')

# adjust the position of the axes
ax2.set_position([0.13, 0.1, 0.775, 0.8])

# show the plot
plt.show()

但是,我希望与每个风险组相对应的两个小提琴图被放置在彼此的旁边,而不是在彼此的顶部。我该怎么弥补。
我以前试过这个,但我不知道如何将它与海运包结合起来:

import matplotlib.pyplot as plt
import numpy as np

# generate some random data
data1 = np.random.normal(loc=0, scale=1, size=1000)
data2 = np.random.normal(loc=0, scale=0.1, size=100)
data3 = np.random.normal(loc=1, scale=1, size=1000)
data4 = np.random.normal(loc=1, scale=0.1, size=100)

# create the figure and the axes
fig, ax1 = plt.subplots()

# create the first set of violin plots on ax1
vp1 = ax1.violinplot([data1, data3], positions=[0, 1], widths=0.5)
vp1['bodies'][0].set_facecolor('tab:blue')
vp1['bodies'][1].set_facecolor('tab:blue')

# set the label and the color of the left y-axis
ax1.set_ylabel('Data 1', color='tab:blue')
ax1.tick_params(axis='y', labelcolor='tab:blue')

# create the second axes sharing the x-axis with ax1
ax2 = ax1.twinx()

# create the second set of violin plots on ax2
vp2 = ax2.violinplot([data2, data4], positions=[0.5, 1.5], widths=0.5)
vp2['bodies'][0].set_facecolor('tab:red')
vp2['bodies'][1].set_facecolor('tab:red')

# set the label and the color of the right y-axis
ax2.set_ylabel('Data 2', color='tab:red')
ax2.tick_params(axis='y', labelcolor='tab:red')

# set the x-axis tick locations and labels
ax1.set_xticks([0.25, 1.25])
ax1.set_xticklabels(['No Risk', 'Double Risk'])

# set the x-axis label and the title
ax1.set_xlabel('Risk Level')
ax1.set_title('Two Sets of Violin Plots with Different Y-Axes')

# adjust the position of the axes
ax2.set_position([0.13, 0.1, 0.775, 0.8])

# show the plot
plt.show()

gzszwxb4

gzszwxb41#

下面是一个如何将数据集分成两个垂直范围(将x轴成对)和自定义小提琴图的示例。您在问题末尾提供的代码片段已经创建了两个垂直范围,因此此响应的目的是提供有关自定义小提琴图以及两个垂直范围的见解。
这可以在没有seaborn包的情况下轻松完成,只需使用matplotlib(参见customizing violin plots)。为了说明,这里有一个小函数,它显示了一些定制,但是matplotlib文档可以进一步扩展这个函数。

def custom_violin(ax, data, pos, fc='b', ec='k', alpha=0.7, percentiles=[25, 50, 75], side="both", scatter_kwargs={}, violin_kwargs={}):
    """Customized violin plot.
    ax: axes.Axes, The axes to plot to
    data: 1D array like, The data to plot
    pos: float, The position on the x-axis where the violin should be plotted
    fc: color, The facecolor of the violin
    ec: color, The edgecolor of the violin
    alpha: float, The transparancy of the violin
    percentiles: array like, The percentiles to be marked on the violin
    side: string, Which side(s) of the violin should be cut off. Options: 'left', 'right', 'both'
    scatter_kwargs: dict, Keyword arguments for the scatterplot
    violin_kwargs: dict, Keyword arguments for the violinplot"""

    parts = ax.violinplot(data, positions=[pos], **violin_kwargs)
    for pc in parts['bodies']:
        m = np.mean(pc.get_paths()[0].vertices[:, 0])
        if side == "left":
            points_x = pos - 0.05
            pc.get_paths()[0].vertices[:, 0] = np.clip(pc.get_paths()[0].vertices[:, 0], -np.inf, m)
        elif side == "right":
            points_x = pos + 0.05
            pc.get_paths()[0].vertices[:, 0] = np.clip(pc.get_paths()[0].vertices[:, 0], m, np.inf)
        else:
            points_x = pos
        pc.set_facecolor(fc)
        pc.set_edgecolor(ec)
        pc.set_alpha(alpha)

    perc = np.percentile(data, percentiles)
    for p in perc:
        ax.scatter(points_x, p, color=ec, zorder=3, **scatter_kwargs)

完整示例:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

# generate some random data
data1 = np.random.normal(loc=0, scale=1, size=1000)
data2 = np.random.normal(loc=0, scale=0.1, size=100)
data3 = np.random.normal(loc=1, scale=1, size=1000)
data4 = np.random.normal(loc=1, scale=0.1, size=100)

s_kwargs = {"s": 40, "marker": "_"}
v_kwargs = {"showextrema": False, "showmedians": False, "showmeans": False, "widths": 0.5}

# create the figure and the axes (left and right)
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

# create the first set of violin plots for the no risk data
custom_violin(ax1, data1, 0, 'tab:blue', 'tab:blue', 0.6, scatter_kwargs=s_kwargs, violin_kwargs=v_kwargs)
custom_violin(ax2, data2, 0.5, 'tab:red', 'tab:red', 0.6, scatter_kwargs=s_kwargs, violin_kwargs=v_kwargs)
ax1.set_ylabel('Data 1', color='tab:blue')
ax1.tick_params(axis='y', labelcolor='tab:blue')

# create the second set of violin plots on ax2
custom_violin(ax1, data3, 1, 'tab:blue', 'tab:blue', 0.6,  scatter_kwargs=s_kwargs, violin_kwargs=v_kwargs)
custom_violin(ax2, data4, 1.5, 'tab:red', 'tab:red', 0.6, scatter_kwargs=s_kwargs, violin_kwargs=v_kwargs)
ax2.set_ylabel('Data 2', color='tab:red')
ax2.tick_params(axis='y', labelcolor='tab:red')

# set the x-axis tick locations and labels
ax1.set_xticks([0.25, 1.25])
ax1.set_xticklabels(['No Risk', 'Double Risk'])
ax1.set_xlabel('Risk Level')
ax1.set_title('Two Sets of Violin Plots with Different Y-Axes')

# adjust the position of the axes
ax2.set_position([0.13, 0.1, 0.775, 0.8])

# show the plot
plt.show()

该函数还允许您通过指定“side”关键字使用不对称小提琴(参见half violin plot in matplotlib)绘制数据。要将此应用于上面的示例,可以指定left和right并保持位置恒定。

# create the first set of violin plots for the no risk data
custom_violin(ax1, data1, 0, 'tab:blue', 'tab:blue', 0.6, side="left", scatter_kwargs=s_kwargs, violin_kwargs=v_kwargs)
custom_violin(ax2, data2, 0, 'tab:red', 'tab:red', 0.6, side="right", scatter_kwargs=s_kwargs, violin_kwargs=v_kwargs)

3b6akqbq

3b6akqbq2#

  • 一个选项是使用pd.concat将所有 Dataframe 组合成具有公共列的单个 Dataframe ,同时使用.assign添加适当的标识列。
  • 这是因为seaborn最适合长格式的 Dataframe 。
  • sns.catplotkind='violin'是axes-level sns.violinplot的图形级版本。
  • 这种可视化的目的是在相似的尺度上比较分布。所需的图可能会混淆比较,因为集合彼此相关,这不是一个好的做法。
  • 小提琴图描绘了汇总统计量和每个变量的密度。
  • 小提琴图使用核密度估计(KDE)来计算样本的经验分布,而不是显示落入箱或顺序统计的数据点的计数。
  • seabornmatplotlib的高级API。API特别使比较数据组更容易。也就是说,只有这么多的灵活性。对于真正自定义的图,应该直接使用matplotlib
  • matplotlib.axes.Axes.violinplot-显式API
  • matplotlib.pyplot.violinplot-隐式API
  • Violin plot basics
    *python 3.11.2pandas 2.0.0matplotlib 3.7.1seaborn 0.12.2中测试

导入和数据

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# ordered list of the risks
risks = ['None', 'Double', 'Expenditure']

# combine dataframes from OP into one dataframe
df = pd.concat([data1.assign(Risk_Level='None').assign(Data=1), data3.assign(Risk_Level='Double').assign(Data=1), data5.assign(Risk_Level='Expenditure').assign(Data=1),
                data2.assign(Risk_Level='None').assign(Data=2), data4.assign(Risk_Level='Double').assign(Data=2), data6.assign(Risk_Level='Expenditure').assign(Data=2)], ignore_index=True)
df.columns = df.columns.str.replace('_', ' ')
df['Risk Level'] = pd.Categorical(df['Risk Level'], risks, ordered=True)

# combine dataframes from OP into two dataframes
df1 = pd.concat([data1.assign(Risk_Level='None').assign(Data=1), data3.assign(Risk_Level='Double').assign(Data=1), data5.assign(Risk_Level='Expenditure').assign(Data=1)], ignore_index=True)
df2 = pd.concat([data2.assign(Risk_Level='None').assign(Data=2), data4.assign(Risk_Level='Double').assign(Data=2), data6.assign(Risk_Level='Expenditure').assign(Data=2)], ignore_index=True)
df1.columns = df1.columns.str.replace('_', ' ')
df2.columns = df2.columns.str.replace('_', ' ')
df1['Risk Level'] = pd.Categorical(df1['Risk Level'], risks, ordered=True)
df2['Risk Level'] = pd.Categorical(df2['Risk Level'], risks, ordered=True)

选项一:seabornpandas

  • 使用单个组合 Dataframe df并绘制到单个y轴,可以轻松比较两组数据,具有相同的比例。
# the figure level plot 
g = sns.catplot(data=df, x='Risk Level', y='Value', hue='Data', kind='violin', height=3.5, aspect=2.5)

  • 调整this solution以使用OP中的数据会导致此plot,它会压缩小提琴。但是,为每个组合的 Dataframe df1df2调用plot函数,会导致小提琴的缩放类似于在单独的图中绘制它们。
fig, ax0 = plt.subplots()
ax1 = ax0.twinx()

hue_order = df.Data.unique()

sns.violinplot(df1, x="Risk Level", y='Value', hue="Data", hue_order=hue_order, ax=ax0)
sns.violinplot(df2, x="Risk Level", y='Value', hue="Data", hue_order=hue_order, ax=ax1)

  • 在单独的子图中显示两个组
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5), tight_layout=True, sharey=True)

sns.violinplot(data=df.query('Data == 1'), x='Risk Level', y='Value', ax=ax1)
sns.violinplot(data=df.query('Data == 2'), x='Risk Level', y='Value', ax=ax2)

ax1.set(title='Data Set 1')
ax2.set(title='Data Set 2', ylabel='')
ax2.tick_params(left=False)

sharey=False

  • 如果密度比较不重要,则使用箱形图。
g = sns.catplot(data=df, x='Risk Level', y='Value', hue='Data', kind='box', height=3.5, aspect=2.5)

选项二:matplotlibpandas

  • 使用.groupby迭代每个组,并在x轴上的特定刻度位置绘制小提琴。

使用单个 Dataframe df

fig, ax = plt.subplots(figsize=(10, 6))
for i, ((rl, d), dfg) in enumerate(df.groupby(['Risk Level', 'Data'])):
    ax.violinplot(dfg.Value, positions=[i])

# add figure customizations
_ = ax.set_xticks([0.5, 2.5, 4.5], risks)
plt.show()

使用 Dataframe df1df2,每个时间段和twinx

# create the figure
fig, ax = plt.subplots(figsize=(10, 6))

# iterate through each group and plot the data on on the even xticks
for i, ((rl, d), dfg) in zip(range(0, 7, 2), df1.groupby(['Risk Level', 'Data'])):
    vp1 = ax.violinplot(dfg.Value, positions=[i], showextrema=False)
    vp1['bodies'][0].set_facecolor('tab:blue')
    vp1['bodies'][0].set_edgecolor('k')

# add the secondary axes
ax2 = ax.twinx()

# iterate through each group and plot the data on on the odd xticks
for i, ((rl, d), dfg) in zip(range(1, 7, 2), df2.groupby(['Risk Level', 'Data'])):
    vp2 = ax2.violinplot(dfg.Value, positions=[i], showextrema=False)
    vp2['bodies'][0].set_facecolor('tab:red')
    vp2['bodies'][0].set_edgecolor('k')

# add figure customizations
_ = ax.set_xticks([0.5, 2.5, 4.5], risks)

ax.set_xlabel('Risk Level', labelpad=10)
ax.set_title('Two Sets of Violin Plots with Different Y-Axes')

ax.set_ylabel('Data 1', color='tab:blue')
ax.tick_params(axis='y', labelcolor='tab:blue')

ax2.set_ylabel('Data 2', color='tab:red')
ax2.tick_params(axis='y', labelcolor='tab:red')

plt.show()

df视图

Value Risk Level  Data
0 -1.560352       None     1
1 -0.030978       None     1
2 -0.620928       None     1
3 -1.464580       None     1
4  1.411946       None     1
...
3295  2.013516  Expenditure     2
3296  2.085659  Expenditure     2
3297  1.998047  Expenditure     2
3298  2.055241  Expenditure     2
3299  2.080164  Expenditure     2
ahy6op9u

ahy6op9u3#

其他的答案都很复杂。我认为使用seaborn.violinplot实现这一点的一个更简单的方法是调用violinplot两次,但使用huehue_order来获得dodged效果:

import seaborn as sns, matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
f, ax0 = plt.subplots()
ax1 = ax0.twinx()
var_order = ["total_bill", "tip"]
for ax, var_name in zip([ax0, ax1], var_order):
    sns.violinplot(
        tips.assign(var=var_name),
        x="day", y=var_name, hue="var",
        hue_order=var_order, ax=ax
    )

相关问题