matplotlib 避免“散点图/点图/蜂群图”中的数据点重叠

vlf7wbxs 于 2023-11-22 发布在其他

关注(0)|答案(7)|浏览(154)

当使用matplotlib绘制点图时，我想偏移重叠的数据点以保持它们都可见。例如，如果我有：

CategoryA: 0,0,3,0,5  
CategoryB: 5,10,5,5,10

字符串
我希望每个CategoryA“0”数据点都并排设置，而不是彼此重叠，同时仍然与CategoryB不同。
在R（ggplot2）中有一个"jitter"选项可以实现这一点。matplotlib中是否有类似的选项，或者是否有其他方法可以导致类似的结果？

编辑：* 澄清一下，the "beeswarm" plot in R本质上是我所考虑的，pybeeswarm是matplotlib/Python版本的早期但有用的起点。
编辑：* 补充说，Seaborn的Swarmplot，在0.7版中引入，是我想要的一个很好的实现。

matplotlib

来源：https://stackoverflow.com/questions/8671808/avoiding-overlapping-datapoints-in-a-scatter-dot-beeswarm-plot

7条答案

按热度按时间

pxy2qtax1#

通过@user2467675扩展答案，以下是我是如何做到的：

def rand_jitter(arr):
    stdev = .01 * (max(arr) - min(arr))
    return arr + np.random.randn(len(arr)) * stdev

def jitter(x, y, s=20, c='b', marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, hold=None, **kwargs):
    return scatter(rand_jitter(x), rand_jitter(y), s=s, c=c, marker=marker, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths, **kwargs)

字符串
stdev变量确保抖动足以在不同尺度上看到，但它假设轴的极限为零和最大值。
然后可以调用jitter而不是scatter。

赞(0）回复(0）举报 2023-11-22

eqfvzcg82#

Seaborn通过sns.swarmplot()提供类似直方图的分类点图，并通过sns.stripplot()提供抖动分类点图：

import seaborn as sns

sns.set(style='ticks', context='talk')
iris = sns.load_dataset('iris')

sns.swarmplot('species', 'sepal_length', data=iris)
sns.despine()

字符串

的数据

sns.stripplot('species', 'sepal_length', data=iris, jitter=0.2)
sns.despine()

型

的

赞(0）回复(0）举报 2023-11-22

2q5ifsrm3#

我使用numpy.random将数据沿X轴沿着但围绕每个类别的固定点进行“分散/预热”，然后基本上为每个类别执行pyplot.scatter（）：

import matplotlib.pyplot as plt
import numpy as np

#random data for category A, B, with B "taller"
yA, yB = np.random.randn(100), 5.0+np.random.randn(1000)

xA, xB = np.random.normal(1, 0.1, len(yA)), 
         np.random.normal(3, 0.1, len(yB))

plt.scatter(xA, yA)
plt.scatter(xB, yB)
plt.show()

字符串
x1c 0d1x的数据

赞(0）回复(0）举报 2023-11-22

fjnneemd4#

解决这个问题的一种方法是将散点图/点图/蜂群图中的每一行都视为直方图中的一个bin：

data = np.random.randn(100)

width = 0.8     # the maximum width of each 'row' in the scatter plot
xpos = 0        # the centre position of the scatter plot in x

counts, edges = np.histogram(data, bins=20)

centres = (edges[:-1] + edges[1:]) / 2.
yvals = centres.repeat(counts)

max_offset = width / counts.max()
offsets = np.hstack((np.arange(cc) - 0.5 * (cc - 1)) for cc in counts)
xvals = xpos + (offsets * max_offset)

fig, ax = plt.subplots(1, 1)
ax.scatter(xvals, yvals, s=30, c='b')

字符串
这显然涉及到数据的装箱，所以你可能会失去一些精度。如果你有离散数据，你可以替换：

counts, edges = np.histogram(data, bins=20)
centres = (edges[:-1] + edges[1:]) / 2.

型
使用：

centres, counts = np.unique(data, return_counts=True)

型
即使对于连续数据，也可以保留精确的y坐标的另一种方法是使用kernel density estimate来缩放x轴上随机抖动的幅度：

from scipy.stats import gaussian_kde

kde = gaussian_kde(data)
density = kde(data)     # estimate the local density at each datapoint

# generate some random jitter between 0 and 1
jitter = np.random.rand(*data.shape) - 0.5 

# scale the jitter by the KDE estimate and add it to the centre x-coordinate
xvals = 1 + (density * jitter * width * 2)

ax.scatter(xvals, data, s=30, c='g')
for sp in ['top', 'bottom', 'right']:
    ax.spines[sp].set_visible(False)
ax.tick_params(top=False, bottom=False, right=False)

ax.set_xticks([0, 1])
ax.set_xticklabels(['Histogram', 'KDE'], fontsize='x-large')
fig.tight_layout()

型
第二种方法是基于violin plots的工作原理，它仍然不能保证没有一个点是重叠的，但我发现，在实践中，只要有一个适当的点（>20），它往往会给出给予相当好看的结果，并且分布可以合理地近似为高斯和。
x1c 0d1x的数据

赞(0）回复(0）举报 2023-11-22

ymzxtsji5#

不知道一个直接的mpl替代品在这里你有一个非常基本的建议：

from matplotlib import pyplot as plt
from itertools import groupby

CA = [0,4,0,3,0,5]  
CB = [0,0,4,4,2,2,2,2,3,0,5]  

x = []
y = []
for indx, klass in enumerate([CA, CB]):
    klass = groupby(sorted(klass))
    for item, objt in klass:
        objt = list(objt)
        points = len(objt)
        pos = 1 + indx + (1 - points) / 50.
        for item in objt:
            x.append(pos)
            y.append(item)
            pos += 0.04

plt.plot(x, y, 'o')
plt.xlim((0,3))

plt.show()

字符串

的数据

赞(0）回复(0）举报 2023-11-22

ws51t4hk6#

Seaborn的swarmplot似乎最适合您的想法，但您也可以使用Seaborn的regplot：

import seaborn as sns
iris = sns.load_dataset('iris')

sns.swarmplot('species', 'sepal_length', data=iris)

sns.regplot(x='sepal_length',
            y='sepal_width',
            data=iris,
            fit_reg=False,  # do not fit a regression line
            x_jitter=0.1,  # could also dynamically set this with range of data
            y_jitter=0.1,
            scatter_kws={'alpha': 0.5})  # set transparency to 50%

字符串

赞(0）回复(0）举报 2023-11-22

e5nszbig7#

通过@wordsforthewise扩展答案（对不起，不能以我的声誉评论），如果你需要抖动和使用色调来按某些分类（像我一样）对点进行着色，Seaborn的lmplot是一个很好的选择，而不是reglpot：

import seaborn as sns
iris = sns.load_dataset('iris')
sns.lmplot(x='sepal_length', y='sepal_width', hue='species', data=iris, fit_reg=False, x_jitter=0.1, y_jitter=0.1)

字符串

赞(0）回复(0）举报 2023-11-22

我来回答

matplotlib 避免“散点图/点图/蜂群图”中的数据点重叠

7条答案

相关问题

热门标签

最新问答