matplotlib 归一化直方图

wgxvkvu9 于 2023-04-06 发布在其他

关注(0)|答案(3)|浏览(209)

嗨，我正在绘制三个不同的直方图，它们具有不同的总频率，但我想将它们归一化，以便频率相同。

从图中可以看出，这三组具有不同的总频率，但我想将它们归一化，以便它们具有相同的总频率，但我想保留x轴每个值的频率比例。
下面是我用来绘制直方图的代码

setA = [22.972972972972972, 0.0, 0.0, 27.5, 25.0, 18.64406779661017, 8.88888888888889, 20.512820512820515, 11.11111111111111, 15.151515151515152, 17.741935483870968, 13.333333333333334, 16.923076923076923, 12.820512820512821, 27.77777777777778, 4.0, 0.0, 15.625, 14.814814814814815, 7.142857142857143, 15.384615384615385, 14.545454545454545, 38.095238095238095, 17.647058823529413, 21.951219512195124, 21.428571428571427, 32.432432432432435, 10.526315789473685, 36.8421052631579, 13.114754098360656, 17.91044776119403, 12.64367816091954, 16.0, 22.727272727272727, 18.181818181818183, 9.523809523809524, 17.105263157894736, 11.904761904761905, 20.58823529411765, 10.714285714285714, 15.686274509803921, 27.5, 16.129032258064516, 21.333333333333332, 40.90909090909091, 11.904761904761905, 13.157894736842104]
setB = [1.492537313432836, 3.5714285714285716, 17.94871794871795, 11.363636363636363, 13.513513513513514, 14.285714285714286, 15.686274509803921, 17.94871794871795, 9.090909090909092, 41.07142857142857, 10.714285714285714, 25.0, 20.0, 40.0, 13.333333333333334, 13.793103448275861, 3.5714285714285716, 17.073170731707318, 25.675675675675677, 15.625, 17.46031746031746, 8.333333333333334, 18.64406779661017, 14.285714285714286, 0.0, 6.0606060606060606, 6.976744186046512, 18.181818181818183, 26.785714285714285, 22.80701754385965, 6.666666666666667, 12.5]
setC = [13.846153846153847, 23.076923076923077, 25.0, 10.714285714285714, 16.666666666666668, 9.75609756097561, 10.0, 10.0, 17.857142857142858, 20.0, 9.75609756097561, 26.470588235294116, 12.5, 13.333333333333334, 4.3478260869565215, 5.882352941176471, 14.545454545454545, 13.333333333333334, 8.571428571428571, 11.764705882352942, 0.0]

plt.figure('sets')
n, bins, patches = plt.hist(setA, 20, alpha=0.40 , label = 'setA')  
n, bins, patches = plt.hist(setB, 20, alpha=0.40 , label = 'setB')
n, bins, patches = plt.hist(setC, 20, alpha=0.40 , label = 'setC')    
plt.xlabel('Set')
plt.ylabel('Frequency')
plt.title('Different Sets that need to be normalised')

plt.legend()
plt.grid(True)
plt.show()

作为一个加号，因为我的目标是能够比较三组的分布，有没有一个更好的可视化的直方图，我可以用来比较他们更好的图形。

matplotlib

来源：https://stackoverflow.com/questions/35482543/normalizing-histograms

3条答案

按热度按时间

mfpqipee1#

您可以使用normed=True选项对直方图进行归一化。这意味着所有直方图的面积总和为1。
您还可以通过对所有三个直方图使用相同的固定箱（使用bins选项到hist：bins = np.arange(0,48,2)）。
试试这个：

import numpy as np

...

mybins = np.arange(0,48,2)

n, bins, patches = plt.hist(setA, bins=mybins, alpha=0.40 , label = 'setA', normed=True)  
n, bins, patches = plt.hist(setB, bins=mybins, alpha=0.40 , label = 'setB', normed=True)
n, bins, patches = plt.hist(setC, bins=mybins, alpha=0.40 , label = 'setC', normed=True)

另一种选择是在一次调用plt.hist中绘制所有三个直方图，在这种情况下，您可以使用stacked=True选项，它可以进一步清理您的图。

注意：此方法将三个直方图都归一化，因此总积分为1。它不会使三个直方图相加为相同的值。

n, bins, patches = plt.hist([setA,setB,setC], bins=mybins, 
                            label = ['setA','setB','setC'], 
                            normed=True, stacked=True)

或者，最后，如果堆叠直方图不符合您的口味，您可以绘制彼此相邻的条形图，再次在一次调用中绘制所有三个直方图，但从上面的行中删除stacked=True选项：

n, bins, patches = plt.hist([setA,setB,setC], bins=mybins, 
                            label = ['setA','setB','setC'], 
                            normed=True)

如注解中所讨论的，当使用stacked=True时，normed选项仅意味着所有三个直方图的总和将等于1，因此它们可能不会以与其他方法相同的方式进行归一化。
为了解决这个问题，我们可以使用np.histogram，并使用plt.bar绘制结果。
例如，使用上述相同的数据集：

mybins = np.arange(0,48,2)

nA,binsA = np.histogram(setA,bins=mybins,normed=True)
nB,binsB = np.histogram(setB,bins=mybins,normed=True)
nC,binsC = np.histogram(setC,bins=mybins,normed=True)

# Since the sum of each of these will be 1., lets divide by 3.,
# so the sum of the stacked histogram will be 1.
nA/=3.
nB/=3.
nC/=3.

# Use bottom= to set where the bars should begin
plt.bar(binsA[:-1],nA,width=2,color='b',label='setA')
plt.bar(binsB[:-1],nB,width=2,color='g',label='setB',bottom=nA)
plt.bar(binsC[:-1],nC,width=2,color='r',label='setC',bottom=nA+nB)

赞(0）回复(0）举报 2023-04-06

nmpmafwu2#

我个人很喜欢这个功能：

def get_histogram(array: np.ndarray,
                  xlabel: str,
                  ylabel: str,
                  title: str,

                  dpi=200,  # dots per inch,
                  facecolor: str = 'white',
                  bins: int = None,
                  show: bool = False,
                  tight_layout=False,
                  linestyle: Optional[str] = '--',
                  alpha: float = 0.75,
                  edgecolor: str = "black",
                  stat: Optional = 'count',
                  color: Optional[str] = None,
                  ):
    """ """
    # - check it's of size (N,)
    if isinstance(array, list):
        array: np.ndarray = np.array(array)
    assert array.shape == (array.shape[0],)
    assert len(array.shape) == 1
    assert isinstance(array.shape[0], int)
    # -
    n: int = array.shape[0]
    if bins is None:
        bins: int = get_num_bins(n, option='square_root')
        # bins: int = get_num_bins(n, option='square_root')
    print(f'using this number of {bins=} and data size is {n=}')
    # -
    fig = plt.figure(dpi=dpi)
    fig.patch.set_facecolor(facecolor)

    import seaborn as sns
    p = sns.histplot(array, stat=stat, color=color)
    # n, bins, patches = plt.hist(array, bins=bins, facecolor='b', alpha=alpha, edgecolor=edgecolor, density=True)

    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    # plt.xlim(40, 160)
    # plt.ylim(0, 0.03)
    plt.grid(linestyle=linestyle) if linestyle else None
    plt.tight_layout() if tight_layout else None
    plt.show() if show else None

样地：

赞(0）回复(0）举报 2023-04-06

py49o6xq3#

这可以通过seaborn.histplot或seaborn.displot和kind='hist'来实现。
seaborn是matplotlib的高级API
Figure-level vs. axes-level functions
与这个问题有关的有三个主要参数。
common_norm：如果True并且使用归一化统计，则归一化将应用于整个数据集。否则，独立地归一化每个直方图。
multiple：{'layer', 'dodge', 'stack', 'fill'}-如何显示多组数据。
stat：在每个bin中计算的聚合统计量，相关轴将标记与所选stat对应的标签
'probability'：标准化，使条高总和为1
'density'：归一化，使得直方图的总面积等于1
还有'count'、'frequency'和'percent'
*在python 3.11.2、pandas 2.0.0、matplotlib 3.7.1、seaborn 0.12.2中测试

导入和样本数据

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# using the sample sets from the OP
data = {'A': setA, 'B': setB, 'C': setC}

# set some custom bins to compare against the other answer
bins=np.arange(0, 48, 2)

Plots

fig, ax = plt.subplots(figsize=(6.4, 4.3))
sns.histplot(data=data, stat='density', common_norm=True, multiple='dodge', bins=np.arange(0, 48, 2), ax=ax)

g = sns.displot(data=data, kind='hist', stat='density', common_norm=True, multiple='stack', bins=np.arange(0, 48, 2), height=4, aspect=1.25)

赞(0）回复(0）举报 2023-04-06

我来回答

matplotlib 归一化直方图

3条答案

导入和样本数据

Plots

相关问题

热门标签

最新问答