在Python中使用matplotlib子图高效创建多页PDF

gfttwv5a  于 2023-10-24  发布在  Python
关注(0)|答案(1)|浏览(159)

我是Python新手,试图使用matplotlib子图和matplotlib PdfPages后端在单个多页PDF输出文件中可视化大量数据。我的问题是我发现了一个瓶颈,我不知道如何解决。以下是我到目前为止的代码:

import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

with PdfPages("myfigures.pdf") as pdf:
    for i in range(1000):
        f, axarr = plt.subplots(2, 3)
        plt.subplots(2, 3)
        axarr[0, 0].plot(x1, y1)
        axarr[1, 0].plot(x2, y2)

        pdf.savefig(f)
        plt.close("all")

在for循环的每次迭代中创建一个图形似乎非常耗时,但是如果我把它放在循环之外,在绘制下一个图形时,前面的图形不会被清除。我尝试的其他选项,如clear()clf()也不起作用,或者最终创造了多个不同的人物(当我需要的是一个子图数组收集和输出作为一个单一的数字到pdf).有没有人有一个想法如何实现这一点?也许也使它更快?

z9gpfhce

z9gpfhce1#

Multipage PDF追加matplotlib

pdf * 页 * 上创建子图axes阵列的 *行-行×行-行 * 矩阵,并在每页子图矩阵完全填满时保存(追加)→然后创建新页,重复,重复。

要在单个PDF中包含大量子图作为多页输出,请立即开始用图填充第一页,然后在检测到图生成迭代中添加的最新子图已使当前页面的"行×行×行-行“子图阵列布局[即子图的”行×行矩阵]中的可用空间达到最大后,需要创建一个新页面。

这里有一种方法,可以很容易地改变控制每页子图数量的尺寸(x×x):

import sys

    import matplotlib
    from matplotlib.backends.backend_pdf import PdfPages
    import matplotlib.pyplot as plt
    import numpy as np

    matplotlib.rcParams.update({"font.size": 6})

    # Dimensions for any m-rows × n-cols array of subplots / pg.
    m, n = 4, 5

    # Don't forget to indent after the with statement
    with PdfPages("auto_subplotting.pdf") as pdf:

        """Before beginning the iteration through all the data,
        initialize the layout for the plots and create a
        representation of the subplots that can be easily
        iterated over for knowing when to create the next page
        (and also for custom settings like partial axes labels)"""
        f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
        arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
        subplots = [axarr[index] for index in arr_ij]

        # To conserve needed plotting real estate,
        # only label the bottom row and leftmost subplots
        # as determined automatically using m and n
        splot_index = 0
        for s, splot in enumerate(subplots):
            splot.set_ylim(0, 0.15)
            splot.set_xlim(0, 50)
            last_row = m * n - s < n + 1
            first_in_row = s % n == 0
            if last_row:
                splot.set_xlabel("X-axis label")
            if first_in_row:
                splot.set_ylabel("Y-axis label")

        # Iterate through each sample in the data
        for sample in range(33):

            # As a stand-in for real data, let's just make numpy take 100 random draws
            # from a poisson distribution centered around say ~25 and then display
            # the outcome as a histogram
            scaled_y = np.random.randint(20, 30)
            random_data = np.random.poisson(scaled_y, 100)
            subplots[splot_index].hist(
                random_data,
                bins=12,
                normed=True,
                fc=(0, 0, 0, 0),
                lw=0.75,
                ec="b",
            )

            # Keep collecting subplots (into the mpl-created array; 
            # see: [1]) through the samples in the data and increment
            # a counter each time. The page will be full once the count is equal
            # to the product of the user-set dimensions (i.e. m * n)
            splot_index += 1

            """Once an mxn number of subplots have been collected 
            you now have a full page's worth, and it's time to 
            close and save to pdf that page and re-initialize for a
            new page possibly. We can basically repeat the same 
            exact code block used for the first layout 
            initialization, but with the addition of 3 new lines:
             +2 for creating & saving the just-finished pdf page,
             +1 more to reset the subplot index (back to zero)"""
            if splot_index == m * n:
                pdf.savefig()
                plt.close(f)
                f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
                arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
                subplots = [axarr[index] for index in arr_ij]
                splot_index = 0
                for s, splot in enumerate(subplots):
                    splot.set_ylim(0, 0.15)
                    splot.set_xlim(0, 50)
                    last_row = (m * n) - s < n + 1
                    first_in_row = s % n == 0
                    if last_row:
                        splot.set_xlabel("X-axis label")
                    if first_in_row:
                        splot.set_ylabel("Y-axis label")

        # Done!
        # But don't forget to save to pdf after the last page    
        pdf.savefig()
        plt.close(f)

对于任何 m×n 布局,只需分别更改 mn 值的声明即可。从上面的代码(其中“m, n = 4, 5“)中,生成一个4x 5子图矩阵,共有33个样本,作为两页的pdf输出文件:

引用

  1. Link to matplotlib subplots official docs.
  • 注意事项 *:在多页PDF的最后一页上,将有一些空白子图,其数量等于您选择的子图𝑚×𝑛布局尺寸号和要绘制的样本/数据总数的乘积的余数。例如,假设m=3,n=4,因此您得到3行4个子图,每行等于每页12个,如果您有20个样本,则将存在自动创建的具有总共24个子图的两页PDF,其中第二页上的最后4个子图(在该假设示例中,最下面的行是满的)是空的。

使用seaborn

关于上面实现的更高级(& more“pythonic"*)扩展,请参见下面:

多页处理可能应该通过创建一个new_page函数来简化;最好不要逐字重复代码 *,特别是如果您开始自定义绘图,在这种情况下,您不想镜像每个更改并输入两次相同的内容。基于seaborn并使用可用的matplotlib参数的更定制的美学也可能更好,如下图所示。
添加一个new_page函数和一些子图样式的自定义:

import matplotlib.pyplot as plt
    import numpy as np
    import random
    import seaborn as sns

    from matplotlib.backends.backend_pdf import PdfPages

    # this erases labels for any blank plots on the last page
    sns.set(font_scale=0.0)
    m, n = 4, 6
    datasize = 37 
    # 37 % (m*n) = 13, (m*n) - 13 = 24 - 13 = 11. Thus 11 blank subplots on final page
    
    # custom colors scheme / palette
    ctheme = [
        "k", "gray", "magenta", "fuchsia", "#be03fd", "#1e488f",
        (0.44313725490196076, 0.44313725490196076, 0.88627450980392153), "#75bbfd",
        "teal", "lime", "g", (0.6666674, 0.6666663, 0.29078014184397138), "y",
        "#f1da7a", "tan", "orange", "maroon", "r", ] # pick whatever colors you wish
    colors = sns.blend_palette(ctheme, datasize)
    fz = 7  # labels fontsize

    def new_page(m, n):
        global splot_index
        splot_index = 0
        fig, axarr = plt.subplots(m, n, sharey="row")
        plt.subplots_adjust(hspace=0.5, wspace=0.15)
        arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
        subplots = [axarr[index] for index in arr_ij]
        for s, splot in enumerate(subplots):
            splot.grid(
                b=True,
                which="major",
                color="gray",
                linestyle="-",
                alpha=0.25,
                zorder=1,
                lw=0.5,
            )
            splot.set_ylim(0, 0.15)
            splot.set_xlim(0, 50)
            last_row = m * n - s < n + 1
            first_in_row = s % n == 0
            if last_row:
                splot.set_xlabel("X-axis label", labelpad=8, fontsize=fz)
            if first_in_row:
                splot.set_ylabel("Y-axis label", labelpad=8, fontsize=fz)
        return (fig, subplots)

    with PdfPages("auto_subplotting_colors.pdf") as pdf:

        fig, subplots = new_page(m, n)

        for sample in xrange(datasize):
            splot = subplots[splot_index]
            splot_index += 1
            scaled_y = np.random.randint(20, 30)
            random_data = np.random.poisson(scaled_y, 100)
            splot.hist(
                random_data,
                bins=12,
                normed=True,
                zorder=2,
                alpha=0.99,
                fc="white",
                lw=0.75,
                ec=colors.pop(),
            )
            splot.set_title("Sample {}".format(sample + 1), fontsize=fz)
            # tick fontsize & spacing
            splot.xaxis.set_tick_params(pad=4, labelsize=6)
            splot.yaxis.set_tick_params(pad=4, labelsize=6)

            # make new page:
            if splot_index == m * n:
                pdf.savefig()
                plt.close(fig)
                fig, subplots = new_page(m, n)

        if splot_index > 0:
            pdf.savefig()
            plt.close(f)

相关问题