PostgreSQL内存使用调用OOM杀手

wd2eg0qa  于 2022-09-21  发布在  PostgreSQL
关注(0)|答案(1)|浏览(428)

此问题是来自堆栈溢出的migrated,因为它可以在数据库管理员堆栈交换上得到回答。D1c1c1天。

我在一台内存为512MB、磁盘为8 GB的设备上使用Debian9。我已经安装了扩展名为TimscaleDB 1.7.1的PostgreSQL 9.6.22。与PostgreSQL内存相关的参数如下:

shared_buffers = 128MB                  # min 128kB

# temp_buffers = 8MB                     # min 800kB

# work_mem = 4MB                         # min 64kB

# maintenance_work_mem = 64MB            # min 1MB

# autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem

dynamic_shared_memory_type = posix      # the default is the first option
                                        # use none to disable dynamic shared memory
max_connections = 13                    # (change requires restart)

# superuser_reserved_connections = 3     # (change requires restart)

最近,OOM杀手出现了,看起来PostgreSQL占用了最多的内存:

postgres invoked oom-killer: gfp_mask=0x26084c0, order=0, oom_score_adj=0
 postgres cpuset=/ mems_allowed=0
 CPU: 0 PID: 11632 Comm: postgres Not tainted 4.4.0-cip #1
 Hardware name: Generic AM33XX (Flattened Device Tree)
 [<c00133d0>] (unwind_backtrace) from [<c0011cdc>] (show_stack+0x10/0x14)
 [<c0011cdc>] (show_stack) from [<c00dca94>] (dump_header+0x4c/0x180)
 [<c00dca94>] (dump_header) from [<c00a38e4>] (oom_kill_process+0x6c/0x39c)
 [<c00a38e4>] (oom_kill_process) from [<c00a3e54>] (out_of_memory+0x1d8/0x2fc)
 [<c00a3e54>] (out_of_memory) from [<c00a7ecc>] (__alloc_pages_nodemask+0x830/0x89c)
 [<c00a7ecc>] (__alloc_pages_nodemask) from [<c00c012c>] (__pte_alloc+0x20/0x1b0)
 [<c00c012c>] (__pte_alloc) from [<c00c209c>] (handle_mm_fault+0x224/0xc80)
 [<c00c209c>] (handle_mm_fault) from [<c04be4b0>] (do_page_fault+0x20c/0x35c)
 [<c04be4b0>] (do_page_fault) from [<c000923c>] (do_DataAbort+0x38/0xb8)
 [<c000923c>] (do_DataAbort) from [<c04bdebc>] (__dabt_usr+0x3c/0x40)
 Exception stack(0xdb95bfb0 to 0xdb95bff8)
 bfa0:                                     aaf9a000 00000000 00101002 aaf9a008
 bfc0: b6a5d7a4 00101000 00098dd6 b6a5d7d4 00100008 b6a5d000 000001ff b6a4185c
 bfe0: 000000c0 bed9f978 b6a052d7 b69c444c 40070030 ffffffff
 Mem-Info:
 active_anon:113550 inactive_anon:651 isolated_anon:0
                              active_file:69 inactive_file:126 isolated_file:0
                              unevictable:0 dirty:0 writeback:0 unstable:0
                              slab_reclaimable:1250 slab_unreclaimable:2180
                              mapped:34529 shmem:35279 pagetables:         free:8083 free_pcp:0 free_cma:2861
 Normal free:32332kB min:16384kB low:20480kB high:24576kB active_anon:454200kB inactive_anon:2604kB active_file:276kB inactive_file:504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:511816kB mlocked:0kB dirty:0kB writeback:0kB mapped:138116kB shmem:141116kB slab_reclaimable:5000kB slab_unreclaimable:8720kB kernel_stack:1136kB pagetables:4776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:11444kB writeback_tmp:0kB pages_scanned:4960 all_unreclaimable? yes
 lowmem_reserve[]: 0 0 0
 Normal: 989*4kB (UMEHC) 397*8kB (UMEH) 215*16kB (UMEHC) 138*32kB (UMEHC) 67*64kB (UEH) 12*128kB (UEC) 1*256kB (U) 0*512kB 1*1024kB (C) 1*2048kB (C) 2*4096kB (C) 0*8192kB = 32332kB
 35474 total pagecache pages
 0 pages in swap cache
 Swap cache stats: add 0, delete 0, find 0/0
 Free swap  = 0kB
 Total swap = 0kB
 131072 pages RAM
 0 pages HighMem/MovableOnly
 3118 pages reserved
 4096 pages cma reserved
 [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
 [  220]     0   220    36222      113      64       0        0             0 systemd-journal
 [  247]     0   247     3154      103       7       0        0         -1000 systemd-udevd
 [  352]  1000   352     4778       31       6       0        0             0 custom-process
 [  353]  1000   353     6572       35       5       0        0             0 custom-process2
 [  356]  1000   356     4778       46       5       0        0             0 custom-process3
 [  358]     0   358     1163       55       5       0        0             0 cron
 [  368]  1000   368     4778       32       7       0        0             0 custom-process4
 [  377]   107   377     1332      104       6       0        0          -900 dbus-daemon
 [  385]     0   385     1795      212       7       0        0             0 openvpn
 [  389]  1000   389     5477       55       7       0        0             0 custom-process5
 [  405]     0   405     1524      105       5       0        0             0 systemd-logind
 [  408]  1000   408     7084       40       8       0        0             0 custom-process6
 [  412]     0   412     1100       54       6       0        0             0 cgmanager
 [  416]  1000   416     6572       41       6       0        0             0 custom-process7
 [  422]     0   422     2488       33       3       0        0             0 custom-process8
 [  526]   109   526    47120     2045      22       0        0          -900 postgres
 [  530]   109   530    47145    32270      84       0        0             0 postgres
 [  531]   109   531    47120      538      18       0        0             0 postgres
 [  532]   109   532    47120     1343      19       0        0             0 postgres
 [  533]   109   533    47206      542      21       0        0             0 postgres
 [  534]   109   534    12158      325      16       0        0             0 postgres
 [  535]   109   535    47187      500      20       0        0             0 postgres
 [  537]   109   537    96668    51046     124       0        0             0 postgres
 [  564]     0   564     1315      174       6       0        0             0 dhclient
 [  641]     0   641     2125      134       7       0        0         -1000 sshd
 [  643]     0   643      923       29       5       0        0             0 agetty
 [  646]     0   646      868       29       5       0        0             0 agetty
 [  675]   105   675     1993      408       7       0        0             0 ntpd
 [29365]     0 29365    10566      333      15       0        0             0 packagekitd
 [29371]     0 29371     9412      435      14       0        0             0 polkitd
 [29655]     0 29655    10825      268      13       0        0             0 nginx
 [29656]    33 29656    10876      319      13       0        0             0 nginx
 [19701]  1000 19701    42241     7379     114       0        0             0 node
 [11625]   109 11625    47909    17171      85       0        0             0 postgres
 [11629]   109 11629    47775    15420      84       0        0             0 postgres
 [11631]   109 11631    50664     6973      58       0        0             0 postgres
 [11632]   109 11632    50987     6972      58       0        0             0 postgres
 [11633]   109 11633    50986     6982      58       0        0             0 postgres
 [11634]   109 11634    50985     6986      58       0        0             0 postgres
 [11635]   109 11635    50987     6981      58       0        0             0 postgres
 [11636]   109 11636    50987     6981      58       0        0             0 postgres
 Out of memory: Kill process 537 (postgres) score 399 or sacrifice child
 Killed process 537 (postgres) total-vm:386672kB, anon-rss:19679

以下是当时的PostgreSQL日志:

[526] LOG:  worker process: TimescaleDB Background Worker Scheduler (PID 537) was terminated by signal 9: Killed
[526] LOG:  terminating any other active server processes
[11629] postgres@db WARNING:  terminating connection because of crash of another server process
[11629] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11629] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11625] postgres@db WARNING:  terminating connection because of crash of another server process
[11625] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11625] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[533] WARNING:  terminating connection because of crash of another server process
[533] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another
 server process exited abnormally and possibly corrupted shared memory.
[533] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11632] postgres@db WARNING:  terminating connection because of crash of another server process
[11632] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11632] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11635] postgres@db WARNING:  terminating connection because of crash of another server process
[11635] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11635] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11631] postgres@db WARNING:  terminating connection because of crash of another server process
[11631] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11631] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11634] postgres@db WARNING:  terminating connection because of crash of another server process
[11634] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11634] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11633] postgres@db WARNING:  terminating connection because of crash of another server process
[11633] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11633] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[11636] postgres@db WARNING:  terminating connection because of crash of another server process
[11636] postgres@db DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit,
 because another server process exited abnormally and possibly corrupted shared memory.
[11636] postgres@db HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[526] LOG:  all server processes terminated; reinitializing
[11637] LOG:  database system was interrupted; last known up at 2022-09-12 21:47:45 CEST
[11637] LOG:  database system was not properly shut down; automatic recovery in progress
[11637] LOG:  redo starts at 1C/32C6E978
[11637] LOG:  invalid record length at 1C/32D041F8: wanted 24, got 0
[11637] LOG:  redo done at 1C/32D041D0
[11637] LOG:  last completed transaction was at log time 2022-09-12 21:50:23.799706+02
[11637] LOG:  MultiXact member wraparound protections are now enabled
[11641] LOG:  autovacuum launcher started
[526] LOG:  database system is ready to accept connections
[11643] LOG:  TimescaleDB background worker launcher connected to shared catalogs

我知道有可用的过量使用调优,并且我的应用程序可能存在调用PostgreSQL的潜在问题,但我想了解PostgreSQL对内存的使用。我试图了解OOM Killer输出和与PostgreSQL内存相关的参数,但我有几个疑问:

  • PostgreSQL的最大进程数是多少?我认为它将对应于最大连接数(max_connections + superuser_reserved_connections)。但是,在本例中,当有16个postgres进程时,此限制加起来为13个。
  • 根据herehere描述的OOM杀手输出的解释,实际使用的RAM是rss值的总和。但是,在本例中,总和为187604页≈730MB,比设备中的总内存还要多。
  • PostgreSQL可以使用的最大内存是多少?根据this answermax RAM = shared_buffers + (temp_buffers + work_mem) * max_connections。在本例中,128+(8+4)*10=248MB。然而,OOM杀手的输出与这一假设不匹配。Postgres进程的总RSS为≈690MB,远远超过其应有的大小,甚至超过设备中的总RAM值。
  • 是否还有其他我应该考虑的PostgreSQL参数?
nmpmafwu

nmpmafwu1#

除了“用户”后端,PostgreSQL还使用了几个后台进程。像Wal作者、背景作者、检查器、档案员、Wal发送者、自动吸尘器工人等。这些(可能)不是问题所在,他们只是让报告变得复杂。如果进程名包含该信息就好了,它会在top和ps中包含该信息(或者可以包含,取决于配置),但显然不会包含在OOM报告中。查看数据库服务器日志文件,以了解被杀死的进程实际是什么。这不会是一份完整的报告,但比你目前所知的要好。

大多数RS可能是每个进程都接触到的Shared_Buffers部分,这是所有其他进程也接触到的相同部分。因此,这是一次又一次计算的相同记忆。仅仅总结这篇专栏文章并不能给你真正的答案。

如果您关闭了OverCommit,您将在数据库服务器的日志文件中获得一个很好的报告,说明该进程是什么,当它耗尽内存时它在做什么,以及内存的去向。但是,因为您将内核配置为只随机使用核弹,所以您不会得到这一点。不再存在的进程无法生成错误报告。

如果你安装了一个窗口系统,你不可能关闭Over Commit,因为它们会严重地过度分配它们永远不会使用的RAM。但它看起来不像你正在运行的其中之一,所以可能没有理由不按医生说的做。

相关问题