此问题是来自堆栈溢出的migrated,因为它可以在数据库管理员堆栈交换上得到回答。D1c1c1天。
我在一台内存为512MB、磁盘为8 GB的设备上使用Debian9。我已经安装了扩展名为TimscaleDB 1.7.1的PostgreSQL 9.6.22。与PostgreSQL内存相关的参数如下:
shared_buffers = 128MB # min 128kB
# temp_buffers = 8MB # min 800kB
# work_mem = 4MB # min 64kB
# maintenance_work_mem = 64MB # min 1MB
# autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
dynamic_shared_memory_type = posix # the default is the first option
# use none to disable dynamic shared memory
max_connections = 13 # (change requires restart)
# superuser_reserved_connections = 3 # (change requires restart)
最近,OOM杀手出现了,看起来PostgreSQL占用了最多的内存:
postgres invoked oom-killer: gfp_mask=0x26084c0, order=0, oom_score_adj=0
postgres cpuset=/ mems_allowed=0
CPU: 0 PID: 11632 Comm: postgres Not tainted 4.4.0-cip #1
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c00133d0>] (unwind_backtrace) from [<c0011cdc>] (show_stack+0x10/0x14)
[<c0011cdc>] (show_stack) from [<c00dca94>] (dump_header+0x4c/0x180)
[<c00dca94>] (dump_header) from [<c00a38e4>] (oom_kill_process+0x6c/0x39c)
[<c00a38e4>] (oom_kill_process) from [<c00a3e54>] (out_of_memory+0x1d8/0x2fc)
[<c00a3e54>] (out_of_memory) from [<c00a7ecc>] (__alloc_pages_nodemask+0x830/0x89c)
[<c00a7ecc>] (__alloc_pages_nodemask) from [<c00c012c>] (__pte_alloc+0x20/0x1b0)
[<c00c012c>] (__pte_alloc) from [<c00c209c>] (handle_mm_fault+0x224/0xc80)
[<c00c209c>] (handle_mm_fault) from [<c04be4b0>] (do_page_fault+0x20c/0x35c)
[<c04be4b0>] (do_page_fault) from [<c000923c>] (do_DataAbort+0x38/0xb8)
[<c000923c>] (do_DataAbort) from [<c04bdebc>] (__dabt_usr+0x3c/0x40)
Exception stack(0xdb95bfb0 to 0xdb95bff8)
bfa0: aaf9a000 00000000 00101002 aaf9a008
bfc0: b6a5d7a4 00101000 00098dd6 b6a5d7d4 00100008 b6a5d000 000001ff b6a4185c
bfe0: 000000c0 bed9f978 b6a052d7 b69c444c 40070030 ffffffff
Mem-Info:
active_anon:113550 inactive_anon:651 isolated_anon:0
active_file:69 inactive_file:126 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:1250 slab_unreclaimable:2180
mapped:34529 shmem:35279 pagetables: free:8083 free_pcp:0 free_cma:2861
Normal free:32332kB min:16384kB low:20480kB high:24576kB active_anon:454200kB inactive_anon:2604kB active_file:276kB inactive_file:504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:511816kB mlocked:0kB dirty:0kB writeback:0kB mapped:138116kB shmem:141116kB slab_reclaimable:5000kB slab_unreclaimable:8720kB kernel_stack:1136kB pagetables:4776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:11444kB writeback_tmp:0kB pages_scanned:4960 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Normal: 989*4kB (UMEHC) 397*8kB (UMEH) 215*16kB (UMEHC) 138*32kB (UMEHC) 67*64kB (UEH) 12*128kB (UEC) 1*256kB (U) 0*512kB 1*1024kB (C) 1*2048kB (C) 2*4096kB (C) 0*8192kB = 32332kB
35474 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
131072 pages RAM
0 pages HighMem/MovableOnly
3118 pages reserved
4096 pages cma reserved
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 220] 0 220 36222 113 64 0 0 0 systemd-journal
[ 247] 0 247 3154 103 7 0 0 -1000 systemd-udevd
[ 352] 1000 352 4778 31 6 0 0 0 custom-process
[ 353] 1000 353 6572 35 5 0 0 0 custom-process2
[ 356] 1000 356 4778 46 5 0 0 0 custom-process3
[ 358] 0 358 1163 55 5 0 0 0 cron
[ 368] 1000 368 4778 32 7 0 0 0 custom-process4
[ 377] 107 377 1332 104 6 0 0 -900 dbus-daemon
[ 385] 0 385 1795 212 7 0 0 0 openvpn
[ 389] 1000 389 5477 55 7 0 0 0 custom-process5
[ 405] 0 405 1524 105 5 0 0 0 systemd-logind
[ 408] 1000 408 7084 40 8 0 0 0 custom-process6
[ 412] 0 412 1100 54 6 0 0 0 cgmanager
[ 416] 1000 416 6572 41 6 0 0 0 custom-process7
[ 422] 0 422 2488 33 3 0 0 0 custom-process8
[ 526] 109 526 47120 2045 22 0 0 -900 postgres
[ 530] 109 530 47145 32270 84 0 0 0 postgres
[ 531] 109 531 47120 538 18 0 0 0 postgres
[ 532] 109 532 47120 1343 19 0 0 0 postgres
[ 533] 109 533 47206 542 21 0 0 0 postgres
[ 534] 109 534 12158 325 16 0 0 0 postgres
[ 535] 109 535 47187 500 20 0 0 0 postgres
[ 537] 109 537 96668 51046 124 0 0 0 postgres
[ 564] 0 564 1315 174 6 0 0 0 dhclient
[ 641] 0 641 2125 134 7 0 0 -1000 sshd
[ 643] 0 643 923 29 5 0 0 0 agetty
[ 646] 0 646 868 29 5 0 0 0 agetty
[ 675] 105 675 1993 408 7 0 0 0 ntpd
[29365] 0 29365 10566 333 15 0 0 0 packagekitd
[29371] 0 29371 9412 435 14 0 0 0 polkitd
[29655] 0 29655 10825 268 13 0 0 0 nginx
[29656] 33 29656 10876 319 13 0 0 0 nginx
[19701] 1000 19701 42241 7379 114 0 0 0 node
[11625] 109 11625 47909 17171 85 0 0 0 postgres
[11629] 109 11629 47775 15420 84 0 0 0 postgres
[11631] 109 11631 50664 6973 58 0 0 0 postgres
[11632] 109 11632 50987 6972 58 0 0 0 postgres
[11633] 109 11633 50986 6982 58 0 0 0 postgres
[11634] 109 11634 50985 6986 58 0 0 0 postgres
[11635] 109 11635 50987 6981 58 0 0 0 postgres
[11636] 109 11636 50987 6981 58 0 0 0 postgres
Out of memory: Kill process 537 (postgres) score 399 or sacrifice child
Killed process 537 (postgres) total-vm:386672kB, anon-rss:19679
以下是当时的PostgreSQL日志:
[526] LOG: worker process: TimescaleDB Background Worker Scheduler (PID 537) was terminated by signal 9: Killed
[526] LOG: terminating any other active server processes
[11629] postgres@db WARNING: terminating connection because of crash of another server process
[11629] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11629] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11625] postgres@db WARNING: terminating connection because of crash of another server process
[11625] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11625] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[533] WARNING: terminating connection because of crash of another server process
[533] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
[533] HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11632] postgres@db WARNING: terminating connection because of crash of another server process
[11632] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11632] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11635] postgres@db WARNING: terminating connection because of crash of another server process
[11635] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11635] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11631] postgres@db WARNING: terminating connection because of crash of another server process
[11631] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11631] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11634] postgres@db WARNING: terminating connection because of crash of another server process
[11634] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11634] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11633] postgres@db WARNING: terminating connection because of crash of another server process
[11633] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11633] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[11636] postgres@db WARNING: terminating connection because of crash of another server process
[11636] postgres@db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted shared memory.
[11636] postgres@db HINT: In a moment you should be able to reconnect to the database and repeat your command.
[526] LOG: all server processes terminated; reinitializing
[11637] LOG: database system was interrupted; last known up at 2022-09-12 21:47:45 CEST
[11637] LOG: database system was not properly shut down; automatic recovery in progress
[11637] LOG: redo starts at 1C/32C6E978
[11637] LOG: invalid record length at 1C/32D041F8: wanted 24, got 0
[11637] LOG: redo done at 1C/32D041D0
[11637] LOG: last completed transaction was at log time 2022-09-12 21:50:23.799706+02
[11637] LOG: MultiXact member wraparound protections are now enabled
[11641] LOG: autovacuum launcher started
[526] LOG: database system is ready to accept connections
[11643] LOG: TimescaleDB background worker launcher connected to shared catalogs
我知道有可用的过量使用调优,并且我的应用程序可能存在调用PostgreSQL的潜在问题,但我想了解PostgreSQL对内存的使用。我试图了解OOM Killer输出和与PostgreSQL内存相关的参数,但我有几个疑问:
- PostgreSQL的最大进程数是多少?我认为它将对应于最大连接数(
max_connections + superuser_reserved_connections
)。但是,在本例中,当有16个postgres进程时,此限制加起来为13个。 - 根据here和here描述的OOM杀手输出的解释,实际使用的RAM是
rss
值的总和。但是,在本例中,总和为187604页≈730MB,比设备中的总内存还要多。 - PostgreSQL可以使用的最大内存是多少?根据this answer、
max RAM = shared_buffers + (temp_buffers + work_mem) * max_connections
。在本例中,128+(8+4)*10=248MB。然而,OOM杀手的输出与这一假设不匹配。Postgres进程的总RSS为≈690MB,远远超过其应有的大小,甚至超过设备中的总RAM值。 - 是否还有其他我应该考虑的PostgreSQL参数?
1条答案
按热度按时间nmpmafwu1#
除了“用户”后端,PostgreSQL还使用了几个后台进程。像Wal作者、背景作者、检查器、档案员、Wal发送者、自动吸尘器工人等。这些(可能)不是问题所在,他们只是让报告变得复杂。如果进程名包含该信息就好了,它会在top和ps中包含该信息(或者可以包含,取决于配置),但显然不会包含在OOM报告中。查看数据库服务器日志文件,以了解被杀死的进程实际是什么。这不会是一份完整的报告,但比你目前所知的要好。
大多数RS可能是每个进程都接触到的Shared_Buffers部分,这是所有其他进程也接触到的相同部分。因此,这是一次又一次计算的相同记忆。仅仅总结这篇专栏文章并不能给你真正的答案。
如果您关闭了OverCommit,您将在数据库服务器的日志文件中获得一个很好的报告,说明该进程是什么,当它耗尽内存时它在做什么,以及内存的去向。但是,因为您将内核配置为只随机使用核弹,所以您不会得到这一点。不再存在的进程无法生成错误报告。
如果你安装了一个窗口系统,你不可能关闭Over Commit,因为它们会严重地过度分配它们永远不会使用的RAM。但它看起来不像你正在运行的其中之一,所以可能没有理由不按医生说的做。