Analyse van een OOM (Out-Of-Memory) melding
In de vorige post werd uitgelegd wat er exact schuilgaat achter een OOM melding van de Linux kernel. Nu gaan we zo een melding diepgaand analyseren.
[Sat Aug 10 12:30:10 2024] git invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0
[Sat Aug 10 12:30:10 2024] CPU: 2 PID: 509789 Comm: git Kdump: loaded Not tainted 4.18.0-553.el8_10.x86_64 #1
[Sat Aug 10 12:30:10 2024] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[Sat Aug 10 12:30:10 2024] Call Trace:
[Sat Aug 10 12:30:10 2024] dump_stack+0x41/0x60
[Sat Aug 10 12:30:10 2024] dump_header+0x4a/0x1df
[Sat Aug 10 12:30:10 2024] oom_kill_process.cold.33+0xb/0x10
[Sat Aug 10 12:30:10 2024] out_of_memory+0x1bd/0x4e0
[Sat Aug 10 12:30:10 2024] __alloc_pages_slowpath+0xbf0/0xcd0
[Sat Aug 10 12:30:10 2024] __alloc_pages_nodemask+0x2e2/0x330
[Sat Aug 10 12:30:10 2024] new_slab+0x3f4/0x4f0
[Sat Aug 10 12:30:10 2024] ___slab_alloc+0x3a3/0x950
[Sat Aug 10 12:30:10 2024] ? copy_process+0x215/0x1b50
[Sat Aug 10 12:30:10 2024] ? obj_cgroup_charge_pages+0xbe/0xe0
[Sat Aug 10 12:30:10 2024] ? copy_process+0x215/0x1b50
[Sat Aug 10 12:30:10 2024] kmem_cache_alloc_node+0x253/0x2b0
[Sat Aug 10 12:30:10 2024] copy_process+0x215/0x1b50
[Sat Aug 10 12:30:10 2024] _do_fork+0x8b/0x340
[Sat Aug 10 12:30:10 2024] do_syscall_64+0x5b/0x1b0
[Sat Aug 10 12:30:10 2024] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[Sat Aug 10 12:30:10 2024] RIP: 0033:0x7f89cf78d143
[Sat Aug 10 12:30:10 2024] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 89 c5 85 c0 0f 85 c6 00 00
[Sat Aug 10 12:30:10 2024] RSP: 002b:00007ffe060bcae0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[Sat Aug 10 12:30:10 2024] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f89cf78d143
[Sat Aug 10 12:30:10 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[Sat Aug 10 12:30:10 2024] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f89d02c6f00
[Sat Aug 10 12:30:10 2024] R10: 00007f89d02c71d0 R11: 0000000000000246 R12: 0000000000000001
[Sat Aug 10 12:30:10 2024] R13: 00007ffe060bcb50 R14: 0000000000000000 R15: 00007ffe060bcb70
[Sat Aug 10 12:30:10 2024] Mem-Info:
[Sat Aug 10 12:30:10 2024] active_anon:3681441 inactive_anon:417474 isolated_anon:0
active_file:30473 inactive_file:16140 isolated_file:0
unevictable:627 dirty:609 writeback:0
slab_reclaimable:1723903 slab_unreclaimable:322374
mapped:27873 shmem:699 pagetables:13466 bounce:0
free:181901 free_pcp:115 free_cma:0
[Sat Aug 10 12:30:10 2024] Node 0 active_anon:14725764kB inactive_anon:1669896kB active_file:121892kB inactive_file:64560kB unevictable:2508kB isolated(anon):0kB isolated(file):0kB mapped:111492kB dirty:2436kB writeback:0kB shmem:2796kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 202752kB writeback_tmp:0kB kernel_stack:19212kB pagetables:53864kB all_unreclaimable? no
[Sat Aug 10 12:30:10 2024] Node 0 DMA free:11264kB min:28kB low:40kB high:52kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Sat Aug 10 12:30:10 2024] lowmem_reserve[]: 0 2706 31835 31835 31835
[Sat Aug 10 12:30:10 2024] Node 0 DMA32 free:119364kB min:5740kB low:8508kB high:11276kB active_anon:274668kB inactive_anon:85252kB active_file:552kB inactive_file:208kB unevictable:0kB writepending:0kB present:3128732kB managed:2800988kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Sat Aug 10 12:30:10 2024] lowmem_reserve[]: 0 0 29128 29128 29128
[Sat Aug 10 12:30:10 2024] Node 0 Normal free:596976kB min:90480kB low:120308kB high:150136kB active_anon:14451096kB inactive_anon:1583872kB active_file:121340kB inactive_file:64352kB unevictable:2508kB writepending:2436kB present:30408704kB managed:29836568kB mlocked:2508kB bounce:0kB free_pcp:460kB local_pcp:12kB free_cma:0kB
[Sat Aug 10 12:30:10 2024] lowmem_reserve[]: 0 0 0 0 0
[Sat Aug 10 12:30:10 2024] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[Sat Aug 10 12:30:10 2024] Node 0 DMA32: 1081*4kB (UME) 1258*8kB (UME) 1281*16kB (UME) 24*32kB (UME) 872*64kB (UE) 218*128kB (E) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 119364kB
[Sat Aug 10 12:30:10 2024] Node 0 Normal: 75403*4kB (UMEH) 36845*8kB (UEH) 2*16kB (H) 2*32kB (H) 3*64kB (H) 3*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 598324kB
[Sat Aug 10 12:30:10 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Sat Aug 10 12:30:10 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sat Aug 10 12:30:10 2024] 67741 total pagecache pages
[Sat Aug 10 12:30:10 2024] 20408 pages in swap cache
[Sat Aug 10 12:30:10 2024] Swap cache stats: add 53729283, delete 53712173, find 38041573/56230750
[Sat Aug 10 12:30:10 2024] Free swap = 0kB
[Sat Aug 10 12:30:10 2024] Total swap = 4194300kB
[Sat Aug 10 12:30:10 2024] 8388358 pages RAM
[Sat Aug 10 12:30:10 2024] 0 pages HighMem/MovableOnly
[Sat Aug 10 12:30:10 2024] 225129 pages reserved
[Sat Aug 10 12:30:10 2024] 0 pages hwpoisoned
[Sat Aug 10 12:30:10 2024] Tasks state (memory values in pages):
[Sat Aug 10 12:30:10 2024] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Sat Aug 10 12:30:10 2024] [ 969] 0 969 55261 3165 319488 2536 0 systemd-journal
[Sat Aug 10 12:30:10 2024] [ 1034] 0 1034 26497 1198 225280 737 -1000 systemd-udevd
[Sat Aug 10 12:30:10 2024] [ 1476] 32 1476 16832 696 167936 163 0 rpcbind
[Sat Aug 10 12:30:10 2024] [ 1478] 0 1478 38521 801 184320 282 -1000 auditd
[Sat Aug 10 12:30:10 2024] [ 1480] 0 1480 23569 670 225280 11443 0 sedispatch
[Sat Aug 10 12:30:10 2024] [ 1481] 0 1481 2779 433 65536 62 0 audisp-syslog
[Sat Aug 10 12:30:10 2024] [ 1513] 0 1513 12625 624 143360 299 0 smartd
[Sat Aug 10 12:30:10 2024] [ 1514] 0 1514 31276 695 147456 131 0 irqbalance
[Sat Aug 10 12:30:10 2024] [ 1516] 0 1516 18821 1106 188416 430 0 VGAuthService
[Sat Aug 10 12:30:10 2024] [ 1517] 0 1517 90949 1487 339968 385 0 vmtoolsd
[Sat Aug 10 12:30:10 2024] [ 1518] 0 1518 94880 951 352256 539 0 abrtd
[Sat Aug 10 12:30:10 2024] [ 1526] 994 1526 35594 805 180224 142 0 chronyd
[Sat Aug 10 12:30:10 2024] [ 1530] 28 1530 208791 1347 241664 137 0 nscd
[Sat Aug 10 12:30:10 2024] [ 1537] 81 1537 17914 949 172032 119 -900 dbus-daemon
[Sat Aug 10 12:30:10 2024] [ 1544] 0 1544 95236 1597 376832 923 0 NetworkManager
[Sat Aug 10 12:30:10 2024] [ 1550] 0 1550 20312 1141 196608 522 0 systemd-logind
[Sat Aug 10 12:30:10 2024] [ 1553] 0 1553 419326 3105 253952 1476 0 ScanAssistant
[Sat Aug 10 12:30:10 2024] [ 1607] 0 1607 18651 1078 188416 192 -1000 sshd
[Sat Aug 10 12:30:10 2024] [ 1608] 0 1608 66019 4784 331776 26070 0 samhain
[Sat Aug 10 12:30:10 2024] [ 1609] 0 1609 28906 387 114688 112 0 rhsmcertd
[Sat Aug 10 12:30:10 2024] [ 1620] 0 1620 43220 496 200704 719 0 gssproxy
[Sat Aug 10 12:30:10 2024] [ 1623] 0 1623 102243 2184 409600 3455 0 tuned
[Sat Aug 10 12:30:10 2024] [ 1652] 0 1652 13977 490 135168 156 0 rpc.gssd
[Sat Aug 10 12:30:10 2024] [ 1664] 0 1664 6005 602 86016 162 0 crond
[Sat Aug 10 12:30:10 2024] [ 1677] 0 1677 6126 308 98304 0 0 atd
[Sat Aug 10 12:30:10 2024] [ 1711] 0 1711 1217 201 57344 0 0 agetty
[Sat Aug 10 12:30:10 2024] [ 1726] 530229 1726 11910 1511 126976 519 0 SgUx_LogD.pl
[Sat Aug 10 12:30:10 2024] [ 1757] 0 1757 399181 11373 675840 8753 0 metricbeat
[Sat Aug 10 12:30:10 2024] [ 1761] 0 1761 119583 565 380928 313 0 rsyslogd
[Sat Aug 10 12:30:10 2024] [ 1770] 993 1770 49548 1904 282624 322 0 freshclam
[Sat Aug 10 12:30:10 2024] [ 1879] 0 1879 44904 7919 303104 1131 0 adclient
[Sat Aug 10 12:30:10 2024] [ 1883] 0 1883 25483 345 110592 101 0 cdcwatch
[Sat Aug 10 12:30:10 2024] [ 1914] 0 1914 14974 1546 167936 2191 0 SgUx_AutoAction
[Sat Aug 10 12:30:10 2024] [ 1933] 530229 1933 17680 4845 184320 536 0 SgUx_LogScanD.p
[Sat Aug 10 12:30:10 2024] [ 1946] 0 1946 255409 2146 270336 565 0 dad
[Sat Aug 10 12:30:10 2024] [ 1978] 0 1978 130371 3561 225280 2292 0 BESClient
[Sat Aug 10 12:30:10 2024] [ 2166] 0 2166 32485 484 147456 181 0 cdawatch
[Sat Aug 10 12:30:10 2024] [ 2214] 0 2214 18510 288 159744 316 0 rscd
[Sat Aug 10 12:30:10 2024] [ 2228] 0 2228 18572 156 159744 305 0 rscd
[Sat Aug 10 12:30:10 2024] [ 2229] 0 2229 18510 612 163840 288 0 rscd
[Sat Aug 10 12:30:10 2024] [ 2363] 998 2363 502101 984 372736 1592 0 polkitd
[Sat Aug 10 12:30:10 2024] [ 3124] 0 3124 59862 397 380928 3356 0 linux_agent_exe
[Sat Aug 10 12:30:10 2024] [ 3225] 0 3225 99238 1484 499712 3746 0 linux_agent_exe
[Sat Aug 10 12:30:10 2024] [ 4213] 557223 4213 10198 908 118784 1048 0 ubrokerd
[Sat Aug 10 12:30:10 2024] [ 4530] 557223 4530 309311 2079 262144 3104 0 uagsrv
[Sat Aug 10 12:30:10 2024] [ 4532] 0 4532 7175 602 98304 140 0 uemsrv
[Sat Aug 10 12:30:10 2024] [ 7721] 0 7721 37059 16 90112 192 0 samhain
[Sat Aug 10 12:30:10 2024] [ 417367] 0 417367 438156 13290 745472 7381 0 filebeat
[Sat Aug 10 12:30:10 2024] [3549217] 29 3549217 11311 472 126976 115 0 rpc.statd
[Sat Aug 10 12:30:10 2024] [2193516] 737120 2193516 5426819 3917107 37040128 453875 0 java
[Sat Aug 10 12:30:10 2024] [2474009] 0 2474009 476279 44626 2158592 191968 0 cbdaemon
[Sat Aug 10 12:30:10 2024] [2474021] 0 2474021 147431 81 155648 774 0 ECStateEngine
[Sat Aug 10 12:30:10 2024] [2474024] 0 2474024 179458 9380 700416 16968 0 event_collector
[Sat Aug 10 12:30:10 2024] [3133255] 0 3133255 422240 53018 3121152 295662 0 clamd
[Sat Aug 10 12:30:10 2024] [3788871] 0 3788871 6058 373 81920 207 0 crond
[Sat Aug 10 12:30:10 2024] [3788874] 0 3788874 3234 691 65536 88 0 scb_cron.sh
[Sat Aug 10 12:30:10 2024] [3788989] 0 3788989 4079 814 81920 827 0 scb.sh
[Sat Aug 10 12:30:10 2024] [ 419852] 0 419852 4079 461 73728 827 0 scb.sh
[Sat Aug 10 12:30:10 2024] [ 419855] 0 419855 1958 396 53248 0 0 xargs
[Sat Aug 10 12:30:10 2024] [ 419856] 0 419856 5171 105 86016 0 0 sed
[Sat Aug 10 12:30:10 2024] [ 419877] 0 419877 7987 3564 102400 0 0 find
[Sat Aug 10 12:30:10 2024] [ 509763] 737120 509763 3683 787 77824 0 0 git
[Sat Aug 10 12:30:10 2024] [ 509774] 737120 509774 3683 789 73728 0 0 git
[Sat Aug 10 12:30:10 2024] [ 509775] 737120 509775 3683 779 73728 0 0 git
[Sat Aug 10 12:30:10 2024] [ 509776] 737120 509776 3683 784 73728 0 0 git
[Sat Aug 10 12:30:10 2024] [ 509778] 737120 509778 3683 793 69632 0 0 git
[Sat Aug 10 12:30:10 2024] [ 509779] 737120 509779 38022 2221 184320 0 0 git-remote-http
[Sat Aug 10 12:30:10 2024] [ 509780] 737120 509780 3683 784 73728 0 0 git
[Sat Aug 10 12:30:10 2024] [ 509781] 737120 509781 21585 2197 188416 0 0 git-remote-http
[Sat Aug 10 12:30:10 2024] [ 509784] 737120 509784 38022 2222 192512 0 0 git-remote-http
[Sat Aug 10 12:30:10 2024] [ 509789] 737120 509789 3683 784 69632 0 0 git
[Sat Aug 10 12:30:10 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/jenkins.service,task=java,pid=2193516,uid=737120
[Sat Aug 10 12:30:10 2024] Out of memory: Killed process 2193516 (java) total-vm:21707276kB, anon-rss:15668428kB, file-rss:0kB, shmem-rss:0kB, UID:737120 pgtables:36172kB oom_score_adj:0
1. Initiële OOM-trigger
[Sat Aug 10 12:30:10 2024] git invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0
- Proces: Het
git
-proces veroorzaakte de OOM-killer. - gfp_mask: Dit is een combinatie van flags (
GFP_KERNEL|__GFP_COMP
) die aangeven hoe het geheugen is aangevraagd.GFP_KERNEL
betekent dat het een reguliere kerneltaak is, en__GFP_COMP
betekent dat de allocatie betrekking heeft op samengevoegde pagina’s, zoals bij slab-allocatie. - order=2: Dit duidt op een aanvraag voor een geheugengrootte van 2^2 = 4 pagina’s (4 * 4KB = 16KB).
- oom_score_adj=0: Dit betekent dat het
git
-proces geen aangepaste OOM-score had en een neutrale score kreeg toegewezen.
2. Systeeminformatie
[Sat Aug 10 12:30:10 2024] CPU: 2 PID: 509789 Comm: git Kdump: loaded Not tainted 4.18.0-553.el8_10.x86_64 #1
[Sat Aug 10 12:30:10 2024] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
- CPU, PID, Comm: Het
git
-proces draaide op CPU 2 met proces-ID 509789. - Kernelversie: De kernelversie is 4.18.0-553, wat een stabiele versie van de Linux-kernel is.
- Hardware: Dit draait op een VMware-virtuele machine, wat kan betekenen dat de hardwarelimieten (zoals RAM) strikt worden gecontroleerd door de hypervisor.
3. Stacktrace
[Sat Aug 10 12:30:10 2024] Call Trace:
[Sat Aug 10 12:30:10 2024] dump_stack+0x41/0x60
[Sat Aug 10 12:30:10 2024] dump_header+0x4a/0x1df
[Sat Aug 10 12:30:10 2024] oom_kill_process.cold.33+0xb/0x10
[Sat Aug 10 12:30:10 2024] out_of_memory+0x1bd/0x4e0
[Sat Aug 10 12:30:10 2024] __alloc_pages_slowpath+0xbf0/0xcd0
[Sat Aug 10 12:30:10 2024] __alloc_pages_nodemask+0x2e2/0x330
[Sat Aug 10 12:30:10 2024] new_slab+0x3f4/0x4f0
[Sat Aug 10 12:30:10 2024] ___slab_alloc+0x3a3/0x950
- Stacktrace: Dit toont de kernel-functies die werden aangeroepen. De belangrijke functie hier is
out_of_memory()
, die werd getriggerd door een mislukt geheugentoewijzingsverzoek in__alloc_pages_slowpath()
, wat betekent dat het systeem geen geheugenpagina’s kon toewijzen zoals gevraagd door hetgit
-proces.
4. Geheugeninformatie (Mem-Info)
[Sat Aug 10 12:30:10 2024] Mem-Info:
...
[Sat Aug 10 12:30:10 2024] Free swap = 0kB
[Sat Aug 10 12:30:10 2024] Total swap = 4194300kB
- Geheugenstatus: Het systeem heeft een aanzienlijke hoeveelheid RAM (ongeveer 32GB), maar alle swap-ruimte is uitgeput (Free swap = 0kB).
- Pagina’s in swap cache: Dit suggereert dat het systeem onder zware geheugendruk stond en intensief gebruikmaakte van swap, wat vaak gebeurt als het RAM-geheugen volledig is verbruikt.
- Belangrijke indicator: Het gebrek aan vrije swapruimte is een belangrijke trigger voor de OOM-killer, omdat het systeem geen virtueel geheugen meer kon gebruiken om geheugenaanvragen te vervullen.
5. OOM-killer besluitvorming
[Sat Aug 10 12:30:10 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/jenkins.service,task=java,pid=2193516,uid=737120
[Sat Aug 10 12:30:10 2024] Out of memory: Killed process 2193516 (java) total-vm:21707276kB, anon-rss:15668428kB, file-rss:0kB, shmem-rss:0kB, UID:737120 pgtables:36172kB oom_score_adj:0
- Besluit: De OOM-killer koos ervoor om het
java
-proces (PID 2193516) te beëindigen. Dit proces had een aanzienlijke geheugenvoetafdruk: ongeveer 21GB virtueel geheugen (total-vm
) en bijna 15,7GB anonieme resident set size (anon-rss
), wat meestal niet-geswapped, actief gebruikt geheugen is. - Reden: Het
java
-proces was de grootste geheugenconsument op dat moment en had een neutraleoom_score_adj
waarde (0). Het beëindigen van dit proces maakte een aanzienlijke hoeveelheid geheugen vrij, waardoor het systeem kon blijven functioneren.
Samenvatting
Het git
-proces trachtte geheugen toe te wijzen, maar het systeem had geen vrije geheugenpagina’s of swapruimte meer beschikbaar, waardoor de OOM-killer werd getriggerd. Na het evalueren van de lopende processen en hun geheugenverbruik, besloot de OOM-killer om het java
-proces (PID 2193516) te beëindigen, omdat dit verreweg het meeste geheugen verbruikte. Dit proces werd beëindigd om te voorkomen dat het systeem volledig vast zou lopen.
Het beëindigen van het java
-proces was een noodzakelijke actie om ervoor te zorgen dat het systeem kon blijven functioneren, ondanks de zware geheugendruk.
Word je zelf niet wild van deze hoeveelheid geekiness, wat uiteraard onbegrijpelijk is, maar ben je toch op zoek naar iemand om je Linux servers te beheren?