Analyse van een OOM (Out-Of-Memory) melding

Analyse van een OOM (Out-Of-Memory) melding

In de vorige post werd uitgelegd wat er exact schuilgaat achter een OOM melding van de Linux kernel. Nu gaan we zo een melding diepgaand analyseren.

[Sat Aug 10 12:30:10 2024] git invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0
[Sat Aug 10 12:30:10 2024] CPU: 2 PID: 509789 Comm: git Kdump: loaded Not tainted 4.18.0-553.el8_10.x86_64 #1
[Sat Aug 10 12:30:10 2024] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
[Sat Aug 10 12:30:10 2024] Call Trace:
[Sat Aug 10 12:30:10 2024]  dump_stack+0x41/0x60
[Sat Aug 10 12:30:10 2024]  dump_header+0x4a/0x1df
[Sat Aug 10 12:30:10 2024]  oom_kill_process.cold.33+0xb/0x10
[Sat Aug 10 12:30:10 2024]  out_of_memory+0x1bd/0x4e0
[Sat Aug 10 12:30:10 2024]  __alloc_pages_slowpath+0xbf0/0xcd0
[Sat Aug 10 12:30:10 2024]  __alloc_pages_nodemask+0x2e2/0x330
[Sat Aug 10 12:30:10 2024]  new_slab+0x3f4/0x4f0
[Sat Aug 10 12:30:10 2024]  ___slab_alloc+0x3a3/0x950
[Sat Aug 10 12:30:10 2024]  ? copy_process+0x215/0x1b50
[Sat Aug 10 12:30:10 2024]  ? obj_cgroup_charge_pages+0xbe/0xe0
[Sat Aug 10 12:30:10 2024]  ? copy_process+0x215/0x1b50
[Sat Aug 10 12:30:10 2024]  kmem_cache_alloc_node+0x253/0x2b0
[Sat Aug 10 12:30:10 2024]  copy_process+0x215/0x1b50
[Sat Aug 10 12:30:10 2024]  _do_fork+0x8b/0x340
[Sat Aug 10 12:30:10 2024]  do_syscall_64+0x5b/0x1b0
[Sat Aug 10 12:30:10 2024]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[Sat Aug 10 12:30:10 2024] RIP: 0033:0x7f89cf78d143
[Sat Aug 10 12:30:10 2024] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 89 c5 85 c0 0f 85 c6 00 00
[Sat Aug 10 12:30:10 2024] RSP: 002b:00007ffe060bcae0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[Sat Aug 10 12:30:10 2024] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f89cf78d143
[Sat Aug 10 12:30:10 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[Sat Aug 10 12:30:10 2024] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f89d02c6f00
[Sat Aug 10 12:30:10 2024] R10: 00007f89d02c71d0 R11: 0000000000000246 R12: 0000000000000001
[Sat Aug 10 12:30:10 2024] R13: 00007ffe060bcb50 R14: 0000000000000000 R15: 00007ffe060bcb70
[Sat Aug 10 12:30:10 2024] Mem-Info:
[Sat Aug 10 12:30:10 2024] active_anon:3681441 inactive_anon:417474 isolated_anon:0
active_file:30473 inactive_file:16140 isolated_file:0
unevictable:627 dirty:609 writeback:0
slab_reclaimable:1723903 slab_unreclaimable:322374
mapped:27873 shmem:699 pagetables:13466 bounce:0
free:181901 free_pcp:115 free_cma:0
[Sat Aug 10 12:30:10 2024] Node 0 active_anon:14725764kB inactive_anon:1669896kB active_file:121892kB inactive_file:64560kB unevictable:2508kB isolated(anon):0kB isolated(file):0kB mapped:111492kB dirty:2436kB writeback:0kB shmem:2796kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 202752kB writeback_tmp:0kB kernel_stack:19212kB pagetables:53864kB all_unreclaimable? no
[Sat Aug 10 12:30:10 2024] Node 0 DMA free:11264kB min:28kB low:40kB high:52kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Sat Aug 10 12:30:10 2024] lowmem_reserve[]: 0 2706 31835 31835 31835
[Sat Aug 10 12:30:10 2024] Node 0 DMA32 free:119364kB min:5740kB low:8508kB high:11276kB active_anon:274668kB inactive_anon:85252kB active_file:552kB inactive_file:208kB unevictable:0kB writepending:0kB present:3128732kB managed:2800988kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Sat Aug 10 12:30:10 2024] lowmem_reserve[]: 0 0 29128 29128 29128
[Sat Aug 10 12:30:10 2024] Node 0 Normal free:596976kB min:90480kB low:120308kB high:150136kB active_anon:14451096kB inactive_anon:1583872kB active_file:121340kB inactive_file:64352kB unevictable:2508kB writepending:2436kB present:30408704kB managed:29836568kB mlocked:2508kB bounce:0kB free_pcp:460kB local_pcp:12kB free_cma:0kB
[Sat Aug 10 12:30:10 2024] lowmem_reserve[]: 0 0 0 0 0
[Sat Aug 10 12:30:10 2024] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[Sat Aug 10 12:30:10 2024] Node 0 DMA32: 1081*4kB (UME) 1258*8kB (UME) 1281*16kB (UME) 24*32kB (UME) 872*64kB (UE) 218*128kB (E) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 119364kB
[Sat Aug 10 12:30:10 2024] Node 0 Normal: 75403*4kB (UMEH) 36845*8kB (UEH) 2*16kB (H) 2*32kB (H) 3*64kB (H) 3*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 598324kB
[Sat Aug 10 12:30:10 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Sat Aug 10 12:30:10 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sat Aug 10 12:30:10 2024] 67741 total pagecache pages
[Sat Aug 10 12:30:10 2024] 20408 pages in swap cache
[Sat Aug 10 12:30:10 2024] Swap cache stats: add 53729283, delete 53712173, find 38041573/56230750
[Sat Aug 10 12:30:10 2024] Free swap  = 0kB
[Sat Aug 10 12:30:10 2024] Total swap = 4194300kB
[Sat Aug 10 12:30:10 2024] 8388358 pages RAM
[Sat Aug 10 12:30:10 2024] 0 pages HighMem/MovableOnly
[Sat Aug 10 12:30:10 2024] 225129 pages reserved
[Sat Aug 10 12:30:10 2024] 0 pages hwpoisoned
[Sat Aug 10 12:30:10 2024] Tasks state (memory values in pages):
[Sat Aug 10 12:30:10 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Sat Aug 10 12:30:10 2024] [    969]     0   969    55261     3165   319488     2536             0 systemd-journal
[Sat Aug 10 12:30:10 2024] [   1034]     0  1034    26497     1198   225280      737         -1000 systemd-udevd
[Sat Aug 10 12:30:10 2024] [   1476]    32  1476    16832      696   167936      163             0 rpcbind
[Sat Aug 10 12:30:10 2024] [   1478]     0  1478    38521      801   184320      282         -1000 auditd
[Sat Aug 10 12:30:10 2024] [   1480]     0  1480    23569      670   225280    11443             0 sedispatch
[Sat Aug 10 12:30:10 2024] [   1481]     0  1481     2779      433    65536       62             0 audisp-syslog
[Sat Aug 10 12:30:10 2024] [   1513]     0  1513    12625      624   143360      299             0 smartd
[Sat Aug 10 12:30:10 2024] [   1514]     0  1514    31276      695   147456      131             0 irqbalance
[Sat Aug 10 12:30:10 2024] [   1516]     0  1516    18821     1106   188416      430             0 VGAuthService
[Sat Aug 10 12:30:10 2024] [   1517]     0  1517    90949     1487   339968      385             0 vmtoolsd
[Sat Aug 10 12:30:10 2024] [   1518]     0  1518    94880      951   352256      539             0 abrtd
[Sat Aug 10 12:30:10 2024] [   1526]   994  1526    35594      805   180224      142             0 chronyd
[Sat Aug 10 12:30:10 2024] [   1530]    28  1530   208791     1347   241664      137             0 nscd
[Sat Aug 10 12:30:10 2024] [   1537]    81  1537    17914      949   172032      119          -900 dbus-daemon
[Sat Aug 10 12:30:10 2024] [   1544]     0  1544    95236     1597   376832      923             0 NetworkManager
[Sat Aug 10 12:30:10 2024] [   1550]     0  1550    20312     1141   196608      522             0 systemd-logind
[Sat Aug 10 12:30:10 2024] [   1553]     0  1553   419326     3105   253952     1476             0 ScanAssistant
[Sat Aug 10 12:30:10 2024] [   1607]     0  1607    18651     1078   188416      192         -1000 sshd
[Sat Aug 10 12:30:10 2024] [   1608]     0  1608    66019     4784   331776    26070             0 samhain
[Sat Aug 10 12:30:10 2024] [   1609]     0  1609    28906      387   114688      112             0 rhsmcertd
[Sat Aug 10 12:30:10 2024] [   1620]     0  1620    43220      496   200704      719             0 gssproxy
[Sat Aug 10 12:30:10 2024] [   1623]     0  1623   102243     2184   409600     3455             0 tuned
[Sat Aug 10 12:30:10 2024] [   1652]     0  1652    13977      490   135168      156             0 rpc.gssd
[Sat Aug 10 12:30:10 2024] [   1664]     0  1664     6005      602    86016      162             0 crond
[Sat Aug 10 12:30:10 2024] [   1677]     0  1677     6126      308    98304        0             0 atd
[Sat Aug 10 12:30:10 2024] [   1711]     0  1711     1217      201    57344        0             0 agetty
[Sat Aug 10 12:30:10 2024] [   1726] 530229  1726    11910     1511   126976      519             0 SgUx_LogD.pl
[Sat Aug 10 12:30:10 2024] [   1757]     0  1757   399181    11373   675840     8753             0 metricbeat
[Sat Aug 10 12:30:10 2024] [   1761]     0  1761   119583      565   380928      313             0 rsyslogd
[Sat Aug 10 12:30:10 2024] [   1770]   993  1770    49548     1904   282624      322             0 freshclam
[Sat Aug 10 12:30:10 2024] [   1879]     0  1879    44904     7919   303104     1131             0 adclient
[Sat Aug 10 12:30:10 2024] [   1883]     0  1883    25483      345   110592      101             0 cdcwatch
[Sat Aug 10 12:30:10 2024] [   1914]     0  1914    14974     1546   167936     2191             0 SgUx_AutoAction
[Sat Aug 10 12:30:10 2024] [   1933] 530229  1933    17680     4845   184320      536             0 SgUx_LogScanD.p
[Sat Aug 10 12:30:10 2024] [   1946]     0  1946   255409     2146   270336      565             0 dad
[Sat Aug 10 12:30:10 2024] [   1978]     0  1978   130371     3561   225280     2292             0 BESClient
[Sat Aug 10 12:30:10 2024] [   2166]     0  2166    32485      484   147456      181             0 cdawatch
[Sat Aug 10 12:30:10 2024] [   2214]     0  2214    18510      288   159744      316             0 rscd
[Sat Aug 10 12:30:10 2024] [   2228]     0  2228    18572      156   159744      305             0 rscd
[Sat Aug 10 12:30:10 2024] [   2229]     0  2229    18510      612   163840      288             0 rscd
[Sat Aug 10 12:30:10 2024] [   2363]   998  2363   502101      984   372736     1592             0 polkitd
[Sat Aug 10 12:30:10 2024] [   3124]     0  3124    59862      397   380928     3356             0 linux_agent_exe
[Sat Aug 10 12:30:10 2024] [   3225]     0  3225    99238     1484   499712     3746             0 linux_agent_exe
[Sat Aug 10 12:30:10 2024] [   4213] 557223  4213    10198      908   118784     1048             0 ubrokerd
[Sat Aug 10 12:30:10 2024] [   4530] 557223  4530   309311     2079   262144     3104             0 uagsrv
[Sat Aug 10 12:30:10 2024] [   4532]     0  4532     7175      602    98304      140             0 uemsrv
[Sat Aug 10 12:30:10 2024] [   7721]     0  7721    37059       16    90112      192             0 samhain
[Sat Aug 10 12:30:10 2024] [ 417367]     0 417367   438156    13290   745472     7381             0 filebeat
[Sat Aug 10 12:30:10 2024] [3549217]    29 3549217    11311      472   126976      115             0 rpc.statd
[Sat Aug 10 12:30:10 2024] [2193516] 737120 2193516  5426819  3917107 37040128   453875             0 java
[Sat Aug 10 12:30:10 2024] [2474009]     0 2474009   476279    44626  2158592   191968             0 cbdaemon
[Sat Aug 10 12:30:10 2024] [2474021]     0 2474021   147431       81   155648      774             0 ECStateEngine
[Sat Aug 10 12:30:10 2024] [2474024]     0 2474024   179458     9380   700416    16968             0 event_collector
[Sat Aug 10 12:30:10 2024] [3133255]     0 3133255   422240    53018  3121152   295662             0 clamd
[Sat Aug 10 12:30:10 2024] [3788871]     0 3788871     6058      373    81920      207             0 crond
[Sat Aug 10 12:30:10 2024] [3788874]     0 3788874     3234      691    65536       88             0 scb_cron.sh
[Sat Aug 10 12:30:10 2024] [3788989]     0 3788989     4079      814    81920      827             0 scb.sh
[Sat Aug 10 12:30:10 2024] [ 419852]     0 419852     4079      461    73728      827             0 scb.sh
[Sat Aug 10 12:30:10 2024] [ 419855]     0 419855     1958      396    53248        0             0 xargs
[Sat Aug 10 12:30:10 2024] [ 419856]     0 419856     5171      105    86016        0             0 sed
[Sat Aug 10 12:30:10 2024] [ 419877]     0 419877     7987     3564   102400        0             0 find
[Sat Aug 10 12:30:10 2024] [ 509763] 737120 509763     3683      787    77824        0             0 git
[Sat Aug 10 12:30:10 2024] [ 509774] 737120 509774     3683      789    73728        0             0 git
[Sat Aug 10 12:30:10 2024] [ 509775] 737120 509775     3683      779    73728        0             0 git
[Sat Aug 10 12:30:10 2024] [ 509776] 737120 509776     3683      784    73728        0             0 git
[Sat Aug 10 12:30:10 2024] [ 509778] 737120 509778     3683      793    69632        0             0 git
[Sat Aug 10 12:30:10 2024] [ 509779] 737120 509779    38022     2221   184320        0             0 git-remote-http
[Sat Aug 10 12:30:10 2024] [ 509780] 737120 509780     3683      784    73728        0             0 git
[Sat Aug 10 12:30:10 2024] [ 509781] 737120 509781    21585     2197   188416        0             0 git-remote-http
[Sat Aug 10 12:30:10 2024] [ 509784] 737120 509784    38022     2222   192512        0             0 git-remote-http
[Sat Aug 10 12:30:10 2024] [ 509789] 737120 509789     3683      784    69632        0             0 git
[Sat Aug 10 12:30:10 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/jenkins.service,task=java,pid=2193516,uid=737120
[Sat Aug 10 12:30:10 2024] Out of memory: Killed process 2193516 (java) total-vm:21707276kB, anon-rss:15668428kB, file-rss:0kB, shmem-rss:0kB, UID:737120 pgtables:36172kB oom_score_adj:0

1. Initiële OOM-trigger

[Sat Aug 10 12:30:10 2024] git invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0
  • Proces: Het git-proces veroorzaakte de OOM-killer.
  • gfp_mask: Dit is een combinatie van flags (GFP_KERNEL|__GFP_COMP) die aangeven hoe het geheugen is aangevraagd. GFP_KERNEL betekent dat het een reguliere kerneltaak is, en __GFP_COMP betekent dat de allocatie betrekking heeft op samengevoegde pagina’s, zoals bij slab-allocatie.
  • order=2: Dit duidt op een aanvraag voor een geheugengrootte van 2^2 = 4 pagina’s (4 * 4KB = 16KB).
  • oom_score_adj=0: Dit betekent dat het git-proces geen aangepaste OOM-score had en een neutrale score kreeg toegewezen.

2. Systeeminformatie

[Sat Aug 10 12:30:10 2024] CPU: 2 PID: 509789 Comm: git Kdump: loaded Not tainted 4.18.0-553.el8_10.x86_64 #1
[Sat Aug 10 12:30:10 2024] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023
  • CPU, PID, Comm: Het git-proces draaide op CPU 2 met proces-ID 509789.
  • Kernelversie: De kernelversie is 4.18.0-553, wat een stabiele versie van de Linux-kernel is.
  • Hardware: Dit draait op een VMware-virtuele machine, wat kan betekenen dat de hardwarelimieten (zoals RAM) strikt worden gecontroleerd door de hypervisor.

3. Stacktrace

[Sat Aug 10 12:30:10 2024] Call Trace:
[Sat Aug 10 12:30:10 2024]  dump_stack+0x41/0x60
[Sat Aug 10 12:30:10 2024]  dump_header+0x4a/0x1df
[Sat Aug 10 12:30:10 2024]  oom_kill_process.cold.33+0xb/0x10
[Sat Aug 10 12:30:10 2024]  out_of_memory+0x1bd/0x4e0
[Sat Aug 10 12:30:10 2024]  __alloc_pages_slowpath+0xbf0/0xcd0
[Sat Aug 10 12:30:10 2024]  __alloc_pages_nodemask+0x2e2/0x330
[Sat Aug 10 12:30:10 2024]  new_slab+0x3f4/0x4f0
[Sat Aug 10 12:30:10 2024]  ___slab_alloc+0x3a3/0x950
  • Stacktrace: Dit toont de kernel-functies die werden aangeroepen. De belangrijke functie hier is out_of_memory(), die werd getriggerd door een mislukt geheugentoewijzingsverzoek in __alloc_pages_slowpath(), wat betekent dat het systeem geen geheugenpagina’s kon toewijzen zoals gevraagd door het git-proces.

4. Geheugeninformatie (Mem-Info)

[Sat Aug 10 12:30:10 2024] Mem-Info:
...
[Sat Aug 10 12:30:10 2024] Free swap  = 0kB
[Sat Aug 10 12:30:10 2024] Total swap = 4194300kB
  • Geheugenstatus: Het systeem heeft een aanzienlijke hoeveelheid RAM (ongeveer 32GB), maar alle swap-ruimte is uitgeput (Free swap = 0kB).
  • Pagina’s in swap cache: Dit suggereert dat het systeem onder zware geheugendruk stond en intensief gebruikmaakte van swap, wat vaak gebeurt als het RAM-geheugen volledig is verbruikt.
  • Belangrijke indicator: Het gebrek aan vrije swapruimte is een belangrijke trigger voor de OOM-killer, omdat het systeem geen virtueel geheugen meer kon gebruiken om geheugenaanvragen te vervullen.

5. OOM-killer besluitvorming

[Sat Aug 10 12:30:10 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/jenkins.service,task=java,pid=2193516,uid=737120
[Sat Aug 10 12:30:10 2024] Out of memory: Killed process 2193516 (java) total-vm:21707276kB, anon-rss:15668428kB, file-rss:0kB, shmem-rss:0kB, UID:737120 pgtables:36172kB oom_score_adj:0
  • Besluit: De OOM-killer koos ervoor om het java-proces (PID 2193516) te beëindigen. Dit proces had een aanzienlijke geheugenvoetafdruk: ongeveer 21GB virtueel geheugen (total-vm) en bijna 15,7GB anonieme resident set size (anon-rss), wat meestal niet-geswapped, actief gebruikt geheugen is.
  • Reden: Het java-proces was de grootste geheugenconsument op dat moment en had een neutrale oom_score_adj waarde (0). Het beëindigen van dit proces maakte een aanzienlijke hoeveelheid geheugen vrij, waardoor het systeem kon blijven functioneren.

Samenvatting

Het git-proces trachtte geheugen toe te wijzen, maar het systeem had geen vrije geheugenpagina’s of swapruimte meer beschikbaar, waardoor de OOM-killer werd getriggerd. Na het evalueren van de lopende processen en hun geheugenverbruik, besloot de OOM-killer om het java-proces (PID 2193516) te beëindigen, omdat dit verreweg het meeste geheugen verbruikte. Dit proces werd beëindigd om te voorkomen dat het systeem volledig vast zou lopen.

Het beëindigen van het java-proces was een noodzakelijke actie om ervoor te zorgen dat het systeem kon blijven functioneren, ondanks de zware geheugendruk.

Word je zelf niet wild van deze hoeveelheid geekiness, wat uiteraard onbegrijpelijk is, maar ben je toch op zoek naar iemand om je Linux servers te beheren?

Jouw feedback

Chatten
1
💬 Meer info?
Hallo 👋🏼
Kan ik een vraag voor jou beantwoorden?