theone Posted October 19, 2012 Share Posted October 19, 2012 Hello, I have been experiencing "page allocaion failure" errors on my unRAID server when running a Virtual Machine on VirtualBox for unRAID (http://lime-technology.com/forum/index.php?topic=10978.0). I know that my problem is a result of VirtualBox because when the Virtual Machine is disabled/stopped the errors don't reoccur. The errors would be very frequent every few minutes. After implementing the solution described further down the problem seems to have gone (running without errors for ~2 days). I wanted to share my experience so others with such a problem (or similar not just because of VirtualBox) can get help. The errors I was getting were formatted as below: Oct 17 14:01:57 Tower kernel: swapper/2: page allocation failure: order:2, mode:0x4020 Oct 17 14:01:57 Tower kernel: Pid: 0, comm: swapper/2 Tainted: G W O 3.4.11-unRAID #1 Oct 17 14:01:57 Tower kernel: Call Trace: Oct 17 14:01:57 Tower kernel: [<c1062eb6>] warn_alloc_failed+0xbd/0xcf Oct 17 14:01:57 Tower kernel: [<c106372e>] __alloc_pages_nodemask+0x47c/0x4a5 Oct 17 14:01:57 Tower kernel: [<c12a12e0>] ? skb_copy+0x2e/0x83 Oct 17 14:01:57 Tower kernel: [<c10637ad>] __get_free_pages+0xf/0x21 Oct 17 14:01:57 Tower kernel: [<c10831e8>] __kmalloc_track_caller+0x2e/0xf2 Oct 17 14:01:57 Tower kernel: [<c12a0807>] ? __alloc_skb+0x2a/0xf5 Oct 17 14:01:57 Tower kernel: [<c12a082d>] __alloc_skb+0x50/0xf5 Oct 17 14:01:57 Tower kernel: [<c12a12e0>] skb_copy+0x2e/0x83 Oct 17 14:01:57 Tower kernel: [<f85ec612>] vboxNetFltLinuxPacketHandler+0x60/0x4e8 [vboxnetflt] Oct 17 14:01:57 Tower kernel: [<c104530f>] ? getnstimeofday+0x4d/0xd0 Oct 17 14:01:57 Tower kernel: [<c10453a4>] ? ktime_get_real+0x12/0x2e Oct 17 14:01:57 Tower kernel: [<c12a72f1>] dev_queue_xmit_nit+0x13b/0x144 Oct 17 14:01:57 Tower kernel: [<c12a76d8>] dev_hard_start_xmit+0x5c/0x31b Oct 17 14:01:57 Tower kernel: [<c12a7c10>] ? dev_queue_xmit+0x279/0x286 Oct 17 14:01:57 Tower kernel: [<c12b6184>] sch_direct_xmit+0x54/0x13f Oct 17 14:01:57 Tower kernel: [<c12a7a95>] dev_queue_xmit+0xfe/0x286 Oct 17 14:01:57 Tower kernel: [<c12ae7a1>] neigh_resolve_output+0x135/0x15b Oct 17 14:01:57 Tower kernel: [<c1062a0d>] ? free_pages_prepare+0xb4/0xb6 Oct 17 14:01:57 Tower kernel: [<c12b5685>] ? eth_rebuild_header+0x53/0x53 Oct 17 14:01:57 Tower kernel: [<c12c3504>] ip_finish_output+0x22d/0x258 Oct 17 14:01:57 Tower kernel: [<c12c35c2>] ip_output+0x93/0x9a Oct 17 14:01:57 Tower kernel: [<c12c2a5a>] ip_local_out+0x1b/0x1e Oct 17 14:01:57 Tower kernel: [<c12c2f3e>] ip_queue_xmit+0x2ad/0x2ee Oct 17 14:01:57 Tower kernel: [<c129e519>] ? __skb_clone+0x22/0xbf Oct 17 14:01:57 Tower kernel: [<c12d27b6>] tcp_transmit_skb+0x4d3/0x508 Oct 17 14:01:57 Tower kernel: [<c12d4993>] tcp_write_xmit+0x2f2/0x3ec Oct 17 14:01:57 Tower kernel: [<c12d4ad1>] __tcp_push_pending_frames+0x18/0x6f Oct 17 14:01:57 Tower kernel: [<c12d1481>] tcp_rcv_established+0xfa/0x575 Oct 17 14:01:57 Tower kernel: [<c12d6978>] tcp_v4_do_rcv+0x47/0x13a Oct 17 14:01:57 Tower kernel: [<c12d6e4e>] tcp_v4_rcv+0x3e3/0x665 Oct 17 14:01:57 Tower kernel: [<c12bf240>] ip_local_deliver_finish+0xba/0x192 Oct 17 14:01:57 Tower kernel: [<c12bf379>] ip_local_deliver+0x61/0x66 Oct 17 14:01:57 Tower kernel: [<c12bef01>] ip_rcv_finish+0x23d/0x253 Oct 17 14:01:57 Tower kernel: [<c12bf150>] ip_rcv+0x239/0x26f Oct 17 14:01:57 Tower kernel: [<c12a5189>] __netif_receive_skb+0x223/0x259 Oct 17 14:01:57 Tower kernel: [<c12a6767>] netif_receive_skb+0x66/0x6c Oct 17 14:01:57 Tower kernel: [<c12a682b>] napi_skb_finish+0x1e/0x34 Oct 17 14:01:57 Tower kernel: [<c12a6c98>] napi_gro_receive+0xe7/0xef Oct 17 14:01:57 Tower kernel: [<f84463ba>] e1000_receive_skb+0x3c/0x42 [e1000e] Oct 17 14:01:57 Tower kernel: [<f8446628>] e1000_clean_rx_irq+0x268/0x30e [e1000e] Oct 17 14:01:57 Tower kernel: [<f8448995>] e1000_clean+0x6a/0x211 [e1000e] Oct 17 14:01:57 Tower kernel: [<c12a6d6e>] net_rx_action+0x59/0x12c Oct 17 14:01:57 Tower kernel: [<c1027056>] __do_softirq+0x6b/0xe5 Oct 17 14:01:57 Tower kernel: [<c1026feb>] ? irq_enter+0x41/0x41 Oct 17 14:01:57 Tower kernel: <IRQ> [<c1026e9f>] ? irq_exit+0x32/0x58 Oct 17 14:01:57 Tower kernel: [<c1003506>] ? do_IRQ+0x7c/0x90 Oct 17 14:01:57 Tower kernel: [<c13208e9>] ? common_interrupt+0x29/0x30 Oct 17 14:01:57 Tower kernel: [<c10081ed>] ? mwait_idle+0x4a/0x50 Oct 17 14:01:57 Tower kernel: [<c10083e8>] ? cpu_idle+0x4b/0x65 Oct 17 14:01:57 Tower kernel: [<c131b5f9>] ? start_secondary+0x9a/0x9c Oct 17 14:01:57 Tower kernel: Mem-Info: Oct 17 14:01:57 Tower kernel: DMA per-cpu: Oct 17 14:01:57 Tower kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 17 14:01:57 Tower kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 17 14:01:57 Tower kernel: CPU 2: hi: 0, btch: 1 usd: 0 Oct 17 14:01:57 Tower kernel: CPU 3: hi: 0, btch: 1 usd: 0 Oct 17 14:01:57 Tower kernel: Normal per-cpu: Oct 17 14:01:57 Tower kernel: CPU 0: hi: 186, btch: 31 usd: 179 Oct 17 14:01:57 Tower kernel: CPU 1: hi: 186, btch: 31 usd: 171 Oct 17 14:01:57 Tower kernel: CPU 2: hi: 186, btch: 31 usd: 184 Oct 17 14:01:57 Tower kernel: CPU 3: hi: 186, btch: 31 usd: 143 Oct 17 14:01:57 Tower kernel: HighMem per-cpu: Oct 17 14:01:57 Tower kernel: CPU 0: hi: 186, btch: 31 usd: 112 Oct 17 14:01:57 Tower kernel: CPU 1: hi: 186, btch: 31 usd: 92 Oct 17 14:01:57 Tower kernel: CPU 2: hi: 186, btch: 31 usd: 137 Oct 17 14:01:57 Tower kernel: CPU 3: hi: 186, btch: 31 usd: 145 Oct 17 14:01:57 Tower kernel: active_anon:61918 inactive_anon:38 isolated_anon:0 Oct 17 14:01:57 Tower kernel: active_file:100718 inactive_file:184117 isolated_file:0 Oct 17 14:01:57 Tower kernel: unevictable:234794 dirty:176 writeback:0 unstable:0 Oct 17 14:01:57 Tower kernel: free:102653 slab_reclaimable:9203 slab_unreclaimable:115250 Oct 17 14:01:57 Tower kernel: mapped:228658 shmem:54 pagetables:1054 bounce:0 Oct 17 14:01:57 Tower kernel: DMA free:3500kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15780kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:176kB slab_unreclaimable:12144kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Oct 17 14:01:57 Tower kernel: lowmem_reserve[]: 0 869 4039 4039 Oct 17 14:01:57 Tower kernel: Normal free:248680kB min:3736kB low:4668kB high:5604kB active_anon:30716kB inactive_anon:4kB active_file:19120kB inactive_file:19188kB unevictable:9908kB isolated(anon):0kB isolated(file):0kB present:890008kB mlocked:0kB dirty:16kB writeback:0kB mapped:2544kB shmem:4kB slab_reclaimable:36636kB slab_unreclaimable:448856kB kernel_stack:1808kB pagetables:172kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:19174 all_unreclaimable? no Oct 17 14:01:57 Tower kernel: lowmem_reserve[]: 0 0 25363 25363 Oct 17 14:01:57 Tower kernel: HighMem free:158432kB min:512kB low:3920kB high:7328kB active_anon:216956kB inactive_anon:148kB active_file:383752kB inactive_file:717280kB unevictable:929268kB isolated(anon):0kB isolated(file):0kB present:3246564kB mlocked:0kB dirty:688kB writeback:0kB mapped:912088kB shmem:212kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:4044kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 17 14:01:57 Tower kernel: lowmem_reserve[]: 0 0 0 0 Oct 17 14:01:57 Tower kernel: DMA: 1*4kB 31*8kB 45*16kB 27*32kB 18*64kB 4*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3500kB Oct 17 14:01:57 Tower kernel: Normal: 21762*4kB 20168*8kB 1*16kB 4*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248664kB Oct 17 14:01:57 Tower kernel: HighMem: 3568*4kB 1022*8kB 557*16kB 101*32kB 1689*64kB 113*128kB 5*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 158432kB Oct 17 14:01:57 Tower kernel: 519691 total pagecache pages Oct 17 14:01:57 Tower kernel: 0 pages in swap cache Oct 17 14:01:57 Tower kernel: Swap cache stats: add 0, delete 0, find 0/0 Oct 17 14:01:57 Tower kernel: Free swap = 0kB Oct 17 14:01:57 Tower kernel: Total swap = 0kB Oct 17 14:01:57 Tower kernel: 1310704 pages RAM Oct 17 14:01:57 Tower kernel: 1082370 pages HighMem Oct 17 14:01:57 Tower kernel: 490407 pages reserved Oct 17 14:01:57 Tower kernel: 304658 pages shared Oct 17 14:01:57 Tower kernel: 430686 pages non-shared After some investigation on the internet I came across the following site that describes the "page allocation failure" log output: http://www.linuxsmiths.com/blog/?p=527 Reading through it I found that the "trace" output is where correct information can be gathered about the source of the error, which in my case is "vboxNetFltLinuxPacketHandler" and the "e1000" intel ethernet driver. Further in the article it is mentioned that a fast (1GIG ethernet connection) can exhaust quite fast the memory resources (TX/RX buffer) and cause this error. The given answer was changes the minimum free memory that the OS always leaves free for fast allocation. This can be done by increasing /proc/sys/vm/min_free_kbytes. Again turning to the internet I found several mentions of "page allocation failure" related to "vboxNetFltLinuxPacketHandler": https://www.virtualbox.org/ticket/5260 https://www.virtualbox.org/ticket/10108 There also I found reference to vm.min_free_kbytes Searching the internet once again for min_free_kbytes I can accross the following link: http://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory #change value for this boot sysctl -w vm.min_free_kbytes=65536 #change value for subsequent boots echo "vm.min_free_kbytes=65536" >> /etc/sysctl.conf and This tells the kernel to try and keep 64MB of RAM free at all times. It’s useful in two main cases: * Swap-less machines, where you don’t want incoming network traffic to overwhelm the kernel and force an OOM before it has time to flush any buffers. * x86 machines, for the same reason: the x86 architecture only allows DMA transfers below approximately 900MB of RAM. So you can end up with the bizarre situation of an OOM error with tons of RAM free. I do not have a swap file. Looking at unRAID kernel settings I saw that the min_free_kbytes is only 3806 which is ~3.7MB - seems very low. I added the following to my go script (after array is up) and reboot: #Increase v.min_free_kbytes to 64MB (default was 3806=3.71MB) for better immunity to "page allocation failure" sysctl -w vm.min_free_kbytes=65536 Since then (2 days now - October 17 2012) I have had no "page allocation errors" as of yet. Hope this helps someone. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.