From patchwork Fri Nov 18 12:40:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ding Tianhong X-Patchwork-Id: 82880 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp53297qge; Fri, 18 Nov 2016 04:41:42 -0800 (PST) X-Received: by 10.129.136.2 with SMTP id y2mr6946779ywf.326.1479472902227; Fri, 18 Nov 2016 04:41:42 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 63si1634695ybo.218.2016.11.18.04.41.41; Fri, 18 Nov 2016 04:41:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752844AbcKRMlh (ORCPT + 26 others); Fri, 18 Nov 2016 07:41:37 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:51507 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751689AbcKRMlf (ORCPT ); Fri, 18 Nov 2016 07:41:35 -0500 Received: from 172.24.1.136 (EHLO SZXEML423-HUB.china.huawei.com) ([172.24.1.136]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DVI55637; Fri, 18 Nov 2016 20:41:08 +0800 (CST) Received: from [127.0.0.1] (10.177.23.32) by SZXEML423-HUB.china.huawei.com (10.82.67.154) with Microsoft SMTP Server id 14.3.235.1; Fri, 18 Nov 2016 20:41:05 +0800 From: Ding Tianhong Subject: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic To: , , , , , "linux-kernel@vger.kernel.org" Message-ID: <635ca612-370c-b6e4-7f2a-cba702dd0c4a@huawei.com> Date: Fri, 18 Nov 2016 20:40:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 X-Originating-IP: [10.177.23.32] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread") will introduce a new problem that when huge IP abnormal packet arrived, it may cause OOM and break the kernel, just like this: [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2 [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120 [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1 [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014 [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9 [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000 [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798 [ 100.067050] Call Trace: [ 100.067057] [] dump_stack+0x19/0x1b [ 100.067062] [] warn_alloc_failed+0x110/0x180 [ 100.067066] [] __alloc_pages_nodemask+0x9b6/0xba0 [ 100.067070] [] ? skb_add_rx_frag+0x90/0xb0 [ 100.067075] [] alloc_pages_current+0xaa/0x170 [ 100.067080] [] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en] [ 100.067083] [] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en] [ 100.067086] [] ? __netif_receive_skb+0x18/0x60 [ 100.067088] [] ? netif_receive_skb+0x40/0xc0 [ 100.067092] [] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en] [ 100.067095] [] ? list_del+0xd/0x30 [ 100.067098] [] ? __napi_complete+0x1f/0x30 [ 100.067101] [] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en] [ 100.067103] [] net_rx_action+0x152/0x240 [ 100.067107] [] __do_softirq+0xef/0x280 [ 100.067109] [] run_ksoftirqd+0x30/0x50 [ 100.067114] [] smpboot_thread_fn+0xff/0x1a0 [ 100.067117] [] ? schedule+0x29/0x70 [ 100.067120] [] ? lg_double_unlock+0x90/0x90 [ 100.067122] [] kthread+0xcf/0xe0 [ 100.067124] [] ? kthread_create_on_node+0x140/0x140 [ 100.067127] [] ret_from_fork+0x58/0x90 [ 100.067129] [] ? kthread_create_on_node+0x140/0x140 -- 1.9.0 ================================cut here===================================== The reason is that the huge abnormal IP packet will be received to net stack and be dropped finally by dst_release, and the dst_release would use the rcuos callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will calling do_softirq() to receive more and more IP abnormal packets which will be throw into the RCU callbacks again later, the number of received packet is much greater than the number of packets freed, it will exhaust the memory and then OOM, so don't try to process any pending softirqs in the rcuos callback-offload kthread is a more effective solution. Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread") Signed-off-by: Ding Tianhong Signed-off-by: Ding Tianhong --- kernel/rcu/tree_plugin.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 85c5a88..760c3b5 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg) if (__rcu_reclaim(rdp->rsp->name, list)) cl++; c++; - local_bh_enable(); - cond_resched_rcu_qs(); + _local_bh_enable(); list = next; } trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);