From patchwork Sat Jun 25 02:46:51 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ding Tianhong X-Patchwork-Id: 70845 Delivered-To: patch@linaro.org Received: by 10.140.28.4 with SMTP id 4csp123063qgy; Fri, 24 Jun 2016 19:47:27 -0700 (PDT) X-Received: by 10.98.213.2 with SMTP id d2mr12738255pfg.123.1466822847522; Fri, 24 Jun 2016 19:47:27 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id vx8si3463523pac.107.2016.06.24.19.47.27; Fri, 24 Jun 2016 19:47:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751917AbcFYCrK (ORCPT + 30 others); Fri, 24 Jun 2016 22:47:10 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:33126 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751463AbcFYCrI (ORCPT ); Fri, 24 Jun 2016 22:47:08 -0400 Received: from 172.24.1.47 (EHLO SZXEML423-HUB.china.huawei.com) ([172.24.1.47]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id CDV38847; Sat, 25 Jun 2016 10:46:57 +0800 (CST) Received: from [127.0.0.1] (10.177.22.246) by SZXEML423-HUB.china.huawei.com (10.82.67.154) with Microsoft SMTP Server id 14.3.235.1; Sat, 25 Jun 2016 10:46:52 +0800 To: , , , Eric Dumazet , "David S. Miller" , Netdev From: Ding Tianhong Subject: [PATCH] notifier: Fix soft lockup for notifier_call_chain(). Message-ID: <576DF09B.6010406@huawei.com> Date: Sat, 25 Jun 2016 10:46:51 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 X-Originating-IP: [10.177.22.246] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020201.576DF0A2.00E8, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 923ec408e832f925974bafaf9e671f09 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The problem was occurs in my system that a lot of drviers register its own handler to the notifier call chain for netdev_chain, and then create 4095 vlan dev for one nic, and add several ipv6 address on each one of them, just like this: for i in `seq 1 4095`; do ip link add link eth0 name eth0.$i type vlan id $i; done for i in `seq 1 4095`; do ip -6 addr add 2001::$i dev eth0.$i; done for i in `seq 1 4095`; do ip -6 addr add 2002::$i dev eth0.$i; done for i in `seq 1 4095`; do ip -6 addr add 2003::$i dev eth0.$i; done ifconfig eth0 up ifconfig eth0 down then it will halt several seconds, and occurs softlockup: <0>[ 7620.364058]NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ifconfig:19186] <0>[ 7620.364592]Call trace: <4>[ 7620.364599][] dump_backtrace+0x0/0x220 <4>[ 7620.364603][] show_stack+0x20/0x28 <4>[ 7620.364607][] dump_stack+0x90/0xb0 <4>[ 7620.364612][] watchdog_timer_fn+0x41c/0x460 <4>[ 7620.364617][] __run_hrtimer+0x98/0x2d8 <4>[ 7620.364620][] hrtimer_interrupt+0x110/0x288 <4>[ 7620.364624][] arch_timer_handler_phys+0x38/0x48 <4>[ 7620.364628][] handle_percpu_devid_irq+0x9c/0x190 <4>[ 7620.364632][] generic_handle_irq+0x40/0x58 <4>[ 7620.364635][] __handle_domain_irq+0x68/0xc0 <4>[ 7620.364638][] gic_handle_irq+0xc4/0x1c8 <4>[ 7620.364641]Exception stack(0xffffffc0309b3640 to 0xffffffc0309b3770) <4>[ 7620.364644]3640: 0000000000001000 0000000000000000 ffffffc0309b37c0 ffffffbfa1019cf8 <4>[ 7620.364647]3660: 0000000080000145 ffffffc0309b3958 0000000000000000 ffffffbfa1013008 <4>[ 7620.364651]3680: 00000000000007f0 ffffffbfa131b770 ffffffd08aaadc40 ffffffbfa1019cf8 <4>[ 7620.364654]36a0: ffffffbfa1019cc4 ffffffd089c2b000 ffffffd08eff8000 ffffffc0309b3958 <4>[ 7620.364656]36c0: ffffffbfa101c5c0 0000000000000000 0000000000000000 ffffffbfa101c66c <4>[ 7620.364659]36e0: 7f7f7f7f7f7f7f7f 0000000000000030 ffffffffffffffff ffff000000000000 <4>[ 7620.364662]3700: 0000000000000000 0000000000000000 ffffffc000393d58 0000007f794d67b0 <4>[ 7620.364665]3720: 0000007fe62215d0 ffffffc0309b3830 ffffffc00021d8e0 ffffffbfa1049b68 <4>[ 7620.364668]3740: ffffffc000697578 ffffffc0006974b8 ffffffc0309b3958 0000000000000000 <4>[ 7620.364670]3760: ffffffbfa1013008 00000000000007f0 <4>[ 7620.364673][] el1_irq+0x80/0x100 <4>[ 7620.364692][] fib6_walk+0x3c/0x70 [ipv6] <4>[ 7620.364710][] fib6_clean_tree+0x68/0x90 [ipv6] <4>[ 7620.364727][] __fib6_clean_all+0x88/0xc0 [ipv6] <4>[ 7620.364746][] fib6_clean_all+0x28/0x30 [ipv6] <4>[ 7620.364763][] rt6_ifdown+0x64/0x148 [ipv6] <4>[ 7620.364781][] addrconf_ifdown+0x68/0x540 [ipv6] <4>[ 7620.364798][] addrconf_notify+0xd0/0x8b8 [ipv6] <4>[ 7620.364801][] notifier_call_chain+0x5c/0xa0 <4>[ 7620.364804][] raw_notifier_call_chain+0x20/0x28 <4>[ 7620.364809][] call_netdevice_notifiers_info+0x4c/0x80 <4>[ 7620.364812][] dev_close_many+0xd0/0x138 <4>[ 7620.364821][] vlan_device_event+0x4a8/0x6a0 [8021q] <4>[ 7620.364824][] notifier_call_chain+0x5c/0xa0 <4>[ 7620.364827][] raw_notifier_call_chain+0x20/0x28 <4>[ 7620.364830][] call_netdevice_notifiers_info+0x4c/0x80 <4>[ 7620.364833][] __dev_notify_flags+0xb8/0xe0 <4>[ 7620.364836][] dev_change_flags+0x54/0x68 <4>[ 7620.364840][] devinet_ioctl+0x650/0x700 <4>[ 7620.364843][] inet_ioctl+0xa4/0xc8 <4>[ 7620.364847][] sock_do_ioctl+0x44/0x88 <4>[ 7620.364850][] sock_ioctl+0x23c/0x308 <4>[ 7620.364854][] do_vfs_ioctl+0x48c/0x620 <4>[ 7620.364857][] SyS_ioctl+0x94/0xa8 -- 1.9.0 =================================cut here======================================== It looks that the notifier_call_chain has to deal with too much handler, and will not feed the watchdog until finish the work, so add cond_resched() in the loops to fix this problem, and it will not panic again. Signed-off-by: Ding Tianhong --- kernel/notifier.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/notifier.c b/kernel/notifier.c index fd2c9ac..9c30411 100644 --- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -92,6 +92,8 @@ static int notifier_call_chain(struct notifier_block **nl, #endif ret = nb->notifier_call(nb, val, v); + cond_resched(); + if (nr_calls) (*nr_calls)++;