Message ID | 57761E04.8030202@huawei.com |
---|---|
State | New |
Headers | show |
On 2016/7/1 15:57, Eric Dumazet wrote: > On Fri, 2016-07-01 at 15:38 +0800, Ding Tianhong wrote: > ... >> net/ipv6/addrconf.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >> index f555f4f..e294a3d 100644 >> --- a/net/ipv6/addrconf.c >> +++ b/net/ipv6/addrconf.c >> @@ -3284,6 +3284,12 @@ restart: >> spin_unlock_bh(&addrconf_hash_lock); >> } >> >> + /* >> + * It is safe here to schedule out to avoid softlocking if preempt >> + * is disabled. >> + */ >> + cond_resched(); >> + >> write_lock_bh(&idev->lock); >> >> addrconf_del_rs_timer(idev); > > Seeing you apparently cooked your patch against an old kernel (which > one ?) ... > > I tried vanilla net-next kernel, and apparently I could not trigger the > softlockup you mentioned. > > Are you sure current kernel has a bug to begin with ? > have you disable the preempt? The problem will disappear if you enable the preempt voluntary or preempt. CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set I test the 4.1 lts kernel and found this problem, and I didn't found any patch to fix this from linux 4.1, but I will try to test in 4.7 kernel version. Thanks Ding > Thanks. > > > > > . >
On 2016/7/1 16:23, Eric Dumazet wrote: > On Fri, 2016-07-01 at 16:10 +0800, Ding Tianhong wrote: >> On 2016/7/1 15:57, Eric Dumazet wrote: >>> On Fri, 2016-07-01 at 15:38 +0800, Ding Tianhong wrote: >>> ... >>>> net/ipv6/addrconf.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >>>> index f555f4f..e294a3d 100644 >>>> --- a/net/ipv6/addrconf.c >>>> +++ b/net/ipv6/addrconf.c >>>> @@ -3284,6 +3284,12 @@ restart: >>>> spin_unlock_bh(&addrconf_hash_lock); >>>> } >>>> >>>> + /* >>>> + * It is safe here to schedule out to avoid softlocking if preempt >>>> + * is disabled. >>>> + */ >>>> + cond_resched(); >>>> + >>>> write_lock_bh(&idev->lock); >>>> >>>> addrconf_del_rs_timer(idev); >>> >>> Seeing you apparently cooked your patch against an old kernel (which >>> one ?) ... >>> >>> I tried vanilla net-next kernel, and apparently I could not trigger the >>> softlockup you mentioned. >>> >>> Are you sure current kernel has a bug to begin with ? >>> >> have you disable the preempt? The problem will disappear if you enable the preempt voluntary or preempt. >> CONFIG_PREEMPT_NONE=y >> # CONFIG_PREEMPT_VOLUNTARY is not set >> # CONFIG_PREEMPT is not set >> >> I test the 4.1 lts kernel and found this problem, and I didn't found any patch to fix this from linux 4.1, but I will try to test in 4.7 kernel version. > > I usually do not have PREEMPT enabled in my kernels. > > $ grep PREEMPT .config > CONFIG_PREEMPT_NOTIFIERS=y > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set > > Also the whole script is quite fast on latest kernels. I am guessing you > are chasing an already fixed problem. > > Hi Eric: I had found out that the patch aaf92f(netfilter: conntrack: resched in nf_ct_iterate_cleanup) solve the problem, this patch add cond_sched() in the nf_ct_iterate_cleanup() which will be called in the net notifier chain every time, and I revert this patch at kernel 4.7-rc4 , it will panic for soft lockup, so I am not sure whether our patch is need, it looks like if I disable the CONFIG for netfilter that would register the nf_ct_iterate_cleanup as notifier, the problem still be exist. Thanks. Ding > > > . >
On 2016/7/6 16:44, Eric Dumazet wrote: > On Wed, 2016-07-06 at 16:15 +0800, Ding Tianhong wrote: >> Hi Eric: >> >> I had found out that the patch aaf92f(netfilter: conntrack: resched in >> nf_ct_iterate_cleanup) solve the problem, >> this patch add cond_sched() in the nf_ct_iterate_cleanup() which will >> be called in the net notifier chain every time, >> and I revert this patch at kernel 4.7-rc4 , it will panic for soft >> lockup, so I am not sure whether our patch is need, >> it looks like if I disable the CONFIG for netfilter that would >> register the nf_ct_iterate_cleanup as notifier, the problem still be >> exist. > > Well, I do not have conntrack on my kernels, and I can not reproduce the > issue. > > So I am guessing other patches also solved a scalability issue, between > 4.1 and 4.7 > > I am aware of something that David did for IPv4, but this might help as > well for IPv6. > > commit fbd40ea0180a2d328c5adc61414dc8bab9335ce2 > ipv4: Don't do expensive useless work during inetdev destroy. > Hi Eric: I check this patch: [root@localhost linux]# git name-rev fbd40ea0180a2d328c5adc61414dc8bab9335ce2 fbd40ea0180a2d328c5adc61414dc8bab9335ce2 tags/v4.6-rc1~91^2~63 So the kernel4.7-RC4 already has this patch, but it have no effort if I revert the commit aaf92f(netfilter: conntrack: resched in nf_ct_iterate_cleanup), So I don't think David's patch could fix this problem. Thanks Ding > > > >
=================================cut here======================================== It looks that the notifier_call_chain has to deal with too much handler, and will not feed the watchdog until finish the work, and the notifier_call_chain would call the ipv6_dev_notf several times and hold the cpu for a long time, so add cond_resched() in the ipv6_dev_notf in order to schedule out in the network notifiers to avoid softlocking to fix this problem. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> --- net/ipv6/addrconf.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index f555f4f..e294a3d 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -3284,6 +3284,12 @@ restart: spin_unlock_bh(&addrconf_hash_lock); } + /* + * It is safe here to schedule out to avoid softlocking if preempt + * is disabled. + */ + cond_resched(); + write_lock_bh(&idev->lock); addrconf_del_rs_timer(idev);