Message ID | 20201201214047.128948-1-Joseph.Huang@garmin.com |
---|---|
State | New |
Headers | show |
Series | bridge: Fix a deadlock when enabling multicast snooping | expand |
On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: > When enabling multicast snooping, bridge module deadlocks on multicast_lock > if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 > network. > > The deadlock was caused by the following sequence: While holding the lock, > br_multicast_open calls br_multicast_join_snoopers, which eventually causes > IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). > Since the destination Ethernet address is a multicast address, br_dev_xmit > feeds the packet back to the bridge via br_multicast_rcv, which in turn > calls br_multicast_add_group, which then deadlocks on multicast_lock. > > The fix is to move the call br_multicast_join_snoopers outside of the > critical section. This works since br_multicast_join_snoopers only deals > with IP and does not modify any multicast data structures of the bridge, > so there's no need to hold the lock. > > Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > > Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Nik, Linus - how does this one look?
On 03/12/2020 20:28, Jakub Kicinski wrote: > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: >> When enabling multicast snooping, bridge module deadlocks on multicast_lock >> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 >> network. >> >> The deadlock was caused by the following sequence: While holding the lock, >> br_multicast_open calls br_multicast_join_snoopers, which eventually causes >> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). >> Since the destination Ethernet address is a multicast address, br_dev_xmit >> feeds the packet back to the bridge via br_multicast_rcv, which in turn >> calls br_multicast_add_group, which then deadlocks on multicast_lock. >> >> The fix is to move the call br_multicast_join_snoopers outside of the >> critical section. This works since br_multicast_join_snoopers only deals >> with IP and does not modify any multicast data structures of the bridge, >> so there's no need to hold the lock. >> >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") >> >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> > > Nik, Linus - how does this one look? > Hi, Thanks, somehow I missed this one too. Need to check my email config. :) I believe I see how it can happen, although it's not straight-forward to follow. A selftest for this case would be great, and any traces (e.g. hung task) would help a lot as well. Correct me if I'm wrong but the sequence is something like: br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> igmp6_group_added -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode igmp6_join_group() -> igmp6_join_group -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge mcast processing code which uses the multicast_lock spinlock. Right? One question - shouldn't leaving have the same problem? I.e. br_multicast_toggle -> br_multicast_leave_snoopers -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> igmp6_group_dropped -> igmp6_leave_group -> MLDv1 mode && last reporter -> igmp6_send() ? I think it was saved by the fact that !br_opt_get(br, BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that because it's not really trivial to find out. :) Anyhow, the patch is fine as-is too: Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Thanks, Nik
> From: Nikolay Aleksandrov <nikolay@nvidia.com> > Sent: Thursday, December 3, 2020 3:47 PM > To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph > <Joseph.Huang@garmin.com> > Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller > <davem@davemloft.net>; bridge@lists.linux-foundation.org; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing > <linus.luessing@c0d3.blue> > Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast > snooping > > On 03/12/2020 20:28, Jakub Kicinski wrote: > > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: > >> When enabling multicast snooping, bridge module deadlocks on > >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing > >> querier on the same L2 network. > >> > >> The deadlock was caused by the following sequence: While holding the > >> lock, br_multicast_open calls br_multicast_join_snoopers, which > >> eventually causes IP stack to (attempt to) send out a Listener Report (in > igmp6_join_group). > >> Since the destination Ethernet address is a multicast address, > >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, > >> which in turn calls br_multicast_add_group, which then deadlocks on > multicast_lock. > >> > >> The fix is to move the call br_multicast_join_snoopers outside of the > >> critical section. This works since br_multicast_join_snoopers only > >> deals with IP and does not modify any multicast data structures of > >> the bridge, so there's no need to hold the lock. > >> > >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > >> > >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> > > > > Nik, Linus - how does this one look? > > > > Hi, > Thanks, somehow I missed this one too. Need to check my email config. :) I > believe I see how it can happen, although it's not straight-forward to follow. > A selftest for this case would be great, and any traces (e.g. hung task) would > help a lot as well. > Correct me if I'm wrong but the sequence is something like: > br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> > igmp6_group_added > -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode > -> igmp6_join_group() -> igmp6_join_group > -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge > -> mcast processing code > which uses the multicast_lock spinlock. Right? That is correct. Here's a stack trace from a typical run: echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping [ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU [ 936.152534] rcu: 0-....: (5594 ticks this GP) idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625 [ 936.162026] (t=5253 jiffies g=4205 q=12) [ 936.166041] Task dump for CPU 0: [ 936.169272] sh R running task 0 1315 1295 0x00000002 [ 936.176332] Call trace: [ 936.178797] dump_backtrace+0x0/0x140 [ 936.182469] show_stack+0x14/0x20 [ 936.185793] sched_show_task+0x108/0x138 [ 936.189727] dump_cpu_task+0x40/0x50 [ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0 [ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0 [ 936.201698] update_process_times+0x2c/0x68 [ 936.205893] tick_sched_handle.isra.0+0x30/0x50 [ 936.210432] tick_sched_timer+0x48/0x98 [ 936.214272] __hrtimer_run_queues+0x110/0x1b0 [ 936.218635] hrtimer_interrupt+0xe4/0x240 [ 936.222656] arch_timer_handler_phys+0x30/0x40 [ 936.227106] handle_percpu_devid_irq+0x80/0x140 [ 936.231654] generic_handle_irq+0x24/0x38 [ 936.235669] __handle_domain_irq+0x60/0xb8 [ 936.239774] gic_handle_irq+0x5c/0x148 [ 936.243535] el1_irq+0xb8/0x180 [ 936.246689] queued_spin_lock_slowpath+0x118/0x3b0 [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] > > One question - shouldn't leaving have the same problem? I.e. > br_multicast_toggle -> br_multicast_leave_snoopers > -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> > -> igmp6_group_dropped -> igmp6_leave_group -> > MLDv1 mode && last reporter -> igmp6_send() ? > > I think it was saved by the fact that !br_opt_get(br, > BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be > acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that > because it's not really trivial to find out. :) That's a really good point. Leave should have deadlocked as well, but when I tested the patch, I was able to turn on/off multicast snooping multiple times without any problem. Is it because this line in igmp6_leave_group? if (ma->mca_flags & MAF_LAST_REPORTER) igmp6_send(&ma->mca_addr, ma->idev->dev, ICMPV6_MGM_REDUCTION); Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called? > > Anyhow, the patch is fine as-is too: > Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> > > Thanks, > Nik Thanks, Joseph ________________________________ CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.
> From: Huang, Joseph > Sent: Thursday, December 3, 2020 4:53 PM > To: Nikolay Aleksandrov <nikolay@nvidia.com>; Jakub Kicinski > <kuba@kernel.org> > Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller > <davem@davemloft.net>; bridge@lists.linux-foundation.org; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing > <linus.luessing@c0d3.blue> > Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping > > > From: Nikolay Aleksandrov <nikolay@nvidia.com> > > Sent: Thursday, December 3, 2020 3:47 PM > > To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph > > <Joseph.Huang@garmin.com> > > Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller > > <davem@davemloft.net>; bridge@lists.linux-foundation.org; > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing > > <linus.luessing@c0d3.blue> > > Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast > > snooping > > > > On 03/12/2020 20:28, Jakub Kicinski wrote: > > > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: > > >> When enabling multicast snooping, bridge module deadlocks on > > >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing > > >> querier on the same L2 network. > > >> > > >> The deadlock was caused by the following sequence: While holding the > > >> lock, br_multicast_open calls br_multicast_join_snoopers, which > > >> eventually causes IP stack to (attempt to) send out a Listener Report (in > > igmp6_join_group). > > >> Since the destination Ethernet address is a multicast address, > > >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, > > >> which in turn calls br_multicast_add_group, which then deadlocks on > > multicast_lock. > > >> > > >> The fix is to move the call br_multicast_join_snoopers outside of the > > >> critical section. This works since br_multicast_join_snoopers only > > >> deals with IP and does not modify any multicast data structures of > > >> the bridge, so there's no need to hold the lock. > > >> > > >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > > >> > > >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> > > > > > > Nik, Linus - how does this one look? > > > > > > > Hi, > > Thanks, somehow I missed this one too. Need to check my email config. :) I > > believe I see how it can happen, although it's not straight-forward to follow. > > A selftest for this case would be great, and any traces (e.g. hung task) > would > > help a lot as well. > > Correct me if I'm wrong but the sequence is something like: > > br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> > > igmp6_group_added > > -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode > > -> igmp6_join_group() -> igmp6_join_group > > -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge > > -> mcast processing code > > which uses the multicast_lock spinlock. Right? > > That is correct. > > Here's a stack trace from a typical run: > > echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping > [ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU > [ 936.152534] rcu: 0-....: (5594 ticks this GP) > idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625 > [ 936.162026] (t=5253 jiffies g=4205 q=12) > [ 936.166041] Task dump for CPU 0: > [ 936.169272] sh R running task 0 1315 1295 0x00000002 > [ 936.176332] Call trace: > [ 936.178797] dump_backtrace+0x0/0x140 > [ 936.182469] show_stack+0x14/0x20 > [ 936.185793] sched_show_task+0x108/0x138 > [ 936.189727] dump_cpu_task+0x40/0x50 > [ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0 > [ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0 > [ 936.201698] update_process_times+0x2c/0x68 > [ 936.205893] tick_sched_handle.isra.0+0x30/0x50 > [ 936.210432] tick_sched_timer+0x48/0x98 > [ 936.214272] __hrtimer_run_queues+0x110/0x1b0 > [ 936.218635] hrtimer_interrupt+0xe4/0x240 > [ 936.222656] arch_timer_handler_phys+0x30/0x40 > [ 936.227106] handle_percpu_devid_irq+0x80/0x140 > [ 936.231654] generic_handle_irq+0x24/0x38 > [ 936.235669] __handle_domain_irq+0x60/0xb8 > [ 936.239774] gic_handle_irq+0x5c/0x148 > [ 936.243535] el1_irq+0xb8/0x180 > [ 936.246689] queued_spin_lock_slowpath+0x118/0x3b0 > [ 936.251495] _raw_spin_lock+0x5c/0x68 > [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] > [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] > [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] > [ 936.269689] dev_hard_start_xmit+0x94/0x158 > [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 > [ 936.277890] dev_queue_xmit+0x10/0x18 > [ 936.281563] neigh_resolve_output+0xec/0x198 > [ 936.285845] ip6_finish_output2+0x240/0x710 > [ 936.290039] __ip6_finish_output+0x130/0x170 > [ 936.294318] ip6_output+0x6c/0x1c8 > [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 > [ 936.301834] igmp6_send+0x358/0x558 > [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 > [ 936.309774] igmp6_group_added+0xfc/0x110 > [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 > [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 > [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] > [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] > > > > > > One question - shouldn't leaving have the same problem? I.e. > > br_multicast_toggle -> br_multicast_leave_snoopers > > -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> > > -> igmp6_group_dropped -> igmp6_leave_group -> > > MLDv1 mode && last reporter -> igmp6_send() ? > > > > I think it was saved by the fact that !br_opt_get(br, > > BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't > be > > acquired in the br_dev_xmit path? If so, I'd appreciate a comment about > that > > because it's not really trivial to find out. :) > > That's a really good point. Leave should have deadlocked as well, but when I > tested the patch, I was able to turn on/off multicast snooping multiple times > without any problem. > > Is it because this line in igmp6_leave_group? > > if (ma->mca_flags & MAF_LAST_REPORTER) > igmp6_send(&ma->mca_addr, ma->idev->dev, > ICMPV6_MGM_REDUCTION); > > Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called? > > > > > Anyhow, the patch is fine as-is too: > > Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> > > > > Thanks, > > Nik > > Thanks, > Joseph Would it be advisable if we move br_multicast_leave_snoopers out of the critical section as well? Even though I can't really verify that if this is helpful since I haven't seen it deadlock when disabling multicast snooping. Thanks, Joseph ________________________________ CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.
On 04/12/2020 00:42, Huang, Joseph wrote: >> From: Huang, Joseph >> Sent: Thursday, December 3, 2020 4:53 PM >> To: Nikolay Aleksandrov <nikolay@nvidia.com>; Jakub Kicinski >> <kuba@kernel.org> >> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller >> <davem@davemloft.net>; bridge@lists.linux-foundation.org; >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing >> <linus.luessing@c0d3.blue> >> Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping >> >>> From: Nikolay Aleksandrov <nikolay@nvidia.com> >>> Sent: Thursday, December 3, 2020 3:47 PM >>> To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph >>> <Joseph.Huang@garmin.com> >>> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller >>> <davem@davemloft.net>; bridge@lists.linux-foundation.org; >>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing >>> <linus.luessing@c0d3.blue> >>> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast >>> snooping >>> >>> On 03/12/2020 20:28, Jakub Kicinski wrote: >>>> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: >>>>> When enabling multicast snooping, bridge module deadlocks on >>>>> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing >>>>> querier on the same L2 network. >>>>> >>>>> The deadlock was caused by the following sequence: While holding the >>>>> lock, br_multicast_open calls br_multicast_join_snoopers, which >>>>> eventually causes IP stack to (attempt to) send out a Listener Report (in >>> igmp6_join_group). >>>>> Since the destination Ethernet address is a multicast address, >>>>> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, >>>>> which in turn calls br_multicast_add_group, which then deadlocks on >>> multicast_lock. >>>>> >>>>> The fix is to move the call br_multicast_join_snoopers outside of the >>>>> critical section. This works since br_multicast_join_snoopers only >>>>> deals with IP and does not modify any multicast data structures of >>>>> the bridge, so there's no need to hold the lock. >>>>> >>>>> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") >>>>> >>>>> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> >>>> >>>> Nik, Linus - how does this one look? >>>> >>> >>> Hi, >>> Thanks, somehow I missed this one too. Need to check my email config. :) I >>> believe I see how it can happen, although it's not straight-forward to follow. >>> A selftest for this case would be great, and any traces (e.g. hung task) >> would >>> help a lot as well. >>> Correct me if I'm wrong but the sequence is something like: >>> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> >>> igmp6_group_added >>> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode >>> -> igmp6_join_group() -> igmp6_join_group >>> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge >>> -> mcast processing code >>> which uses the multicast_lock spinlock. Right? >> >> That is correct. >> >> Here's a stack trace from a typical run: >> >> echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping >> [ 936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU >> [ 936.152534] rcu: 0-....: (5594 ticks this GP) >> idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625 >> [ 936.162026] (t=5253 jiffies g=4205 q=12) >> [ 936.166041] Task dump for CPU 0: >> [ 936.169272] sh R running task 0 1315 1295 0x00000002 >> [ 936.176332] Call trace: >> [ 936.178797] dump_backtrace+0x0/0x140 >> [ 936.182469] show_stack+0x14/0x20 >> [ 936.185793] sched_show_task+0x108/0x138 >> [ 936.189727] dump_cpu_task+0x40/0x50 >> [ 936.193313] rcu_dump_cpu_stacks+0x94/0xd0 >> [ 936.197420] rcu_sched_clock_irq+0x75c/0x9c0 >> [ 936.201698] update_process_times+0x2c/0x68 >> [ 936.205893] tick_sched_handle.isra.0+0x30/0x50 >> [ 936.210432] tick_sched_timer+0x48/0x98 >> [ 936.214272] __hrtimer_run_queues+0x110/0x1b0 >> [ 936.218635] hrtimer_interrupt+0xe4/0x240 >> [ 936.222656] arch_timer_handler_phys+0x30/0x40 >> [ 936.227106] handle_percpu_devid_irq+0x80/0x140 >> [ 936.231654] generic_handle_irq+0x24/0x38 >> [ 936.235669] __handle_domain_irq+0x60/0xb8 >> [ 936.239774] gic_handle_irq+0x5c/0x148 >> [ 936.243535] el1_irq+0xb8/0x180 >> [ 936.246689] queued_spin_lock_slowpath+0x118/0x3b0 >> [ 936.251495] _raw_spin_lock+0x5c/0x68 >> [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] >> [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] >> [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] >> [ 936.269689] dev_hard_start_xmit+0x94/0x158 >> [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 >> [ 936.277890] dev_queue_xmit+0x10/0x18 >> [ 936.281563] neigh_resolve_output+0xec/0x198 >> [ 936.285845] ip6_finish_output2+0x240/0x710 >> [ 936.290039] __ip6_finish_output+0x130/0x170 >> [ 936.294318] ip6_output+0x6c/0x1c8 >> [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 >> [ 936.301834] igmp6_send+0x358/0x558 >> [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 >> [ 936.309774] igmp6_group_added+0xfc/0x110 >> [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 >> [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 >> [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] >> [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] >> >> >>> >>> One question - shouldn't leaving have the same problem? I.e. >>> br_multicast_toggle -> br_multicast_leave_snoopers >>> -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> >>> -> igmp6_group_dropped -> igmp6_leave_group -> >>> MLDv1 mode && last reporter -> igmp6_send() ? >>> >>> I think it was saved by the fact that !br_opt_get(br, >>> BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't >> be >>> acquired in the br_dev_xmit path? If so, I'd appreciate a comment about >> that >>> because it's not really trivial to find out. :) >> >> That's a really good point. Leave should have deadlocked as well, but when I >> tested the patch, I was able to turn on/off multicast snooping multiple times >> without any problem. >> >> Is it because this line in igmp6_leave_group? >> >> if (ma->mca_flags & MAF_LAST_REPORTER) >> igmp6_send(&ma->mca_addr, ma->idev->dev, >> ICMPV6_MGM_REDUCTION); >> >> Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called? >> >>> >>> Anyhow, the patch is fine as-is too: >>> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> >>> >>> Thanks, >>> Nik >> >> Thanks, >> Joseph > > Would it be advisable if we move br_multicast_leave_snoopers out of the critical > section as well? Even though I can't really verify that if this is helpful since I haven't > seen it deadlock when disabling multicast snooping. > > Thanks, > Joseph > The reason we're not seeing a deadlock is because the multicast snooping bit is not set and the lock is never acquired when sending the packet through br_multicast_rcv(). Anyway I'd move it out for symmetry and it would be less confusing as you'd just have a standard if/else in the end. By the way now that I looked again the patch is not entirely correct as you'll do multiple joins/leaves on every br_multicast_toggle(), i.e. with the patch you bypass the check for the same value and actually try to change state each time (the if (!!br_opt_get(br, BROPT_MULTICAST_ENABLED) == !!val) check). We also lose the symmetry between br_dev_open and stop, and expose otherwise private multicast code, so I'd pull out the snoopers leave for br_dev_stop() as well. Please add a comment why it's needed, so we won't wonder about it later. And also include the trace in the commit message so we'd have it. Edit: I just tried it and it's very easy to reproduce, steps: 1. sysctl net.ipv6.conf.all.force_mld_version=1 2. have another querier 3. ip link set dev bridge type bridge mcast_snooping 0 && ip link set dev bridge type bridge mcast_snooping 1 < deadlock > Thanks, Nik
On 04/12/2020 23:39, Joseph Huang wrote: > When enabling multicast snooping, bridge module deadlocks on multicast_lock > if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 > network. > > The deadlock was caused by the following sequence: While holding the lock, > br_multicast_open calls br_multicast_join_snoopers, which eventually causes > IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). > Since the destination Ethernet address is a multicast address, br_dev_xmit > feeds the packet back to the bridge via br_multicast_rcv, which in turn > calls br_multicast_add_group, which then deadlocks on multicast_lock. > > The fix is to move the call br_multicast_join_snoopers outside of the > critical section. This works since br_multicast_join_snoopers only deals > with IP and does not modify any multicast data structures of the bridge, > so there's no need to hold the lock. > > Steps to reproduce: > 1. sysctl net.ipv6.conf.all.force_mld_version=1 > 2. have another querier > 3. ip link set dev bridge type bridge mcast_snooping 0 && \ > ip link set dev bridge type bridge mcast_snooping 1 < deadlock > > > A typical call trace looks like the following: > > [ 936.251495] _raw_spin_lock+0x5c/0x68 > [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] > [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] > [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] > [ 936.269689] dev_hard_start_xmit+0x94/0x158 > [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 > [ 936.277890] dev_queue_xmit+0x10/0x18 > [ 936.281563] neigh_resolve_output+0xec/0x198 > [ 936.285845] ip6_finish_output2+0x240/0x710 > [ 936.290039] __ip6_finish_output+0x130/0x170 > [ 936.294318] ip6_output+0x6c/0x1c8 > [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 > [ 936.301834] igmp6_send+0x358/0x558 > [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 > [ 936.309774] igmp6_group_added+0xfc/0x110 > [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 > [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 > [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] > [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] > > Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> > --- Hi, Thank you for fixing it up, a few minor nits below. Overall the patch looks good. > net/bridge/br_device.c | 6 ++++++ > net/bridge/br_multicast.c | 33 ++++++++++++++++++++++++--------- > net/bridge/br_private.h | 10 ++++++++++ > 3 files changed, 40 insertions(+), 9 deletions(-) > > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c > index 7730c8f3cb53..d3ea9d0779fb 100644 > --- a/net/bridge/br_device.c > +++ b/net/bridge/br_device.c > @@ -177,6 +177,9 @@ static int br_dev_open(struct net_device *dev) > br_stp_enable_bridge(br); > br_multicast_open(br); > > + if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) > + br_multicast_join_snoopers(br); > + > return 0; > } > > @@ -197,6 +200,9 @@ static int br_dev_stop(struct net_device *dev) > br_stp_disable_bridge(br); > br_multicast_stop(br); > > + if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) > + br_multicast_leave_snoopers(br); > + > netif_stop_queue(dev); > > return 0; > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c > index eae898c3cff7..426fe00db708 100644 > --- a/net/bridge/br_multicast.c > +++ b/net/bridge/br_multicast.c > @@ -3286,7 +3286,7 @@ static inline void br_ip6_multicast_join_snoopers(struct net_bridge *br) > } > #endif > > -static void br_multicast_join_snoopers(struct net_bridge *br) > +void br_multicast_join_snoopers(struct net_bridge *br) > { > br_ip4_multicast_join_snoopers(br); > br_ip6_multicast_join_snoopers(br); > @@ -3317,7 +3317,7 @@ static inline void br_ip6_multicast_leave_snoopers(struct net_bridge *br) > } > #endif > > -static void br_multicast_leave_snoopers(struct net_bridge *br) > +void br_multicast_leave_snoopers(struct net_bridge *br) > { > br_ip4_multicast_leave_snoopers(br); > br_ip6_multicast_leave_snoopers(br); > @@ -3336,9 +3336,6 @@ static void __br_multicast_open(struct net_bridge *br, > > void br_multicast_open(struct net_bridge *br) > { > - if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) > - br_multicast_join_snoopers(br); > - > __br_multicast_open(br, &br->ip4_own_query); > #if IS_ENABLED(CONFIG_IPV6) > __br_multicast_open(br, &br->ip6_own_query); > @@ -3354,9 +3351,6 @@ void br_multicast_stop(struct net_bridge *br) > del_timer_sync(&br->ip6_other_query.timer); > del_timer_sync(&br->ip6_own_query.timer); > #endif > - > - if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) > - br_multicast_leave_snoopers(br); > } > > void br_multicast_dev_del(struct net_bridge *br) > @@ -3487,6 +3481,8 @@ static void br_multicast_start_querier(struct net_bridge *br, > int br_multicast_toggle(struct net_bridge *br, unsigned long val) > { > struct net_bridge_port *port; > + bool join_snoopers = false; > + bool leave_snoopers = false; > We use reverse xmas tree order, longest to shortest, so these two have to be swapped, but one more related thing further below.. > spin_lock_bh(&br->multicast_lock); > if (!!br_opt_get(br, BROPT_MULTICAST_ENABLED) == !!val) > @@ -3495,7 +3491,7 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val) > br_mc_disabled_update(br->dev, val); > br_opt_toggle(br, BROPT_MULTICAST_ENABLED, !!val); > if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) { > - br_multicast_leave_snoopers(br); > + leave_snoopers = true; > goto unlock; > } > > @@ -3506,9 +3502,28 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val) > list_for_each_entry(port, &br->port_list, list) > __br_multicast_enable_port(port); > > + join_snoopers = true; > + > unlock: > spin_unlock_bh(&br->multicast_lock); > > + /* br_multicast_join_snoopers has the potential to cause > + * an MLD Report/Leave to be delivered to br_multicast_rcv, > + * which would in turn call br_multicast_add_group, which would > + * attempt to acquire multicast_lock. This function should be > + * called after the lock has been released to avoid deadlocks on > + * multicast_lock. > + * > + * br_multicast_leave_snoopers does not have the problem since > + * br_multicast_rcv first checks BROPT_MULTICAST_ENABLED, and > + * returns without calling br_multicast_ipv4/6_rcv if it's not > + * enabled. Moved both functions out just for symmetry. > + */ Nice comment, thanks! > + if (join_snoopers) > + br_multicast_join_snoopers(br); > + else if (leave_snoopers) > + br_multicast_leave_snoopers(br); If I'm not missing anything this can be just 1 bool like "change_snoopers" or something which if set to true will check BROPT_MULTICAST_ENABLED and act accordingly, i.e. if (change_snoopers) { if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) br_multicast_join_snoopers(br); else br_multicast_leave_snoopers(br); } This is not really something critical, just an observation. Up to your preference if you decide to leave it with 2 bools. :-) Cheers, Nik > + > return 0; > } > > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index 345118e35c42..8424464186a6 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -792,6 +792,8 @@ void br_multicast_del_port(struct net_bridge_port *port); > void br_multicast_enable_port(struct net_bridge_port *port); > void br_multicast_disable_port(struct net_bridge_port *port); > void br_multicast_init(struct net_bridge *br); > +void br_multicast_join_snoopers(struct net_bridge *br); > +void br_multicast_leave_snoopers(struct net_bridge *br); > void br_multicast_open(struct net_bridge *br); > void br_multicast_stop(struct net_bridge *br); > void br_multicast_dev_del(struct net_bridge *br); > @@ -969,6 +971,14 @@ static inline void br_multicast_init(struct net_bridge *br) > { > } > > +static inline void br_multicast_join_snoopers(struct net_bridge *br) > +{ > +} > + > +static inline void br_multicast_leave_snoopers(struct net_bridge *br) > +{ > +} > + > static inline void br_multicast_open(struct net_bridge *br) > { > } >
> > + if (join_snoopers) > > + br_multicast_join_snoopers(br); > > + else if (leave_snoopers) > > + br_multicast_leave_snoopers(br); > > If I'm not missing anything this can be just 1 bool like "change_snoopers" or > something which if set to true will check BROPT_MULTICAST_ENABLED and > act accordingly, i.e. > if (change_snoopers) { > if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) > br_multicast_join_snoopers(br); > else > br_multicast_leave_snoopers(br); } > > This is not really something critical, just an observation. Up to your > preference if you decide to leave it with 2 bools. :-) > > Cheers, > Nik I wasn't sure how expensive the call to br_opt_get is, so I used a bool for each. I just checked and it seems that br_opt_get is probably just as cheap as a bool, so I'll change it according to what you suggested here. Thanks for your comments!! Thanks, Joseph
On 05/12/2020 01:56, Joseph Huang wrote: > When enabling multicast snooping, bridge module deadlocks on multicast_lock > if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 > network. > > The deadlock was caused by the following sequence: While holding the lock, > br_multicast_open calls br_multicast_join_snoopers, which eventually causes > IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). > Since the destination Ethernet address is a multicast address, br_dev_xmit > feeds the packet back to the bridge via br_multicast_rcv, which in turn > calls br_multicast_add_group, which then deadlocks on multicast_lock. > > The fix is to move the call br_multicast_join_snoopers outside of the > critical section. This works since br_multicast_join_snoopers only deals > with IP and does not modify any multicast data structures of the bridge, > so there's no need to hold the lock. > > Steps to reproduce: > 1. sysctl net.ipv6.conf.all.force_mld_version=1 > 2. have another querier > 3. ip link set dev bridge type bridge mcast_snooping 0 && \ > ip link set dev bridge type bridge mcast_snooping 1 < deadlock > > > A typical call trace looks like the following: > > [ 936.251495] _raw_spin_lock+0x5c/0x68 > [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] > [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] > [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] > [ 936.269689] dev_hard_start_xmit+0x94/0x158 > [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 > [ 936.277890] dev_queue_xmit+0x10/0x18 > [ 936.281563] neigh_resolve_output+0xec/0x198 > [ 936.285845] ip6_finish_output2+0x240/0x710 > [ 936.290039] __ip6_finish_output+0x130/0x170 > [ 936.294318] ip6_output+0x6c/0x1c8 > [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 > [ 936.301834] igmp6_send+0x358/0x558 > [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 > [ 936.309774] igmp6_group_added+0xfc/0x110 > [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 > [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 > [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] > [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] > > Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> > --- > net/bridge/br_device.c | 6 ++++++ > net/bridge/br_multicast.c | 34 +++++++++++++++++++++++++--------- > net/bridge/br_private.h | 10 ++++++++++ > 3 files changed, 41 insertions(+), 9 deletions(-) > LGTM, thanks! Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
On Sat, 5 Dec 2020 10:56:45 +0200 Nikolay Aleksandrov wrote: > On 05/12/2020 01:56, Joseph Huang wrote: > > When enabling multicast snooping, bridge module deadlocks on multicast_lock > > if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 > > network. > > > > The deadlock was caused by the following sequence: While holding the lock, > > br_multicast_open calls br_multicast_join_snoopers, which eventually causes > > IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). > > Since the destination Ethernet address is a multicast address, br_dev_xmit > > feeds the packet back to the bridge via br_multicast_rcv, which in turn > > calls br_multicast_add_group, which then deadlocks on multicast_lock. > > > > The fix is to move the call br_multicast_join_snoopers outside of the > > critical section. This works since br_multicast_join_snoopers only deals > > with IP and does not modify any multicast data structures of the bridge, > > so there's no need to hold the lock. > > > > Steps to reproduce: > > 1. sysctl net.ipv6.conf.all.force_mld_version=1 > > 2. have another querier > > 3. ip link set dev bridge type bridge mcast_snooping 0 && \ > > ip link set dev bridge type bridge mcast_snooping 1 < deadlock > > > > > A typical call trace looks like the following: > > Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > > Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> > > LGTM, thanks! > Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Applied, thank you!
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 7730c8f3cb53..8d3ec29d3875 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -177,6 +177,9 @@ static int br_dev_open(struct net_device *dev) br_stp_enable_bridge(br); br_multicast_open(br); + if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) + br_multicast_join_snoopers(br); + return 0; } diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index eae898c3cff7..6c8d6b2eed91 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -3286,7 +3286,7 @@ static inline void br_ip6_multicast_join_snoopers(struct net_bridge *br) } #endif -static void br_multicast_join_snoopers(struct net_bridge *br) +void br_multicast_join_snoopers(struct net_bridge *br) { br_ip4_multicast_join_snoopers(br); br_ip6_multicast_join_snoopers(br); @@ -3336,9 +3336,6 @@ static void __br_multicast_open(struct net_bridge *br, void br_multicast_open(struct net_bridge *br) { - if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) - br_multicast_join_snoopers(br); - __br_multicast_open(br, &br->ip4_own_query); #if IS_ENABLED(CONFIG_IPV6) __br_multicast_open(br, &br->ip6_own_query); @@ -3509,6 +3506,9 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val) unlock: spin_unlock_bh(&br->multicast_lock); + if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) + br_multicast_join_snoopers(br); + return 0; } diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 345118e35c42..7085dd747130 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -792,6 +792,7 @@ void br_multicast_del_port(struct net_bridge_port *port); void br_multicast_enable_port(struct net_bridge_port *port); void br_multicast_disable_port(struct net_bridge_port *port); void br_multicast_init(struct net_bridge *br); +void br_multicast_join_snoopers(struct net_bridge *br); void br_multicast_open(struct net_bridge *br); void br_multicast_stop(struct net_bridge *br); void br_multicast_dev_del(struct net_bridge *br); @@ -969,6 +970,10 @@ static inline void br_multicast_init(struct net_bridge *br) { } +static inline void br_multicast_join_snoopers(struct net_bridge *br) +{ +} + static inline void br_multicast_open(struct net_bridge *br) { }
When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> --- net/bridge/br_device.c | 3 +++ net/bridge/br_multicast.c | 8 ++++---- net/bridge/br_private.h | 5 +++++ 3 files changed, 12 insertions(+), 4 deletions(-)