Message ID | 20210827180101.2330929-1-vladimir.oltean@nxp.com |
---|---|
State | New |
Headers | show |
Series | [net] net: dsa: mv88e6xxx: stop calling irq_domain_add_simple with the reg_lock held | expand |
On Fri, Aug 27, 2021 at 08:34:33PM +0200, Andrew Lunn wrote: > On Fri, Aug 27, 2021 at 09:01:01PM +0300, Vladimir Oltean wrote: > > The mv88e6xxx IRQ setup code has some pretty horrible locking patterns, > > and wrong. > > I agree about the patterns. But it has been lockdep clean, i spent a > while testing it, failed probes, unloads etc, and adding comments. > > I suspect it is now wrong because of core changes. It's true, it is lockdep-clean the way it is structured now, but I suspect that is purely by chance. I had to shift code around a bit to get lockdep to shout, my bad for not really mentioning it: I moved mv88e6xxx_mdios_register from mv88e6xxx_probe to mv88e6xxx_setup, all in all a relatively superficial change (I am trying to test something out), hence the reason why I did not believe it would make such a huge difference. I realize now it puts you in a bad light since it suggests you didn't test it with lockdep, and I apologize. > > Only hardware access should need the register lock, and this in itself > > is for the mv88e6xxx_smi_indirect_ops to work properly and nothing more, > > unless I'm misunderstanding something > > Historically, reg_lock has been used to serialize all access to the > hardware across entries points into the driver. Not everything takes > rtnl lock. Clearly, interrupts don't. I don't know if PTP takes it. In > the past there was been hwmon code, etc. The reg_lock is used to > serialize all this. The patterns of all entry points into the driver > taking the lock has in general worked well. Just interrupt code is a > pain. I empathize with working in the blind w.r.t. locking, when rtnl_mutex covers everything. As you point out, threaded interrupts do not the rtnl_lock, so that is a good opportunity to analyze what needs serialization, which I do not have on sja1105. Nonetheless, my experience is that hardware is a pretty parallel/reentrant beast, a "register lock" is almost always the wrong answer. IMHO, proper locking would need to find what are the sequences of SMI reads/writes that need to be indivisible, and only lock around those, and lock per topic if possible. > > Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") > > As i said, i suspect this is the wrong commit. You need to look at > changes to the interrupt core. There is even a danger that if this > gets backported too far, it could add deadlocks. Ok, retarget to "net-next"?
On Fri, Aug 27, 2021 at 10:04:14PM +0200, Andrew Lunn wrote: > > Ok, retarget to "net-next"? > > I would prefer to wait until you have finished your testing and have > something which builds upon it. If its not broken, don't fix it... So I'm not actually sure why lockdep only catches it when I move that code around. Anyways, there might not be anything that builds upon it, but I see the change as an improvement to the consistency of the locking order anyway, regardless of whether an automated validator catches it or not? I mean, extrapolating a bit, would you take rtnl_lock while you already hold reg_lock, even if only at probe time where there is no practical possibility to deadlock since the rtnl_lock would have nothing to do with mv88e6xxx netdevs?
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 272b0535d946..c9631302df0f 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -244,16 +244,19 @@ static const struct irq_domain_ops mv88e6xxx_g1_irq_domain_ops = { .xlate = irq_domain_xlate_twocell, }; -/* To be called with reg_lock held */ static void mv88e6xxx_g1_irq_free_common(struct mv88e6xxx_chip *chip) { int irq, virq; u16 mask; + mv88e6xxx_reg_lock(chip); + mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask); mask &= ~GENMASK(chip->g1_irq.nirqs, 0); mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask); + mv88e6xxx_reg_unlock(chip); + for (irq = 0; irq < chip->g1_irq.nirqs; irq++) { virq = irq_find_mapping(chip->g1_irq.domain, irq); irq_dispose_mapping(virq); @@ -270,9 +273,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip) */ free_irq(chip->irq, chip); - mv88e6xxx_reg_lock(chip); mv88e6xxx_g1_irq_free_common(chip); - mv88e6xxx_reg_unlock(chip); } static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) @@ -293,9 +294,11 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) chip->g1_irq.chip = mv88e6xxx_g1_irq_chip; chip->g1_irq.masked = ~0; + mv88e6xxx_reg_lock(chip); + err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask); if (err) - goto out_mapping; + goto out_unlock; mask &= ~GENMASK(chip->g1_irq.nirqs, 0); @@ -308,13 +311,17 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) if (err) goto out_disable; + mv88e6xxx_reg_unlock(chip); + return 0; out_disable: mask &= ~GENMASK(chip->g1_irq.nirqs, 0); mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask); -out_mapping: +out_unlock: + mv88e6xxx_reg_unlock(chip); + for (irq = 0; irq < 16; irq++) { virq = irq_find_mapping(chip->g1_irq.domain, irq); irq_dispose_mapping(virq); @@ -344,12 +351,10 @@ static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip *chip) snprintf(chip->irq_name, sizeof(chip->irq_name), "mv88e6xxx-%s", dev_name(chip->dev)); - mv88e6xxx_reg_unlock(chip); err = request_threaded_irq(chip->irq, NULL, mv88e6xxx_g1_irq_thread_fn, IRQF_ONESHOT | IRQF_SHARED, chip->irq_name, chip); - mv88e6xxx_reg_lock(chip); if (err) mv88e6xxx_g1_irq_free_common(chip); @@ -393,9 +398,7 @@ static void mv88e6xxx_irq_poll_free(struct mv88e6xxx_chip *chip) kthread_cancel_delayed_work_sync(&chip->irq_poll_work); kthread_destroy_worker(chip->kworker); - mv88e6xxx_reg_lock(chip); mv88e6xxx_g1_irq_free_common(chip); - mv88e6xxx_reg_unlock(chip); } static int mv88e6xxx_port_config_interface(struct mv88e6xxx_chip *chip, @@ -6286,12 +6289,10 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev) * the PHYs will link their interrupts to these interrupt * controllers */ - mv88e6xxx_reg_lock(chip); if (chip->irq > 0) err = mv88e6xxx_g1_irq_setup(chip); else err = mv88e6xxx_irq_poll_setup(chip); - mv88e6xxx_reg_unlock(chip); if (err) goto out;
The mv88e6xxx IRQ setup code has some pretty horrible locking patterns, and wrong. Lockdep warns that mv88e6xxx_probe calls irq_domain_create_simple with reg_lock held, because irq_domain_create_simple takes the irq_domain_mutex through its call to __irq_domain_add. But there also exists the reverse locking scheme: this driver implements struct irq_chip :: irq_bus_lock as being the same old mv88e6xxx_reg_lock. So there are code paths in the IRQ core, like this one: mv88e6xxx_mdio_register -> of_mdiobus_register -> fwnode_mdiobus_register_phy -> of_irq_get -> irq_create_of_mapping -> irq_create_fwspec_mapping -> irq_create_mapping_affinity -> irq_domain_associate <- this takes the &irq_domain_mutex -> mv88e6xxx_g2_irq_domain_map -> irq_set_chip_and_handler_name -> __irq_set_handler -> irq_get_desc_buslock -> mv88e6xxx_g2_irq_bus_lock <- this takes the reg_lock So there are at least in theory the premises of a deadlock, but in practice just an ugly antipattern. I've no idea why the reg_lock is taken so broadly just to temporarily drop around request_threaded_irq() - and why that in itself was not enough of an indication that something is wrong with this scheme. Only hardware access should need the register lock, and this in itself is for the mv88e6xxx_smi_indirect_ops to work properly and nothing more, unless I'm misunderstanding something - but if that's the case, I don't know why it isn't put inside mv88e6xxx_smi_{read,write} and instead it is left to bloat the code so much, and then have other more specific locks on top, rather than a single, giant "register" lock. Anyway... This scheme also makes life harder when considering that the current convention for mv88e6xxx_g1_irq_free_common is for the caller to take the mutex. This is just because the mutex is taken top-level in one of its 3 (indirect) callers, which is mv88e6xxx_g1_irq_setup_common. But since this patch is to drop the reg_lock from being taken top-level when we call mv88e6xxx_g1_irq_setup_common (or its poll alternative) and instead just circle the hardware reads/writes with it, then we can drop the locking requirement from mv88e6xxx_g1_irq_free_common too, and follow the exact same pattern there too: locks around hw reads/writes. Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/net/dsa/mv88e6xxx/chip.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)