Message ID | 20211116093833.245542-1-shreeya.patel@collabora.com |
---|---|
State | Superseded |
Headers | show |
Series | [v4] gpio: Return EPROBE_DEFER if gc->to_irq is NULL | expand |
Shreeya Patel <shreeya.patel@collabora.com> writes: > We are racing the registering of .to_irq when probing the > i2c driver. This results in random failure of touchscreen > devices. > > Following errors could be seen in dmesg logs when gc->to_irq is NULL > > [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ > [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22 > > To avoid this situation, defer probing until to_irq is registered. > > This issue has been reported many times in past and people have been > using workarounds like changing the pinctrl_amd to built-in instead > of loading it as a module or by adding a softdep for pinctrl_amd into > the config file. > > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413 > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > Reported-by: kernel test robot <lkp@intel.com> > Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com> Hi guys, This seems to not have reached the Linus tree on 5.17. If I'm not mistaken, it also hasn't reached linux-next as of today. Is there anything I'm missing here? This is required to prevent spurious probe crashes of devices like this FocalTech touchscreen, FT3528, when using pinctrl-amd. We've been carrying it downstream for quite a while. Thanks,
On Thu, Feb 10, 2022 at 5:36 PM Gabriel Krisman Bertazi <krisman@collabora.com> wrote: > > Shreeya Patel <shreeya.patel@collabora.com> writes: > > > We are racing the registering of .to_irq when probing the > > i2c driver. This results in random failure of touchscreen > > devices. > > > > Following errors could be seen in dmesg logs when gc->to_irq is NULL > > > > [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ > > [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22 > > > > To avoid this situation, defer probing until to_irq is registered. > > > > This issue has been reported many times in past and people have been > > using workarounds like changing the pinctrl_amd to built-in instead > > of loading it as a module or by adding a softdep for pinctrl_amd into > > the config file. > > > > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413 > > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > > Reported-by: kernel test robot <lkp@intel.com> > > Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com> > > Hi guys, > > This seems to not have reached the Linus tree on 5.17. If I'm not > mistaken, it also hasn't reached linux-next as of today. Is there > anything I'm missing here? > > This is required to prevent spurious probe crashes of devices like this > FocalTech touchscreen, FT3528, when using pinctrl-amd. We've been > carrying it downstream for quite a while. > > Thanks, > > -- > Gabriel Krisman Bertazi Hi Gabriel! My email address changed in September, that's why I didn't see the email you sent in November to my old one. gpiod_to_irq() can be used in context other than driver probing, I'm worried existing users would not know how to handle it. Also: how come you can get the GPIO descriptor from the provider but its interrupts are not yet set up? Bart
Bartosz Golaszewski <brgl@bgdev.pl> writes: > My email address changed in September, that's why I didn't see the > email you sent in November to my old one. Hi Bart, thanks for the prompt reply and sorry for the wrong email address. > gpiod_to_irq() can be used in context other than driver probing, I'm > worried existing users would not know how to handle it. Also: how come > you can get the GPIO descriptor from the provider but its interrupts > are not yet set up? I'm definitely some context here, as its been quite a while. Shreeya, feel free to pitch in. :) This is one of the races we saw in gpiochip_add_irqchip, depending on the probe order. The gc is already visible while partially initialized, if pinctrl-amd hasn't been probed yet. Another device being probed can hit an -ENXIO here if to_irq is yet uninitialized or enter .to_irq() and oops. Shreeya's patch workarounds the first issue, but is not a solution for the second. There is another patch that has been flying around to address the Oops. https://lkml.org/lkml/2021/11/8/900 She's been working on a proper solution for that one, which might actually address this too and replace the current patch. Maybe you could help us get to a proper solution there? I'm quite unfamiliar with this code myself :)
On 11/02/22 6:56 am, Gabriel Krisman Bertazi wrote: > Bartosz Golaszewski <brgl@bgdev.pl> writes: > >> My email address changed in September, that's why I didn't see the >> email you sent in November to my old one. > Hi Bart, > > thanks for the prompt reply and sorry for the wrong email address. > >> gpiod_to_irq() can be used in context other than driver probing, I'm >> worried existing users would not know how to handle it. Also: how come >> you can get the GPIO descriptor from the provider but its interrupts >> are not yet set up? > I'm definitely some context here, as its been quite a while. > Shreeya, feel free to pitch in. :) Existing users will probably receive -ENXIO in case to_irq is not set and wasn't intended to be set. We are trying to solve the race which happens frequently in cases where I2C is set as built-in and pinctrl-amd is set as module. There is no dependency between I2C and pinctrl-amd, while pinctrl-amd is still trying to set the gc irq members through gpiochip_add_irqchip, I2C calls gpiod_to_irq() which leads to returning -ENXIO since gc->to_irq is still NULL There have also been cases where gc->to_irq is set successfully but other members are yet to be initalized by gpiochip_add_irqchip like the domain variable which is being used in .to_irq() and ultimately leads to a NULL pointer dereference as Gabriel mentioned. I am working on a fix which would use mutex to not let gc irq members be accessed until they all have been completely initialized. I2C calls gpiod_to_irq through the following stack trace kernel: Call Trace: kernel: gpiod_to_irq.cold+0x49/0x8f kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: driver_probe_device+0xe8/0x160 and pinctrl-amd makes gc visible through gpiochip_add_data_with_key() Thanks, Shreeya Patel > This is one of the races we saw in gpiochip_add_irqchip, depending on > the probe order. The gc is already visible while partially initialized, > if pinctrl-amd hasn't been probed yet. Another device being probed can > hit an -ENXIO here if to_irq is yet uninitialized or enter .to_irq() and > oops. Shreeya's patch workarounds the first issue, but is not a > solution for the second. > > There is another patch that has been flying around to address the Oops. > > https://lkml.org/lkml/2021/11/8/900 > > She's been working on a proper solution for that one, which might > actually address this too and replace the current patch. Maybe you > could help us get to a proper solution there? I'm quite unfamiliar with > this code myself :) >
On 10/02/22 11:30 pm, Bartosz Golaszewski wrote: > On Thu, Feb 10, 2022 at 5:36 PM Gabriel Krisman Bertazi > <krisman@collabora.com> wrote: >> Shreeya Patel <shreeya.patel@collabora.com> writes: >> >>> We are racing the registering of .to_irq when probing the >>> i2c driver. This results in random failure of touchscreen >>> devices. >>> >>> Following errors could be seen in dmesg logs when gc->to_irq is NULL >>> >>> [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ >>> [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22 >>> >>> To avoid this situation, defer probing until to_irq is registered. >>> >>> This issue has been reported many times in past and people have been >>> using workarounds like changing the pinctrl_amd to built-in instead >>> of loading it as a module or by adding a softdep for pinctrl_amd into >>> the config file. >>> >>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413 >>> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> >>> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> >>> Reported-by: kernel test robot <lkp@intel.com> >>> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com> >> Hi guys, >> >> This seems to not have reached the Linus tree on 5.17. If I'm not >> mistaken, it also hasn't reached linux-next as of today. Is there >> anything I'm missing here? >> >> This is required to prevent spurious probe crashes of devices like this >> FocalTech touchscreen, FT3528, when using pinctrl-amd. We've been >> carrying it downstream for quite a while. >> >> Thanks, >> >> -- >> Gabriel Krisman Bertazi > Hi Gabriel! > > My email address changed in September, that's why I didn't see the > email you sent in November to my old one. > > gpiod_to_irq() can be used in context other than driver probing, I'm > worried existing users would not know how to handle it. Also: how come > you can get the GPIO descriptor from the provider but its interrupts > are not yet set up? Hi Bartosz, We would be only changing the error code here for gpio driver race condition. Anything affected by this patch would have already been broken and might be returning -ENXIO. There is a theoretical chance that a driver exists which uses gpiod_to_irq outside of probe and affected by race condition. The more instructions between gpiod_get and gpiod_to_irq the smaller the chance of hitting the race condition though. Also anything hitting the race condition is broken right now and there hasn't been any issues reported so far. In any case that system is not fixed by this patch, it is still a good idea to apply this patch since the proper fix is a lot more invasive and probably might not be suitable for stable backporting. This minimal patch does not make things worse. I have sent a v5 for this patch with better commit message. Thanks, Shreeya Patel > > Bart >
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index abfbf546d159..7b3f7f4d1d06 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -3111,6 +3111,16 @@ int gpiod_to_irq(const struct gpio_desc *desc) return retirq; } +#ifdef CONFIG_GPIOLIB_IRQCHIP + if (gc->irq.chip) { + /* + * Avoid race condition with other code, which tries to lookup + * an IRQ before the irqchip has been properly registered, + * i.e. while gpiochip is still being brought up. + */ + return -EPROBE_DEFER; + } +#endif return -ENXIO; } EXPORT_SYMBOL_GPL(gpiod_to_irq);