Message ID | 20240331182440.14477-1-kl@kl.wtf |
---|---|
State | New |
Headers | show |
Series | [v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor | expand |
Hi Kenny, Sorry for causing this regression and thank you for your fix. One small remark comment below. In the hope of getting this merged soon I'll prepare a v3 addressing this myself (keeping you as the author). On 3/31/24 8:24 PM, Kenny Levinsen wrote: > In af93a167eda9, i2c_hid_parse was changed to continue with reading the > report descriptor before waiting for reset to be acknowledged. > > This has lead to two regressions: > > 1. We fail to handle reset acknowledgement if it happens while reading > the report descriptor. The transfer sets I2C_HID_READ_PENDING, which > causes the IRQ handler to return without doing anything. > > This affects both a Wacom touchscreen and a Sensel touchpad. > > 2. On a Sensel touchpad, reading the report descriptor this quickly > after reset results in all zeroes or partial zeroes. > > The issues were observed on the Lenovo Thinkpad Z16 Gen 2. > > The change in question was made based on a Microsoft article[0] stating > that Windows 8 *may* read the report descriptor in parallel with > awaiting reset acknowledgement, intended as a slight reset performance > optimization. Perhaps they only do this if reset is not completing > quickly enough for their tastes? > > As the code is not currently ready to read registers in parallel with a > pending reset acknowledgement, and as reading quickly breaks the report > descriptor on the Sensel touchpad, revert to waiting for reset > acknowledgement before proceeding to read the report descriptor. > > [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management > > Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor") > Signed-off-by: Kenny Levinsen <kl@kl.wtf> > --- > drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c > index 2df1ab3c31cc..72d2bccf5621 100644 > --- a/drivers/hid/i2c-hid/i2c-hid-core.c > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c > @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid) > mutex_lock(&ihid->reset_lock); > do { > ret = i2c_hid_start_hwreset(ihid); > - if (ret) > + if (ret == 0) > + ret = i2c_hid_finish_hwreset(ihid); > + else > msleep(1000); > } while (tries-- > 0 && ret); > + mutex_unlock(&ihid->reset_lock); > > if (ret) > goto abort_reset; The abort_reset label here and in other places now is no longer necessary. i2c_hid_start_hwreset() (on error) and i2c_hid_finish_hwreset() (regardless of error or not) always clear I2C_HID_RESET_PENDING. And we only do "goto abort_reset;" here and in 2 other places below in a "if (ret) {}" branch, and abort_reset itself is: abort_reset: clear_bit(I2C_HID_RESET_PENDING, &ihid->flags); if (ret) goto out; Since the reset loop now always exits with I2C_HID_RESET_PENDING cleared, the clear_bit() is not necessary after your changes and ret != 0 is always true when doing goto abort_reset so "goto abort_reset" can be replaced with "goto out" or if there is nothing to cleanup with a simple "return ret". As mentioned above I'll post a v3 with this addressed myself, so that we can hopefully get the fix upstream soonest. Regards, Hans > @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid) > } > } > > - /* > - * Windows directly reads the report-descriptor after sending reset > - * and then waits for resets completion afterwards. Some touchpads > - * actually wait for the report-descriptor to be read before signalling > - * reset completion. > - */ > - ret = i2c_hid_finish_hwreset(ihid); > abort_reset: > clear_bit(I2C_HID_RESET_PENDING, &ihid->flags); > - mutex_unlock(&ihid->reset_lock); > if (ret) > goto out; >
On 4/2/24 1:06 PM, Hans de Goede wrote: > One small remark comment below. In the hope of getting this merged > soon I'll prepare a v3 addressing this myself (keeping you as the author). A-OK from my side, the abort_reset label is indeed redundant now. (The split between start and finish is also technically redundant when we always finish after start, but I wanted to keep the change minimal.)
On 31.03.24 20:24, Kenny Levinsen wrote: > In af93a167eda9, i2c_hid_parse was changed to continue with reading the > report descriptor before waiting for reset to be acknowledged. > > This has lead to two regressions: Lo! Jiri, Benjamin, quick question: is there a reason why this fix for a 6.8-rc1 regression after more than two and half weeks is not yet mainlined? Or is there some good reason why we should be should be extra cautious? Side note: I noticed this due to the tracking today, but I also saw a user that recently ran into the problem the quoted fix is supposed to resolve: https://social.lol/@major/112294923280815017 Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > 1. We fail to handle reset acknowledgement if it happens while reading > the report descriptor. The transfer sets I2C_HID_READ_PENDING, which > causes the IRQ handler to return without doing anything. > > This affects both a Wacom touchscreen and a Sensel touchpad. > > 2. On a Sensel touchpad, reading the report descriptor this quickly > after reset results in all zeroes or partial zeroes. > > The issues were observed on the Lenovo Thinkpad Z16 Gen 2. > > The change in question was made based on a Microsoft article[0] stating > that Windows 8 *may* read the report descriptor in parallel with > awaiting reset acknowledgement, intended as a slight reset performance > optimization. Perhaps they only do this if reset is not completing > quickly enough for their tastes? > > As the code is not currently ready to read registers in parallel with a > pending reset acknowledgement, and as reading quickly breaks the report > descriptor on the Sensel touchpad, revert to waiting for reset > acknowledgement before proceeding to read the report descriptor. > > [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management > > Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor") > Signed-off-by: Kenny Levinsen <kl@kl.wtf> > --- > drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c > index 2df1ab3c31cc..72d2bccf5621 100644 > --- a/drivers/hid/i2c-hid/i2c-hid-core.c > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c > @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid) > mutex_lock(&ihid->reset_lock); > do { > ret = i2c_hid_start_hwreset(ihid); > - if (ret) > + if (ret == 0) > + ret = i2c_hid_finish_hwreset(ihid); > + else > msleep(1000); > } while (tries-- > 0 && ret); > + mutex_unlock(&ihid->reset_lock); > > if (ret) > goto abort_reset; > @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid) > } > } > > - /* > - * Windows directly reads the report-descriptor after sending reset > - * and then waits for resets completion afterwards. Some touchpads > - * actually wait for the report-descriptor to be read before signalling > - * reset completion. > - */ > - ret = i2c_hid_finish_hwreset(ihid); > abort_reset: > clear_bit(I2C_HID_RESET_PENDING, &ihid->flags); > - mutex_unlock(&ihid->reset_lock); > if (ret) > goto out; >
On Mon, Apr 22, 2024 at 7:11 PM Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> wrote: > > On 31.03.24 20:24, Kenny Levinsen wrote: > > In af93a167eda9, i2c_hid_parse was changed to continue with reading the > > report descriptor before waiting for reset to be acknowledged. > > > > This has lead to two regressions: > > Lo! Jiri, Benjamin, quick question: is there a reason why this fix for a > 6.8-rc1 regression after more than two and half weeks is not yet > mainlined? Or is there some good reason why we should be should be extra > cautious? No special reasons I guess. Neither Jiri nor I have sent a HID update for this rc cycle, so it's still there, waiting to be pushed. I've been quite busy with BPF lately and dropped the ball slightly on the HID maintainer side, but I'm sure we'll send the PR to Linus this week or the next. Cheers, Benjamin > > > Side note: I noticed this due to the tracking today, but I also saw a > user that recently ran into the problem the quoted fix is supposed to > resolve: https://social.lol/@major/112294923280815017 > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke > > > 1. We fail to handle reset acknowledgement if it happens while reading > > the report descriptor. The transfer sets I2C_HID_READ_PENDING, which > > causes the IRQ handler to return without doing anything. > > > > This affects both a Wacom touchscreen and a Sensel touchpad. > > > > 2. On a Sensel touchpad, reading the report descriptor this quickly > > after reset results in all zeroes or partial zeroes. > > > > The issues were observed on the Lenovo Thinkpad Z16 Gen 2. > > > > The change in question was made based on a Microsoft article[0] stating > > that Windows 8 *may* read the report descriptor in parallel with > > awaiting reset acknowledgement, intended as a slight reset performance > > optimization. Perhaps they only do this if reset is not completing > > quickly enough for their tastes? > > > > As the code is not currently ready to read registers in parallel with a > > pending reset acknowledgement, and as reading quickly breaks the report > > descriptor on the Sensel touchpad, revert to waiting for reset > > acknowledgement before proceeding to read the report descriptor. > > > > [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management > > > > Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor") > > Signed-off-by: Kenny Levinsen <kl@kl.wtf> > > --- > > drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++--------- > > 1 file changed, 4 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c > > index 2df1ab3c31cc..72d2bccf5621 100644 > > --- a/drivers/hid/i2c-hid/i2c-hid-core.c > > +++ b/drivers/hid/i2c-hid/i2c-hid-core.c > > @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid) > > mutex_lock(&ihid->reset_lock); > > do { > > ret = i2c_hid_start_hwreset(ihid); > > - if (ret) > > + if (ret == 0) > > + ret = i2c_hid_finish_hwreset(ihid); > > + else > > msleep(1000); > > } while (tries-- > 0 && ret); > > + mutex_unlock(&ihid->reset_lock); > > > > if (ret) > > goto abort_reset; > > @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid) > > } > > } > > > > - /* > > - * Windows directly reads the report-descriptor after sending reset > > - * and then waits for resets completion afterwards. Some touchpads > > - * actually wait for the report-descriptor to be read before signalling > > - * reset completion. > > - */ > > - ret = i2c_hid_finish_hwreset(ihid); > > abort_reset: > > clear_bit(I2C_HID_RESET_PENDING, &ihid->flags); > > - mutex_unlock(&ihid->reset_lock); > > if (ret) > > goto out; > > >
Linus, On 23.04.24 16:59, Benjamin Tissoires wrote: > On Mon, Apr 22, 2024 at 7:11 PM Linux regression tracking (Thorsten > Leemhuis) <regressions@leemhuis.info> wrote: >> On 31.03.24 20:24, Kenny Levinsen wrote: > > [previous subject: [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor] > >>> In af93a167eda9, i2c_hid_parse was changed to continue with reading the >>> report descriptor before waiting for reset to be acknowledged. >>> >>> This has lead to two regressions: >> >> Lo! Jiri, Benjamin, quick question: is there a reason why this fix for a >> 6.8-rc1 regression after more than two and half weeks is not yet >> mainlined? Or is there some good reason why we should be should be extra >> cautious? > > No special reasons I guess. Neither Jiri nor I have sent a HID update > for this rc cycle, so it's still there, waiting to be pushed. > I've been quite busy with BPF lately and dropped the ball slightly on > the HID maintainer side, but I'm sure we'll send the PR to Linus this > week or the next. out of interest: what's your stance on regression fixes sitting in subsystem git trees for a week or longer before being mainlined? The quoted patch is such a case. It fixes a regression caused by a change that made it into 6.8-rc1, but the problem afaik was only reported on 2024-03-19, e.g. ~nine days after 6.8 was out[1]; Kenny, the author of the fix, apparently noticed and fixed the problem a bit later independently[2]. Jiri merged a newer version of the fix on 2024-04-03[3], which was included in -next a day later -- the Thursday before 6.9-rc3. The fix thus would even have gotten two days of testing in -next, if Benjamin or Jiri would have send it your way for that pre-release. But from Benjamin's statement quoted above it seems the fix might even make -rc6. That obviously heavily reduces the time the fix will be tested before 6.9 is released. It obviously also means that 6.8.y is as of now still unfixed, as the stable team usually only applies fixes once they landed in mainline. Which also means that even more people ran into the problem with 6.8.y[4] or mainline even after Jiri merged the patch into the hid tree -- and maybe some of those people wasted their time on a bisection only to find out that a fix exists. That sounds, ehh, sub-optimal to me. Which is why I wonder what's your stance here, as I encounter similar situations frequently[5] -- which sometimes is kinda demotivating. :-/ Ciao, Thorsten [1] https://lore.kernel.org/all/a587f3f3-e0d5-4779-80a4-a9f7110b0bd2@manjaro.org/ [2] https://lore.kernel.org/all/20240331132332.6694-1-kl@kl.wtf/ [3] https://lore.kernel.org/all/nycvar.YFH.7.76.2404031401411.20263@cbobk.fhfr.pm/ [4] https://social.lol/@major/112294923280815017 [5] This fix for example: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=afc89870ea677bd5a44516eb981f7a259b74280c Reports: https://lore.kernel.org/lkml/ZYhQ2-OnjDgoqjvt@wens.tw/ https://lore.kernel.org/lkml/1553a526-6f28-4a68-88a8-f35bd22d9894@linumiz.com/ > [...] >>> 1. We fail to handle reset acknowledgement if it happens while reading >>> the report descriptor. The transfer sets I2C_HID_READ_PENDING, which >>> causes the IRQ handler to return without doing anything. >>> >>> This affects both a Wacom touchscreen and a Sensel touchpad. >>> >>> 2. On a Sensel touchpad, reading the report descriptor this quickly >>> after reset results in all zeroes or partial zeroes. >>> >>> The issues were observed on the Lenovo Thinkpad Z16 Gen 2. >>> >>> The change in question was made based on a Microsoft article[0] stating >>> that Windows 8 *may* read the report descriptor in parallel with >>> awaiting reset acknowledgement, intended as a slight reset performance >>> optimization. Perhaps they only do this if reset is not completing >>> quickly enough for their tastes? >>> >>> As the code is not currently ready to read registers in parallel with a >>> pending reset acknowledgement, and as reading quickly breaks the report >>> descriptor on the Sensel touchpad, revert to waiting for reset >>> acknowledgement before proceeding to read the report descriptor. >>> >>> [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management >>> >>> Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor") >>> Signed-off-by: Kenny Levinsen <kl@kl.wtf> >>> --- >>> drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++--------- >>> 1 file changed, 4 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c >>> index 2df1ab3c31cc..72d2bccf5621 100644 >>> --- a/drivers/hid/i2c-hid/i2c-hid-core.c >>> +++ b/drivers/hid/i2c-hid/i2c-hid-core.c >>> @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid) >>> mutex_lock(&ihid->reset_lock); >>> do { >>> ret = i2c_hid_start_hwreset(ihid); >>> - if (ret) >>> + if (ret == 0) >>> + ret = i2c_hid_finish_hwreset(ihid); >>> + else >>> msleep(1000); >>> } while (tries-- > 0 && ret); >>> + mutex_unlock(&ihid->reset_lock); >>> >>> if (ret) >>> goto abort_reset; >>> @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid) >>> } >>> } >>> >>> - /* >>> - * Windows directly reads the report-descriptor after sending reset >>> - * and then waits for resets completion afterwards. Some touchpads >>> - * actually wait for the report-descriptor to be read before signalling >>> - * reset completion. >>> - */ >>> - ret = i2c_hid_finish_hwreset(ihid); >>> abort_reset: >>> clear_bit(I2C_HID_RESET_PENDING, &ihid->flags); >>> - mutex_unlock(&ihid->reset_lock); >>> if (ret) >>> goto out; >>> >> > > >
On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis <regressions@leemhuis.info> wrote: > > out of interest: what's your stance on regression fixes sitting in > subsystem git trees for a week or longer before being mainlined? Annoying, but probably depends on circumstances. The fact that it took a while to even be noticed presumably means it's not common or holding anything up. That said, th4e last HID pull I have is from March 14. If the issue is just that there's nothing else happening, I think people should just point me to the patch and say "can you apply this single fix?" Linus
On 24.04.24 20:53, Linus Torvalds wrote: > On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis > <regressions@leemhuis.info> wrote: >> >> out of interest: what's your stance on regression fixes sitting in >> subsystem git trees for a week or longer before being mainlined? > > Annoying, but probably depends on circumstances. The fact that it took > a while to even be noticed presumably means it's not common or holding > anything up. Well, I searched and found quite a few users that reported the problem: https://bbs.archlinux.org/viewtopic.php?id=293971 (at least 4 people) https://bbs.archlinux.org/viewtopic.php?id=293978 (2 people) https://bugzilla.redhat.com/show_bug.cgi?id=2271136 (1) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2061040 (1) https://forums.opensuse.org/t/no-touchpad-found-el-touchpad-a-veces-es-reconocido-por-el-sistema/174100 (1) https://oldos.me/@jay/112294956758222518 (1) There are also these two I mentioned earlier already: https://social.lol/@major/112294920993272987 (1) https://lore.kernel.org/all/9a880b2b-2a28-4647-9f0f-223f9976fdee@manjaro.org/ (1) Side note: there were more discussions about it here: https://forums.lenovo.com/t5/Fedora/PSA-Z16-Gen-2-touchpad-not-working-on-kernel-6-8/m-p/5299530 https://www.reddit.com/r/thinkpad/comments/1bwxwnr/review_thinkpad_z16_gen_2_with_arch_linux/ https://www.reddit.com/r/linuxhardware/comments/1bwxhwa/review_thinkpad_z16_gen_2_arch_linux/ And the arch linux wiki even documents a workaround: https://wiki.archlinux.org/title/Lenovo_ThinkPad_Z16_Gen_2#Initialization_failure Those are just the reports and discussions I found. And you know how it is: many people that struggle will never report a problem. IMHO this all casts a bad light on our "no regression" rule, as the fix is ready, just not mainlined and backported. And as I mentioned: I see similar situations all the time. That's why I made noise here. > That said, th4e last HID pull I have is from March 14. If the issue is > just that there's nothing else happening, I think people should just > point me to the patch and say "can you apply this single fix?" Then I'll likely do so in my regression reports more often. Is cherry picking from -next as easy for you? Maintainers sometimes improve small details when merging a fix, so it might be better to take fixes from there instead of pulling them from lore. Ciao, Thorsten P.S: Wondering if I should team up with the kernel package maintainers of Arch Linux, Fedora, and openSUSE and start a git tree based on the latest stable tree with additional fixes and reverts for regressions not yet fixed upstream...[1] But that feels kinda wrong: it IMHO would be better to resolve those problems quickly in the proper upstream trees. [1] yes, I'm fully aware that such a tree can only address some of the issues; but from what I see that already would make quite a difference.
On Apr 25 2024, Thorsten Leemhuis wrote: > On 24.04.24 20:53, Linus Torvalds wrote: > > On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis > > <regressions@leemhuis.info> wrote: > >> > >> out of interest: what's your stance on regression fixes sitting in > >> subsystem git trees for a week or longer before being mainlined? > > > > Annoying, but probably depends on circumstances. The fact that it took > > a while to even be noticed presumably means it's not common or holding > > anything up. > > Well, I searched and found quite a few users that reported the problem: > > https://bbs.archlinux.org/viewtopic.php?id=293971 (at least 4 people) > https://bbs.archlinux.org/viewtopic.php?id=293978 (2 people) > https://bugzilla.redhat.com/show_bug.cgi?id=2271136 (1) > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2061040 (1) > https://forums.opensuse.org/t/no-touchpad-found-el-touchpad-a-veces-es-reconocido-por-el-sistema/174100 (1) > https://oldos.me/@jay/112294956758222518 (1) > > There are also these two I mentioned earlier already: > https://social.lol/@major/112294920993272987 (1) > https://lore.kernel.org/all/9a880b2b-2a28-4647-9f0f-223f9976fdee@manjaro.org/ (1) > > Side note: there were more discussions about it here: > https://forums.lenovo.com/t5/Fedora/PSA-Z16-Gen-2-touchpad-not-working-on-kernel-6-8/m-p/5299530 > https://www.reddit.com/r/thinkpad/comments/1bwxwnr/review_thinkpad_z16_gen_2_with_arch_linux/ > https://www.reddit.com/r/linuxhardware/comments/1bwxhwa/review_thinkpad_z16_gen_2_arch_linux/ > > And the arch linux wiki even documents a workaround: > https://wiki.archlinux.org/title/Lenovo_ThinkPad_Z16_Gen_2#Initialization_failure > > Those are just the reports and discussions I found. And you know how > it is: many people that struggle will never report a problem. > short FYI, (I've Cc-ed you on the PR), but I just sent the PR for HID, which includes this fix. > > IMHO this all casts a bad light on our "no regression" rule, as the > fix is ready, just not mainlined and backported. And as I mentioned: > I see similar situations all the time. That's why I made noise here. > > > > That said, th4e last HID pull I have is from March 14. If the issue is > > just that there's nothing else happening, I think people should just > > point me to the patch and say "can you apply this single fix?" > > Then I'll likely do so in my regression reports more often. > > Is cherry picking from -next as easy for you? Maintainers sometimes > improve small details when merging a fix, so it might be better to > take fixes from there instead of pulling them from lore. Maybe one suggestion that might help to reduce these kind of situations in the future: can you configure your bot to notify the maintainers after a couple of days that the patch has been merged that it would be nice if they could send the PR to Linus? In this case I bet Jiri forgot to send it because he was overloaded and so was I. So a friendly reminder could make things go faster. And maybe, before sending the reminder, if you could also check that the target branch hasn't been touched in 2 days that would prevent annoyances when we just added a commit and want to let it stay in for-next for 24h before sending the full branch. > > Ciao, Thorsten > > P.S: Wondering if I should team up with the kernel package maintainers > of Arch Linux, Fedora, and openSUSE and start a git tree based on the > latest stable tree with additional fixes and reverts for regressions > not yet fixed upstream...[1] But that feels kinda wrong: it IMHO > would be better to resolve those problems quickly in the proper > upstream trees. I would also say that this is wrong. Unless all regressions go through your tree and you then send PR to Linus, you might quickly get overloaded because sometimes the fix can not be cherry-picked if there is one other change just before. However, do you have some kind of dashboard that you could share with the package maintainers? This way they could easily compare the not-yet applied fixes with their bugs and decide to backport them themselves. In other words: let others do the hard work, you are doing a lot already :) Anyway, I really think a friendly reminder would help makes things go faster. Something like "Hey, it seems that you applied a regression fix that I am currently tracking and that you haven't sent the PR to Linus yet. Could you please send it ASAP as we already have several users reporting the issue?". Cheers, Benjamin > > [1] yes, I'm fully aware that such a tree can only address some of the > issues; but from what I see that already would make quite a difference.
On 25.04.24 10:44, Benjamin Tissoires wrote: > On Apr 25 2024, Thorsten Leemhuis wrote: >> On 24.04.24 20:53, Linus Torvalds wrote: >>> On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis >>> <regressions@leemhuis.info> wrote: >> [...] >> And the arch linux wiki even documents a workaround: >> https://wiki.archlinux.org/title/Lenovo_ThinkPad_Z16_Gen_2#Initialization_failure >> >> Those are just the reports and discussions I found. And you know how >> it is: many people that struggle will never report a problem. > > short FYI, (I've Cc-ed you on the PR), but I just sent the PR for HID, > which includes this fix. Great, many thx. Saw it right after sending my mail... :-/ >> Is cherry picking from -next as easy for you? Maintainers sometimes >> improve small details when merging a fix, so it might be better to >> take fixes from there instead of pulling them from lore. > > Maybe one suggestion that might help to reduce these kind of situations > in the future: can you configure your bot to notify the maintainers > after a couple of days that the patch has been merged that it would be > nice if they could send the PR to Linus? Yes, that is an idea in the long run, but I'm not sure if it's wise now or later. People easily get annoyed by these mails (which I totally understand!) and then will start hating the bot or regression tracking in general. That's why I'm really careful here. There are also the subsystems that regularly flush their fixes shortly before a new -rc, so they likely never want to see such reminders. And sending them right after a new -rc is better than nothing, but not ideal either. IOW: it's complicated. :-/ > In this case I bet Jiri forgot to send it because he was overloaded and > so was I. Understood and no worries. But this became a good opportunity to raise the general problem, as that is something that bugs me. Sorry. Hope you don't mind to much that I used that chance. > So a friendly reminder could make things go faster. I'll already did this occasionally manually, but that of course does not scale. Sometimes I wonder if it would be more efficient for nearly all of us if subsystems just flushed their -fixes branch shortly before each new -rc, as Linus apparently is not bothered by PRs that contain just a change or two. But that of course creates work for each of the subsystem maintainers, unless they creates scripts to handle that work nearly for free (it seems to me the x86 folks have something like that). Of course that would mean... > And maybe, before sending the reminder, if you could also check that the > target branch hasn't been touched in 2 days that would prevent annoyances > when we just added a commit and want to let it stay in for-next for 24h > before sending the full branch. ...that nothing big or slightly dangerous should be merged to -fixes branches on Fridays. >> P.S: Wondering if I should team up with the kernel package maintainers >> of Arch Linux, Fedora, and openSUSE and start a git tree based on the >> latest stable tree with additional fixes and reverts for regressions >> not yet fixed upstream...[1] But that feels kinda wrong: it IMHO >> would be better to resolve those problems quickly in the proper >> upstream trees. > > I would also say that this is wrong. Unless all regressions go through > your tree and you then send PR to Linus, [...] Ohh, sorry, I was not clear here, as that would be totally wrong -- fixes definitely should go through the subsystems trees, as they have the knowledge and the infra to check them (hmm, maybe a dedicated tree might make sense for the smaller subsystems, but let's ignore that). What I meant was just a tree those distros could merge into their kernels to quickly resolve issues that upstream is slow to fix. But that obviously has downsides, too. And is yet more work. > However, do you have some kind of dashboard that you could share with > the package maintainers? This way they could easily compare the not-yet > applied fixes with their bugs and decide to backport them themselves. I have for the kernel overall, but nothing subsystem specific. But that is pretty high on my todo list, as... > In other words: let others do the hard work, you are doing a lot already ...I'm very well aware of this. :-/ Ciao, Thorsten
diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c index 2df1ab3c31cc..72d2bccf5621 100644 --- a/drivers/hid/i2c-hid/i2c-hid-core.c +++ b/drivers/hid/i2c-hid/i2c-hid-core.c @@ -735,9 +735,12 @@ static int i2c_hid_parse(struct hid_device *hid) mutex_lock(&ihid->reset_lock); do { ret = i2c_hid_start_hwreset(ihid); - if (ret) + if (ret == 0) + ret = i2c_hid_finish_hwreset(ihid); + else msleep(1000); } while (tries-- > 0 && ret); + mutex_unlock(&ihid->reset_lock); if (ret) goto abort_reset; @@ -767,16 +770,8 @@ static int i2c_hid_parse(struct hid_device *hid) } } - /* - * Windows directly reads the report-descriptor after sending reset - * and then waits for resets completion afterwards. Some touchpads - * actually wait for the report-descriptor to be read before signalling - * reset completion. - */ - ret = i2c_hid_finish_hwreset(ihid); abort_reset: clear_bit(I2C_HID_RESET_PENDING, &ihid->flags); - mutex_unlock(&ihid->reset_lock); if (ret) goto out;
In af93a167eda9, i2c_hid_parse was changed to continue with reading the report descriptor before waiting for reset to be acknowledged. This has lead to two regressions: 1. We fail to handle reset acknowledgement if it happens while reading the report descriptor. The transfer sets I2C_HID_READ_PENDING, which causes the IRQ handler to return without doing anything. This affects both a Wacom touchscreen and a Sensel touchpad. 2. On a Sensel touchpad, reading the report descriptor this quickly after reset results in all zeroes or partial zeroes. The issues were observed on the Lenovo Thinkpad Z16 Gen 2. The change in question was made based on a Microsoft article[0] stating that Windows 8 *may* read the report descriptor in parallel with awaiting reset acknowledgement, intended as a slight reset performance optimization. Perhaps they only do this if reset is not completing quickly enough for their tastes? As the code is not currently ready to read registers in parallel with a pending reset acknowledgement, and as reading quickly breaks the report descriptor on the Sensel touchpad, revert to waiting for reset acknowledgement before proceeding to read the report descriptor. [0]: https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management Fixes: af93a167eda9 ("HID: i2c-hid: Move i2c_hid_finish_hwreset() to after reading the report-descriptor") Signed-off-by: Kenny Levinsen <kl@kl.wtf> --- drivers/hid/i2c-hid/i2c-hid-core.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)