diff mbox series

HID: i2c-hid: fix handling of unpopulated devices

Message ID 20230918125851.310-1-johan+linaro@kernel.org
State Superseded
Headers show
Series HID: i2c-hid: fix handling of unpopulated devices | expand

Commit Message

Johan Hovold Sept. 18, 2023, 12:58 p.m. UTC
A recent commit reordered probe so that the interrupt line is now
requested before making sure that the device exists.

This breaks machines like the Lenovo ThinkPad X13s which rely on the
HID driver to probe second-source devices and only register the variant
that is actually populated. Specifically, the interrupt line may now
already be (temporarily) claimed when doing asynchronous probing of the
touchpad:

	genirq: Flags mismatch irq 191. 00082008 (hid-over-i2c) vs. 00082008 (hid-over-i2c)
	i2c_hid_of 21-0015: Could not register for hid-over-i2c interrupt, irq = 191, ret = -16
	i2c_hid_of: probe of 21-0015 failed with error -16

Fix this by restoring the old behaviour of first making sure the device
exists before requesting the interrupt line.

Note that something like this should probably be implemented also for
"panel followers", whose actual probe is currently effectively deferred
until the DRM panel is probed (e.g. by powering down the device after
making sure it exists and only then register it as a follower).

Fixes: 675cd877c952 ("HID: i2c-hid: Rearrange probe() to power things up later")
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
 drivers/hid/i2c-hid/i2c-hid-core.c | 142 ++++++++++++++++-------------
 1 file changed, 80 insertions(+), 62 deletions(-)

Comments

Doug Anderson Sept. 18, 2023, 3 p.m. UTC | #1
Hi,

On Mon, Sep 18, 2023 at 6:00 AM Johan Hovold <johan+linaro@kernel.org> wrote:
>
> A recent commit reordered probe so that the interrupt line is now
> requested before making sure that the device exists.
>
> This breaks machines like the Lenovo ThinkPad X13s which rely on the
> HID driver to probe second-source devices and only register the variant
> that is actually populated. Specifically, the interrupt line may now
> already be (temporarily) claimed when doing asynchronous probing of the
> touchpad:
>
>         genirq: Flags mismatch irq 191. 00082008 (hid-over-i2c) vs. 00082008 (hid-over-i2c)
>         i2c_hid_of 21-0015: Could not register for hid-over-i2c interrupt, irq = 191, ret = -16
>         i2c_hid_of: probe of 21-0015 failed with error -16
>
> Fix this by restoring the old behaviour of first making sure the device
> exists before requesting the interrupt line.
>
> Note that something like this should probably be implemented also for
> "panel followers", whose actual probe is currently effectively deferred
> until the DRM panel is probed (e.g. by powering down the device after
> making sure it exists and only then register it as a follower).
>
> Fixes: 675cd877c952 ("HID: i2c-hid: Rearrange probe() to power things up later")
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/hid/i2c-hid/i2c-hid-core.c | 142 ++++++++++++++++-------------
>  1 file changed, 80 insertions(+), 62 deletions(-)

Ugh, sorry for the regression. :( It actually turns out that I've been
digging into this same issue on a different device (see
mt8173-elm-hana). I hadn't realized that it was a regression caused by
my recent change, though.

I haven't yet reviewed your change in detail, but to me it seems like
at most a short term fix. Specifically, I think the way that this has
been working has been partially via hacks and partially via luck. Let
me explain...

Currently, to make this work the `sc8280xp-lenovo-thinkpad-x13s.dts`
file has a hack in it. You can see that the `tpad_default` pinctrl
entry has been moved up to the i2c bus level even though it doesn't
belong there (it should be in each trackpad). This is because,
otherwise, you would have run into similar type problems as the device
core would have failed to claim the pin for one of the devices.

Currently, we're getting a bit lucky with
`sc8280xp-lenovo-thinkpad-x13s.dts` that there are no other shared
resources between the two devices besides the interrupt. Specifically
a number of trackpads / touchscreens also have a "reset" GPIO that
needs to be power sequenced properly in order to talk to the
touchscreen. In this case we'll be stuck again because both instances
would need to grab the "reset" GPIO before being able to confirm if
the device is there.

This is an old problem. The first I remember running into it was back
in 2015 on rk3288-veryron-minnie. We had a downstream hack to make
this work with -EPROBE_DEFER. https://crrev.com/c/266224. By the time
we shipped, though, we decided not to do the 2nd sourcing. After that
I always NAKed HW designs like this, but I guess that didn't help with
Mediatek hardware I wasn't involved with. :( ...and, of course, it
didn't help with devices that aren't Chromebooks like the Thinkpad
X13S.

FWIW: as a short term solution, we ended up forcing synchronous probe
in <https://crrev.com/c/4857566>. This has some pretty serious boot
time implications, but it's also very simple.


I'm actively working on coming up with a better solution here. My
current thought is that that maybe we want to do:

1. Undo the hack in the device tree and have each "2nd source" have
its own pinctrl entry.

2. In core pinctrl / device probing code detect the pinctrl conflict
and only probe one of the devices at a time.

...that sounds like a nice/elegant solution and I'm trying to make it
work, though it does have some downsides. Namely:

a) It requires "dts" changes to work. Namely we've got to undo the
hack that pushed the pinctrl up to the controller level (or, in the
case of mt8173-elm-hana, that just totally skipped the "pinctrl" entry
altogether). Unfortunately those same "dts" changes will actually make
things _worse_ if you don't have the code change. :(

b) It only handles the case where the resources shared by 2nd sourcing
are expressed by pinctrl. In a practical sense this seems to be most
cases, but conceivably you could imagine running into this situation
with a non-pin-related shared resource.

c) To solve this in the core, we have to make sure we properly handle
(without hanging/failing) multiple partially-conflicting devices and
devices that might acquire resources in arbitrary orders.

Though the above solution detecting the pinctrl conflicts sounds
appealing and I'm currently working on prototyping it, I'm still not
100% convinced. I'm worried about the above downsides.


Personally, I feel like we could add information to the device tree
that would help us out. The question is: is this an abuse of device
tree for something that Linux ought to be able to figure out on its
own, or is it OK? To make it concrete, I was thinking about something
like this:

/ {
  tp_ex_group: trackpad-exclusion-group {
    members = <&tp1>, <&tp2>, <&tp3>;
  };
};

&i2c_bus {
  tp1: trackpad@10 {
    ...
    mutual-exclusion-group = <&tp_ex_group>;
  };
  tp2: trackpad@20 {
    ...
    mutual-exclusion-group = <&tp_ex_group>;
  };
  tp3: trackpad@30 {
    ...
    mutual-exclusion-group = <&tp_ex_group>;
  };
};

Then the device core would know not to probe devices in the same
"mutual-exclusion-group" at the same time.

If DT folks are OK with the "mutual-exclusion-group" idea then I'll
probably backburner my attempt to make this work on the pinctrl level
and go with that.
Johan Hovold Sept. 19, 2023, 7:07 a.m. UTC | #2
On Mon, Sep 18, 2023 at 08:00:15AM -0700, Doug Anderson wrote:
> On Mon, Sep 18, 2023 at 6:00 AM Johan Hovold <johan+linaro@kernel.org> wrote:
> > A recent commit reordered probe so that the interrupt line is now
> > requested before making sure that the device exists.
> >
> > This breaks machines like the Lenovo ThinkPad X13s which rely on the
> > HID driver to probe second-source devices and only register the variant
> > that is actually populated. Specifically, the interrupt line may now
> > already be (temporarily) claimed when doing asynchronous probing of the
> > touchpad:
> >
> >         genirq: Flags mismatch irq 191. 00082008 (hid-over-i2c) vs. 00082008 (hid-over-i2c)
> >         i2c_hid_of 21-0015: Could not register for hid-over-i2c interrupt, irq = 191, ret = -16
> >         i2c_hid_of: probe of 21-0015 failed with error -16
> >
> > Fix this by restoring the old behaviour of first making sure the device
> > exists before requesting the interrupt line.

> Ugh, sorry for the regression. :( It actually turns out that I've been
> digging into this same issue on a different device (see
> mt8173-elm-hana). I hadn't realized that it was a regression caused by
> my recent change, though.
> 
> I haven't yet reviewed your change in detail, but to me it seems like
> at most a short term fix. Specifically, I think the way that this has
> been working has been partially via hacks and partially via luck. Let
> me explain...
> 
> Currently, to make this work the `sc8280xp-lenovo-thinkpad-x13s.dts`
> file has a hack in it. You can see that the `tpad_default` pinctrl
> entry has been moved up to the i2c bus level even though it doesn't
> belong there (it should be in each trackpad). This is because,
> otherwise, you would have run into similar type problems as the device
> core would have failed to claim the pin for one of the devices.

Ḯ'm well aware of that and it was mentioned in the commit message for
4367d763698c ("arm64: dts: qcom: sc8280xp-x13s: enable alternate
touchpad") as well as discussed briefly with Rob here:

	https://lore.kernel.org/all/Y3teH14YduOQQkNn@hovoldconsulting.com/

> Currently, we're getting a bit lucky with
> `sc8280xp-lenovo-thinkpad-x13s.dts` that there are no other shared
> resources between the two devices besides the interrupt. Specifically
> a number of trackpads / touchscreens also have a "reset" GPIO that
> needs to be power sequenced properly in order to talk to the
> touchscreen. In this case we'll be stuck again because both instances
> would need to grab the "reset" GPIO before being able to confirm if
> the device is there.

Right, this will only work for fairly simple cases, but we do have a few
of those in tree since some years back.

> This is an old problem. The first I remember running into it was back
> in 2015 on rk3288-veryron-minnie. We had a downstream hack to make
> this work with -EPROBE_DEFER. https://crrev.com/c/266224. By the time
> we shipped, though, we decided not to do the 2nd sourcing. After that
> I always NAKed HW designs like this, but I guess that didn't help with
> Mediatek hardware I wasn't involved with. :( ...and, of course, it
> didn't help with devices that aren't Chromebooks like the Thinkpad
> X13S.
> 
> FWIW: as a short term solution, we ended up forcing synchronous probe
> in <https://crrev.com/c/4857566>. This has some pretty serious boot
> time implications, but it's also very simple.
> 
> 
> I'm actively working on coming up with a better solution here. My
> current thought is that that maybe we want to do:
> 
> 1. Undo the hack in the device tree and have each "2nd source" have
> its own pinctrl entry.
> 
> 2. In core pinctrl / device probing code detect the pinctrl conflict
> and only probe one of the devices at a time.
> 
> ...that sounds like a nice/elegant solution and I'm trying to make it
> work, though it does have some downsides. Namely:
> 
> a) It requires "dts" changes to work. Namely we've got to undo the
> hack that pushed the pinctrl up to the controller level (or, in the
> case of mt8173-elm-hana, that just totally skipped the "pinctrl" entry
> altogether). Unfortunately those same "dts" changes will actually make
> things _worse_ if you don't have the code change. :(

Right, a proper solution will likely require an updated DT.

> b) It only handles the case where the resources shared by 2nd sourcing
> are expressed by pinctrl. In a practical sense this seems to be most
> cases, but conceivably you could imagine running into this situation
> with a non-pin-related shared resource.

Indeed.

> c) To solve this in the core, we have to make sure we properly handle
> (without hanging/failing) multiple partially-conflicting devices and
> devices that might acquire resources in arbitrary orders.
> 
> Though the above solution detecting the pinctrl conflicts sounds
> appealing and I'm currently working on prototyping it, I'm still not
> 100% convinced. I'm worried about the above downsides.

Yes, I agree that we'd need to take a broader look at this and not just
focus on the immediate pinctrl issue.
 
> Personally, I feel like we could add information to the device tree
> that would help us out. The question is: is this an abuse of device
> tree for something that Linux ought to be able to figure out on its
> own, or is it OK? To make it concrete, I was thinking about something
> like this:
> 
> / {
>   tp_ex_group: trackpad-exclusion-group {
>     members = <&tp1>, <&tp2>, <&tp3>;
>   };
> };
> 
> &i2c_bus {
>   tp1: trackpad@10 {
>     ...
>     mutual-exclusion-group = <&tp_ex_group>;
>   };
>   tp2: trackpad@20 {
>     ...
>     mutual-exclusion-group = <&tp_ex_group>;
>   };
>   tp3: trackpad@30 {
>     ...
>     mutual-exclusion-group = <&tp_ex_group>;
>   };
> };
> 
> Then the device core would know not to probe devices in the same
> "mutual-exclusion-group" at the same time.
> 
> If DT folks are OK with the "mutual-exclusion-group" idea then I'll
> probably backburner my attempt to make this work on the pinctrl level
> and go with that.

I expressed something along these lines in the thread above:

	It seems we'd need some way to describe the devices as mutually
	exclusive...

but given that we had prior art for handling simple cases and due to
lack of time, I left it on the ever-growing todo list.

But regardless of what a long-term proper solution to this may look
like, we need to fix the regression in 6.6-rc1 by restoring the old
behaviour.

Johan
Doug Anderson Sept. 19, 2023, 6:15 p.m. UTC | #3
Hi,

On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <johan@kernel.org> wrote:
>
> > c) To solve this in the core, we have to make sure we properly handle
> > (without hanging/failing) multiple partially-conflicting devices and
> > devices that might acquire resources in arbitrary orders.
> >
> > Though the above solution detecting the pinctrl conflicts sounds
> > appealing and I'm currently working on prototyping it, I'm still not
> > 100% convinced. I'm worried about the above downsides.
>
> Yes, I agree that we'd need to take a broader look at this and not just
> focus on the immediate pinctrl issue.

OK. FWIW, I got blocked on trying to solve this in the core
automatically by just using the conflicting "pinctrl" entries. There
are probably some ways to get it solved, but none of them are easy.


> > Personally, I feel like we could add information to the device tree
> > that would help us out. The question is: is this an abuse of device
> > tree for something that Linux ought to be able to figure out on its
> > own, or is it OK? To make it concrete, I was thinking about something
> > like this:
> >
> > / {
> >   tp_ex_group: trackpad-exclusion-group {
> >     members = <&tp1>, <&tp2>, <&tp3>;
> >   };
> > };
> >
> > &i2c_bus {
> >   tp1: trackpad@10 {
> >     ...
> >     mutual-exclusion-group = <&tp_ex_group>;
> >   };
> >   tp2: trackpad@20 {
> >     ...
> >     mutual-exclusion-group = <&tp_ex_group>;
> >   };
> >   tp3: trackpad@30 {
> >     ...
> >     mutual-exclusion-group = <&tp_ex_group>;
> >   };
> > };
> >
> > Then the device core would know not to probe devices in the same
> > "mutual-exclusion-group" at the same time.
> >
> > If DT folks are OK with the "mutual-exclusion-group" idea then I'll
> > probably backburner my attempt to make this work on the pinctrl level
> > and go with that.
>
> I expressed something along these lines in the thread above:

I'm going to try coding up the above to see how it looks. Assuming
nothing comes up, I'll try to have something in the next few days.


>         It seems we'd need some way to describe the devices as mutually
>         exclusive...
>
> but given that we had prior art for handling simple cases and due to
> lack of time, I left it on the ever-growing todo list.
>
> But regardless of what a long-term proper solution to this may look
> like, we need to fix the regression in 6.6-rc1 by restoring the old
> behaviour.

OK, fair enough. I'll take a look at your patch, though I think the
person that really needs to approve it is Benjamin...

Style-wise, I will say that Benjamin really wanted to keep the "panel
follower" code out of the main probe routine. Some of my initial
patches adding "panel follower" looked more like the results after
your patch but Benjamin really wasn't happy until there were no
special cases for panel-followers in the main probe routine. This is
why the code is structured as it is.

Thinking that way, is there any reason you can't just move the
i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
could replace the call to enable_irq() with it and then remove the
`IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
wanted to use a 2nd source + the panel follower concept? Both devices
would probe, but only one of them would actually grab the interrupt
and only one of them would actually create real HID devices. We might
need to do some work to keep from trying again at every poweron of the
panel, but it would probably be workable? I think this would also be a
smaller change...

-Doug
Doug Anderson Sept. 20, 2023, 3:41 p.m. UTC | #4
Hi,

On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <johan@kernel.org> wrote:
>
> > > But regardless of what a long-term proper solution to this may look
> > > like, we need to fix the regression in 6.6-rc1 by restoring the old
> > > behaviour.
> >
> > OK, fair enough. I'll take a look at your patch, though I think the
> > person that really needs to approve it is Benjamin...
> >
> > Style-wise, I will say that Benjamin really wanted to keep the "panel
> > follower" code out of the main probe routine. Some of my initial
> > patches adding "panel follower" looked more like the results after
> > your patch but Benjamin really wasn't happy until there were no
> > special cases for panel-followers in the main probe routine. This is
> > why the code is structured as it is.
>
> Ok, I prefer not hiding away things like that as it obscures what's
> really going on, for example, in this case, that you register a device
> without really having probed it.

I can see your reasoning and I think that intuition is why the earlier
versions of my patches had explicit "panel follower" logic in probe.
However, Benjamin really liked the logic abstracted out.


> As I alluded to in the commit message, you probably want to be able to
> support second-source touchscreen panel followers as well at some point
> and then deferring checking whether device is populated until the panel
> is powered on is not going to work.

Yeah, I've been pondering this too. I _think_ it would work OK-ish if
both devices probed and then only one of the two would actually make
the sub-HID devices. So you'd actually see both devices succeed at
probing but only one of them would actually be functional. It's a bit
ugly, though. :(  Maybe marginally better would be if we could figure
out how to have the device which fails to get its interrupt later
unbind itself, if that's possible...

The only other thought I had would be to have the parent i2c bus
understand that it had children that were panel followers, which it
should be able to do by seeing the "panel" attribute in their device
tree. Then the i2c bus could itself register as a panel follower and
could wait to probe its children until they were powered on. This
could happen in the i2c core so we didn't have to add code like this
to all i2c bus drivers. ...and, if necessary, we could add this to
other busses like SPI. It feels a little awkward but could work.


> I skimmed the thread were you added this, but I'm not sure I saw any
> reason for why powering on the panel follower temporarily during probe
> would not work?

My first instinct says we can't do this, but let's think about it...

In general the "panel follower" API is designed to give all the
decision making about when to power things on and off to the panel
driver, which is controlled by DRM.

The reason for this is from experience I had when dealing with the
Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
on that panel tended to die if you didn't sequence it just right.
Specifically, if you were sending pixels to the panel and then stopped
then you absolutely needed to power the panel off and on again. Folks
I talked to even claimed that the panel was working "to spec" since,
in the "Power Sequencing" section of the eDP spec it clearly shows
that you _must_ turn the panel off and on again after you stop giving
it bits. ...this is despite the fact that no other panel I've worked
with cares. ;-)

On homestar, since we didn't have the "panel follower" API, we ended
up adding cost to the hardware and putting the panel and touchscreens
on different power rails. However, I wanted to make sure that if we
ran into a similar situation in the future (or maybe if we were trying
to make hardware work that we didn't have control over) that we could
solve it.

The other reason for giving full control to the panel driver is just
how userspace usually works. Right now userspace tends to power off
panels if they're not used (like if a lid is closed on a laptop) but
doesn't necessarily power off the touchscreen. Thus if the touchscreen
has the ability to keep things powered on then we'd never get to a low
power state.

The above all explains why panel followers like the touchscreen
shouldn't be able to keep power on. However, you are specifically
suggesting that we just turn the power on temporarily during probe. As
I think about that, it might be possible? I guess you'd have to
temporarily block DRM from changing the state of the panel while the
touchscreen is probing. Then if the panel was off then you'd turn it
on briefly, do your probe, and then turn it off again. If the panel
was on then by blocking DRM you'd ensure that it stayed on. I'm not
sure how palatable that would be or if there are any other tricky
parts I'm not thinking about.


> > Thinking that way, is there any reason you can't just move the
> > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > could replace the call to enable_irq() with it and then remove the
> > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > wanted to use a 2nd source + the panel follower concept? Both devices
> > would probe, but only one of them would actually grab the interrupt
> > and only one of them would actually create real HID devices. We might
> > need to do some work to keep from trying again at every poweron of the
> > panel, but it would probably be workable? I think this would also be a
> > smaller change...
>
> That was my first idea as well, but conceptually it is more correct to
> request resources at probe time and not at some later point when you can
> no longer fail probe.
>
> You'd also need to handle the fact that the interrupt may never have
> been requested when remove() is called, which adds unnecessary
> complexity.

I don't think it's a lot of complexity, is it? Just an extra "if" statement...

-Doug
Johan Hovold Sept. 22, 2023, 9:08 a.m. UTC | #5
On Wed, Sep 20, 2023 at 08:41:12AM -0700, Doug Anderson wrote:
> On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <johan@kernel.org> wrote:
> > On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <johan@kernel.org> wrote:

> > As I alluded to in the commit message, you probably want to be able to
> > support second-source touchscreen panel followers as well at some point
> > and then deferring checking whether device is populated until the panel
> > is powered on is not going to work.
> 
> Yeah, I've been pondering this too. I _think_ it would work OK-ish if
> both devices probed and then only one of the two would actually make
> the sub-HID devices. So you'd actually see both devices succeed at
> probing but only one of them would actually be functional. It's a bit
> ugly, though. :(  Maybe marginally better would be if we could figure
> out how to have the device which fails to get its interrupt later
> unbind itself, if that's possible...
> 
> The only other thought I had would be to have the parent i2c bus
> understand that it had children that were panel followers, which it
> should be able to do by seeing the "panel" attribute in their device
> tree. Then the i2c bus could itself register as a panel follower and
> could wait to probe its children until they were powered on. This
> could happen in the i2c core so we didn't have to add code like this
> to all i2c bus drivers. ...and, if necessary, we could add this to
> other busses like SPI. It feels a little awkward but could work.

There may be other device on the bus that have nothing to do with
panels, but I guess you mean that this would only apply to a subset of
the children. In any case, this feels like a hack and layering
violation.

> > I skimmed the thread were you added this, but I'm not sure I saw any
> > reason for why powering on the panel follower temporarily during probe
> > would not work?
> 
> My first instinct says we can't do this, but let's think about it...
> 
> In general the "panel follower" API is designed to give all the
> decision making about when to power things on and off to the panel
> driver, which is controlled by DRM.
> 
> The reason for this is from experience I had when dealing with the
> Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
> on that panel tended to die if you didn't sequence it just right.
> Specifically, if you were sending pixels to the panel and then stopped
> then you absolutely needed to power the panel off and on again. Folks
> I talked to even claimed that the panel was working "to spec" since,
> in the "Power Sequencing" section of the eDP spec it clearly shows
> that you _must_ turn the panel off and on again after you stop giving
> it bits. ...this is despite the fact that no other panel I've worked
> with cares. ;-)
> 
> On homestar, since we didn't have the "panel follower" API, we ended
> up adding cost to the hardware and putting the panel and touchscreens
> on different power rails. However, I wanted to make sure that if we
> ran into a similar situation in the future (or maybe if we were trying
> to make hardware work that we didn't have control over) that we could
> solve it.
> 
> The other reason for giving full control to the panel driver is just
> how userspace usually works. Right now userspace tends to power off
> panels if they're not used (like if a lid is closed on a laptop) but
> doesn't necessarily power off the touchscreen. Thus if the touchscreen
> has the ability to keep things powered on then we'd never get to a low
> power state.

Don't you need to keep the touchscreen powered to support wakeup events
(e.g. when not closing the lid)?

And if you close the lid with wakeup disabled, you should still be able
to power down the touchscreen as part of suspend, right?

> The above all explains why panel followers like the touchscreen
> shouldn't be able to keep power on. However, you are specifically
> suggesting that we just turn the power on temporarily during probe. As
> I think about that, it might be possible? I guess you'd have to
> temporarily block DRM from changing the state of the panel while the
> touchscreen is probing. Then if the panel was off then you'd turn it
> on briefly, do your probe, and then turn it off again. If the panel
> was on then by blocking DRM you'd ensure that it stayed on. I'm not
> sure how palatable that would be or if there are any other tricky
> parts I'm not thinking about.

As this would allow actually probing the touchscreen during probe(), as
the driver model expects, this seems like a better approach then
deferring probe and registration if it's at all doable.

> > > Thinking that way, is there any reason you can't just move the
> > > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > > could replace the call to enable_irq() with it and then remove the
> > > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > > wanted to use a 2nd source + the panel follower concept? Both devices
> > > would probe, but only one of them would actually grab the interrupt
> > > and only one of them would actually create real HID devices. We might
> > > need to do some work to keep from trying again at every poweron of the
> > > panel, but it would probably be workable? I think this would also be a
> > > smaller change...
> >
> > That was my first idea as well, but conceptually it is more correct to
> > request resources at probe time and not at some later point when you can
> > no longer fail probe.
> >
> > You'd also need to handle the fact that the interrupt may never have
> > been requested when remove() is called, which adds unnecessary
> > complexity.
> 
> I don't think it's a lot of complexity, is it? Just an extra "if" statement...

Well you'd need keep track of whether the interrupt has been requested
or not (and manage serialisation) yourself for a start.

But the main reason is still that requesting resources belongs in
probe() and should not be deferred to some later random time where you
cannot inform driver core of failures (e.g. for probe deferral if the
interrupt controller is not yet available).

Johan
Doug Anderson Sept. 22, 2023, 4:37 p.m. UTC | #6
Hi,

On Fri, Sep 22, 2023 at 2:08 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Wed, Sep 20, 2023 at 08:41:12AM -0700, Doug Anderson wrote:
> > On Wed, Sep 20, 2023 at 12:26 AM Johan Hovold <johan@kernel.org> wrote:
> > > On Tue, Sep 19, 2023 at 11:15:46AM -0700, Doug Anderson wrote:
> > > > On Tue, Sep 19, 2023 at 12:07 AM Johan Hovold <johan@kernel.org> wrote:
>
> > > As I alluded to in the commit message, you probably want to be able to
> > > support second-source touchscreen panel followers as well at some point
> > > and then deferring checking whether device is populated until the panel
> > > is powered on is not going to work.
> >
> > Yeah, I've been pondering this too. I _think_ it would work OK-ish if
> > both devices probed and then only one of the two would actually make
> > the sub-HID devices. So you'd actually see both devices succeed at
> > probing but only one of them would actually be functional. It's a bit
> > ugly, though. :(  Maybe marginally better would be if we could figure
> > out how to have the device which fails to get its interrupt later
> > unbind itself, if that's possible...
> >
> > The only other thought I had would be to have the parent i2c bus
> > understand that it had children that were panel followers, which it
> > should be able to do by seeing the "panel" attribute in their device
> > tree. Then the i2c bus could itself register as a panel follower and
> > could wait to probe its children until they were powered on. This
> > could happen in the i2c core so we didn't have to add code like this
> > to all i2c bus drivers. ...and, if necessary, we could add this to
> > other busses like SPI. It feels a little awkward but could work.
>
> There may be other device on the bus that have nothing to do with
> panels, but I guess you mean that this would only apply to a subset of
> the children. In any case, this feels like a hack and layering
> violation.

Right, the idea would be to only do this for the subset of children
that are panel followers.

It definitely doesn't seem ideal, but it also didn't seem too terrible to me.


> > > I skimmed the thread were you added this, but I'm not sure I saw any
> > > reason for why powering on the panel follower temporarily during probe
> > > would not work?
> >
> > My first instinct says we can't do this, but let's think about it...
> >
> > In general the "panel follower" API is designed to give all the
> > decision making about when to power things on and off to the panel
> > driver, which is controlled by DRM.
> >
> > The reason for this is from experience I had when dealing with the
> > Samsung ATNA33XC20 panel that's on "sc7180-trogdor-homestar". The TCON
> > on that panel tended to die if you didn't sequence it just right.
> > Specifically, if you were sending pixels to the panel and then stopped
> > then you absolutely needed to power the panel off and on again. Folks
> > I talked to even claimed that the panel was working "to spec" since,
> > in the "Power Sequencing" section of the eDP spec it clearly shows
> > that you _must_ turn the panel off and on again after you stop giving
> > it bits. ...this is despite the fact that no other panel I've worked
> > with cares. ;-)
> >
> > On homestar, since we didn't have the "panel follower" API, we ended
> > up adding cost to the hardware and putting the panel and touchscreens
> > on different power rails. However, I wanted to make sure that if we
> > ran into a similar situation in the future (or maybe if we were trying
> > to make hardware work that we didn't have control over) that we could
> > solve it.
> >
> > The other reason for giving full control to the panel driver is just
> > how userspace usually works. Right now userspace tends to power off
> > panels if they're not used (like if a lid is closed on a laptop) but
> > doesn't necessarily power off the touchscreen. Thus if the touchscreen
> > has the ability to keep things powered on then we'd never get to a low
> > power state.
>
> Don't you need to keep the touchscreen powered to support wakeup events
> (e.g. when not closing the lid)?

No. The only reason you'd use panel follower is if the hardware was
designed such that the touchscreen needed to be power sequenced with
the panel. If the touchscreen can stay powered when the panel is off
then it is, by definition, not a panel follower.

For a laptop I don't think most people expect the touchscreen to stay
powered when the screen is off. I certainly wouldn't expect it. If the
screen was off and I wanted to interact with the device, I would hit a
key on the keyboard or touch the trackpad. When the people designing
sc7180-trogdor chose to have the display and touchscreen share a power
rail they made a conscious choice that they didn't need the
touchscreen active when the screen was off.

For the other hardware I'm aware of that needs panel-follower there is
a single external chip on the board that handles driving the panel and
the touchscreen. The power sequencing requirements for this chip
simply don't allow the touchscreen to be powered on while the display
is off.

One use case where I could intuitively think I might touch a
touchscreen of a screen that was "off" would be a kiosk of some sort.
It would make sense there to have two power rails. ...or, I suppose,
userspace could just choose to turn the backlight off but keep the
screen (and touchscreen) powered.


> And if you close the lid with wakeup disabled, you should still be able
> to power down the touchscreen as part of suspend, right?
>
> > The above all explains why panel followers like the touchscreen
> > shouldn't be able to keep power on. However, you are specifically
> > suggesting that we just turn the power on temporarily during probe. As
> > I think about that, it might be possible? I guess you'd have to
> > temporarily block DRM from changing the state of the panel while the
> > touchscreen is probing. Then if the panel was off then you'd turn it
> > on briefly, do your probe, and then turn it off again. If the panel
> > was on then by blocking DRM you'd ensure that it stayed on. I'm not
> > sure how palatable that would be or if there are any other tricky
> > parts I'm not thinking about.
>
> As this would allow actually probing the touchscreen during probe(), as
> the driver model expects, this seems like a better approach then
> deferring probe and registration if it's at all doable.

Yeah, I don't 100% know if it's doable but it seems possible.
Certainly it's something for future investigation.

Luckily, at the moment anything I'm aware of that truly needs panel
follower also doesn't have multiple sources for a touchscreen.


> > > > Thinking that way, is there any reason you can't just move the
> > > > i2c_hid_init_irq() into __do_i2c_hid_core_initial_power_up()? You
> > > > could replace the call to enable_irq() with it and then remove the
> > > > `IRQF_NO_AUTOEN` flag? I think that would also solve the issue if you
> > > > wanted to use a 2nd source + the panel follower concept? Both devices
> > > > would probe, but only one of them would actually grab the interrupt
> > > > and only one of them would actually create real HID devices. We might
> > > > need to do some work to keep from trying again at every poweron of the
> > > > panel, but it would probably be workable? I think this would also be a
> > > > smaller change...
> > >
> > > That was my first idea as well, but conceptually it is more correct to
> > > request resources at probe time and not at some later point when you can
> > > no longer fail probe.
> > >
> > > You'd also need to handle the fact that the interrupt may never have
> > > been requested when remove() is called, which adds unnecessary
> > > complexity.
> >
> > I don't think it's a lot of complexity, is it? Just an extra "if" statement...
>
> Well you'd need keep track of whether the interrupt has been requested
> or not (and manage serialisation) yourself for a start.

Sure. So I guess an "if" test plus a boolean state variable. I still
don't think it's a lot of complexity.


> But the main reason is still that requesting resources belongs in
> probe() and should not be deferred to some later random time where you
> cannot inform driver core of failures (e.g. for probe deferral if the
> interrupt controller is not yet available).

OK, I guess the -EPROBE_DEFER is technically possible though probably
not likely in practice. ...so that's a good reason to make sure we
request the IRQ in probe even in the "panel follower" case. I still
beleive Benjamin would prefer that this was abstracted out and not in
the actual probe() routine, but I guess we can wait to hear from him.

One last idea I had while digging would be to wonder if we could
somehow solve this case with "IRQF_PROBE_SHARED". I guess that doesn't
work well together with "IRQF_NO_AUTOEN", but conceivably we could
have the interrupt handler return "IRQ_NONE" if the initial power up
never happened? I haven't spent much time poking with shared
interrupts though, so I don't know if there are other side effects...


-Doug
diff mbox series

Patch

diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c
index 9601c0605fd9..e66c058a4b00 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -998,45 +998,29 @@  static int i2c_hid_core_resume(struct i2c_hid *ihid)
 	return hid_driver_reset_resume(hid);
 }
 
-/**
- * __do_i2c_hid_core_initial_power_up() - First time power up of the i2c-hid device.
- * @ihid: The ihid object created during probe.
- *
- * This function is called at probe time.
- *
- * The initial power on is where we do some basic validation that the device
- * exists, where we fetch the HID descriptor, and where we create the actual
- * HID devices.
- *
- * Return: 0 or error code.
+/*
+ * Check that the device exists and parse the HID descriptor.
  */
-static int __do_i2c_hid_core_initial_power_up(struct i2c_hid *ihid)
+static int __i2c_hid_core_probe(struct i2c_hid *ihid)
 {
 	struct i2c_client *client = ihid->client;
 	struct hid_device *hid = ihid->hid;
 	int ret;
 
-	ret = i2c_hid_core_power_up(ihid);
-	if (ret)
-		return ret;
-
 	/* Make sure there is something at this address */
 	ret = i2c_smbus_read_byte(client);
 	if (ret < 0) {
 		i2c_hid_dbg(ihid, "nothing at this address: %d\n", ret);
-		ret = -ENXIO;
-		goto err;
+		return -ENXIO;
 	}
 
 	ret = i2c_hid_fetch_hid_descriptor(ihid);
 	if (ret < 0) {
 		dev_err(&client->dev,
 			"Failed to fetch the HID Descriptor\n");
-		goto err;
+		return ret;
 	}
 
-	enable_irq(client->irq);
-
 	hid->version = le16_to_cpu(ihid->hdesc.bcdVersion);
 	hid->vendor = le16_to_cpu(ihid->hdesc.wVendorID);
 	hid->product = le16_to_cpu(ihid->hdesc.wProductID);
@@ -1050,17 +1034,49 @@  static int __do_i2c_hid_core_initial_power_up(struct i2c_hid *ihid)
 
 	ihid->quirks = i2c_hid_lookup_quirk(hid->vendor, hid->product);
 
+	return 0;
+}
+
+static int i2c_hid_core_register_hid(struct i2c_hid *ihid)
+{
+	struct i2c_client *client = ihid->client;
+	struct hid_device *hid = ihid->hid;
+	int ret;
+
+	enable_irq(client->irq);
+
 	ret = hid_add_device(hid);
 	if (ret) {
 		if (ret != -ENODEV)
 			hid_err(client, "can't add hid device: %d\n", ret);
-		goto err;
+		disable_irq(client->irq);
+		return ret;
 	}
 
 	return 0;
+}
+
+static int i2c_hid_core_probe_panel_follower(struct i2c_hid *ihid)
+{
+	int ret;
 
-err:
+	ret = i2c_hid_core_power_up(ihid);
+	if (ret)
+		return ret;
+
+	ret = __i2c_hid_core_probe(ihid);
+	if (ret)
+		goto err_power_down;
+
+	ret = i2c_hid_core_register_hid(ihid);
+	if (ret)
+		goto err_power_down;
+
+	return 0;
+
+err_power_down:
 	i2c_hid_core_power_down(ihid);
+
 	return ret;
 }
 
@@ -1077,7 +1093,7 @@  static void ihid_core_panel_prepare_work(struct work_struct *work)
 	 * steps.
 	 */
 	if (!hid->version)
-		ret = __do_i2c_hid_core_initial_power_up(ihid);
+		ret = i2c_hid_core_probe_panel_follower(ihid);
 	else
 		ret = i2c_hid_core_resume(ihid);
 
@@ -1156,30 +1172,6 @@  static int i2c_hid_core_register_panel_follower(struct i2c_hid *ihid)
 	return 0;
 }
 
-static int i2c_hid_core_initial_power_up(struct i2c_hid *ihid)
-{
-	/*
-	 * If we're a panel follower, we'll register and do our initial power
-	 * up when the panel turns on; otherwise we do it right away.
-	 */
-	if (drm_is_panel_follower(&ihid->client->dev))
-		return i2c_hid_core_register_panel_follower(ihid);
-	else
-		return __do_i2c_hid_core_initial_power_up(ihid);
-}
-
-static void i2c_hid_core_final_power_down(struct i2c_hid *ihid)
-{
-	/*
-	 * If we're a follower, the act of unfollowing will cause us to be
-	 * powered down. Otherwise we need to manually do it.
-	 */
-	if (ihid->is_panel_follower)
-		drm_panel_remove_follower(&ihid->panel_follower);
-	else
-		i2c_hid_core_suspend(ihid, true);
-}
-
 int i2c_hid_core_probe(struct i2c_client *client, struct i2chid_ops *ops,
 		       u16 hid_descriptor_address, u32 quirks)
 {
@@ -1224,14 +1216,10 @@  int i2c_hid_core_probe(struct i2c_client *client, struct i2chid_ops *ops,
 		return ret;
 	device_enable_async_suspend(&client->dev);
 
-	ret = i2c_hid_init_irq(client);
-	if (ret < 0)
-		goto err_buffers_allocated;
-
 	hid = hid_allocate_device();
 	if (IS_ERR(hid)) {
 		ret = PTR_ERR(hid);
-		goto err_irq;
+		goto err_free_buffers;
 	}
 
 	ihid->hid = hid;
@@ -1242,19 +1230,42 @@  int i2c_hid_core_probe(struct i2c_client *client, struct i2chid_ops *ops,
 	hid->bus = BUS_I2C;
 	hid->initial_quirks = quirks;
 
-	ret = i2c_hid_core_initial_power_up(ihid);
+	/* Power on and probe unless device is a panel follower. */
+	if (!drm_is_panel_follower(&ihid->client->dev)) {
+		ret = i2c_hid_core_power_up(ihid);
+		if (ret < 0)
+			goto err_destroy_device;
+
+		ret = __i2c_hid_core_probe(ihid);
+		if (ret < 0)
+			goto err_power_down;
+	}
+
+	ret = i2c_hid_init_irq(client);
+	if (ret < 0)
+		goto err_power_down;
+
+	/*
+	 * If we're a panel follower, we'll register when the panel turns on;
+	 * otherwise we do it right away.
+	 */
+	if (drm_is_panel_follower(&ihid->client->dev))
+		ret = i2c_hid_core_register_panel_follower(ihid);
+	else
+		ret = i2c_hid_core_register_hid(ihid);
 	if (ret)
-		goto err_mem_free;
+		goto err_free_irq;
 
 	return 0;
 
-err_mem_free:
-	hid_destroy_device(hid);
-
-err_irq:
+err_free_irq:
 	free_irq(client->irq, ihid);
-
-err_buffers_allocated:
+err_power_down:
+	if (!drm_is_panel_follower(&ihid->client->dev))
+		i2c_hid_core_power_down(ihid);
+err_destroy_device:
+	hid_destroy_device(hid);
+err_free_buffers:
 	i2c_hid_free_buffers(ihid);
 
 	return ret;
@@ -1266,7 +1277,14 @@  void i2c_hid_core_remove(struct i2c_client *client)
 	struct i2c_hid *ihid = i2c_get_clientdata(client);
 	struct hid_device *hid;
 
-	i2c_hid_core_final_power_down(ihid);
+	/*
+	 * If we're a follower, the act of unfollowing will cause us to be
+	 * powered down. Otherwise we need to manually do it.
+	 */
+	if (ihid->is_panel_follower)
+		drm_panel_remove_follower(&ihid->panel_follower);
+	else
+		i2c_hid_core_suspend(ihid, true);
 
 	hid = ihid->hid;
 	hid_destroy_device(hid);