diff mbox series

[RFC] of: device: Support 2nd sources of probeable but undiscoverable devices

Message ID 20230921102420.RFC.1.I9dddd99ccdca175e3ceb1b9fa1827df0928c5101@changeid
State New
Headers show
Series [RFC] of: device: Support 2nd sources of probeable but undiscoverable devices | expand

Commit Message

Doug Anderson Sept. 21, 2023, 5:24 p.m. UTC
Support for multiple "equivalent" sources for components (also known
as second sourcing components) is a standard practice that helps keep
cost down and also makes sure that if one component is unavailable due
to a shortage that we don't need to stop production for the whole
product.

Some components are very easy to second source. eMMC, for instance, is
fully discoverable and probable so you can stuff a wide variety of
similar eMMC chips on your board and things will work without a hitch.

Some components are more difficult to second source, specifically
because it's difficult for software to probe what component is present
on any given board. In cases like this software is provided
supplementary information to help it, like a GPIO strap or a SKU ID
programmed into an EEPROM. This helpful information can allow the
bootloader to select a different device tree. The various different
"SKUs" of different Chromebooks are examples of this.

Some components are somewhere in between. These in-between components
are the subject of this patch. Specifically, these components are
easily "probeable" but not easily "discoverable".

A good example of a probeable but undiscoverable device is an
i2c-connected touchscreen or trackpad. Two separate components may be
electrically compatible with each other and may have compatible power
sequencing requirements but may require different software. If
software is told about the different possible components (because it
can't discover them), it can safely probe them to figure out which
ones are present.

On systems using device tree, if we want to tell the OS about all of
the different components we need to list them all in the device
tree. This leads to a problem. The multiple sources for components
likely use the same resources (GPIOs, interrupts, regulators). If the
OS tries to probe all of these components at the same time then it
will detect a resource conflict and that's a fatal error.

The fact that Linux can't handle these probeable but undiscoverable
devices well has had a few consequences:
1. In some cases, we've abandoned the idea of second sourcing
   components for a given board, which increases cost / generates
   manufacturing headaches.
2. In some cases, we've been forced to add some sort of strapping /
   EEPROM to indicate which component is present. This adds difficulty
   to manufacturing / refurb processes.
3. In some cases, we've managed to make things work by the skin of our
   teeth through slightly hacky solutions. Specifically, if we remove
   the "pinctrl" entry from the various options then it won't
   conflict. Regulators inherently can have more than one consumer, so
   as long as there are no GPIOs involved in power sequencing and
   probing devices then things can work. This is how
   "sc8280xp-lenovo-thinkpad-x13s" works and also how
   "mt8173-elm-hana" works.

Let's attempt to do something better. Specifically, we'll allow
tagging nodes in the device tree as mutually exclusive from one
another. This says that only one of the components in this group is
present on any given board. To make it concrete, in my proposal this
looks like:

  / {
    tp_ex_group: trackpad-exclusion-group {
    };
  };

  &i2c_bus {
    tp1: trackpad@10 {
      ...
      mutual-exclusion-group = <&tp_ex_group>;
    };
    tp2: trackpad@20 {
      ...
      mutual-exclusion-group = <&tp_ex_group>;
    };
    tp3: trackpad@30 {
      ...
      mutual-exclusion-group = <&tp_ex_group>;
    };
  };

In Linux, we can make things work by simply only probing one of the
devices in the group at a time. We can make a mutex per group and
enforce locking that mutex around probe. If the first device that gets
the mutex fails to probe then it won't try again. If it succeeds then
it will acquire the shared resources and future devices (which we know
can't be present) will fail to get the shared resources. Future
patches could quiet down errors about failing to acquire shared
resources or failing to probe if a device is in a
mutual-exclusion-group.

A traditional response to a proposal to express this type of
information in the device tree is that it's a "hack" to work around
Linux's quirks and is not a proper hardware description.

One often proposed solution instead of this "hack" is that firmware
should be probing the hardware and it should ensure that the device
tree only expresses the hardware that's present. This has a few
serious downsides:
1. It slows down boot. Powering up a component to probe it could take
   hundreds of milliseconds and, in the bootloader, it can't be
   parallelized with anything else.
2. It adds complexity to firmware. By its nature, firmware is harder
   to update regularly and impossible to keep "lockstep" with the
   kernel. This leads to the general principle that if we can keep
   code out of firmware then we should.
3. Not all firmware can be updated. If a device originally shipped as
   a Windows laptop or an Android phone, the bootloader might not be
   open source and easy to update.

Another proposed solution instead of this "hack" is that Linux should
automagically handle this. The idea here is that during probe a device
should get its resources provisionally and not commit to them until
the probe is a success. While possible, this is difficult to implement
in a generic way across all possible resources.

Instead of thinking of this as a "hack", it doesn't seem too
unreasonable to think of this as a hardware description even if it's
an inexact one. We are describing that the hardware has one of N
different variants and we describe the non-discoverable properties of
those components.

For some prior discussions:
- We discussed a bit of this recently in a patch that Johan posted to
  make simple i2c-hid devices (those that don't need reset GPIOs) work
  again [1].
- Johan pointed to a previous discussion with Rob [2].
- Dmitry did some previous prototyping of trying to handle this
  automagically for GPIOs [3].

[1] https://lore.kernel.org/r/20230918125851.310-1-johan+linaro@kernel.org
[2] https://lore.kernel.org/r/Y3teH14YduOQQkNn@hovoldconsulting.com/
[3] https://crrev.com/c/461349

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I definitely understand that, if we decide to go this way, somewhere
in DT documentation we need to document it. However, I wasn't sure
where that should happen. I'd love advice!

 drivers/base/core.c       |  1 +
 drivers/base/dd.c         |  7 +++++
 drivers/of/device.c       | 54 +++++++++++++++++++++++++++++++++++++++
 include/linux/device.h    |  5 ++++
 include/linux/of_device.h |  6 +++++
 5 files changed, 73 insertions(+)

Comments

Doug Anderson Sept. 22, 2023, 5:40 p.m. UTC | #1
Hi,

On Fri, Sep 22, 2023 at 7:14 AM Rob Herring <robh+dt@kernel.org> wrote:
>
> > Let's attempt to do something better. Specifically, we'll allow
> > tagging nodes in the device tree as mutually exclusive from one
> > another. This says that only one of the components in this group is
> > present on any given board. To make it concrete, in my proposal this
> > looks like:
> >
> >   / {
> >     tp_ex_group: trackpad-exclusion-group {
> >     };
>
> Interesting way to just get a unique identifier. But it could be any
> phandle not used by another group. So just point all the devices in a
> group to one of the devices in the group.

Fair enough.


> >   &i2c_bus {
> >     tp1: trackpad@10 {
> >       ...
> >       mutual-exclusion-group = <&tp_ex_group>;
> >     };
> >     tp2: trackpad@20 {
> >       ...
> >       mutual-exclusion-group = <&tp_ex_group>;
> >     };
> >     tp3: trackpad@30 {
> >       ...
> >       mutual-exclusion-group = <&tp_ex_group>;
> >     };
> >   };
> >
> > In Linux, we can make things work by simply only probing one of the
> > devices in the group at a time. We can make a mutex per group and
> > enforce locking that mutex around probe. If the first device that gets
> > the mutex fails to probe then it won't try again. If it succeeds then
> > it will acquire the shared resources and future devices (which we know
> > can't be present) will fail to get the shared resources. Future
> > patches could quiet down errors about failing to acquire shared
> > resources or failing to probe if a device is in a
> > mutual-exclusion-group.
>
> This seems like overkill to me. Do we really need groups and a mutex
> for each group? Worst case is what? 2-3 groups of 2-3 devices?
> Instead, what about extending "status" with another value
> ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
> kernel would just ignore nodes with that status. Then we can process
> those nodes separately 1-by-1.

My worry here is that this has the potential to impact boot speed in a
non-trivial way. While trackpads and touchscreens _are_ probable,
their probe routines are often quite slow. This is even mentioned in
Dmitry's initial patches adding async probe to the kernel. See commit
765230b5f084 ("driver-core: add asynchronous probing support for
drivers") where he specifically brings up input devices as examples.

It wouldn't be absurd to have a system that has multiple sources for
both the trackpad and the touchscreen. If we have to probe each of
these one at a time then it could be slow. It would be quicker to be
able to probe the trackpads (one at a time) at the same time we're
probing the touchscreens (one at a time). Using the "fail-needs-probe"
doesn't provide information needed to know which devices conflict with
each other. IMO this is still better than nothing, but it worries me
to pick the less-expressive solution for the dts which means that the
information simply isn't there and the OS can't be made better later.

Thinking about this more, I guess even my proposed solution isn't
ideal for probe speed. Let's imagine that we had:

  &i2c_bus {
    tp1: trackpad@10 {
      compatible = "hid-over-i2c";
      reg = <0x10>;
      post-power-on-delay-ms = <200>;
      ...
      mutual-exclusion-group = <&tp1>;
    };
    tp2: trackpad@20 {
      compatible = "hid-over-i2c";
      reg = <0x20>;
      post-power-on-delay-ms = <200>;
      ...
      mutual-exclusion-group = <&tp1>;
    };
  };

With my solution, we'd power the first device up, wait 200 ms, then
check to see if anything acks an i2c xfer at address 0x10. If it
didn't, we'd power down. Then we'd power up the second device
(presumably the same power rail), wait 200 ms, and check to see if
anything acks an i2c xfer at 0x20. It would have been better to just
power up once, wait 200 ms, then check for a device at either 0x10 or
0x20.

I guess with more complex touchscreens this could be more important. I
don't know if we need to try to solve it at this point, but I guess I
could imagine a case where we truly need to take into account all
possible devices (maybe taking the maximum of delays?) to ensure we
don't violate power sequencing requirements for any of them while
probing.

That would lead me to suggest this:

  &i2c_bus {
    trackpad-prober {
      compatible = "mt8173-elm-hana-trackpad-prober";

      tp1: trackpad@10 {
        compatible = "hid-over-i2c";
        reg = <0x10>;
        ...
        post-power-on-delay-ms = <200>;
      };
      tp2: trackpad@20 {
        compatible = "hid-over-i2c";
        reg = <0x20>;
        ...
        post-power-on-delay-ms = <200>;
      };
    };
  };

...but I suspect that would be insta-NAKed because it's creating a
completely virtual device ("mt8173-elm-hana-trackpad-prober") in the
device tree. I don't know if there's something that's functionally
similar that would be OK?

-Doug
Mark Brown Sept. 22, 2023, 7:28 p.m. UTC | #2
On Fri, Sep 22, 2023 at 02:08:08PM -0500, Rob Herring wrote:

> You could always make the driver probe smarter where if your supply
> was already powered on, then don't delay. Then something else could
> ensure that the supply is enabled. I'm not sure if regulators have the
> same issue as clocks where the clock might be on from the bootloader,
> then a failed probe which gets then puts the clock turns it off.

That'll happen.
Doug Anderson Sept. 27, 2023, 11:50 p.m. UTC | #3
Hi,

On Tue, Sep 26, 2023 at 7:37 PM Jeff LaBundy <jeff@labundy.com> wrote:
>
> Hi Doug,
>
> On Fri, Sep 22, 2023 at 05:11:10PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Sep 22, 2023 at 12:08 PM Rob Herring <robh+dt@kernel.org> wrote:
> > >
> > > > > This seems like overkill to me. Do we really need groups and a mutex
> > > > > for each group? Worst case is what? 2-3 groups of 2-3 devices?
> > > > > Instead, what about extending "status" with another value
> > > > > ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
> > > > > kernel would just ignore nodes with that status. Then we can process
> > > > > those nodes separately 1-by-1.
> > > >
> > > > My worry here is that this has the potential to impact boot speed in a
> > > > non-trivial way. While trackpads and touchscreens _are_ probable,
> > > > their probe routines are often quite slow. This is even mentioned in
> > > > Dmitry's initial patches adding async probe to the kernel. See commit
> > > > 765230b5f084 ("driver-core: add asynchronous probing support for
> > > > drivers") where he specifically brings up input devices as examples.
>
> Ideally, all but one driver in a group should bail out of probe quickly if
> the device is not populated. If not, I would consider that to be a bug or at
> least room for improvement in that driver.
>
> The reason input devices can take a while to probe is because they may be
> loading FW over I2C or performing some sort of calibration procedure; only
> one driver in the group should get that far.

Hmm, that's not my experience. Specifically I've seen i2c-hid devices
whose datasheets say that you're not allowed to talk i2c to them at
all for hundreds of milliseconds after you power them on. See, for
instance, "i2c-hid-of-goodix.c" which has a "post_gpio_reset_delay_ms"
of 180 ms and "i2c-hid-of-elan.c" which has one of 300 ms.

As I understand it these touchscreens have firmware on them and that
firmware can take a while to boot. Until the firmware boots they won't
respond over i2c. This is simply not something that Linux can do
anything about.

About the best you could do would be to add a board-specific driver
that understood that it could power up the rails, wait the maximum
amount of time that all possible touchscreens might need, and then
look for i2c ACKs. I'm still hoping to hear from Rob about how I would
get a board-specific driver to load on a DT system so I can
investigate / prototype this.


> > > We could add information on the class of device. touchscreen and
> > > touchpad aliases or something.
> >
> > Ah, I see. So something like "fail-needs-probe-<class>". The
> > touchscreens could have "fail-needs-probe-touchscreen" and the
> > trackpads could have "fail-needs-probe-trackpad" ? That could work. In
> > theory that could fall back to the same solution of grabbing a mutex
> > based on the group ID...
> >
> > Also: if having the mutex in the "struct device" is seen as a bad
> > idea, it would also be easy to remove. __driver_probe_device() could
> > just make a call like "of_device_probe_start()" at the beginning that
> > locks the mutex and then "of_device_probe_end()" that unlocks it. Both
> > of those calls could easily lookup the mutex in a list, which would
> > get rid of the need to store it in the "struct device".
> >
> >
> > > > That would lead me to suggest this:
> > > >
> > > >   &i2c_bus {
> > > >     trackpad-prober {
> > > >       compatible = "mt8173-elm-hana-trackpad-prober";
> > > >
> > > >       tp1: trackpad@10 {
> > > >         compatible = "hid-over-i2c";
> > > >         reg = <0x10>;
> > > >         ...
> > > >         post-power-on-delay-ms = <200>;
> > > >       };
> > > >       tp2: trackpad@20 {
> > > >         compatible = "hid-over-i2c";
> > > >         reg = <0x20>;
> > > >         ...
> > > >         post-power-on-delay-ms = <200>;
> > > >       };
> > > >     };
> > > >   };
> > > >
> > > > ...but I suspect that would be insta-NAKed because it's creating a
> > > > completely virtual device ("mt8173-elm-hana-trackpad-prober") in the
> > > > device tree. I don't know if there's something that's functionally
> > > > similar that would be OK?
>
> This solution seems a bit confusing to me, and would require more edits
> to the dts each time a second source is added. It also means one would
> have to write a small platform driver for each group of devices, correct?

No matter what we need to add something to the dts each time a second
source is added, right?

While it's true that we'd end up with some extra drivers, if we do it
correctly we don't necessarily need a driver for each group of devices
nor even a driver per board. If several boards have very similar
probing requirements then, even if they have unique "compatible"
strings they could still end up using the same Linux driver.

I've actually been talking offline with folks on ChromeOS more about
this problem as well. Chen-Yu actually pointed at a patch series (that
never landed, I guess) that has some similar ideas [1]. I guess in
that case Hans was actually constructing device tree properties
manually in the driver. I was thinking more of having all of the
options listed in the device tree and then doing something that only
causes some of them to probe.

If Rob was OK with it, I guess I could have some sort of top-level
"hwmanager" node like Hans did and then have phandle links to all the
hardware that are managed by it. Then I could just change those to
"okay"?

Ideally, though, this could somehow use device tree "overlays" I
guess. That seems like almost a perfect fit. I guess the issue here,
though, is that I'd want the overlays bundled together with the
original DT and then the board-specific "hardware prober" driver to
actually apply the overlays after probing. Does that seem sensible?


[1] https://lore.kernel.org/linux-arm-kernel/20160901190820.21987-1-hdegoede@redhat.com/
Rob Herring Sept. 28, 2023, 8:11 p.m. UTC | #4
On Fri, Sep 22, 2023 at 7:11 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, Sep 22, 2023 at 12:08 PM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > > > This seems like overkill to me. Do we really need groups and a mutex
> > > > for each group? Worst case is what? 2-3 groups of 2-3 devices?
> > > > Instead, what about extending "status" with another value
> > > > ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
> > > > kernel would just ignore nodes with that status. Then we can process
> > > > those nodes separately 1-by-1.
> > >
> > > My worry here is that this has the potential to impact boot speed in a
> > > non-trivial way. While trackpads and touchscreens _are_ probable,
> > > their probe routines are often quite slow. This is even mentioned in
> > > Dmitry's initial patches adding async probe to the kernel. See commit
> > > 765230b5f084 ("driver-core: add asynchronous probing support for
> > > drivers") where he specifically brings up input devices as examples.
> >
> > Perhaps then this should be solved in userspace where it can learn
> > which device is actually present and save that information for
> > subsequent boots.
>
> Yeah, the thought occurred to me as well. I think there are a few
> problems, though:
>
> a) Userspace can't itself probe these devices effectively. While
> userspace could turn on GPIOs manually and query the i2c bus manually,
> it can't (I believe) turn on regulators nor can it turn on clocks, if
> they are needed. About the best userspace could do would be to blindly
> try binding an existing kernel driver, and in that case why did we
> need userspace involved anyway?
>
> b) While deferring to userspace can work for solutions like ChromeOS
> or Android where it's easy to ensure the userspace bits are there,
> it's less appealing as a general solution. I think in Johan's case
> he's taking a laptop that initially ran Windows and then is trying to
> run a generic Linux distro on it. For anyone in a similar situation,
> they'd either need to pick a Linux distro that has the magic userspace
> bits that are needed or they need to know that, on their laptop, they
> need to manually install some software. While requiring special
> userspace might make sense if you've got a special peripheral, like an
> LTE modem, it makes less sense to need special userspace just to get
> the right devices bound...

I did not mean do it all in userspace, but for userspace to save off
what devices are actually present. For example, if userspace has
access to the dtb, it could just update the dtb to enable the right
nodes. Then after the first boot, boot time is faster. Or a driver
could try to load an overlay with the config enabling the right
devices. Though maybe waiting til userspace is available wouldn't
speed things up.

> > > It wouldn't be absurd to have a system that has multiple sources for
> > > both the trackpad and the touchscreen. If we have to probe each of
> > > these one at a time then it could be slow. It would be quicker to be
> > > able to probe the trackpads (one at a time) at the same time we're
> > > probing the touchscreens (one at a time). Using the "fail-needs-probe"
> > > doesn't provide information needed to know which devices conflict with
> > > each other.
> >
> > I would guess most of the time that's pretty evident. They are going
> > to be on the same bus/link. If unrelated devices are on the same bus,
> > then that's going to get serialized anyways (if bus accesses are what
> > make things slow).
> >
> > We could add information on the class of device. touchscreen and
> > touchpad aliases or something.
>
> Ah, I see. So something like "fail-needs-probe-<class>". The
> touchscreens could have "fail-needs-probe-touchscreen" and the
> trackpads could have "fail-needs-probe-trackpad" ? That could work. In
> theory that could fall back to the same solution of grabbing a mutex
> based on the group ID...

I would not combine the 2 things. Knowing the class/type of the device
may be useful independent of your problem.

> Also: if having the mutex in the "struct device" is seen as a bad
> idea, it would also be easy to remove. __driver_probe_device() could
> just make a call like "of_device_probe_start()" at the beginning that
> locks the mutex and then "of_device_probe_end()" that unlocks it. Both
> of those calls could easily lookup the mutex in a list, which would
> get rid of the need to store it in the "struct device".

That could be useful for other things too. Like moving some of the hw
init we do outside of probe (though that's mostly abstracted to be not
DT specific, so maybe not).

> > > That would lead me to suggest this:
> > >
> > >   &i2c_bus {
> > >     trackpad-prober {
> > >       compatible = "mt8173-elm-hana-trackpad-prober";
> > >
> > >       tp1: trackpad@10 {
> > >         compatible = "hid-over-i2c";
> > >         reg = <0x10>;
> > >         ...
> > >         post-power-on-delay-ms = <200>;
> > >       };
> > >       tp2: trackpad@20 {
> > >         compatible = "hid-over-i2c";
> > >         reg = <0x20>;
> > >         ...
> > >         post-power-on-delay-ms = <200>;
> > >       };
> > >     };
> > >   };
> > >
> > > ...but I suspect that would be insta-NAKed because it's creating a
> > > completely virtual device ("mt8173-elm-hana-trackpad-prober") in the
> > > device tree. I don't know if there's something that's functionally
> > > similar that would be OK?
> >
> > Why do you need the intermediate node other than a convenient way to
> > instantiate a driver? You just need a flag in each node which needs
> > this special handling. Again, "status" could work well here since it
> > keeps the normal probe from happening. But I'm not saying you can't
> > have some board specific code. Sometimes you just need code to deal
> > with this stuff. Don't try to parameterize everything to DT
> > properties.
>
> I think I'd have an easier time understanding if I knew where you
> envisioned the board-specific code living. Do you have an example of
> board specific code running at boot time in the kernel on DT systems?

drivers/soc/ or drivers/platform/ are the dumping grounds. Don't we
already have CrOS stuff there?

> > Note that the above only works with "generic" compatibles with
> > "generic" power sequencing properties (I won't repeat my dislike
> > again).
>
> I don't think so? I was imagining that we'd have some board specific
> code that ran that knew all the possible combinations of devices,
> could probe them, and then could instantiate the correct driver.

Okay, just making sure you weren't trying to parameterize everything
when generally, how to power sequence is implicit.

>
> Imagine that instead of the hated "hid-over-i2c" compatible we were
> using two other devices. Imagine that a given board could have a
> "elan,ekth6915" and a "goodix,gt7375p". Both of these devices have
> specific timing requirements on how to sequence their supplies and
> reset GPIOs. For Elan we power on the supplies, wait at least 1 ms,
> deassert reset, wait at least 300 ms, and then can talk i2c. For
> Goodix we power on the supply, wait at least 10 ms, deassert reset,
> wait at least 180 ms, and then can talk i2c. If we had a
> board-specific probing driver then it would power on the supplies,
> wait at least 10 ms (the max of the two), deassert reset, wait at
> least 300 ms (the max of the two), and then see which device talked.
> Then it would instantiate whichever of the two drivers. This could be
> done for any two devices that EEs have determined have "compatible"
> probing sequences.

My point was that in the above example, all these delay times would
have to be defined in the kernel, not DT.

> Ideally in the above situation we'd be able to avoid turning the
> device off and on again between the board-specific probe code and the
> normal driver. That optimization might need special code per-driver
> but it feels doable by passing some sort of hint to the child driver
> when it's instantiated.

I think fixing regulators getting turned off on failed probes and
having a "regulator on time" would go a long way towards providing
that hint even if the on time was just since clocksource started.


> > If only the driver knows how to handle the device, then you
> > still just have to have the driver probe. If you *only* wanted to
> > solve the above case, I'd just make "hid-over-i2c" take a 2nd (and
> > 3rd) I2C address in reg and have those as fallbacks.
>
> Yeah, it did occur to me that having "hid-over-i2c" take more than one
> register (and I guess more than one "hid-descr-addr") would work in my
> earlier example and this might actually be a good solution for Johan.
> I'm hoping for a better generic solution, though.
>
>
> > You could always make the driver probe smarter where if your supply
> > was already powered on, then don't delay. Then something else could
> > ensure that the supply is enabled. I'm not sure if regulators have the
> > same issue as clocks where the clock might be on from the bootloader,
> > then a failed probe which gets then puts the clock turns it off.
>
> I'm not sure it's that simple. Even if the supply didn't turn off by
> itself in some cases, we wouldn't know how long the supply was on.

We know the time since the clocksource was initialized. I'd imagine
for most cases, more than enough time would elapse by the time you get
to these drivers.

Rob
Rob Herring Sept. 28, 2023, 8:37 p.m. UTC | #5
On Wed, Sep 27, 2023 at 6:50 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Tue, Sep 26, 2023 at 7:37 PM Jeff LaBundy <jeff@labundy.com> wrote:
> >
> > Hi Doug,
> >
> > On Fri, Sep 22, 2023 at 05:11:10PM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, Sep 22, 2023 at 12:08 PM Rob Herring <robh+dt@kernel.org> wrote:
> > > >
> > > > > > This seems like overkill to me. Do we really need groups and a mutex
> > > > > > for each group? Worst case is what? 2-3 groups of 2-3 devices?
> > > > > > Instead, what about extending "status" with another value
> > > > > > ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
> > > > > > kernel would just ignore nodes with that status. Then we can process
> > > > > > those nodes separately 1-by-1.
> > > > >
> > > > > My worry here is that this has the potential to impact boot speed in a
> > > > > non-trivial way. While trackpads and touchscreens _are_ probable,
> > > > > their probe routines are often quite slow. This is even mentioned in
> > > > > Dmitry's initial patches adding async probe to the kernel. See commit
> > > > > 765230b5f084 ("driver-core: add asynchronous probing support for
> > > > > drivers") where he specifically brings up input devices as examples.
> >
> > Ideally, all but one driver in a group should bail out of probe quickly if
> > the device is not populated. If not, I would consider that to be a bug or at
> > least room for improvement in that driver.
> >
> > The reason input devices can take a while to probe is because they may be
> > loading FW over I2C or performing some sort of calibration procedure; only
> > one driver in the group should get that far.
>
> Hmm, that's not my experience. Specifically I've seen i2c-hid devices
> whose datasheets say that you're not allowed to talk i2c to them at
> all for hundreds of milliseconds after you power them on. See, for
> instance, "i2c-hid-of-goodix.c" which has a "post_gpio_reset_delay_ms"
> of 180 ms and "i2c-hid-of-elan.c" which has one of 300 ms.
>
> As I understand it these touchscreens have firmware on them and that
> firmware can take a while to boot. Until the firmware boots they won't
> respond over i2c. This is simply not something that Linux can do
> anything about.
>
> About the best you could do would be to add a board-specific driver
> that understood that it could power up the rails, wait the maximum
> amount of time that all possible touchscreens might need, and then
> look for i2c ACKs. I'm still hoping to hear from Rob about how I would
> get a board-specific driver to load on a DT system so I can
> investigate / prototype this.

foo_initcall()
{
  if (of_machine_is_compatible(...))
    platform_device_create();
}

That chunk would have to be built in and if that's part of the driver
module, autoloading wouldn't work.

We could have a match table of board compatibles and driver names. I'm
not worried about that list being big, so I'm happy to stick that in
drivers/of/.

> > > > We could add information on the class of device. touchscreen and
> > > > touchpad aliases or something.
> > >
> > > Ah, I see. So something like "fail-needs-probe-<class>". The
> > > touchscreens could have "fail-needs-probe-touchscreen" and the
> > > trackpads could have "fail-needs-probe-trackpad" ? That could work. In
> > > theory that could fall back to the same solution of grabbing a mutex
> > > based on the group ID...
> > >
> > > Also: if having the mutex in the "struct device" is seen as a bad
> > > idea, it would also be easy to remove. __driver_probe_device() could
> > > just make a call like "of_device_probe_start()" at the beginning that
> > > locks the mutex and then "of_device_probe_end()" that unlocks it. Both
> > > of those calls could easily lookup the mutex in a list, which would
> > > get rid of the need to store it in the "struct device".
> > >
> > >
> > > > > That would lead me to suggest this:
> > > > >
> > > > >   &i2c_bus {
> > > > >     trackpad-prober {
> > > > >       compatible = "mt8173-elm-hana-trackpad-prober";
> > > > >
> > > > >       tp1: trackpad@10 {
> > > > >         compatible = "hid-over-i2c";
> > > > >         reg = <0x10>;
> > > > >         ...
> > > > >         post-power-on-delay-ms = <200>;
> > > > >       };
> > > > >       tp2: trackpad@20 {
> > > > >         compatible = "hid-over-i2c";
> > > > >         reg = <0x20>;
> > > > >         ...
> > > > >         post-power-on-delay-ms = <200>;
> > > > >       };
> > > > >     };
> > > > >   };
> > > > >
> > > > > ...but I suspect that would be insta-NAKed because it's creating a
> > > > > completely virtual device ("mt8173-elm-hana-trackpad-prober") in the
> > > > > device tree. I don't know if there's something that's functionally
> > > > > similar that would be OK?
> >
> > This solution seems a bit confusing to me, and would require more edits
> > to the dts each time a second source is added. It also means one would
> > have to write a small platform driver for each group of devices, correct?
>
> No matter what we need to add something to the dts each time a second
> source is added, right?

That was my thought. There is the case of the devices are almost the
same, so we lie. That's what you are doing for displays IIRC.

> While it's true that we'd end up with some extra drivers, if we do it
> correctly we don't necessarily need a driver for each group of devices
> nor even a driver per board. If several boards have very similar
> probing requirements then, even if they have unique "compatible"
> strings they could still end up using the same Linux driver.
>
> I've actually been talking offline with folks on ChromeOS more about
> this problem as well. Chen-Yu actually pointed at a patch series (that
> never landed, I guess) that has some similar ideas [1]. I guess in
> that case Hans was actually constructing device tree properties
> manually in the driver. I was thinking more of having all of the
> options listed in the device tree and then doing something that only
> causes some of them to probe.
>
> If Rob was OK with it, I guess I could have some sort of top-level
> "hwmanager" node like Hans did and then have phandle links to all the
> hardware that are managed by it. Then I could just change those to
> "okay"?

That's really just making the mutex node link the other direction. The
devices link to the common mutex node vs. the hwmanager node(s) links
to all the devices. That's really just picking the paint colors.

> Ideally, though, this could somehow use device tree "overlays" I
> guess. That seems like almost a perfect fit. I guess the issue here,
> though, is that I'd want the overlays bundled together with the
> original DT and then the board-specific "hardware prober" driver to
> actually apply the overlays after probing. Does that seem sensible?

BTW, there was an idea long ago from maintainer emeritus Likely to
append overlays to the base DTB for the kernel to apply.

How would that help you here? Are you going to have an overlay for
each device that enables it? It's much easier to just call
of_update_property() to change "status".

Rob
Doug Anderson Sept. 28, 2023, 11:21 p.m. UTC | #6
Hi,

On Thu, Sep 28, 2023 at 1:12 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> > > Perhaps then this should be solved in userspace where it can learn
> > > which device is actually present and save that information for
> > > subsequent boots.
> >
> > Yeah, the thought occurred to me as well. I think there are a few
> > problems, though:
> >
> > a) Userspace can't itself probe these devices effectively. While
> > userspace could turn on GPIOs manually and query the i2c bus manually,
> > it can't (I believe) turn on regulators nor can it turn on clocks, if
> > they are needed. About the best userspace could do would be to blindly
> > try binding an existing kernel driver, and in that case why did we
> > need userspace involved anyway?
> >
> > b) While deferring to userspace can work for solutions like ChromeOS
> > or Android where it's easy to ensure the userspace bits are there,
> > it's less appealing as a general solution. I think in Johan's case
> > he's taking a laptop that initially ran Windows and then is trying to
> > run a generic Linux distro on it. For anyone in a similar situation,
> > they'd either need to pick a Linux distro that has the magic userspace
> > bits that are needed or they need to know that, on their laptop, they
> > need to manually install some software. While requiring special
> > userspace might make sense if you've got a special peripheral, like an
> > LTE modem, it makes less sense to need special userspace just to get
> > the right devices bound...
>
> I did not mean do it all in userspace, but for userspace to save off
> what devices are actually present. For example, if userspace has
> access to the dtb, it could just update the dtb to enable the right
> nodes. Then after the first boot, boot time is faster. Or a driver
> could try to load an overlay with the config enabling the right
> devices. Though maybe waiting til userspace is available wouldn't
> speed things up.

At least for the ChromeOS boot flow userspace isn't able to make any
changes to the dtb so I don't think this would help us unless I'm
misunderstanding.


> > Imagine that instead of the hated "hid-over-i2c" compatible we were
> > using two other devices. Imagine that a given board could have a
> > "elan,ekth6915" and a "goodix,gt7375p". Both of these devices have
> > specific timing requirements on how to sequence their supplies and
> > reset GPIOs. For Elan we power on the supplies, wait at least 1 ms,
> > deassert reset, wait at least 300 ms, and then can talk i2c. For
> > Goodix we power on the supply, wait at least 10 ms, deassert reset,
> > wait at least 180 ms, and then can talk i2c. If we had a
> > board-specific probing driver then it would power on the supplies,
> > wait at least 10 ms (the max of the two), deassert reset, wait at
> > least 300 ms (the max of the two), and then see which device talked.
> > Then it would instantiate whichever of the two drivers. This could be
> > done for any two devices that EEs have determined have "compatible"
> > probing sequences.
>
> My point was that in the above example, all these delay times would
> have to be defined in the kernel, not DT.

In the case of using the actual "hid-over-i2c" driver and not one of
the specialized ones, I think we'd still need to put the delay times
in the DT for the individual "hid-over-i2c" nodes, right? The
board-specific driver could still have an implicit delay depending on
the board compatible, but if you set the "hid-over-i2c" node to "okay"
then that driver is going to be expecting to read the delay from DT.
It will use the delay it reads from the DT for powering on after
suspend/resume...


> > Ideally in the above situation we'd be able to avoid turning the
> > device off and on again between the board-specific probe code and the
> > normal driver. That optimization might need special code per-driver
> > but it feels doable by passing some sort of hint to the child driver
> > when it's instantiated.
>
> I think fixing regulators getting turned off on failed probes and
> having a "regulator on time" would go a long way towards providing
> that hint even if the on time was just since clocksource started.

You're suggesting that when a touchscreen fails to probe because it
didn't find the touchscreen on the i2c bus that it should leave its
regulator on? That doesn't seem right to me. While we might find a way
for it to help in the 2nd sourcing case, it doesn't seem right in the
case of a device truly being missing...

I'm also not sure that it truly solves the problem since the power
sequencing of these devices is more than just a regulator but often
involves several regulators being turned on (perhaps with delays in
between) and some enable and/or reset GPIOs. In the case of many
touchscreens the delay needed is actually the delay from after the
reset GPIO is deasserted.

In any case, we can talk more about this in my other response.

-Doug
Doug Anderson Sept. 28, 2023, 11:21 p.m. UTC | #7
Hi,

On Thu, Sep 28, 2023 at 1:38 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> > About the best you could do would be to add a board-specific driver
> > that understood that it could power up the rails, wait the maximum
> > amount of time that all possible touchscreens might need, and then
> > look for i2c ACKs. I'm still hoping to hear from Rob about how I would
> > get a board-specific driver to load on a DT system so I can
> > investigate / prototype this.
>
> foo_initcall()
> {
>   if (of_machine_is_compatible(...))
>     platform_device_create();
> }
>
> That chunk would have to be built in and if that's part of the driver
> module, autoloading wouldn't work.
>
> We could have a match table of board compatibles and driver names. I'm
> not worried about that list being big, so I'm happy to stick that in
> drivers/of/.

Ah, got it. So your proposal is that we don't add anything to the
device tree but we just probe the hardware manager based on the top
level compatible string. I guess it could work. It wouldn't mesh
amazingly well with the current Chromebook rev/sku stuff in the
top-level compatible without being a bit of a jumble. It could
probably work with some sort of wildcarding (I'd assume glob-style is
enough?). So essentially:

static const struct hw_prober_map[] {
  { .glob = "google,lazor*", .func = lazor_hw_prober_init },
  { .glob = "google,homestar*", .func = homestar_hw_prober_init },
  ...
};

for (i = 0; i < ARRAY_SIZE(hw_prober_map), i++) {
  if (of_machine_is_compatible_glob(hw_prober_map[i].glob)
    hw_prober_map[i].func();
}

If that got to be too big to be built-in then I guess we could always
figure out a way to move stuff to modules and have them auto-loaded.
For now the driver could be in
"drivers/platform/chrome/cros_hwprober.c" or something?


Hmmm, I guess one issue of doing this, though, is that it's going to
be more of a pain to find the DT nodes associated with the resources
we want to enable, right? Since there's no DT note associated with the
"HW prober" driver we don't just have phandles to them. Do we just use
the whole "status" concept and search the whole DT for
"fail-needs-probe" type statuses? Like if we have an elan vs. goodix
touchscreen and we have a realtek vs. synaptic trackpad then we have
statuses like:

status = "fail-needs-probe-touchscreen-elan";
status = "fail-needs-probe-touchscreen-goodix";
status = "fail-needs-probe-trackpad-realtek";
status = "fail-needs-probe-trackpad-synaptic";

...or did you have something else in mind? I'd rather not have the HW
prober driver need to hardcode full DT paths for the devices it's
looking for.


> > > This solution seems a bit confusing to me, and would require more edits
> > > to the dts each time a second source is added. It also means one would
> > > have to write a small platform driver for each group of devices, correct?
> >
> > No matter what we need to add something to the dts each time a second
> > source is added, right?
>
> That was my thought.

OK, cool.


> There is the case of the devices are almost the
> same, so we lie. That's what you are doing for displays IIRC.

Well, we used to do it for display, but it kept biting us. That's why
we created the generic "edp-panel", right? In any case, I'd tend to
keep it as one node per possible device and have HW prober just update
the status.


> > While it's true that we'd end up with some extra drivers, if we do it
> > correctly we don't necessarily need a driver for each group of devices
> > nor even a driver per board. If several boards have very similar
> > probing requirements then, even if they have unique "compatible"
> > strings they could still end up using the same Linux driver.
> >
> > I've actually been talking offline with folks on ChromeOS more about
> > this problem as well. Chen-Yu actually pointed at a patch series (that
> > never landed, I guess) that has some similar ideas [1]. I guess in
> > that case Hans was actually constructing device tree properties
> > manually in the driver. I was thinking more of having all of the
> > options listed in the device tree and then doing something that only
> > causes some of them to probe.
> >
> > If Rob was OK with it, I guess I could have some sort of top-level
> > "hwmanager" node like Hans did and then have phandle links to all the
> > hardware that are managed by it. Then I could just change those to
> > "okay"?
>
> That's really just making the mutex node link the other direction. The
> devices link to the common mutex node vs. the hwmanager node(s) links
> to all the devices. That's really just picking the paint colors.

I don't think the HW Manager concept is the same as the common mutex
at all, so I probably didn't explain it properly.

With the mutex approach the idea is that you simply keep probing each
device one at a time until one succeeds and the mutex keeps them all
from probing at the same time.

With the hardware manager approach you run a bit of board-specific
code that understands which devices are available and can probe for
them in a way that's safer and more efficient. It's safer because it
can take into account the timing requirements of all the possible
devices to ensure that none of their power sequences are violated.
Imagine two touchscreens that each have two power rails and a reset
line. The power sequences are:

TS1:
1. Power up VCC
2. Wait 0 ms (ensure ordering between VCC and VCCIO)
3. Power up VCCIO
4. Wait 100 ms
5. Deassert reset
6. Wait 50 ms.
7. Talk I2C

TS2:
1. Power up VCC
2. Wait 10 ms
3. Power up VCCIO
4. Wait 50 ms.
5. Deassert reset
6. Wait 100 ms
7. Talk I2C

With the "mutex" approach then when we try probing TS1 we'll violate
TS2's specs (not enough delay between VCC and VCCIO). When we try
probing TS2 we'll violate TS1's specs (not enough time between VCCIO
and deasserting reset).

With the a board-specific hardware manager we could know that, for all
possible touchscreens on this board, we can always safely probe for
them with:
1. Power up VCC
2. Wait 10 ms
3. Power up VCCIO
4. Wait 100 ms.
5. Deassert reset
6. Wait 100 ms
7. Talk I2C

Once we've realized which touchscreen is actually present then all
future power ons (like after suspend/resume) can be faster, but this
would be safer for the initial probe.

The above is not only safer but also more efficient. If, in the mutex
case, we probed TS1 first but actually had TS2 then we'd spend 100 +
50 + 10 + 50 + 100 = 310 ms. With the hardware manager we'd probe for
both touchscreens in step 7 and thus we'd only take 10 + 100 + 100 =
210 ms.

The issue with the hardware manager is that we'd then run the normal
driver probe and, unless we could somehow give it a hint, it would
need to re-run through the power sequence again. In your other
response you suggested that the normal driver could just detect that
its regulator was already on and it could skip the regulator power
sequence. I'm not convinced that's a reliable hint. If nothing else
there are some boards the touchscreen regulator is shared and/or
always-on but that doesn't mean someone has properly power sequenced
the "reset" GPIO. I feel like we'd want a more explicit hint, but
that's more something to solve in the Linux driver model and not
something to solve in DT bindings.


> > Ideally, though, this could somehow use device tree "overlays" I
> > guess. That seems like almost a perfect fit. I guess the issue here,
> > though, is that I'd want the overlays bundled together with the
> > original DT and then the board-specific "hardware prober" driver to
> > actually apply the overlays after probing. Does that seem sensible?
>
> BTW, there was an idea long ago from maintainer emeritus Likely to
> append overlays to the base DTB for the kernel to apply.
>
> How would that help you here? Are you going to have an overlay for
> each device that enables it? It's much easier to just call
> of_update_property() to change "status".

Ah, OK. Somehow I assumed that using overlays would be more palatable.
If it's OK to just update the property then that seems fine to me.

...although one other reason I thought to use overlays is I think you
mentioned there was code to make late-arriving devices probe, but I'm
sure that can be handled.

---

So I guess the overall summary is: I'm strongly leaning towards
prototyping the "HW prober" approach. Hopefully that sounds OK.

-Doug
diff mbox series

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4d8b315c48a1..adeceea331df 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3109,6 +3109,7 @@  void device_initialize(struct device *dev)
 	dev->dma_coherent = dma_default_coherent;
 #endif
 	swiotlb_dev_init(dev);
+	of_device_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index a528cec24264..476d411b5443 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -789,6 +789,9 @@  static int __driver_probe_device(struct device_driver *drv, struct device *dev)
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	if (dev->probe_mutex)
+		mutex_lock(dev->probe_mutex);
+
 	pm_runtime_get_suppliers(dev);
 	if (dev->parent)
 		pm_runtime_get_sync(dev->parent);
@@ -804,6 +807,10 @@  static int __driver_probe_device(struct device_driver *drv, struct device *dev)
 		pm_runtime_put(dev->parent);
 
 	pm_runtime_put_suppliers(dev);
+
+	if (dev->probe_mutex)
+		mutex_unlock(dev->probe_mutex);
+
 	return ret;
 }
 
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 1ca42ad9dd15..c58c716507e8 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -304,3 +304,57 @@  int of_device_uevent_modalias(const struct device *dev, struct kobj_uevent_env *
 	return 0;
 }
 EXPORT_SYMBOL_GPL(of_device_uevent_modalias);
+
+struct of_mutex_list_node {
+	struct list_head list;
+	struct device_node *np;
+	struct mutex mutex;
+};
+
+static DEFINE_MUTEX(of_mutex_list_lock);
+static LIST_HEAD(of_mutex_list);
+
+/**
+ * of_device_init() - Init a OF-related elements in a new struct device
+ * @dev: the new struct device
+ *
+ * The only initialization we need done at the moment is to init the
+ * "probe_mutex" if this device is part of a mutual-exclusion-group.
+ */
+void of_device_init(struct device *dev)
+{
+	struct of_mutex_list_node *node;
+	struct device_node *mutex_np;
+
+	mutex_np = of_parse_phandle(dev->of_node, "mutual-exclusion-group", 0);
+	if (!mutex_np)
+		return;
+
+	mutex_lock(&of_mutex_list_lock);
+
+	/*
+	 * Check to see if we've already created a mutex for this group. If
+	 * so then we're done.
+	 */
+	list_for_each_entry(node, &of_mutex_list, list) {
+		if (node->np == mutex_np) {
+			of_node_put(mutex_np);
+			dev->probe_mutex = &node->mutex;
+			goto exit;
+		}
+	}
+
+	/*
+	 * We need to create a new mutex. We'll never free the memory for this
+	 * (nor release the referenced to the mutual-exclusion-group node) but
+	 * there is only one object per group.
+	 */
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	mutex_init(&node->mutex);
+	node->np = mutex_np;
+	list_add_tail(&node->list, &of_mutex_list);
+	dev->probe_mutex = &node->mutex;
+
+exit:
+	mutex_unlock(&of_mutex_list_lock);
+}
diff --git a/include/linux/device.h b/include/linux/device.h
index 56d93a1ffb7b..f3cecf535bca 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -672,6 +672,9 @@  struct device_physical_location {
  * @iommu:	Per device generic IOMMU runtime data
  * @physical_location: Describes physical location of the device connection
  *		point in the system housing.
+ * @probe_mutex: If non-NULL, this mutex will be held during device probe
+ *		to allow mutual exclusion between multiple sources of probable
+ *		but non-discoverable devices with conflicting resources.
  * @removable:  Whether the device can be removed from the system. This
  *              should be set by the subsystem / bus driver that discovered
  *              the device.
@@ -790,6 +793,8 @@  struct device {
 
 	struct device_physical_location *physical_location;
 
+	struct mutex		*probe_mutex;
+
 	enum device_removable	removable;
 
 	bool			offline_disabled:1;
diff --git a/include/linux/of_device.h b/include/linux/of_device.h
index 2c7a3d4bc775..8ebaf4d58ecd 100644
--- a/include/linux/of_device.h
+++ b/include/linux/of_device.h
@@ -30,6 +30,7 @@  extern ssize_t of_device_modalias(struct device *dev, char *str, ssize_t len);
 
 extern void of_device_uevent(const struct device *dev, struct kobj_uevent_env *env);
 extern int of_device_uevent_modalias(const struct device *dev, struct kobj_uevent_env *env);
+extern void of_device_init(struct device *dev);
 
 int of_dma_configure_id(struct device *dev,
 		     struct device_node *np,
@@ -82,6 +83,11 @@  static inline int of_dma_configure(struct device *dev,
 {
 	return 0;
 }
+
+static inline void of_device_init(struct device *dev)
+{
+}
+
 #endif /* CONFIG_OF */
 
 #endif /* _LINUX_OF_DEVICE_H */