mbox series

[v2,0/9] Add support for Microsoft Surface System Aggregator Module

Message ID 20201203212640.663931-1-luzmaximilian@gmail.com
Headers show
Series Add support for Microsoft Surface System Aggregator Module | expand

Message

Maximilian Luz Dec. 3, 2020, 9:26 p.m. UTC
Hello,

Here is version two of the Surface System Aggregator Module (SAM/SSAM)
driver series, adding initial support for the embedded controller on 5th
and later generation Microsoft Surface devices. Initial support includes
the ACPI interface to the controller, via which battery and thermal
information is provided on some of these devices.

The previous version and cover letter detailing what this series is
about can be found at

  https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/

This patch-set can also be found at the following repository and
reference, if you prefer to look at a kernel tree instead of these
emails:

  https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2

Thank you all for the feedback to v1, I hope I have addressed all
comments.

Regards,
Max


Note: This patch depends on

  [PATCH v4] platform/surface: Create a platform subdirectory for
             Microsoft Surface devices

which can be found at

  https://lore.kernel.org/platform-driver-x86/20201009141128.683254-1-luzmaximilian@gmail.com/

and is currently in platform-drivers-x86/for-next.


Changes in v1 (from RFC, overview):
 - move to platform/surface
 - add copyright lines
 - change SPDX identifier to GPL-2.0+ (was GPL-2.0-or-later)
 - change user-space interface from debugfs to misc-device
 - address issues in user-space interface
 - fix typos in commit messages and documentation
 - fix some bugs, address other issues

Changes in v2 (overview):
 - simplify some code, mostly with regards to concurrency
 - add architectural overview to documentation
 - improve comments for documentation
 - use printk specifier for hex prefix instead of hard-coding it
 - spell check comments and strings, fix typos
 - unify comment style
 - run checkpatch --strict, fix these and other style issues

Changes regarding specific patches (and more details) can be found on
the individual patch.


Maximilian Luz (9):
  platform/surface: Add Surface Aggregator subsystem
  platform/surface: aggregator: Add control packet allocation caching
  platform/surface: aggregator: Add event item allocation caching
  platform/surface: aggregator: Add trace points
  platform/surface: aggregator: Add error injection capabilities
  platform/surface: aggregator: Add dedicated bus and device type
  docs: driver-api: Add Surface Aggregator subsystem documentation
  platform/surface: Add Surface Aggregator user-space interface
  platform/surface: Add Surface ACPI Notify driver

 Documentation/driver-api/index.rst            |    1 +
 .../surface_aggregator/client-api.rst         |   38 +
 .../driver-api/surface_aggregator/client.rst  |  393 +++
 .../surface_aggregator/clients/cdev.rst       |   85 +
 .../surface_aggregator/clients/index.rst      |   21 +
 .../surface_aggregator/clients/san.rst        |   44 +
 .../driver-api/surface_aggregator/index.rst   |   21 +
 .../surface_aggregator/internal-api.rst       |   67 +
 .../surface_aggregator/internal.rst           |  577 ++++
 .../surface_aggregator/overview.rst           |   77 +
 .../driver-api/surface_aggregator/ssh.rst     |  344 +++
 .../userspace-api/ioctl/ioctl-number.rst      |    2 +
 MAINTAINERS                                   |   13 +
 drivers/platform/surface/Kconfig              |   38 +
 drivers/platform/surface/Makefile             |    3 +
 drivers/platform/surface/aggregator/Kconfig   |   69 +
 drivers/platform/surface/aggregator/Makefile  |   17 +
 drivers/platform/surface/aggregator/bus.c     |  415 +++
 drivers/platform/surface/aggregator/bus.h     |   27 +
 .../platform/surface/aggregator/controller.c  | 2543 +++++++++++++++++
 .../platform/surface/aggregator/controller.h  |  285 ++
 drivers/platform/surface/aggregator/core.c    |  833 ++++++
 .../platform/surface/aggregator/ssh_msgb.h    |  205 ++
 .../surface/aggregator/ssh_packet_layer.c     | 2046 +++++++++++++
 .../surface/aggregator/ssh_packet_layer.h     |  190 ++
 .../platform/surface/aggregator/ssh_parser.c  |  228 ++
 .../platform/surface/aggregator/ssh_parser.h  |  154 +
 .../surface/aggregator/ssh_request_layer.c    | 1256 ++++++++
 .../surface/aggregator/ssh_request_layer.h    |  143 +
 drivers/platform/surface/aggregator/trace.h   |  632 ++++
 .../platform/surface/surface_acpi_notify.c    |  886 ++++++
 .../surface/surface_aggregator_cdev.c         |  299 ++
 include/linux/mod_devicetable.h               |   18 +
 include/linux/surface_acpi_notify.h           |   39 +
 include/linux/surface_aggregator/controller.h |  824 ++++++
 include/linux/surface_aggregator/device.h     |  423 +++
 include/linux/surface_aggregator/serial_hub.h |  672 +++++
 include/uapi/linux/surface_aggregator/cdev.h  |   58 +
 scripts/mod/devicetable-offsets.c             |    8 +
 scripts/mod/file2alias.c                      |   23 +
 40 files changed, 14017 insertions(+)
 create mode 100644 Documentation/driver-api/surface_aggregator/client-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/client.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/cdev.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/san.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/overview.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst
 create mode 100644 drivers/platform/surface/aggregator/Kconfig
 create mode 100644 drivers/platform/surface/aggregator/Makefile
 create mode 100644 drivers/platform/surface/aggregator/bus.c
 create mode 100644 drivers/platform/surface/aggregator/bus.h
 create mode 100644 drivers/platform/surface/aggregator/controller.c
 create mode 100644 drivers/platform/surface/aggregator/controller.h
 create mode 100644 drivers/platform/surface/aggregator/core.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_msgb.h
 create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_packet_layer.h
 create mode 100644 drivers/platform/surface/aggregator/ssh_parser.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_parser.h
 create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.c
 create mode 100644 drivers/platform/surface/aggregator/ssh_request_layer.h
 create mode 100644 drivers/platform/surface/aggregator/trace.h
 create mode 100644 drivers/platform/surface/surface_acpi_notify.c
 create mode 100644 drivers/platform/surface/surface_aggregator_cdev.c
 create mode 100644 include/linux/surface_acpi_notify.h
 create mode 100644 include/linux/surface_aggregator/controller.h
 create mode 100644 include/linux/surface_aggregator/device.h
 create mode 100644 include/linux/surface_aggregator/serial_hub.h
 create mode 100644 include/uapi/linux/surface_aggregator/cdev.h

--
2.29.2

Comments

Leon Romanovsky Dec. 6, 2020, 7:07 a.m. UTC | #1
On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:
> Hello,

>

> Here is version two of the Surface System Aggregator Module (SAM/SSAM)

> driver series, adding initial support for the embedded controller on 5th

> and later generation Microsoft Surface devices. Initial support includes

> the ACPI interface to the controller, via which battery and thermal

> information is provided on some of these devices.

>

> The previous version and cover letter detailing what this series is

> about can be found at

>

>   https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/

>

> This patch-set can also be found at the following repository and

> reference, if you prefer to look at a kernel tree instead of these

> emails:

>

>   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2

>

> Thank you all for the feedback to v1, I hope I have addressed all

> comments.



I think that it is too far fetched to attempt and expose UAPI headers
for some obscure char device that we are all know won't be around in
a couple of years from now due to the nature of how this embedded world
works.

More on that, the whole purpose of proposed interface is to debug and
not intended to be used by any user space code.

Also the idea that you are creating new bus just for this device doesn't
really sound right. I recommend you to take a look on auxiliary bus and
use it or come with very strong justifications why it is not fit yet.

I'm sorry to say, but this series is not ready to be merged yet.

NAK: Leon Romanovsky <leon@kernel.org>

Thanks
Greg Kroah-Hartman Dec. 6, 2020, 8:32 a.m. UTC | #2
On Sun, Dec 06, 2020 at 09:07:05AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:
> > Hello,
> >
> > Here is version two of the Surface System Aggregator Module (SAM/SSAM)
> > driver series, adding initial support for the embedded controller on 5th
> > and later generation Microsoft Surface devices. Initial support includes
> > the ACPI interface to the controller, via which battery and thermal
> > information is provided on some of these devices.
> >
> > The previous version and cover letter detailing what this series is
> > about can be found at
> >
> >   https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/
> >
> > This patch-set can also be found at the following repository and
> > reference, if you prefer to look at a kernel tree instead of these
> > emails:
> >
> >   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2
> >
> > Thank you all for the feedback to v1, I hope I have addressed all
> > comments.
> 
> 
> I think that it is too far fetched to attempt and expose UAPI headers
> for some obscure char device that we are all know won't be around in
> a couple of years from now due to the nature of how this embedded world
> works.

No, that's not ok, we do this for loads of devices out there.  If there
is a device that wants to be supported for Linux, and a developer that
wants to support it, we will take it.

> More on that, the whole purpose of proposed interface is to debug and
> not intended to be used by any user space code.

I thought that debugfs was going to be used for most of the debugging
code, or has that changed in newer versions of this patchset?

thanks,

greg k-h
Leon Romanovsky Dec. 6, 2020, 8:35 a.m. UTC | #3
On Sun, Dec 06, 2020 at 09:32:30AM +0100, Greg Kroah-Hartman wrote:
> On Sun, Dec 06, 2020 at 09:07:05AM +0200, Leon Romanovsky wrote:

> > On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:

> > > Hello,

> > >

> > > Here is version two of the Surface System Aggregator Module (SAM/SSAM)

> > > driver series, adding initial support for the embedded controller on 5th

> > > and later generation Microsoft Surface devices. Initial support includes

> > > the ACPI interface to the controller, via which battery and thermal

> > > information is provided on some of these devices.

> > >

> > > The previous version and cover letter detailing what this series is

> > > about can be found at

> > >

> > >   https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/

> > >

> > > This patch-set can also be found at the following repository and

> > > reference, if you prefer to look at a kernel tree instead of these

> > > emails:

> > >

> > >   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2

> > >

> > > Thank you all for the feedback to v1, I hope I have addressed all

> > > comments.

> >

> >

> > I think that it is too far fetched to attempt and expose UAPI headers

> > for some obscure char device that we are all know won't be around in

> > a couple of years from now due to the nature of how this embedded world

> > works.

>

> No, that's not ok, we do this for loads of devices out there.  If there

> is a device that wants to be supported for Linux, and a developer that

> wants to support it, we will take it.

>

> > More on that, the whole purpose of proposed interface is to debug and

> > not intended to be used by any user space code.

>

> I thought that debugfs was going to be used for most of the debugging

> code, or has that changed in newer versions of this patchset?


https://lore.kernel.org/lkml/20201203212640.663931-9-luzmaximilian@gmail.com/#Z30include:uapi:linux:surface_aggregator:cdev.h

+ * Definitions, structs, and IOCTLs for the /dev/surface/aggregator misc
+ * device. This device provides direct user-space access to the SSAM EC.
+ * Intended for debugging and development.

>

> thanks,

>

> greg k-h
Hans de Goede Dec. 6, 2020, 8:41 a.m. UTC | #4
Hi Leon,

On 12/6/20 8:07 AM, Leon Romanovsky wrote:
> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:

>> Hello,

>>

>> Here is version two of the Surface System Aggregator Module (SAM/SSAM)

>> driver series, adding initial support for the embedded controller on 5th

>> and later generation Microsoft Surface devices. Initial support includes

>> the ACPI interface to the controller, via which battery and thermal

>> information is provided on some of these devices.

>>

>> The previous version and cover letter detailing what this series is

>> about can be found at

>>

>>   https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/

>>

>> This patch-set can also be found at the following repository and

>> reference, if you prefer to look at a kernel tree instead of these

>> emails:

>>

>>   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2

>>

>> Thank you all for the feedback to v1, I hope I have addressed all

>> comments.

> 

> 

> I think that it is too far fetched to attempt and expose UAPI headers

> for some obscure char device that we are all know won't be around in

> a couple of years from now due to the nature of how this embedded world

> works.


This is not for an embedded device, but for the popular line of
Microsoft Surface laptops / 2-in-1s...

> More on that, the whole purpose of proposed interface is to debug and

> not intended to be used by any user space code.


The purpose is to provide raw access to the Surface Serial Hub protocol,
just like we provide raw access to USB devices and have hidraw devices.

So this goes a litle beyond just debugging; and eventually the choice
may be made to implement some functionality with userspace drivers,
just like we do for some HID and USB devices.

Still I agree with you that adding new userspace API is something which
needs to be considered carefully. So I will look at this closely when
reviewing this set.

> Also the idea that you are creating new bus just for this device doesn't

> really sound right. I recommend you to take a look on auxiliary bus and

> use it or come with very strong justifications why it is not fit yet.


AFAIK the auxiliary bus is for sharing a single device between multiple
drivers, while the main device-driver also still offers functionality
(beyond the providing of access) itself.

This is more akin to how the WMI driver also models different WMI
functions as a bus + devices on the bus.

Or how the SDIO driver multiplex a single SDIO device into its
functions by again using a bus + devices on the bus model.

Also this has been in the works for quite a while now, the Linux on
Microsoft Surface devices community has been working on this out of
tree for a long time, see:
https://github.com/linux-surface/

And an RFC and a v1 have been posted a while ago, while auxiliary
bus support is not even in the mainline kernel yet. I would agree
with you that this should switch to auxiliary bus, despite the timing,
if that would lead to much better code. But ATM I don't really see
switching to auxiliary bus offering much benefits here.

> I'm sorry to say, but this series is not ready to be merged yet.

> 

> NAK: Leon Romanovsky <leon@kernel.org>


See above, I believe that this all is a bit harsh and I have not
really heard convincing arguments for not merging this.

Moreover such a quick nack does not really promote working upstream,
where as we actually want people to work upstream as much as possible.
I know this is not a reason for taking bad code, but I'm not
convinced that this is bad code.

I have not reviewed this myself yet, but once I have reviewed
this and any review remarks have been addressed I do expect to
merge this series through the platform-drivers-x86 tree.

Regards,

Hans de Goede
(drivers/platform/x86 and drivers/platform/surface subsys maintainer)
Blaž Hrastnik Dec. 6, 2020, 8:58 a.m. UTC | #5
> 

> > More on that, the whole purpose of proposed interface is to debug and

> > not intended to be used by any user space code.

> 

> The purpose is to provide raw access to the Surface Serial Hub protocol,

> just like we provide raw access to USB devices and have hidraw devices.

> 

> So this goes a litle beyond just debugging; and eventually the choice

> may be made to implement some functionality with userspace drivers,

> just like we do for some HID and USB devices.

> 

> Still I agree with you that adding new userspace API is something which

> needs to be considered carefully. So I will look at this closely when

> reviewing this set.


To add to that: this was previously a debugfs interface but was moved to misc after review on the initial RFC:
https://lkml.org/lkml/2020/9/24/96
Leon Romanovsky Dec. 6, 2020, 9:06 a.m. UTC | #6
On Sun, Dec 06, 2020 at 05:58:32PM +0900, Blaž Hrastnik wrote:
> >

> > > More on that, the whole purpose of proposed interface is to debug and

> > > not intended to be used by any user space code.

> >

> > The purpose is to provide raw access to the Surface Serial Hub protocol,

> > just like we provide raw access to USB devices and have hidraw devices.

> >

> > So this goes a litle beyond just debugging; and eventually the choice

> > may be made to implement some functionality with userspace drivers,

> > just like we do for some HID and USB devices.

> >

> > Still I agree with you that adding new userspace API is something which

> > needs to be considered carefully. So I will look at this closely when

> > reviewing this set.

>

> To add to that: this was previously a debugfs interface but was moved to misc after review on the initial RFC:

> https://lkml.org/lkml/2020/9/24/96


There is a huge difference between the suggestion and final implementation.

Greg suggested to add new debug module to the drivers/misc that will
open char device explicitly after user loaded that module to debug this
hub. However, the author added full blown char device as a first citizen
that has all not-break-user constrains.

Thanks
Hans de Goede Dec. 6, 2020, 10:04 a.m. UTC | #7
Hi,

On 12/6/20 9:56 AM, Leon Romanovsky wrote:
> On Sun, Dec 06, 2020 at 09:41:21AM +0100, Hans de Goede wrote:
>> Hi Leon,
>>
>> On 12/6/20 8:07 AM, Leon Romanovsky wrote:
>>> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:
>>>> Hello,
>>>>
>>>> Here is version two of the Surface System Aggregator Module (SAM/SSAM)
>>>> driver series, adding initial support for the embedded controller on 5th
>>>> and later generation Microsoft Surface devices. Initial support includes
>>>> the ACPI interface to the controller, via which battery and thermal
>>>> information is provided on some of these devices.
>>>>
>>>> The previous version and cover letter detailing what this series is
>>>> about can be found at
>>>>
>>>>   https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/
>>>>
>>>> This patch-set can also be found at the following repository and
>>>> reference, if you prefer to look at a kernel tree instead of these
>>>> emails:
>>>>
>>>>   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2
>>>>
>>>> Thank you all for the feedback to v1, I hope I have addressed all
>>>> comments.
>>>
>>>
>>> I think that it is too far fetched to attempt and expose UAPI headers
>>> for some obscure char device that we are all know won't be around in
>>> a couple of years from now due to the nature of how this embedded world
>>> works.
>>
>> This is not for an embedded device, but for the popular line of
>> Microsoft Surface laptops / 2-in-1s...
> 
> It is the naming, we don't have char device for every "laptop" vendor.
> Why is Microsoft different here?

Because their hardware department has invented a whole new way of dealing
with a bunch of things at the hardware level (for some reason).

Also almost all laptop vendors have a whole bunch of laptop vendor
specific userspace API in the form of sysfs files exported by
drivers/platform/x86/laptop-vendor.c drivers. E.g. do:

ls /sys/bus/platform/devices/thinkpad_acpi/

An any IBM/Lenovo Thinkpad (and only on a Thinkpad) to see a bunch
of laptop vendor specific UAPI.

Since I've become the pdx86 subsys maintainer I've actually been
pushing back against adding more of this, instead trying to
either use existing UAPIs, or defining new common UAPIs which can
be shared between vendors.

So I agree very much with you that we need to be careful about
needlessly introducing new UAPI.

But there is a difference between being careful and just nacking
it because no new UAPI may be added at all (also see GKH's response).

>>> More on that, the whole purpose of proposed interface is to debug and
>>> not intended to be used by any user space code.
>>
>> The purpose is to provide raw access to the Surface Serial Hub protocol,
>> just like we provide raw access to USB devices and have hidraw devices.
> 
> USB devices implement standard protocol, this surface hub is nothing
> even close to that.

The USB protocol just defines a transport layer, outside of the USB classes
there are plenty of proprietary protocols on top of that transport.

And this chardev just offers access to the Surface Serial Hub transport
protocol. And if you want something even closer the i2cdev module offers
raw I2C transfer access and I2C defines no protocol other then
how to read or write a number of bytes.

I do a lot of hw enablement work and being able to poke HID / USB / I2C
devices directly from userspace is very useful for this.

>> So this goes a litle beyond just debugging; and eventually the choice
>> may be made to implement some functionality with userspace drivers,
>> just like we do for some HID and USB devices.
> 
> I don't know how it goes in device/platform area, but in other large
> subsystems, UAPI should be presented with working user-space part.
> 
>>
>> Still I agree with you that adding new userspace API is something which
>> needs to be considered carefully. So I will look at this closely when
>> reviewing this set.

So this ^^^ still stands, I agree with you that adding new UAPI needs
to be considered carefully and when I get around to reviewing this
that is exactly what I will do.

Maximilian, can you perhaps explain a bit more of what you want / expect
to use the chardev for, and maybe provide pointers to the matching
userspace utilities (which I presume you have) ?

>>> Also the idea that you are creating new bus just for this device doesn't
>>> really sound right. I recommend you to take a look on auxiliary bus and
>>> use it or come with very strong justifications why it is not fit yet.
>>
>> AFAIK the auxiliary bus is for sharing a single device between multiple
>> drivers, while the main device-driver also still offers functionality
>> (beyond the providing of access) itself.
> 
> The idea behind auxiliary bus is to slice various functionalities into
> different sub-drivers, see it as a way to create subsystem inside one
> driver.

AFAIK the idea is to be able to combine multiple physical devices, e.g.
a PCI device + an ACPI enumerated platform device and then slice the
combination of those 2 up in new devices which may use parts of both
parent devices, quoting from:

https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/auxiliary_bus.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294

"multiple devices might implement a common intersection of functionality"

IOW this is for cases where the simpler bus + devices model does not
work well. AFAICT in this case the simpler bus + devices does work
well, so there is no need to use the auxiliary bus.

>> This is more akin to how the WMI driver also models different WMI
>> functions as a bus + devices on the bus.
>>
>> Or how the SDIO driver multiplex a single SDIO device into its
>> functions by again using a bus + devices on the bus model.
>>
>> Also this has been in the works for quite a while now, the Linux on
>> Microsoft Surface devices community has been working on this out of
>> tree for a long time, see:
>> https://github.com/linux-surface/
> 
> It is not relevant, the code is merged than it is ready.

It is relevant, you cannot expect drivers which were written during
the last 6 months to use functionality which is not even in the
mainline kernel yet (yes it is in -next, but not in mainline).

Now if that new functionality where to provide major benefits to
the code making it much cleaner / better then yes asking to rewrite
it to use that new functionality would make sense.

I need to take a closer look at the code but ATM I'm not convinced
that rewriting it to use the new auxiliary bus stuff will make it
better at all, let alone enough to warrant the effort of the rewrite.

<snip>

Regards,

Hans


p.s.

For the record I do not own any Microsoft Surface devices myself, so
I have no interests here other then to make sure that the Linux
kernel is welcoming to new new contributors and does not needlessly
scare them away.
Leon Romanovsky Dec. 6, 2020, 10:33 a.m. UTC | #8
On Sun, Dec 06, 2020 at 11:04:06AM +0100, Hans de Goede wrote:
> Hi,
>
> On 12/6/20 9:56 AM, Leon Romanovsky wrote:
> > On Sun, Dec 06, 2020 at 09:41:21AM +0100, Hans de Goede wrote:
> >> Hi Leon,
> >>
> >> On 12/6/20 8:07 AM, Leon Romanovsky wrote:
> >>> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:
> >>>> Hello,
> >>>>
> >>>> Here is version two of the Surface System Aggregator Module (SAM/SSAM)
> >>>> driver series, adding initial support for the embedded controller on 5th
> >>>> and later generation Microsoft Surface devices. Initial support includes
> >>>> the ACPI interface to the controller, via which battery and thermal
> >>>> information is provided on some of these devices.
> >>>>
> >>>> The previous version and cover letter detailing what this series is
> >>>> about can be found at
> >>>>
> >>>>   https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/
> >>>>
> >>>> This patch-set can also be found at the following repository and
> >>>> reference, if you prefer to look at a kernel tree instead of these
> >>>> emails:
> >>>>
> >>>>   https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2
> >>>>
> >>>> Thank you all for the feedback to v1, I hope I have addressed all
> >>>> comments.
> >>>
> >>>
> >>> I think that it is too far fetched to attempt and expose UAPI headers
> >>> for some obscure char device that we are all know won't be around in
> >>> a couple of years from now due to the nature of how this embedded world
> >>> works.
> >>
> >> This is not for an embedded device, but for the popular line of
> >> Microsoft Surface laptops / 2-in-1s...
> >
> > It is the naming, we don't have char device for every "laptop" vendor.
> > Why is Microsoft different here?
>
> Because their hardware department has invented a whole new way of dealing
> with a bunch of things at the hardware level (for some reason).

They are not different from any other vendor, it is much cheaper and easier
to do not follow standard implementations.

>
> Also almost all laptop vendors have a whole bunch of laptop vendor
> specific userspace API in the form of sysfs files exported by
> drivers/platform/x86/laptop-vendor.c drivers. E.g. do:
>
> ls /sys/bus/platform/devices/thinkpad_acpi/

It is different from the proposed /dev/surface... char device.

>
> An any IBM/Lenovo Thinkpad (and only on a Thinkpad) to see a bunch
> of laptop vendor specific UAPI.

Yes, it is gross, IBM did it in early days of Linux. Other vendors don't
have anything like that.

>
> Since I've become the pdx86 subsys maintainer I've actually been
> pushing back against adding more of this, instead trying to
> either use existing UAPIs, or defining new common UAPIs which can
> be shared between vendors.
>
> So I agree very much with you that we need to be careful about
> needlessly introducing new UAPI.
>
> But there is a difference between being careful and just nacking
> it because no new UAPI may be added at all (also see GKH's response).

I saw, the author misunderstood the Greg's comments.

>
> >>> More on that, the whole purpose of proposed interface is to debug and
> >>> not intended to be used by any user space code.
> >>
> >> The purpose is to provide raw access to the Surface Serial Hub protocol,
> >> just like we provide raw access to USB devices and have hidraw devices.
> >
> > USB devices implement standard protocol, this surface hub is nothing
> > even close to that.
>
> The USB protocol just defines a transport layer, outside of the USB classes
> there are plenty of proprietary protocols on top of that transport.
>
> And this chardev just offers access to the Surface Serial Hub transport
> protocol. And if you want something even closer the i2cdev module offers
> raw I2C transfer access and I2C defines no protocol other then
> how to read or write a number of bytes.
>
> I do a lot of hw enablement work and being able to poke HID / USB / I2C
> devices directly from userspace is very useful for this.

Greg wrote how to do it.

>
> >> So this goes a litle beyond just debugging; and eventually the choice
> >> may be made to implement some functionality with userspace drivers,
> >> just like we do for some HID and USB devices.
> >
> > I don't know how it goes in device/platform area, but in other large
> > subsystems, UAPI should be presented with working user-space part.
> >
> >>
> >> Still I agree with you that adding new userspace API is something which
> >> needs to be considered carefully. So I will look at this closely when
> >> reviewing this set.
>
> So this ^^^ still stands, I agree with you that adding new UAPI needs
> to be considered carefully and when I get around to reviewing this
> that is exactly what I will do.
>
> Maximilian, can you perhaps explain a bit more of what you want / expect
> to use the chardev for, and maybe provide pointers to the matching
> userspace utilities (which I presume you have) ?
>
> >>> Also the idea that you are creating new bus just for this device doesn't
> >>> really sound right. I recommend you to take a look on auxiliary bus and
> >>> use it or come with very strong justifications why it is not fit yet.
> >>
> >> AFAIK the auxiliary bus is for sharing a single device between multiple
> >> drivers, while the main device-driver also still offers functionality
> >> (beyond the providing of access) itself.
> >
> > The idea behind auxiliary bus is to slice various functionalities into
> > different sub-drivers, see it as a way to create subsystem inside one
> > driver.
>
> AFAIK the idea is to be able to combine multiple physical devices, e.g.
> a PCI device + an ACPI enumerated platform device and then slice the
> combination of those 2 up in new devices which may use parts of both
> parent devices, quoting from:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/tree/Documentation/driver-api/auxiliary_bus.rst?h=driver-core-next&id=7de3697e9cbd4bd3d62bafa249d57990e1b8f294
>
> "multiple devices might implement a common intersection of functionality"

It is one way, another is to take one device and create many small
devices out of it.
https://lore.kernel.org/alsa-devel/20201026111849.1035786-1-leon@kernel.org/

>
> IOW this is for cases where the simpler bus + devices model does not
> work well. AFAICT in this case the simpler bus + devices does work
> well, so there is no need to use the auxiliary bus.

It is designed to replace invention of custom buses.

>
> >> This is more akin to how the WMI driver also models different WMI
> >> functions as a bus + devices on the bus.
> >>
> >> Or how the SDIO driver multiplex a single SDIO device into its
> >> functions by again using a bus + devices on the bus model.
> >>
> >> Also this has been in the works for quite a while now, the Linux on
> >> Microsoft Surface devices community has been working on this out of
> >> tree for a long time, see:
> >> https://github.com/linux-surface/
> >
> > It is not relevant, the code is merged than it is ready.
>
> It is relevant, you cannot expect drivers which were written during
> the last 6 months to use functionality which is not even in the
> mainline kernel yet (yes it is in -next, but not in mainline).
>
> Now if that new functionality where to provide major benefits to
> the code making it much cleaner / better then yes asking to rewrite
> it to use that new functionality would make sense.

And who will rewrite it? My experience shows that right code should be
written from the beginning otherwise the code has too many chances to be
abandoned.

Thanks
Hans de Goede Dec. 6, 2020, 10:43 a.m. UTC | #9
Hi,

On 12/6/20 11:33 AM, Maximilian Luz wrote:
> On 12/6/20 10:06 AM, Leon Romanovsky wrote:> On Sun, Dec 06, 2020 at 05:58:32PM +0900, Blaž Hrastnik wrote:

>>>>

>>>>> More on that, the whole purpose of proposed interface is to debug and

>>>>> not intended to be used by any user space code.

>>>>

>>>> The purpose is to provide raw access to the Surface Serial Hub protocol,

>>>> just like we provide raw access to USB devices and have hidraw devices.

>>>>

>>>> So this goes a litle beyond just debugging; and eventually the choice

>>>> may be made to implement some functionality with userspace drivers,

>>>> just like we do for some HID and USB devices.

>>>>

>>>> Still I agree with you that adding new userspace API is something which

>>>> needs to be considered carefully. So I will look at this closely when

>>>> reviewing this set.

>>>

>>> To add to that: this was previously a debugfs interface but was moved to misc after review on the initial RFC:

>>> https://lkml.org/lkml/2020/9/24/96

>>

>> There is a huge difference between the suggestion and final implementation.

>>

>> Greg suggested to add new debug module to the drivers/misc that will

>> open char device explicitly after user loaded that module to debug this

>> hub. However, the author added full blown char device as a first citizen

>> that has all not-break-user constrains.

> 

> This module still needs to be loaded explicitly.


Good then I really do not see a problem with this.

> And (I might be wrong

> about this) the "not-break-user constraints" hold as soon as I register

> a misc device at all, no?


Correct.

> So I don't see how this is a) any different

> than previously discussed with Greg and b) how the uapi header now

> introduces any not-break-user constraints that would not be there

> without it.

> 

> This interface is intended as a stable interface. That's something that

> I committed to as soon as I decided to implement this via a misc-device.

> 

> Sure, I can move the definitions in the uapi header to the module

> itself, but I don't see any benefit in that.


Right, if we are going to use a misc chardev for this, then the
correct thing to do is to put the API bits for that chardev under
include/uapi.

It would still be good if you can provide a pointer to some userspace
tools using this new API; and for the next version maybe add that
pointer to the commit message

Regards,

Hans
Maximilian Luz Dec. 6, 2020, 10:56 a.m. UTC | #10
On 12/6/20 11:43 AM, Hans de Goede wrote:
> Hi,

> 

> On 12/6/20 11:33 AM, Maximilian Luz wrote:

>> On 12/6/20 10:06 AM, Leon Romanovsky wrote:> On Sun, Dec 06, 2020 at 05:58:32PM +0900, Blaž Hrastnik wrote:

>>>>>

>>>>>> More on that, the whole purpose of proposed interface is to debug and

>>>>>> not intended to be used by any user space code.

>>>>>

>>>>> The purpose is to provide raw access to the Surface Serial Hub protocol,

>>>>> just like we provide raw access to USB devices and have hidraw devices.

>>>>>

>>>>> So this goes a litle beyond just debugging; and eventually the choice

>>>>> may be made to implement some functionality with userspace drivers,

>>>>> just like we do for some HID and USB devices.

>>>>>

>>>>> Still I agree with you that adding new userspace API is something which

>>>>> needs to be considered carefully. So I will look at this closely when

>>>>> reviewing this set.

>>>>

>>>> To add to that: this was previously a debugfs interface but was moved to misc after review on the initial RFC:

>>>> https://lkml.org/lkml/2020/9/24/96

>>>

>>> There is a huge difference between the suggestion and final implementation.

>>>

>>> Greg suggested to add new debug module to the drivers/misc that will

>>> open char device explicitly after user loaded that module to debug this

>>> hub. However, the author added full blown char device as a first citizen

>>> that has all not-break-user constrains.

>>

>> This module still needs to be loaded explicitly.

> 

> Good then I really do not see a problem with this.

> 

>> And (I might be wrong

>> about this) the "not-break-user constraints" hold as soon as I register

>> a misc device at all, no?

> 

> Correct.

> 

>> So I don't see how this is a) any different

>> than previously discussed with Greg and b) how the uapi header now

>> introduces any not-break-user constraints that would not be there

>> without it.

>>

>> This interface is intended as a stable interface. That's something that

>> I committed to as soon as I decided to implement this via a misc-device.

>>

>> Sure, I can move the definitions in the uapi header to the module

>> itself, but I don't see any benefit in that.

> 

> Right, if we are going to use a misc chardev for this, then the

> correct thing to do is to put the API bits for that chardev under

> include/uapi.

> 

> It would still be good if you can provide a pointer to some userspace

> tools using this new API; and for the next version maybe add that

> pointer to the commit message


Right, I will add that to the commit message. I just linked you the
scripts in my other response, but here again for completeness:

   https://github.com/linux-surface/surface-aggregator-module/tree/master/scripts/ssam

While I'm not using the header directly (the scripts are written in
python) I still think uapi is the right place to put this (please
correct me if I'm wrong). Not putting them there seems to be needless
obfuscating to me.

Regards,
Max
Maximilian Luz Dec. 6, 2020, 11:13 a.m. UTC | #11
On 12/6/20 9:32 AM, Greg Kroah-Hartman wrote:
> On Sun, Dec 06, 2020 at 09:07:05AM +0200, Leon Romanovsky wrote:

>> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:

>>> Hello,

>>>

>>> Here is version two of the Surface System Aggregator Module (SAM/SSAM)

>>> driver series, adding initial support for the embedded controller on 5th

>>> and later generation Microsoft Surface devices. Initial support includes

>>> the ACPI interface to the controller, via which battery and thermal

>>> information is provided on some of these devices.

>>>

>>> The previous version and cover letter detailing what this series is

>>> about can be found at

>>>

>>>    https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/

>>>

>>> This patch-set can also be found at the following repository and

>>> reference, if you prefer to look at a kernel tree instead of these

>>> emails:

>>>

>>>    https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2

>>>

>>> Thank you all for the feedback to v1, I hope I have addressed all

>>> comments.

>>

>>

>> I think that it is too far fetched to attempt and expose UAPI headers

>> for some obscure char device that we are all know won't be around in

>> a couple of years from now due to the nature of how this embedded world

>> works.

> 

> No, that's not ok, we do this for loads of devices out there.  If there

> is a device that wants to be supported for Linux, and a developer that

> wants to support it, we will take it.

> 

>> More on that, the whole purpose of proposed interface is to debug and

>> not intended to be used by any user space code.

> 

> I thought that debugfs was going to be used for most of the debugging

> code, or has that changed in newer versions of this patchset?

As per previous discussion (https://lkml.org/lkml/2020/9/24/96) I have
replaced the debugfs device by a misc-device with stable interface.

I also believe that this is probably the better option long-term. The
general idea is to have a device that has direct access to the
EC/transport protocol and can be used for development and prototyping.
Debugging is a part of that. So it's more akin to something raw access
via i2cdev, hidraw, or raw access to USB devices as Hans de Goede
mentioned in one of his mails. Note that the module must still be loaded
manually

Regards,
Max
Leon Romanovsky Dec. 6, 2020, 11:30 a.m. UTC | #12
On Sun, Dec 06, 2020 at 11:33:40AM +0100, Maximilian Luz wrote:
> On 12/6/20 10:06 AM, Leon Romanovsky wrote:> On Sun, Dec 06, 2020 at 05:58:32PM +0900, Blaž Hrastnik wrote:
> > > >
> > > > > More on that, the whole purpose of proposed interface is to debug and
> > > > > not intended to be used by any user space code.
> > > >
> > > > The purpose is to provide raw access to the Surface Serial Hub protocol,
> > > > just like we provide raw access to USB devices and have hidraw devices.
> > > >
> > > > So this goes a litle beyond just debugging; and eventually the choice
> > > > may be made to implement some functionality with userspace drivers,
> > > > just like we do for some HID and USB devices.
> > > >
> > > > Still I agree with you that adding new userspace API is something which
> > > > needs to be considered carefully. So I will look at this closely when
> > > > reviewing this set.
> > >
> > > To add to that: this was previously a debugfs interface but was moved to misc after review on the initial RFC:
> > > https://lkml.org/lkml/2020/9/24/96
> >
> > There is a huge difference between the suggestion and final implementation.
> >
> > Greg suggested to add new debug module to the drivers/misc that will
> > open char device explicitly after user loaded that module to debug this
> > hub. However, the author added full blown char device as a first citizen
> > that has all not-break-user constrains.
>
> This module still needs to be loaded explicitly. And (I might be wrong
> about this) the "not-break-user constraints" hold as soon as I register
> a misc device at all, no?

I don't think so, files in drivers/misc/* don't have such strict policy.

> than previously discussed with Greg and b) how the uapi header now
> introduces any not-break-user constraints that would not be there
> without it.

There is a huge difference between char device for the debug and
exposed UAPI header. The first requires from the user to build and
explicitly run it, while header allows to reliably build on top of
it various applications that we don't control. The not-break-rule
talks about the second.

>
> This interface is intended as a stable interface. That's something that
> I committed to as soon as I decided to implement this via a misc-device.
>
> Sure, I can move the definitions in the uapi header to the module
> itself, but I don't see any benefit in that. If someone really wants to
> use this interface, they can just as well copy the definitions from the
> module source itself. So why not be upfront about it and make life
> easier for everyone?

Because you are actually making life harder for everyone who cares about
UAPIs exposed by the Linux and they definitely different in numbers from
those who needs debug interface for the Microsoft Surface board.

Thanks

>
> Regards,
> Max
>
Leon Romanovsky Dec. 6, 2020, 11:41 a.m. UTC | #13
On Sun, Dec 06, 2020 at 11:41:46AM +0100, Hans de Goede wrote:
> Hi,

>

> On 12/6/20 11:33 AM, Leon Romanovsky wrote:

> > On Sun, Dec 06, 2020 at 11:04:06AM +0100, Hans de Goede wrote:

>

> <snip>

>

> >> But there is a difference between being careful and just nacking

> >> it because no new UAPI may be added at all (also see GKH's response).

> >

> > I saw, the author misunderstood the Greg's comments.

>

> Quoting from patch 8/9:

>

> "

> +==============================

> +User-Space EC Interface (cdev)

> +==============================

> +

> +The ``surface_aggregator_cdev`` module provides a misc-device for the SSAM

> +controller to allow for a (more or less) direct connection from user-space to

> +the SAM EC. It is intended to be used for development and debugging, and

> +therefore should not be used or relied upon in any other way. Note that this

> +module is not loaded automatically, but instead must be loaded manually.

> "

>

> If I'm not mistaken that seems to be pretty much what Greg asked for.


Right, unless you forget the end of his request.
 "
  The "joy" of creating a user api is that no matter how much you tell
  people "do not depend on this", they will, so no matter the file being
  in debugfs, or a misc device, you might be stuck with it for forever,
  sorry.
 "

So I still think that exposing user api for a development and debug of device
that has no future is wrong thing to do.

Thanks

>

> Regards,

>

> Hans

>
Maximilian Luz Dec. 6, 2020, 1:27 p.m. UTC | #14
On 12/6/20 12:30 PM, Leon Romanovsky wrote:
> On Sun, Dec 06, 2020 at 11:33:40AM +0100, Maximilian Luz wrote:
>> On 12/6/20 10:06 AM, Leon Romanovsky wrote:> On Sun, Dec 06, 2020 at 05:58:32PM +0900, Blaž Hrastnik wrote:
>>>>>
>>>>>> More on that, the whole purpose of proposed interface is to debug and
>>>>>> not intended to be used by any user space code.
>>>>>
>>>>> The purpose is to provide raw access to the Surface Serial Hub protocol,
>>>>> just like we provide raw access to USB devices and have hidraw devices.
>>>>>
>>>>> So this goes a litle beyond just debugging; and eventually the choice
>>>>> may be made to implement some functionality with userspace drivers,
>>>>> just like we do for some HID and USB devices.
>>>>>
>>>>> Still I agree with you that adding new userspace API is something which
>>>>> needs to be considered carefully. So I will look at this closely when
>>>>> reviewing this set.
>>>>
>>>> To add to that: this was previously a debugfs interface but was moved to misc after review on the initial RFC:
>>>> https://lkml.org/lkml/2020/9/24/96
>>>
>>> There is a huge difference between the suggestion and final implementation.
>>>
>>> Greg suggested to add new debug module to the drivers/misc that will
>>> open char device explicitly after user loaded that module to debug this
>>> hub. However, the author added full blown char device as a first citizen
>>> that has all not-break-user constrains.
>>
>> This module still needs to be loaded explicitly. And (I might be wrong
>> about this) the "not-break-user constraints" hold as soon as I register
>> a misc device at all, no?
> 
> I don't think so, files in drivers/misc/* don't have such strict policy.

Can I get a link to the documentation stating that or someone else
confirming that?

Also I don't think it makes sense to have a platform/surface device in
drivers/misc, after we've explicitly decided to move this code out of
there. IIRC drivers/misc is not a place for misc-devices, but the
directory for devices that don't have any good place elsewhere.

>> than previously discussed with Greg and b) how the uapi header now
>> introduces any not-break-user constraints that would not be there
>> without it.
> 
> There is a huge difference between char device for the debug and
> exposed UAPI header. The first requires from the user to build and
> explicitly run it, while header allows to reliably build on top of
> it various applications that we don't control. The not-break-rule
> talks about the second.

So it's okay to break stuff that's not explicitly in include/uapi/?
Again, can I get someone to confirm that for me?

As already said, I'm okay with moving the definitions from the header to
the module itself (if there is a consensus on that, CC Greg, Hans),
however both allow you to build user-space tools against the API. Case
in point my python scripts, which don't use the header. Or any other
non-C-based tool. So unless there's a rule that anything without a
header in uapi is fair game, I fail to see your point.

>> This interface is intended as a stable interface. That's something that
>> I committed to as soon as I decided to implement this via a misc-device.
>>
>> Sure, I can move the definitions in the uapi header to the module
>> itself, but I don't see any benefit in that. If someone really wants to
>> use this interface, they can just as well copy the definitions from the
>> module source itself. So why not be upfront about it and make life
>> easier for everyone?
> 
> Because you are actually making life harder for everyone who cares about
> UAPIs exposed by the Linux and they definitely different in numbers from
> those who needs debug interface for the Microsoft Surface board.

This point again depends on the interface not being stable. Unless, of
course, you want me to remove the interface completely and just maintain
it out of tree...

I'm happy to be told otherwise by the authorities here, but from past
conversations it seems that basically everything providing some sort of
user-space access falls under the "don't break user-space" rule as soon
as somebody uses it.

Regards,
Max
Maximilian Luz Dec. 6, 2020, 1:43 p.m. UTC | #15
On 12/6/20 12:41 PM, Leon Romanovsky wrote:
> On Sun, Dec 06, 2020 at 11:41:46AM +0100, Hans de Goede wrote:

>> Hi,

>>

>> On 12/6/20 11:33 AM, Leon Romanovsky wrote:

>>> On Sun, Dec 06, 2020 at 11:04:06AM +0100, Hans de Goede wrote:

>>

>> <snip>

>>

>>>> But there is a difference between being careful and just nacking

>>>> it because no new UAPI may be added at all (also see GKH's response).

>>>

>>> I saw, the author misunderstood the Greg's comments.

>>

>> Quoting from patch 8/9:

>>

>> "

>> +==============================

>> +User-Space EC Interface (cdev)

>> +==============================

>> +

>> +The ``surface_aggregator_cdev`` module provides a misc-device for the SSAM

>> +controller to allow for a (more or less) direct connection from user-space to

>> +the SAM EC. It is intended to be used for development and debugging, and

>> +therefore should not be used or relied upon in any other way. Note that this

>> +module is not loaded automatically, but instead must be loaded manually.

>> "

>>

>> If I'm not mistaken that seems to be pretty much what Greg asked for.

> 

> Right, unless you forget the end of his request.

>   "

>    The "joy" of creating a user api is that no matter how much you tell

>    people "do not depend on this", they will, so no matter the file being

>    in debugfs, or a misc device, you might be stuck with it for forever,

>    sorry.

>   "


Which to me reads as "if you want a user-space interface for development and
debugging, you'll have to make it a stable interface, regardless where it is
implemented". Rather making a point for a well though-out stable interface.
Specifically with regards to

    "
     >  - So you suggest I go with a misc device instead of putting this into

     >    debugfs?


     Yes.
    "

Unless of course I'm misunderstanding things entirely. Greg, please feel free
to yell at me if I've got this wrong.

> So I still think that exposing user api for a development and debug of device

> that has no future is wrong thing to do.


Unless you know something that I don't, MS is rumored to come out with a new
Surface Pro 8 and Surface Laptop 4 early next year, which I fully expect to
also have this EC built in. And if they once again decided to move some
functionality normally provided via ACPI or other means to the EC for some
reason, we'll likely need that interface again.

Yes, it has no future outside of Surface devices, but so has every other
platform driver with respect to their specific platform. What are your
alternatives to exposing a user API? If we want to be able to easily test
and attempt to provide support for Surface devices, there has to be some
interaction with user-space.

With respect to stability of the interface and future changes, I believe
that IOCTLs are the way to go. If that's in debugfs or, as was the
result from the previous discussion about this, via a misc-device...
I'll be happy to implement whatever a consensus yields, as long as it
can be used for its intended purpose: aid development with regards to
the EC found on Surface devices.

Regards,
Max
Maximilian Luz Dec. 6, 2020, 3:58 p.m. UTC | #16
On 12/6/20 8:07 AM, Leon Romanovsky wrote:
> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:
>> Hello,
>>
>> Here is version two of the Surface System Aggregator Module (SAM/SSAM)
>> driver series, adding initial support for the embedded controller on 5th
>> and later generation Microsoft Surface devices. Initial support includes
>> the ACPI interface to the controller, via which battery and thermal
>> information is provided on some of these devices.
>>
>> The previous version and cover letter detailing what this series is
>> about can be found at
>>
>>    https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/
>>
>> This patch-set can also be found at the following repository and
>> reference, if you prefer to look at a kernel tree instead of these
>> emails:
>>
>>    https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2
>>
>> Thank you all for the feedback to v1, I hope I have addressed all
>> comments.
> 
> 
> I think that it is too far fetched to attempt and expose UAPI headers
> for some obscure char device that we are all know won't be around in
> a couple of years from now due to the nature of how this embedded world
> works.
> 
> More on that, the whole purpose of proposed interface is to debug and
> not intended to be used by any user space code.

I believe this has already been extensively discussed. I want to focus
more on the part below in this response:

> Also the idea that you are creating new bus just for this device doesn't
> really sound right. I recommend you to take a look on auxiliary bus and
> use it or come with very strong justifications why it is not fit yet.

I tend to agree that this is a valid concern to bring up, and adding a
new bus is not something that should be done lightly.

Let's ignore that this has been merged into -next after I've submitted
this (and that I only recently became aware of this) for the time being.
If I would see a clear benefit, I would not hesitate to switch the
driver and subsystem over to this.

What does concern me most, is the device/driver matching by string.
Right now, this subsystem matches those via a device UID. This UID is
directly tied to the EC functionality provided by the device. A bit of
background to this:

Requests sent to the EC contain an address, so to say. This consists of

  - Target category (TC): Broad group of functionality, e.g. battery/AC,
    thermal, HID input, ..., i.e. a subsystem of sorts.

  - Target ID (TID): Some major device, e.g. the dual batteries on the
    Surface Book 3 are addressed by target ID 1 and 2, some functionality
    is only available at 2 and some only at 1. May be related to physical
    parts of/locations on the device.

  - Instance ID (IID): A device instance, e.g. for thermal sensors each
    sensor is at TC=0x03 (thermal) and has a different instance ID.

Those can be used to pretty much uniquely identify a sub-device on the
EC.

Note the "pretty much". To truly make them unique we can add a function
ID (FN). With that, we can for example match for TC=0x03, TID=*, IID=*,
FN=0x00 to load a driver against all thermal sensors. And this is
basically the device UID that the subsystem uses for matching (modulo
domain for virtual devices, i.e. device hubs). Sure, we can use some
string, but that then leads to having to come up with creative names
once we need some driver specific data, e.g. in the battery driver [1]:

     const struct auxiliary_device_id my_auxiliary_id_table[] = {
         { .name = "surface_aggregator_registry.battery", .driver_data = x },
         { .name = "surface_aggregator_registry.battery_sb3", .driver_data = y },
         { },
     }

Arguably, not _that_ big of a deal.

What worries me more is that this will block any path of auto-detecting
devices on a more general/global level. Right now, we hard-code devices
because we haven't found any way to detect them via some EC query yet
[2] (FYI the node groups contain all devices that will eventually be
added to the bus, which are already 11 devices on the Surface Book 3
without taking missing thermal sensors into account; also they are
spread across a bunch of subsystems, so not just platform). That's of
course not an ideal solution and one that I hope we can eventually fix.
If we can auto-detect devices, it's very likely that we know or can
easily get to the device UID. A meaningful string is somewhat more
difficult.

This registry, which is loaded against a platform device that, from what
we can tell differentiates the models for some driver bindings by
Windows (that's speculation), is also the reason why we don't register
client devices directly under the main module, so instead of a nice
"surface_aggregator.<devicename>", you'll get
"surface_aggregator_registry.<devicename>". And it may not end there.

Something that's currently not implemented is support for thermal
sensors on 7th generation devices. With thermal sensors, we can already
detect which sensors, i.e. which IIDs, are present. Naturally, that's
part of the EC-API for thermal devices (TC=0x03), so would warrant a
master driver that registers the individual sensor drivers (that's a
place where I'd argue that in a normal situation, the auxiliary bus
makes sense). So with the auxiliary bus we'd now end up with devices
with "surface_thermal.sensor" for the sensors as well as
"surface_aggregator_registry.<devicename>", both of type ssam_device
(which then would be a wrapper around auxiliary_device with UID stored
in that wrapper). Note that they need to be of type ssam_device (or
another wrapper around that) as they again need the reference to the
controller device, their UID for access, etc. With a proper bus, device,
and the UID for matching, we can just add the sensor devices to the bus
again, as they will have a meaningful and guaranteed unique UID.

 From some reports I've seen it looks like thermal sensors may also be
available separately on TID=0x01 as well as TID=0x02 on some devices,
at which point I believe you'd need to introduce some IDA for ID
allocation to not cause a clash with IDs. At least if you separate the
base drivers for each TC, which I guess should be preferred due to
code-reuse. Then again they might use different event registries so you
may end up needing "surface_thermal.sensor_tc1" and
"surface_thermal.sensor_tc2" as device names to differentiate those
for driver loading. Or store the registry in software node properties
when registering the device.

I'm repeating myself here, but to me it looks cleaner to have a single
bus type as opposed to spreading the same base auxiliary device type
over several namespaces.

Which then leads me to the question of how a function like
"is_ssam_device()", i.e. a function testing if the device is of a given
type, would be implemented without enforcing and testing against some
part of the device name. Something that, again, doesn't look clean to
me. Although the use of such a function could probably avoided, but that
then feels like working around the auxiliary bus.

Unfortunately, there are a couple more hypotheticals at play than I'd
like to have (making this not an easy decision), but it's a reverse
engineered driver so I guess that comes with the territory. All in all,
I believe it's possible to do this (i.e. use the auxiliary bus), but, to
me at least, the implementation using a discrete bus feels tidier and
more true to the hardware (or virtual hardware anyway) behind this. I'm
happy to hear any arguments against this though.

Regards,
Max

[1]: https://github.com/linux-surface/surface-aggregator-module/blob/61b9bb859c30a8e17654c3a06696feb2691438f7/module/src/clients/surface_battery.c#L1075-L1079
[2]: https://github.com/linux-surface/surface-aggregator-module/blob/61b9bb859c30a8e17654c3a06696feb2691438f7/module/src/clients/surface_aggregator_registry.c
Hans de Goede Dec. 7, 2020, 8:49 a.m. UTC | #17
Hi,

On 12/6/20 4:58 PM, Maximilian Luz wrote:
> On 12/6/20 8:07 AM, Leon Romanovsky wrote:
>> On Thu, Dec 03, 2020 at 10:26:31PM +0100, Maximilian Luz wrote:
>>> Hello,
>>>
>>> Here is version two of the Surface System Aggregator Module (SAM/SSAM)
>>> driver series, adding initial support for the embedded controller on 5th
>>> and later generation Microsoft Surface devices. Initial support includes
>>> the ACPI interface to the controller, via which battery and thermal
>>> information is provided on some of these devices.
>>>
>>> The previous version and cover letter detailing what this series is
>>> about can be found at
>>>
>>>    https://lore.kernel.org/platform-driver-x86/20201115192143.21571-1-luzmaximilian@gmail.com/
>>>
>>> This patch-set can also be found at the following repository and
>>> reference, if you prefer to look at a kernel tree instead of these
>>> emails:
>>>
>>>    https://github.com/linux-surface/kernel tags/s/surface-aggregator/v2
>>>
>>> Thank you all for the feedback to v1, I hope I have addressed all
>>> comments.
>>
>>
>> I think that it is too far fetched to attempt and expose UAPI headers
>> for some obscure char device that we are all know won't be around in
>> a couple of years from now due to the nature of how this embedded world
>> works.
>>
>> More on that, the whole purpose of proposed interface is to debug and
>> not intended to be used by any user space code.
> 
> I believe this has already been extensively discussed. I want to focus
> more on the part below in this response:
> 
>> Also the idea that you are creating new bus just for this device doesn't
>> really sound right. I recommend you to take a look on auxiliary bus and
>> use it or come with very strong justifications why it is not fit yet.
> 
> I tend to agree that this is a valid concern to bring up, and adding a
> new bus is not something that should be done lightly.
> 
> Let's ignore that this has been merged into -next after I've submitted
> this (and that I only recently became aware of this) for the time being.
> If I would see a clear benefit, I would not hesitate to switch the
> driver and subsystem over to this.
> 
> What does concern me most, is the device/driver matching by string.
> Right now, this subsystem matches those via a device UID. This UID is
> directly tied to the EC functionality provided by the device. A bit of
> background to this:
> 
> Requests sent to the EC contain an address, so to say. This consists of
> 
>  - Target category (TC): Broad group of functionality, e.g. battery/AC,
>    thermal, HID input, ..., i.e. a subsystem of sorts.
> 
>  - Target ID (TID): Some major device, e.g. the dual batteries on the
>    Surface Book 3 are addressed by target ID 1 and 2, some functionality
>    is only available at 2 and some only at 1. May be related to physical
>    parts of/locations on the device.
> 
>  - Instance ID (IID): A device instance, e.g. for thermal sensors each
>    sensor is at TC=0x03 (thermal) and has a different instance ID.
> 
> Those can be used to pretty much uniquely identify a sub-device on the
> EC.

Thank you for this explanation, that is going to be useful to know
when I get around to reviewing this set (although I guess that you
probably also have written this down in one of the commit msgs /
docs I did not check).

> 
> Note the "pretty much". To truly make them unique we can add a function
> ID (FN). With that, we can for example match for TC=0x03, TID=*, IID=*,
> FN=0x00 to load a driver against all thermal sensors. And this is
> basically the device UID that the subsystem uses for matching (modulo
> domain for virtual devices, i.e. device hubs). Sure, we can use some
> string, but that then leads to having to come up with creative names
> once we need some driver specific data, e.g. in the battery driver [1]:
> 
>     const struct auxiliary_device_id my_auxiliary_id_table[] = {
>         { .name = "surface_aggregator_registry.battery", .driver_data = x },
>         { .name = "surface_aggregator_registry.battery_sb3", .driver_data = y },
>         { },
>     }
> 
> Arguably, not _that_ big of a deal.
> 
> What worries me more is that this will block any path of auto-detecting
> devices on a more general/global level. Right now, we hard-code devices
> because we haven't found any way to detect them via some EC query yet
> [2] (FYI the node groups contain all devices that will eventually be
> added to the bus, which are already 11 devices on the Surface Book 3
> without taking missing thermal sensors into account; also they are
> spread across a bunch of subsystems, so not just platform). That's of
> course not an ideal solution and one that I hope we can eventually fix.
> If we can auto-detect devices, it's very likely that we know or can
> easily get to the device UID. A meaningful string is somewhat more
> difficult.
> 
> This registry, which is loaded against a platform device that, from what
> we can tell differentiates the models for some driver bindings by
> Windows (that's speculation), is also the reason why we don't register
> client devices directly under the main module, so instead of a nice
> "surface_aggregator.<devicename>", you'll get
> "surface_aggregator_registry.<devicename>". And it may not end there.
> 
> Something that's currently not implemented is support for thermal
> sensors on 7th generation devices. With thermal sensors, we can already
> detect which sensors, i.e. which IIDs, are present. Naturally, that's
> part of the EC-API for thermal devices (TC=0x03), so would warrant a
> master driver that registers the individual sensor drivers (that's a
> place where I'd argue that in a normal situation, the auxiliary bus
> makes sense). So with the auxiliary bus we'd now end up with devices
> with "surface_thermal.sensor" for the sensors as well as
> "surface_aggregator_registry.<devicename>", both of type ssam_device
> (which then would be a wrapper around auxiliary_device with UID stored
> in that wrapper). Note that they need to be of type ssam_device (or
> another wrapper around that) as they again need the reference to the
> controller device, their UID for access, etc. With a proper bus, device,
> and the UID for matching, we can just add the sensor devices to the bus
> again, as they will have a meaningful and guaranteed unique UID.
> 
> From some reports I've seen it looks like thermal sensors may also be
> available separately on TID=0x01 as well as TID=0x02 on some devices,
> at which point I believe you'd need to introduce some IDA for ID
> allocation to not cause a clash with IDs. At least if you separate the
> base drivers for each TC, which I guess should be preferred due to
> code-reuse. Then again they might use different event registries so you
> may end up needing "surface_thermal.sensor_tc1" and
> "surface_thermal.sensor_tc2" as device names to differentiate those
> for driver loading. Or store the registry in software node properties
> when registering the device.
> 
> I'm repeating myself here, but to me it looks cleaner to have a single
> bus type as opposed to spreading the same base auxiliary device type
> over several namespaces.
> 
> Which then leads me to the question of how a function like
> "is_ssam_device()", i.e. a function testing if the device is of a given
> type, would be implemented without enforcing and testing against some
> part of the device name. Something that, again, doesn't look clean to
> me. Although the use of such a function could probably avoided, but that
> then feels like working around the auxiliary bus.
> 
> Unfortunately, there are a couple more hypotheticals at play than I'd
> like to have (making this not an easy decision), but it's a reverse
> engineered driver so I guess that comes with the territory. All in all,


> I believe it's possible to do this (i.e. use the auxiliary bus), but, to
> me at least, the implementation using a discrete bus feels tidier and
> more true to the hardware (or virtual hardware anyway) behind this. I'm
> happy to hear any arguments against this though.

I agree, the whole setup with the TC + TID + IID feels like the functionality
is nicely (and cleanly) split into separate functions and as with other
busses using a bus + 1 device per function for this is a perfectly clean
way to handle this.

Note if in the future you do see benefit in switching the auxiliary bus
I have no problems with that. But atm I don't really see any benefits of
doing so, so then we would just be switching over for the sake of switching
over which does not seem productive.

Regards,

Hans
Greg Kroah-Hartman Dec. 7, 2020, 9:12 a.m. UTC | #18
On Mon, Dec 07, 2020 at 09:49:03AM +0100, Hans de Goede wrote:
> Note if in the future you do see benefit in switching the auxiliary bus

> I have no problems with that. But atm I don't really see any benefits of

> doing so, so then we would just be switching over for the sake of switching

> over which does not seem productive.


I too do not see the benefit at this time to switch either.

thanks,

greg k-h
Maximilian Luz Dec. 8, 2020, 2:37 p.m. UTC | #19
On 12/8/20 2:01 PM, Hans de Goede wrote:
> Hi,
> 
> Thank you for your patch series.
> 
> Full review done, I have a couple of minor review remarks inline below.
> 
> Note these are really all suggestions and not must fix for this to
> be merged items.

Thank you! I will try to address and fix them.

[...]

>> +/**
>> + * ssam_event_matches_notifier() - Test if an event matches a notifier.
>> + * @n: The event notifier to test against.
>> + * @event: The event to test.
>> + *
>> + * Return: Returns %true iff the given event matches the given notifier
> 
> s/iff/if/

Ack.

>> +/**
>> + * ssam_nf_refcount_inc() - Increment reference-/activation-count of the given
>> + * event.
>> + * @nf:  The notifier system reference.
>> + * @reg: The registry used to enable/disable the event.
>> + * @id:  The event ID.
>> + *
>> + * Increments the reference-/activation-count associated with the specified
>> + * event type/ID, allocating a new entry for this event ID if necessary. A
>> + * newly allocated entry will have a refcount of one.
>> + *
>> + * Note: Must be synchronized by the caller with regards to other
>> + * ssam_nf_refcount_inc() and ssam_nf_refcount_dec() calls, e.g. via
>> + * ``nf->lock``. Note that this lock should also be used to ensure the
>> + * corresponding EC requests are sent, if necessary.
> 
> You write "e.g. via ``nf->lock``", are any other locking strategies used
> in the other patches ?  If not then it might be good to change this to just
> stating that nf->lock must be held and adding an assert for that. I would
> prefer a clear set of locking rules like this over rules like:
> "must be synchronized ...".

There are no other locking strategies here. I've worded it this way as
"nf->lock" is mainly intended to synchronize the enable requests and is
just by extension also used to synchronize this (i.e. the lock is on a
different abstraction layer). I agree that for clarity it makes sense to
directly specify "nf->lock", so I will re-write this to just say that
"nf->lock" must be held.
  
> Note I am naively assuming here that it is possible to come up with such a
> clear set of locking rules, I'm not sure if that is the case.
> 
> Note in case it is possible to always take nf->lock but you are not doing
> so in some places because of performance concerns then I would prefer to
> move to a model where the caller simply always takes nf->lock. AFAICT non
> of the events/data going through the SSH are high frequency (iow
> none of them occur at a high rate of see 50 times / second or more) so
> I don't think that we need to focus too much on optimizing things for
> speed.

The registration/deregistration functions are all synchronized via
"nf->lock". I also don't consider that path critical. Event handling
(i.e. the actual critical path) is done via a RCU list (reasoning behind
that is mostly for touchpad input events).

>> + *
>> + * Return: Returns the refcount entry on success. Returns an error pointer
>> + * with %-ENOSPC if there have already been %INT_MAX events of the specified
>> + * ID and type registered, or %-ENOMEM if the entry could not be allocated.
>> + */
>> +static struct ssam_nf_refcount_entry
>> +*ssam_nf_refcount_inc(struct ssam_nf *nf, struct ssam_event_registry reg,
>> +		      struct ssam_event_id id)
>> +{
>> +	struct ssam_nf_refcount_entry *entry;
>> +	struct ssam_nf_refcount_key key;
>> +	struct rb_node **link = &nf->refcount.rb_node;
>> +	struct rb_node *parent = NULL;
>> +	int cmp;
>> +
>> +	key.reg = reg;
>> +	key.id = id;
>> +
>> +	while (*link) {
>> +		entry = rb_entry(*link, struct ssam_nf_refcount_entry, node);
>> +		parent = *link;
>> +
>> +		cmp = memcmp(&key, &entry->key, sizeof(key));
>> +		if (cmp < 0) {
>> +			link = &(*link)->rb_left;
>> +		} else if (cmp > 0) {
>> +			link = &(*link)->rb_right;
>> +		} else if (entry->refcount < INT_MAX) {
>> +			entry->refcount++;
>> +			return entry;
>> +		} else {
> 
> So we hit this only if entry->refcount == INT_MAX, right?
> 
> That seems like something which should never happen, right?
> 
> So maybe add a:
> 
> 			WARN_ON(1);
> 
> here?

Yes and yes. That seems like a good idea, I'll do that.

>> +			return ERR_PTR(-ENOSPC);
>> +		}
>> +	}
>> +
>> +	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
>> +	if (!entry)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	entry->key = key;
>> +	entry->refcount = 1;
>> +
>> +	rb_link_node(&entry->node, parent, link);
>> +	rb_insert_color(&entry->node, &nf->refcount);
>> +
>> +	return entry;
>> +}

[...]

>> +/**
>> + * ssam_nf_call() - Call notification callbacks for the provided event.
>> + * @nf:    The notifier system
>> + * @dev:   The associated device, only used for logging.
>> + * @rqid:  The request ID of the event.
>> + * @event: The event provided to the callbacks.
>> + *
>> + * Execute registered callbacks in order of their priority until either no
>> + * callback is left or a callback returns a value with the %SSAM_NOTIF_STOP
>> + * bit set. Note that this bit is set automatically when converting non-zero
>> + * error values via ssam_notifier_from_errno() to notifier values.
>> + *
>> + * Also note that any callback that could handle an event should return a value
>> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go
>> + * unhandled/ignored. In case no registered callback could handle an event,
>> + * this function will emit a warning.
>> + *
>> + * In case a callback failed, this function will emit an error message.
>> + */
>> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid,
>> +			 struct ssam_event *event)
>> +{
>> +	struct ssam_nf_head *nf_head;
>> +	int status, nf_ret;
>> +
>> +	if (!ssh_rqid_is_event(rqid)) {
>> +		dev_warn(dev, "event: unsupported rqid: %#06x\n", rqid);
>> +		return;
>> +	}
>> +
>> +	nf_head = &nf->head[ssh_rqid_to_event(rqid)];
>> +	nf_ret = ssam_nfblk_call_chain(nf_head, event);
>> +	status = ssam_notifier_to_errno(nf_ret);
>> +
>> +	if (status < 0) {
>> +		dev_err(dev,
>> +			"event: error handling event: %d (tc: %#04x, tid: %#04x, cid: %#04x, iid: %#04x)\n",
>> +			status, event->target_category, event->target_id,
>> +			event->command_id, event->instance_id);
>> +	}
>> +
>> +	if (!(nf_ret & SSAM_NOTIF_HANDLED)) {
> 
> Maybe use "else if" here so that on an error we don't emit both the error
> and the unhandled event warning ?

Right, makes sense. An error should be enough.

>> +		dev_warn(dev,
>> +			 "event: unhandled event (rqid: %#04x, tc: %#04x, tid: %#04x, cid: %#04x, iid: %#04x)\n",
>> +			 rqid, event->target_category, event->target_id,
>> +			 event->command_id, event->instance_id);
>> +	}
>> +}
>> +

[...]

>> +static int ssam_dsm_get_functions(acpi_handle handle, u64 *funcs)
>> +{
>> +	union acpi_object *obj;
>> +	u64 mask = 0;
>> +	int i;
>> +
>> +	*funcs = 0;
>> +
>> +	if (!acpi_has_method(handle, "_DSM"))
>> +		return 0;
> 
> Is this expected on some models?

It is, yes. Specifically: Older models don't have that, newer do.

> If yes then maybe add a comment.

Right. I'll have to look at the DSDTs to check in which generation
specifically that _DSM was introduced and add that as a comment.

> If not then I think this can be dropped as acpi_evaluate_dsm_typed()
> will then error out with -EIO which seems better then return 0
> (while this is not expected).
> 
>> +
>> +	obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID,
>> +				      SSAM_SSH_DSM_REVISION, 0, NULL,
>> +				      ACPI_TYPE_BUFFER);
>> +	if (!obj)
>> +		return -EIO;
>> +
>> +	for (i = 0; i < obj->buffer.length && i < 8; i++)
>> +		mask |= (((u64)obj->buffer.pointer[i]) << (i * 8));
>> +
>> +	if (mask & BIT(0))
>> +		*funcs = mask;
>> +
>> +	ACPI_FREE(obj);
>> +	return 0;
>> +}
>> +
>> +static int ssam_dsm_load_u32(acpi_handle handle, u64 funcs, u64 func, u32 *ret)
>> +{
>> +	union acpi_object *obj;
>> +	u64 val;
>> +> +	if (!(funcs & BIT(func)))
>> +		return 0;
> 
> This exit path leaves *ret uninitialized, looking at the usage if this
> function I see that this is intentional, but it did stood out to me, so maybe
> add a comment like this ? :
> 
> 	if (!(funcs & BIT(func)))
> 		return 0; /* Not supported leave *ret at its default value */

Right, will do that.

>> +
>> +	obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID,
>> +				      SSAM_SSH_DSM_REVISION, func, NULL,
>> +				      ACPI_TYPE_INTEGER);
>> +	if (!obj)
>> +		return -EIO;
>> +
>> +	val = obj->integer.value;
>> +	ACPI_FREE(obj);
>> +
>> +	if (val > U32_MAX)
>> +		return -ERANGE;
>> +
>> +	*ret = val;
>> +	return 0;
>> +}

[...]

>> +/**
>> + * ssam_controller_start() - Start the receiver and transmitter threads of the
>> + * controller.
>> + * @ctrl: The controller.
>> + *
>> + * Note: When this function is called, the controller should be properly
>> + * hooked up to the serdev core via &struct serdev_device_ops. Please refer
>> + * to ssam_controller_init() for more details on controller initialization.
>> + *
>> + * This function must be called from an exclusive context with regards to the
>> + * state, if necessary, by locking the controller via ssam_controller_lock().
> 
> Again you are being a bit hand-wavy (I assume you know what I mean by that)
> wrt the locking requirements. If possible I would prefer clearly spelled out
> locking requirements in the form of "this and that lock must be held when
> calling this function". Preferably backed-up by lockdep_assert-s asserting
> these conditions.

The reason for this is that this function specifically is currently only
called during initialization, when the controller has not been published
yet, i.e. when we have an exclusive reference to the controller.

I'll change this to fully enforce locking (with lockdep_assert).

> And maybe if you are a bit stricter with always holding the lock when
> calling this, you can also drop the WRITE_ONCE and the comment about it
> (in all places where you do this).

The WRITE_ONCE is only there to ensure that the basic test in
ssam_request_sync_submit() can be done. I always try to be explicit
about access that can happen without the respective locks being held.

Unfortunately it's not feasible to hold the reader lock in
ssam_request_sync_submit() due to reentrancy. Specifically, as the lock,
if at all (i.e. if this is not a client driver bound to the controller),
must be held not only during submission but until the request has been
completed. Note that if we would hold the lock during submission, this
is just a smoke-test.

> Note, that like my remarks around the locking docs for
> ssam_nf_refcount_inc() this assumes that it is possible to come
> up with such a clear set of locking rules.  If that for some reason
> is not straight forwardd, then maybe add some docs documenting the
> locking expectations somewhere instead ?

It's possible to fully enforce this here, so I'll do that.

>> + */
>> +int ssam_controller_start(struct ssam_controller *ctrl)
>> +{
>> +	int status;
>> +
>> +	if (ctrl->state != SSAM_CONTROLLER_INITIALIZED)
>> +		return -EINVAL;
>> +
>> +	status = ssh_rtl_start(&ctrl->rtl);
>> +	if (status)
>> +		return status;
>> +
>> +	/*
>> +	 * Set state via write_once even though we expect to be locked/in an
>> +	 * exclusive context, due to smoke-testing in
>> +	 * ssam_request_sync_submit().
>> +	 */
>> +	WRITE_ONCE(ctrl->state, SSAM_CONTROLLER_STARTED);
>> +	return 0;
>> +}
>> +
>> +/*
>> + * SSAM_CTRL_SHUTDOWN_FLUSH_TIMEOUT - Timeout for flushing requests during
>> + * shutdown.
>> + *
>> + * Chosen to be larger than one full request timeout, including packets timing
>> + * out. This value should give ample time to complete any outstanding requests
>> + * during normal operation and account for the odd package timeout.
>> + */
>> +#define SSAM_CTRL_SHUTDOWN_FLUSH_TIMEOUT	msecs_to_jiffies(5000)
>> +
>> +/**
>> + * ssam_controller_shutdown() - Shut down the controller.
>> + * @ctrl: The controller.
>> + *
>> + * Shuts down the controller by flushing all pending requests and stopping the
>> + * transmitter and receiver threads. All requests submitted after this call
>> + * will fail with %-ESHUTDOWN. While it is discouraged to do so, this function
>> + * is safe to use in parallel with ongoing request submission.
>> + *
>> + * In the course of this shutdown procedure, all currently registered
>> + * notifiers will be unregistered. It is, however, strongly recommended to not
>> + * rely on this behavior, and instead the party registering the notifier
>> + * should unregister it before the controller gets shut down, e.g. via the
>> + * SSAM bus which guarantees client devices to be removed before a shutdown.
>> + *
>> + * Note that events may still be pending after this call, but, due to the
>> + * notifiers being unregistered, these events will be dropped when the
>> + * controller is subsequently destroyed via ssam_controller_destroy().
>> + *
>> + * This function must be called from an exclusive context with regards to the
>> + * state, if necessary, by locking the controller via ssam_controller_lock().
> 
> Same comment as earlier wrt the locking.

Similar to the above, this can be currently called outside of the lock
during initialization failure. I'll change that to fully enforce locking
too.
>> + */
>> +void ssam_controller_shutdown(struct ssam_controller *ctrl)
>> +{
>> +	enum ssam_controller_state s = ctrl->state;
>> +	int status;
>> +
>> +	if (s == SSAM_CONTROLLER_UNINITIALIZED || s == SSAM_CONTROLLER_STOPPED)
>> +		return;
>> +
>> +	/*
>> +	 * Try to flush pending events and requests while everything still
>> +	 * works. Note: There may still be packets and/or requests in the
>> +	 * system after this call (e.g. via control packets submitted by the
>> +	 * packet transport layer or flush timeout / failure, ...). Those will
>> +	 * be handled with the ssh_rtl_shutdown() call below.
>> +	 */
>> +	status = ssh_rtl_flush(&ctrl->rtl, SSAM_CTRL_SHUTDOWN_FLUSH_TIMEOUT);
>> +	if (status) {
>> +		ssam_err(ctrl, "failed to flush request transport layer: %d\n",
>> +			 status);
>> +	}
>> +
>> +	/* Try to flush all currently completing requests and events. */
>> +	ssam_cplt_flush(&ctrl->cplt);
>> +
>> +	/*
>> +	 * We expect all notifiers to have been removed by the respective client
>> +	 * driver that set them up at this point. If this warning occurs, some
>> +	 * client driver has not done that...
>> +	 */
>> +	WARN_ON(!ssam_notifier_is_empty(ctrl));
>> +
>> +	/*
>> +	 * Nevertheless, we should still take care of drivers that don't behave
>> +	 * well. Thus disable all enabled events, unregister all notifiers.
>> +	 */
>> +	ssam_notifier_unregister_all(ctrl);
>> +
>> +	/*
>> +	 * Cancel remaining requests. Ensure no new ones can be queued and stop
>> +	 * threads.
>> +	 */
>> +	ssh_rtl_shutdown(&ctrl->rtl);
>> +
>> +	/*
>> +	 * Set state via write_once even though we expect to be locked/in an
>> +	 * exclusive context, due to smoke-testing in
>> +	 * ssam_request_sync_submit().
>> +	 */
>> +	WRITE_ONCE(ctrl->state, SSAM_CONTROLLER_STOPPED);
>> +	ctrl->rtl.ptl.serdev = NULL;
>> +}
>> +
>> +/**
>> + * ssam_controller_destroy() - Destroy the controller and free its resources.
>> + * @ctrl: The controller.
>> + *
>> + * Ensures that all resources associated with the controller get freed. This
>> + * function should only be called after the controller has been stopped via
>> + * ssam_controller_shutdown(). In general, this function should not be called
>> + * directly. The only valid place to call this function directly is during
>> + * initialization, before the controller has been fully initialized and passed
>> + * to other processes. This function is called automatically when the
>> + * reference count of the controller reaches zero.
>> + *
>> + * Must be called from an exclusive context with regards to the controller
>> + * state.
> 
> Same comment as earlier wrt the locking.

Right, I'll change that as well.

>> + */
>> +void ssam_controller_destroy(struct ssam_controller *ctrl)
>> +{
>> +	if (ctrl->state == SSAM_CONTROLLER_UNINITIALIZED)
>> +		return;
>> +
>> +	WARN_ON(ctrl->state != SSAM_CONTROLLER_STOPPED);
>> +
>> +	/*
>> +	 * Note: New events could still have been received after the previous
>> +	 * flush in ssam_controller_shutdown, before the request transport layer
>> +	 * has been shut down. At this point, after the shutdown, we can be sure
>> +	 * that no new events will be queued. The call to ssam_cplt_destroy will
>> +	 * ensure that those remaining are being completed and freed.
>> +	 */
>> +
>> +	/* Actually free resources. */
>> +	ssam_cplt_destroy(&ctrl->cplt);
>> +	ssh_rtl_destroy(&ctrl->rtl);
>> +
>> +	/*
>> +	 * Set state via write_once even though we expect to be locked/in an
>> +	 * exclusive context, due to smoke-testing in
>> +	 * ssam_request_sync_submit().
>> +	 */
>> +	WRITE_ONCE(ctrl->state, SSAM_CONTROLLER_UNINITIALIZED);
>> +}
>> +

[...]

>> +/* Must be called with queue lock held. */
>> +static struct list_head *__ssh_ptl_queue_find_entrypoint(struct ssh_packet *p)
>> +{
>> +	struct list_head *head;
>> +	struct ssh_packet *q;
>> +
>> +	/*
>> +	 * We generally assume that there are less control (ACK/NAK) packets
>> +	 * and re-submitted data packets as there are normal data packets (at
>> +	 * least in situations in which many packets are queued; if there
>> +	 * aren't many packets queued the decision on how to iterate should be
>> +	 * basically irrelevant; the number of control/data packets is more or
>> +	 * less limited via the maximum number of pending packets). Thus, when
>> +	 * inserting a control or re-submitted data packet, (determined by
>> +	 * their priority), we search from front to back. Normal data packets
>> +	 * are, usually queued directly at the tail of the queue, so for those
>> +	 * search from back to front.
>> +	 */
>> +
>> +	if (p->priority > SSH_PACKET_PRIORITY(DATA, 0)) {
>> +		list_for_each(head, &p->ptl->queue.head) {
>> +			q = list_entry(head, struct ssh_packet, queue_node);
>> +
>> +			if (q->priority < p->priority)
>> +				break;
>> +		}
>> +	} else {
>> +		list_for_each_prev(head, &p->ptl->queue.head) {
>> +			q = list_entry(head, struct ssh_packet, queue_node);
>> +
>> +			if (q->priority >= p->priority) {
>> +				head = head->next;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	return head;
>> +}
>> +
>> +/* Must be called with queue lock held. */
> 
> Maybe add a lockdep_assert for this ?
> 
> (and the same for a bunch of similar cases below)

Right, makes sense. I'll do that.

>> +static int __ssh_ptl_queue_push(struct ssh_packet *packet)
>> +{
>> +	struct ssh_ptl *ptl = packet->ptl;
>> +	struct list_head *head;
>> +
>> +	if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state))
>> +		return -ESHUTDOWN;
>> +
>> +	/* Avoid further transitions when canceling/completing. */
>> +	if (test_bit(SSH_PACKET_SF_LOCKED_BIT, &packet->state))
>> +		return -EINVAL;
>> +
>> +	/* If this packet has already been queued, do not add it. */
>> +	if (test_and_set_bit(SSH_PACKET_SF_QUEUED_BIT, &packet->state))
>> +		return -EALREADY;
>> +
>> +	head = __ssh_ptl_queue_find_entrypoint(packet);
>> +
>> +	list_add_tail(&ssh_packet_get(packet)->queue_node, head);
>> +	return 0;
>> +}
> 
> <snip about 5000 more lines>
> 
> Snipped because nothing stood out in the rest of this patch.
> 
> Phew that was one large patch to review. I've run out of review
> bandwidth for today, so I will review the rest of the series
> later (probably coming Thursday).

I'm sorry about the size of this. I've tried to split out as much as
possible (caching, tracing, error injection) but couldn't get it smaller
than this while also having something functional.

Please do review this at your own pace. There is no need to hurry.

Thank you for your review!

Regards,
Max
Hans de Goede Dec. 8, 2020, 2:43 p.m. UTC | #20
Hi,

On 12/8/20 3:37 PM, Maximilian Luz wrote:

<snip>

>>> +
>>> +    obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID,
>>> +                      SSAM_SSH_DSM_REVISION, func, NULL,
>>> +                      ACPI_TYPE_INTEGER);
>>> +    if (!obj)
>>> +        return -EIO;
>>> +
>>> +    val = obj->integer.value;
>>> +    ACPI_FREE(obj);
>>> +
>>> +    if (val > U32_MAX)
>>> +        return -ERANGE;
>>> +
>>> +    *ret = val;
>>> +    return 0;
>>> +}
> 
> [...]
> 
>>> +/**
>>> + * ssam_controller_start() - Start the receiver and transmitter threads of the
>>> + * controller.
>>> + * @ctrl: The controller.
>>> + *
>>> + * Note: When this function is called, the controller should be properly
>>> + * hooked up to the serdev core via &struct serdev_device_ops. Please refer
>>> + * to ssam_controller_init() for more details on controller initialization.
>>> + *
>>> + * This function must be called from an exclusive context with regards to the
>>> + * state, if necessary, by locking the controller via ssam_controller_lock().
>>
>> Again you are being a bit hand-wavy (I assume you know what I mean by that)
>> wrt the locking requirements. If possible I would prefer clearly spelled out
>> locking requirements in the form of "this and that lock must be held when
>> calling this function". Preferably backed-up by lockdep_assert-s asserting
>> these conditions.
> 
> The reason for this is that this function specifically is currently only
> called during initialization, when the controller has not been published
> yet, i.e. when we have an exclusive reference to the controller.
> 
> I'll change this to fully enforce locking (with lockdep_assert).
> 
>> And maybe if you are a bit stricter with always holding the lock when
>> calling this, you can also drop the WRITE_ONCE and the comment about it
>> (in all places where you do this).
> 
> The WRITE_ONCE is only there to ensure that the basic test in
> ssam_request_sync_submit() can be done. I always try to be explicit
> about access that can happen without the respective locks being held.

Yes I saw the matching READ_ONCE later on (as the comment indicated
I would), which made it more obvious to me why the WRITE_ONCE is here,'
so maybe I should have gone back and updated this comment.

Anyways, keeping the WRITE_ONCE + READ_ONCE for this is fine.

> Unfortunately it's not feasible to hold the reader lock in
> ssam_request_sync_submit() due to reentrancy. Specifically, as the lock,
> if at all (i.e. if this is not a client driver bound to the controller),
> must be held not only during submission but until the request has been
> completed. Note that if we would hold the lock during submission, this
> is just a smoke-test.

Ack.

<more snip>

Regards,

Hans
Maximilian Luz Dec. 8, 2020, 2:54 p.m. UTC | #21
On 12/8/20 3:43 PM, Hans de Goede wrote:
> Hi,

> 

> On 12/8/20 3:37 PM, Maximilian Luz wrote:

> 

> <snip>

> 

>>>> +

>>>> +    obj = acpi_evaluate_dsm_typed(handle, &SSAM_SSH_DSM_GUID,

>>>> +                      SSAM_SSH_DSM_REVISION, func, NULL,

>>>> +                      ACPI_TYPE_INTEGER);

>>>> +    if (!obj)

>>>> +        return -EIO;

>>>> +

>>>> +    val = obj->integer.value;

>>>> +    ACPI_FREE(obj);

>>>> +

>>>> +    if (val > U32_MAX)

>>>> +        return -ERANGE;

>>>> +

>>>> +    *ret = val;

>>>> +    return 0;

>>>> +}

>>

>> [...]

>>

>>>> +/**

>>>> + * ssam_controller_start() - Start the receiver and transmitter threads of the

>>>> + * controller.

>>>> + * @ctrl: The controller.

>>>> + *

>>>> + * Note: When this function is called, the controller should be properly

>>>> + * hooked up to the serdev core via &struct serdev_device_ops. Please refer

>>>> + * to ssam_controller_init() for more details on controller initialization.

>>>> + *

>>>> + * This function must be called from an exclusive context with regards to the

>>>> + * state, if necessary, by locking the controller via ssam_controller_lock().

>>>

>>> Again you are being a bit hand-wavy (I assume you know what I mean by that)

>>> wrt the locking requirements. If possible I would prefer clearly spelled out

>>> locking requirements in the form of "this and that lock must be held when

>>> calling this function". Preferably backed-up by lockdep_assert-s asserting

>>> these conditions.

>>

>> The reason for this is that this function specifically is currently only

>> called during initialization, when the controller has not been published

>> yet, i.e. when we have an exclusive reference to the controller.

>>

>> I'll change this to fully enforce locking (with lockdep_assert).

>>

>>> And maybe if you are a bit stricter with always holding the lock when

>>> calling this, you can also drop the WRITE_ONCE and the comment about it

>>> (in all places where you do this).

>>

>> The WRITE_ONCE is only there to ensure that the basic test in

>> ssam_request_sync_submit() can be done. I always try to be explicit

>> about access that can happen without the respective locks being held.

> 

> Yes I saw the matching READ_ONCE later on (as the comment indicated

> I would), which made it more obvious to me why the WRITE_ONCE is here,'

> so maybe I should have gone back and updated this comment.


No worries, always good to have another look at these kinds of things.

> Anyways, keeping the WRITE_ONCE + READ_ONCE for this is fine.

> 

>> Unfortunately it's not feasible to hold the reader lock in

>> ssam_request_sync_submit() due to reentrancy. Specifically, as the lock,

>> if at all (i.e. if this is not a client driver bound to the controller),

>> must be held not only during submission but until the request has been

>> completed. Note that if we would hold the lock during submission, this

>> is just a smoke-test.

> 

> Ack.

> 

> <more snip>

> 

> Regards,

> 

> Hans

>