mbox series

[v3,0/2] efi/x86: Call set_os() protocol on dual GPU Macs

Message ID 20240701140940.2340297-4-ardb+git@google.com
Headers show
Series efi/x86: Call set_os() protocol on dual GPU Macs | expand

Message

Ard Biesheuvel July 1, 2024, 2:09 p.m. UTC
From: Ard Biesheuvel <ardb@kernel.org>

v3:
- add patch to make the SMBIOS protocol glue code compatible with mixed
  mode on x86
- update Aditya's patch to limit the effect to dual GPU Macs that are
  known to require the set_os() calls in order for both GPUs to remain
  active after boot
- drop mixed mode handling of set_os() protocol, and dereference the
  struct members directly

Cc: Aditya Garg <gargaditya08@live.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Kerem Karabay <kekrby@gmail.com>
Cc: Orlando Chamberlain <orlandoch.dev@gmail.com>

Aditya Garg (1):
  x86/efistub: Call Apple set_os protocol on dual GPU Intel Macs

Ard Biesheuvel (1):
  efistub/x86: Enable SMBIOS protocol handling for x86

 drivers/firmware/efi/libstub/Makefile   |  2 +-
 drivers/firmware/efi/libstub/smbios.c   | 42 ++++++++----
 drivers/firmware/efi/libstub/x86-stub.c | 72 +++++++++++++++++++-
 include/linux/efi.h                     |  1 +
 4 files changed, 98 insertions(+), 19 deletions(-)

Comments

Lukas Wunner July 1, 2024, 2:17 p.m. UTC | #1
On Mon, Jul 01, 2024 at 04:09:41PM +0200, Ard Biesheuvel wrote:
> Aditya Garg (1):
>   x86/efistub: Call Apple set_os protocol on dual GPU Intel Macs
> 
> Ard Biesheuvel (1):
>   efistub/x86: Enable SMBIOS protocol handling for x86

For the series,

Reviewed-by: Lukas Wunner <lukas@wunner.de>
Aditya Garg July 1, 2024, 7:42 p.m. UTC | #2
Tested-by: Aditya Garg <gargaditya08@live.com>

Thanks for the support Ard :)

> On 1 Jul 2024, at 7:40 PM, Ard Biesheuvel <ardb+git@google.com> wrote:
> 
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> v3:
> - add patch to make the SMBIOS protocol glue code compatible with mixed
>  mode on x86
> - update Aditya's patch to limit the effect to dual GPU Macs that are
>  known to require the set_os() calls in order for both GPUs to remain
>  active after boot
> - drop mixed mode handling of set_os() protocol, and dereference the
>  struct members directly
> 
> Cc: Aditya Garg <gargaditya08@live.com>
> Cc: Hans de Goede <hdegoede@redhat.com>
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Kerem Karabay <kekrby@gmail.com>
> Cc: Orlando Chamberlain <orlandoch.dev@gmail.com>
> 
> Aditya Garg (1):
>  x86/efistub: Call Apple set_os protocol on dual GPU Intel Macs
> 
> Ard Biesheuvel (1):
>  efistub/x86: Enable SMBIOS protocol handling for x86
> 
> drivers/firmware/efi/libstub/Makefile   |  2 +-
> drivers/firmware/efi/libstub/smbios.c   | 42 ++++++++----
> drivers/firmware/efi/libstub/x86-stub.c | 72 +++++++++++++++++++-
> include/linux/efi.h                     |  1 +
> 4 files changed, 98 insertions(+), 19 deletions(-)
> 
> --
> 2.45.2.803.g4e1b14247a-goog
>
Aditya Garg July 17, 2024, 4:35 p.m. UTC | #3
Hi Ard, Lukas

Although the patch has been upstreamed, and works well for the Macs included,
we have noticed another issue. For the Macs having a single GPU, in case a person
uses an eGPU, they still need this apple-set-os quirk for hybrid graphics. This was
not reported that time by anyone since people who use an eGPU are quite rare.

I'm not sure how to handle this. Had a few ideas like:

1. Enable this for all T2 Macs.
2. Enable this for all Macs.
3. Add a kernel parameter.

Would like your ideas and a possible fix for the same.

Thanks
Aditya

> On 2 Jul 2024, at 1:12 AM, Aditya Garg <gargaditya08@live.com> wrote:
> 
> Tested-by: Aditya Garg <gargaditya08@live.com>
> 
> Thanks for the support Ard :)
> 
>> On 1 Jul 2024, at 7:40 PM, Ard Biesheuvel <ardb+git@google.com> wrote:
>> 
>> From: Ard Biesheuvel <ardb@kernel.org>
>> 
>> v3:
>> - add patch to make the SMBIOS protocol glue code compatible with mixed
>> mode on x86
>> - update Aditya's patch to limit the effect to dual GPU Macs that are
>> known to require the set_os() calls in order for both GPUs to remain
>> active after boot
>> - drop mixed mode handling of set_os() protocol, and dereference the
>> struct members directly
>> 
>> Cc: Aditya Garg <gargaditya08@live.com>
>> Cc: Hans de Goede <hdegoede@redhat.com>
>> Cc: Lukas Wunner <lukas@wunner.de>
>> Cc: Kerem Karabay <kekrby@gmail.com>
>> Cc: Orlando Chamberlain <orlandoch.dev@gmail.com>
>> 
>> Aditya Garg (1):
>> x86/efistub: Call Apple set_os protocol on dual GPU Intel Macs
>> 
>> Ard Biesheuvel (1):
>> efistub/x86: Enable SMBIOS protocol handling for x86
>> 
>> drivers/firmware/efi/libstub/Makefile   |  2 +-
>> drivers/firmware/efi/libstub/smbios.c   | 42 ++++++++----
>> drivers/firmware/efi/libstub/x86-stub.c | 72 +++++++++++++++++++-
>> include/linux/efi.h                     |  1 +
>> 4 files changed, 98 insertions(+), 19 deletions(-)
>> 
>> --
>> 2.45.2.803.g4e1b14247a-goog
>>
Ard Biesheuvel July 17, 2024, 4:46 p.m. UTC | #4
On Wed, 17 Jul 2024 at 09:35, Aditya Garg <gargaditya08@live.com> wrote:
>
> Hi Ard, Lukas
>
> Although the patch has been upstreamed, and works well for the Macs included,
> we have noticed another issue. For the Macs having a single GPU, in case a person
> uses an eGPU, they still need this apple-set-os quirk for hybrid graphics. This was
> not reported that time by anyone since people who use an eGPU are quite rare.
>
> I'm not sure how to handle this. Had a few ideas like:
>
> 1. Enable this for all T2 Macs.
> 2. Enable this for all Macs.
> 3. Add a kernel parameter.
>
> Would like your ideas and a possible fix for the same.
>

Hi,

Is this a theoretical concern? Or are you aware of any user that is
actually affected by this?

In any case, given the niche nature, enabling this more widely by
default does not seem like a great approach, as the risk for
regressions outweighs the benefit.

I could imagine the use of a EFI variable to opt into this, would that
work? It would have to be set only once from user space (using
efivarfs)
Aditya Garg July 17, 2024, 4:52 p.m. UTC | #5
Hi Ard

> Hi,
> 
> Is this a theoretical concern? Or are you aware of any user that is
> actually affected by this?

We had a Mac Mini user who wasn't able to use hybrid graphics
on his Mac after using the eGPU. So no, it's not a theoretical concern
> In any case, given the niche nature, enabling this more widely by
> default does not seem like a great approach, as the risk for
> regressions outweighs the benefit.
> 
> I could imagine the use of a EFI variable to opt into this, would that
> work? It would have to be set only once from user space (using
> efivarfs)

I'm not really sure about efivars here, but am ready to test. As long as
it doesn't break booting of macOS or Windows, I don't find it an issue.
macOS may also reset the variable, again have to verify by testing the same.
I guess it will also get reset if a users resets their NVRAM.
Sharpened Blade July 17, 2024, 6:27 p.m. UTC | #6
> Hi Ard
> 
> > Hi,
> > 
> > Is this a theoretical concern? Or are you aware of any user that is
> > actually affected by this?
> 
> 
> We had a Mac Mini user who wasn't able to use hybrid graphics
> on his Mac after using the eGPU. So no, it's not a theoretical concern
> 
> > In any case, given the niche nature, enabling this more widely by
> > default does not seem like a great approach, as the risk for
> > regressions outweighs the benefit.
> > 
> > I could imagine the use of a EFI variable to opt into this, would that
> > work? It would have to be set only once from user space (using
> > efivarfs)
> 
> 
> I'm not really sure about efivars here, but am ready to test. As long as
> it doesn't break booting of macOS or Windows, I don't find it an issue.
> macOS may also reset the variable, again have to verify by testing the same.

On my mac, macOS does not reset EFI variables that it does not use. It should be the same for other models and firmware versions.

> I guess it will also get reset if a users resets their NVRAM.

Users have to manually reset their nvram so this will not happen during normal usage. This is actually a good thing since users can easily disable the eGPU if it is misbehaving and they dont have graphical output.
Lukas Wunner July 17, 2024, 6:58 p.m. UTC | #7
On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
> For the Macs having a single GPU, in case a person uses an eGPU,
> they still need this apple-set-os quirk for hybrid graphics.

I don't quite follow.  You mean the integrated graphics are
disabled by EFI firmware if an eGPU is attached?

This sounds like a bug on Apple's part:  The panel can be switched
between integrated graphics and discrete graphics, but an external
display can't be switched between eGPU and iGPU.  Is the person
affected using the latest EFI firmware update from Apple?

We need more information before we can devise a way to solve the
issue:

- Exact model name
  (I'd be surprised if this affected pre-Haswell models)
- Full dmesg output with command line option "dump_apple_properties"
  (EFI drivers provide various properties for Thunderbolt-attached
  devices and graphics cards, those could be queried to recognize
  the situation causing the issue)
- Full lspci output with and without eGPU attached

Perhaps you can open a bug at bugzilla.kernel.org and attach the
files there.

Thanks,

Lukas
Aditya Garg July 23, 2024, 4:25 p.m. UTC | #8
Sending this message again as for some reason it got sent only to Lukas:

Full model name: Mac mini (2018) (Macmini8,1)

The drive link below has the logs:

https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link
Aditya Garg July 23, 2024, 4:26 p.m. UTC | #9
Sending this message again because it previously got sent only to Lukas by mistake.

Hi,

Not sure whether it's relevant, but this user here also claims that without setting the os to macOS, the external GPUs had issues.

https://www.phoronix.com/forums/forum/software/general-linux-open-source/1479158-linux-6-11-efi-will-fake-that-it-s-booting-apple-macos-to-fix-some-dual-gpu-macs?p=1479205#post1479205
Lukas Wunner July 24, 2024, 4:01 p.m. UTC | #10
On Tue, Jul 23, 2024 at 04:25:19PM +0000, Aditya Garg wrote:
> On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
> > For the Macs having a single GPU, in case a person uses an eGPU,
> > they still need this apple-set-os quirk for hybrid graphics.
> 
> Sending this message again as for some reason it got sent only to Lukas:
> 
> Full model name: Mac mini (2018) (Macmini8,1)
> 
> The drive link below has the logs:
> 
> https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link

Some observations:

* dmesg-with-egpu.txt:  It seems the system was actually booted *without*
  an eGPU, so the filename appears to be a misnomer.

* The two files in the with_apple_set_os_efi directory only contain
  incomplete dmesg output.  Boot with log_buf_len=16M to solve this.
  Fortunately the truncated log is sufficient to see what's going on.

* If the apple_set_os protocol is not used, the attached eGPU is not
  enumerated by the kernel on boot and a rescan is required.
  So neither the iGPU nor the eGPU are working.  The reason is BIOS
  sets up incorrect bridge windows for the Thunderbolt host controller:
  Its two downstream ports' 64-bit windows overlap.  The 32-bit windows
  do not overlap.  If apple_set_os is used, the eGPU is using the
  (non-overlapping) 32-bit window.  If apple_set_os is not used,
  the attached eGPU is using the (overlapping, hence broken) 64-bit window.  

  So not only is apple_set_os needed to keep the iGPU enabled,
  but also to ensure BIOS sets up bridge windows in a manner that is
  only halfway broken and not totally broken.

  Below, 0000:06:01.0 and 0000:06:04.0 are the downstream ports on the
  Thunderbolt host controller and 0000:09:00.0 is the upstream port of
  the attached eGPU.

  iGPU enabled, no eGPU attached (dmesg.txt):
  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xb83fffff 64bit pref]
  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]

  iGPU disabled, eGPU attached, apple_set_os not used (journalctl.txt):
  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
  pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]: can't claim; address conflict with PCI Bus 0000:09 [mem 0xb1400000-0xbf3fffff 64bit pref]

  iGPU enabled, eGPU attached, apple_set_os used (working-journalctl.txt):
  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
  pci 0000:09:00.0:   bridge window [mem 0x81900000-0x81cfffff]

* As to how we can solve this and keep using apple_set_os only when
  necessary:

  I note that on x86, the efistub walks over all PCI devices in the system
  (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
  retrieves the Device ID and Vendor ID.  We could additionally retrieve
  the Class Code and count the number of GPUs in the system by checking
  whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
  at least 2 GPUs in the system, invoke apple_set_os.

  The question is whether this is needed on *all* Apple products or only
  on newer ones.  I suspect that the eGPU issue may be specific to
  recent products.  Ideally we'd find someone with a Haswell or Ivy Bridge
  era Mac Mini and an eGPU who could verify whether apple_set_os is needed
  on older models as well.

  We could constrain apple_set_os to newer models by checking for
  presence of the T2 PCI device [106b:1802].  Alternatively, we could
  use the BIOS date (DMI_BIOS_DATE in SMBIOS data) to enforce a
  cut-off such that only machines with a recent BIOS use apple_set_os.

Thanks,

Lukas
Aditya Garg July 24, 2024, 4:21 p.m. UTC | #11
https://www.phoronix.com/forums/forum/software/general-linux-open-source/1479158-linux-6-11-efi-will-fake-that-it-s-booting-apple-macos-to-fix-some-dual-gpu-macs?p=1479205#post1479205

By looking at this post, it seems to affect non T2 Macs as well.

Also, T2 Macs report that EFI v2 is being used, although by a kernel patch, we force them to use v1.  Have we can use this as a quirk for enabling apple-set-os?

https://lore.kernel.org/linux-efi/20220112101413.188234-1-ardb@kernel.org/

Although, the NVRAM variable idea by Ard looks better to me.
Aditya Garg July 24, 2024, 4:26 p.m. UTC | #12
> On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
> 
> On Tue, Jul 23, 2024 at 04:25:19PM +0000, Aditya Garg wrote:
>>> On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
>>> For the Macs having a single GPU, in case a person uses an eGPU,
>>> they still need this apple-set-os quirk for hybrid graphics.
>> 
>> Sending this message again as for some reason it got sent only to Lukas:
>> 
>> Full model name: Mac mini (2018) (Macmini8,1)
>> 
>> The drive link below has the logs:
>> 
>> https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link
> 
> Some observations:
> 
> * dmesg-with-egpu.txt:  It seems the system was actually booted *without*
>  an eGPU, so the filename appears to be a misnomer.
> 
> * The two files in the with_apple_set_os_efi directory only contain
>  incomplete dmesg output.  Boot with log_buf_len=16M to solve this.
>  Fortunately the truncated log is sufficient to see what's going on.
> 
> * If the apple_set_os protocol is not used, the attached eGPU is not
>  enumerated by the kernel on boot and a rescan is required.
>  So neither the iGPU nor the eGPU are working.  The reason is BIOS
>  sets up incorrect bridge windows for the Thunderbolt host controller:
>  Its two downstream ports' 64-bit windows overlap.  The 32-bit windows
>  do not overlap.  If apple_set_os is used, the eGPU is using the
>  (non-overlapping) 32-bit window.  If apple_set_os is not used,
>  the attached eGPU is using the (overlapping, hence broken) 64-bit window.  
> 
>  So not only is apple_set_os needed to keep the iGPU enabled,
>  but also to ensure BIOS sets up bridge windows in a manner that is
>  only halfway broken and not totally broken.
> 
>  Below, 0000:06:01.0 and 0000:06:04.0 are the downstream ports on the
>  Thunderbolt host controller and 0000:09:00.0 is the upstream port of
>  the attached eGPU.
> 
>  iGPU enabled, no eGPU attached (dmesg.txt):
>  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
>  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xb83fffff 64bit pref]
>  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
>  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> 
>  iGPU disabled, eGPU attached, apple_set_os not used (journalctl.txt):
>  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
>  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
>  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
>  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
>  pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]: can't claim; address conflict with PCI Bus 0000:09 [mem 0xb1400000-0xbf3fffff 64bit pref]
> 
>  iGPU enabled, eGPU attached, apple_set_os used (working-journalctl.txt):
>  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
>  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
>  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
>  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
>  pci 0000:09:00.0:   bridge window [mem 0x81900000-0x81cfffff]
> 
> * As to how we can solve this and keep using apple_set_os only when
>  necessary:
> 
>  I note that on x86, the efistub walks over all PCI devices in the system
>  (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
>  retrieves the Device ID and Vendor ID.  We could additionally retrieve
>  the Class Code and count the number of GPUs in the system by checking
>  whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
>  at least 2 GPUs in the system, invoke apple_set_os.

This also looks like a good idea, but I'm not well aware of the pci quirks in the Linux kernel. So, would consider it a bug report for the maintainers to fix.
> 
>  The question is whether this is needed on *all* Apple products or only
>  on newer ones.  I suspect that the eGPU issue may be specific to
>  recent products.  Ideally we'd find someone with a Haswell or Ivy Bridge
>  era Mac Mini and an eGPU who could verify whether apple_set_os is needed
>  on older models as well.
> 
>  We could constrain apple_set_os to newer models by checking for
>  presence of the T2 PCI device [106b:1802].  Alternatively, we could
>  use the BIOS date (DMI_BIOS_DATE in SMBIOS data) to enforce a
>  cut-off such that only machines with a recent BIOS use apple_set_os.
> 
> Thanks,
> 
> Lukas
Lukas Wunner July 24, 2024, 4:28 p.m. UTC | #13
On Wed, Jul 24, 2024 at 04:21:03PM +0000, Aditya Garg wrote:
> https://www.phoronix.com/forums/forum/software/general-linux-open-source/1479158-linux-6-11-efi-will-fake-that-it-s-booting-apple-macos-to-fix-some-dual-gpu-macs?p=1479205#post1479205
> 
> By looking at this post, it seems to affect non T2 Macs as well.

Disabling iGPU if dGPU is present was a behavior that existed well
before T2 Macs, yes.  That affected MacBook Pros from Haswell onward.

However the question is since when iGPU is disabled if an eGPU is present.
I had never heard of this before.  It would be good to have more datapoints
to come up with a quirk that's as narrowly applied as possible to avoid
regressions.

Thanks,

Lukas
Aditya Garg July 24, 2024, 4:30 p.m. UTC | #14
> On 24 Jul 2024, at 9:58 PM, Lukas Wunner <lukas@wunner.de> wrote:
> 
> On Wed, Jul 24, 2024 at 04:21:03PM +0000, Aditya Garg wrote:
>> https://www.phoronix.com/forums/forum/software/general-linux-open-source/1479158-linux-6-11-efi-will-fake-that-it-s-booting-apple-macos-to-fix-some-dual-gpu-macs?p=1479205#post1479205
>> 
>> By looking at this post, it seems to affect non T2 Macs as well.
> 
> Disabling iGPU if dGPU is present was a behavior that existed well
> before T2 Macs, yes.  That affected MacBook Pros from Haswell onward.

If you read the person's comment carefully, he has said it affected the thunderbolt eGPUs as well. Fat chance he is using a T2 Mac, because I have been a part of the t2linux community for a long time and we received such case recently.

> 
> However the question is since when iGPU is disabled if an eGPU is present.
> I had never heard of this before.  It would be good to have more datapoints
> to come up with a quirk that's as narrowly applied as possible to avoid
> regressions.
> 
> Thanks,
> 
> Lukas
Lukas Wunner July 24, 2024, 4:31 p.m. UTC | #15
On Wed, Jul 24, 2024 at 04:26:58PM +0000, Aditya Garg wrote:
> > On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > I note that on x86, the efistub walks over all PCI devices in the system
> > (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
> > retrieves the Device ID and Vendor ID.  We could additionally retrieve
> > the Class Code and count the number of GPUs in the system by checking
> > whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
> > at least 2 GPUs in the system, invoke apple_set_os.
> 
> This also looks like a good idea, but I'm not well aware of the pci
> quirks in the Linux kernel. So, would consider it a bug report for
> the maintainers to fix.

This is not a PCI quirk in the kernel.  The efistub is a separate
program.  I'm saying that the efistub already walks over all PCI devices,
it would be trivial to hook into this to count GPUs, recognize the T2
device or do something else entirely.

Thanks,

Lukas
Aditya Garg July 24, 2024, 4:34 p.m. UTC | #16
> On 24 Jul 2024, at 10:01 PM, Lukas Wunner <lukas@wunner.de> wrote:
> 
> On Wed, Jul 24, 2024 at 04:26:58PM +0000, Aditya Garg wrote:
>>>> On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
>>> I note that on x86, the efistub walks over all PCI devices in the system
>>> (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
>>> retrieves the Device ID and Vendor ID.  We could additionally retrieve
>>> the Class Code and count the number of GPUs in the system by checking
>>> whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
>>> at least 2 GPUs in the system, invoke apple_set_os.
>> 
>> This also looks like a good idea, but I'm not well aware of the pci
>> quirks in the Linux kernel. So, would consider it a bug report for
>> the maintainers to fix.
> 
> This is not a PCI quirk in the kernel.  The efistub is a separate
> program.  I'm saying that the efistub already walks over all PCI devices,
> it would be trivial to hook into this to count GPUs, recognize the T2
> device or do something else entirely.
> 

I'll leave this to Ard. I'm really confused in choosing the best possible option here.

> Thanks,
> 
> Lukas
Ard Biesheuvel July 25, 2024, 12:39 p.m. UTC | #17
On Wed, 24 Jul 2024 at 18:27, Aditya Garg <gargaditya08@live.com> wrote:
>
>
>
> > On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >
> > On Tue, Jul 23, 2024 at 04:25:19PM +0000, Aditya Garg wrote:
> >>> On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
> >>> For the Macs having a single GPU, in case a person uses an eGPU,
> >>> they still need this apple-set-os quirk for hybrid graphics.
> >>
> >> Sending this message again as for some reason it got sent only to Lukas:
> >>
> >> Full model name: Mac mini (2018) (Macmini8,1)
> >>
> >> The drive link below has the logs:
> >>
> >> https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link
> >
> > Some observations:
> >
> > * dmesg-with-egpu.txt:  It seems the system was actually booted *without*
> >  an eGPU, so the filename appears to be a misnomer.
> >
> > * The two files in the with_apple_set_os_efi directory only contain
> >  incomplete dmesg output.  Boot with log_buf_len=16M to solve this.
> >  Fortunately the truncated log is sufficient to see what's going on.
> >
> > * If the apple_set_os protocol is not used, the attached eGPU is not
> >  enumerated by the kernel on boot and a rescan is required.
> >  So neither the iGPU nor the eGPU are working.  The reason is BIOS
> >  sets up incorrect bridge windows for the Thunderbolt host controller:
> >  Its two downstream ports' 64-bit windows overlap.  The 32-bit windows
> >  do not overlap.  If apple_set_os is used, the eGPU is using the
> >  (non-overlapping) 32-bit window.  If apple_set_os is not used,
> >  the attached eGPU is using the (overlapping, hence broken) 64-bit window.
> >
> >  So not only is apple_set_os needed to keep the iGPU enabled,
> >  but also to ensure BIOS sets up bridge windows in a manner that is
> >  only halfway broken and not totally broken.
> >
> >  Below, 0000:06:01.0 and 0000:06:04.0 are the downstream ports on the
> >  Thunderbolt host controller and 0000:09:00.0 is the upstream port of
> >  the attached eGPU.
> >
> >  iGPU enabled, no eGPU attached (dmesg.txt):
> >  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
> >  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xb83fffff 64bit pref]
> >  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
> >  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> >
> >  iGPU disabled, eGPU attached, apple_set_os not used (journalctl.txt):
> >  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
> >  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
> >  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
> >  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> >  pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]: can't claim; address conflict with PCI Bus 0000:09 [mem 0xb1400000-0xbf3fffff 64bit pref]
> >
> >  iGPU enabled, eGPU attached, apple_set_os used (working-journalctl.txt):
> >  pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
> >  pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
> >  pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
> >  pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
> >  pci 0000:09:00.0:   bridge window [mem 0x81900000-0x81cfffff]
> >
> > * As to how we can solve this and keep using apple_set_os only when
> >  necessary:
> >
> >  I note that on x86, the efistub walks over all PCI devices in the system
> >  (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
> >  retrieves the Device ID and Vendor ID.  We could additionally retrieve
> >  the Class Code and count the number of GPUs in the system by checking
> >  whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
> >  at least 2 GPUs in the system, invoke apple_set_os.
>
> This also looks like a good idea, but I'm not well aware of the pci quirks in the Linux kernel. So, would consider it a bug report for the maintainers to fix.

That is not how it works.

This is not a regression in Linux, and even if it was, it is not the
maintainers' job to fix bugs.

If Linux is lacking functionality that you find important, please
propose a patch the implements it, and argue why it should be merged.
Ard Biesheuvel July 25, 2024, 12:42 p.m. UTC | #18
On Wed, 24 Jul 2024 at 18:31, Lukas Wunner <lukas@wunner.de> wrote:
>
> On Wed, Jul 24, 2024 at 04:26:58PM +0000, Aditya Garg wrote:
> > > On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > I note that on x86, the efistub walks over all PCI devices in the system
> > > (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
> > > retrieves the Device ID and Vendor ID.  We could additionally retrieve
> > > the Class Code and count the number of GPUs in the system by checking
> > > whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
> > > at least 2 GPUs in the system, invoke apple_set_os.
> >
> > This also looks like a good idea, but I'm not well aware of the pci
> > quirks in the Linux kernel. So, would consider it a bug report for
> > the maintainers to fix.
>
> This is not a PCI quirk in the kernel.  The efistub is a separate
> program.  I'm saying that the efistub already walks over all PCI devices,
> it would be trivial to hook into this to count GPUs, recognize the T2
> device or do something else entirely.
>

Thanks for the analysis, and for the suggestions.

I wouldn't object to changes to the EFI stub that implement something
along these lines, although I'd like to understand a bit better what
the actual issue is.

If PCI resource allocation is the culprit here, wouldn't it be better
to force Linux to reallocate those from scratch? IIRC there is already
a command line option for this.
Aditya Garg July 25, 2024, 12:55 p.m. UTC | #19
> On 25 Jul 2024, at 6:09 PM, Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> On Wed, 24 Jul 2024 at 18:27, Aditya Garg <gargaditya08@live.com> wrote:
>> 
>> 
>> 
>>>> On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
>>> 
>>> On Tue, Jul 23, 2024 at 04:25:19PM +0000, Aditya Garg wrote:
>>>>> On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
>>>>> For the Macs having a single GPU, in case a person uses an eGPU,
>>>>> they still need this apple-set-os quirk for hybrid graphics.
>>>> 
>>>> Sending this message again as for some reason it got sent only to Lukas:
>>>> 
>>>> Full model name: Mac mini (2018) (Macmini8,1)
>>>> 
>>>> The drive link below has the logs:
>>>> 
>>>> https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link
>>> 
>>> Some observations:
>>> 
>>> * dmesg-with-egpu.txt:  It seems the system was actually booted *without*
>>> an eGPU, so the filename appears to be a misnomer.
>>> 
>>> * The two files in the with_apple_set_os_efi directory only contain
>>> incomplete dmesg output.  Boot with log_buf_len=16M to solve this.
>>> Fortunately the truncated log is sufficient to see what's going on.
>>> 
>>> * If the apple_set_os protocol is not used, the attached eGPU is not
>>> enumerated by the kernel on boot and a rescan is required.
>>> So neither the iGPU nor the eGPU are working.  The reason is BIOS
>>> sets up incorrect bridge windows for the Thunderbolt host controller:
>>> Its two downstream ports' 64-bit windows overlap.  The 32-bit windows
>>> do not overlap.  If apple_set_os is used, the eGPU is using the
>>> (non-overlapping) 32-bit window.  If apple_set_os is not used,
>>> the attached eGPU is using the (overlapping, hence broken) 64-bit window.
>>> 
>>> So not only is apple_set_os needed to keep the iGPU enabled,
>>> but also to ensure BIOS sets up bridge windows in a manner that is
>>> only halfway broken and not totally broken.
>>> 
>>> Below, 0000:06:01.0 and 0000:06:04.0 are the downstream ports on the
>>> Thunderbolt host controller and 0000:09:00.0 is the upstream port of
>>> the attached eGPU.
>>> 
>>> iGPU enabled, no eGPU attached (dmesg.txt):
>>> pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
>>> pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xb83fffff 64bit pref]
>>> pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
>>> pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
>>> 
>>> iGPU disabled, eGPU attached, apple_set_os not used (journalctl.txt):
>>> pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
>>> pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
>>> pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
>>> pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
>>> pci 0000:06:04.0: bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]: can't claim; address conflict with PCI Bus 0000:09 [mem 0xb1400000-0xbf3fffff 64bit pref]
>>> 
>>> iGPU enabled, eGPU attached, apple_set_os used (working-journalctl.txt):
>>> pci 0000:06:01.0:   bridge window [mem 0x81900000-0x888fffff]
>>> pci 0000:06:01.0:   bridge window [mem 0xb1400000-0xc6ffffff 64bit pref]
>>> pci 0000:06:04.0:   bridge window [mem 0x88900000-0x8f8fffff]
>>> pci 0000:06:04.0:   bridge window [mem 0xb8400000-0xbf3fffff 64bit pref]
>>> pci 0000:09:00.0:   bridge window [mem 0x81900000-0x81cfffff]
>>> 
>>> * As to how we can solve this and keep using apple_set_os only when
>>> necessary:
>>> 
>>> I note that on x86, the efistub walks over all PCI devices in the system
>>> (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
>>> retrieves the Device ID and Vendor ID.  We could additionally retrieve
>>> the Class Code and count the number of GPUs in the system by checking
>>> whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
>>> at least 2 GPUs in the system, invoke apple_set_os.
>> 
>> This also looks like a good idea, but I'm not well aware of the pci quirks in the Linux kernel. So, would consider it a bug report for the maintainers to fix.
> 
> That is not how it works.
> 
> This is not a regression in Linux, and even if it was, it is not the
> maintainers' job to fix bugs.
> 
> If Linux is lacking functionality that you find important, please
> propose a patch the implements it, and argue why it should be merged.

Hi Ard

I believe Linux needs the functionality to be able to properly use eGPUs on Macs. Since the data availability on these Macs is too low and not clear, I believe we should be atleast be able to enable this from userspace as well. Maybe we can do something like:

1. Add 'efi=enable_apple_set_os' and 'efi=disable_apple_set_os' to efi-stub-helper.c (I can make a patch for the same).
2. Use an NVRAM variable as suggested by you (Would need help for this).

If you find any of the option out of these acceptable, I'll start working on them.
Aditya Garg July 25, 2024, 12:58 p.m. UTC | #20
> On 25 Jul 2024, at 6:12 PM, Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> On Wed, 24 Jul 2024 at 18:31, Lukas Wunner <lukas@wunner.de> wrote:
>> 
>> On Wed, Jul 24, 2024 at 04:26:58PM +0000, Aditya Garg wrote:
>>>>> On 24 Jul 2024, at 9:31 PM, Lukas Wunner <lukas@wunner.de> wrote:
>>>>> I note that on x86, the efistub walks over all PCI devices in the system
>>>>> (see setup_efi_pci() in drivers/firmware/efi/libstub/x86-stub.c) and
>>>>> retrieves the Device ID and Vendor ID.  We could additionally retrieve
>>>>> the Class Code and count the number of GPUs in the system by checking
>>>>> whether the Class Code matches PCI_BASE_CLASS_DISPLAY.  If there's
>>>>> at least 2 GPUs in the system, invoke apple_set_os.
>>> 
>>> This also looks like a good idea, but I'm not well aware of the pci
>>> quirks in the Linux kernel. So, would consider it a bug report for
>>> the maintainers to fix.
>> 
>> This is not a PCI quirk in the kernel.  The efistub is a separate
>> program.  I'm saying that the efistub already walks over all PCI devices,
>> it would be trivial to hook into this to count GPUs, recognize the T2
>> device or do something else entirely.
>> 
> 
> Thanks for the analysis, and for the suggestions.
> 
> I wouldn't object to changes to the EFI stub that implement something
> along these lines, although I'd like to understand a bit better what
> the actual issue is.
> 
> If PCI resource allocation is the culprit here, wouldn't it be better
> to force Linux to reallocate those from scratch? IIRC there is already
> a command line option for this.

I guess you are talking about 'pci=realloc'. Adding this doesn't fix the issue.
Ard Biesheuvel July 29, 2024, 7:27 a.m. UTC | #21
On Sun, 28 Jul 2024 at 14:03, Orlando Chamberlain
<orlandoch.dev@gmail.com> wrote:
>
> On Wed, 24 Jul 2024 18:01:08 +0200, Lukas Wunner Wrote:
> > On Tue, Jul 23, 2024 at 04:25:19PM +0000, Aditya Garg wrote:
> > > On Wed, Jul 17, 2024 at 04:35:15PM +0000, Aditya Garg wrote:
> > > > For the Macs having a single GPU, in case a person uses an eGPU,
> > > > they still need this apple-set-os quirk for hybrid graphics.
> > >
> > > Sending this message again as for some reason it got sent only to
> > > Lukas:
> > >
> > > Full model name: Mac mini (2018) (Macmini8,1)
> > >
> > > The drive link below has the logs:
> > >
> > > https://drive.google.com/file/d/1P3-GlksU6WppvzvWC0A-nAoTZh7oPPxk/view?usp=drive_link
> >
> > Some observations:
> >
> > * dmesg-with-egpu.txt:  It seems the system was actually booted
> >   *without* an eGPU, so the filename appears to be a misnomer.
> >
> > * The two files in the with_apple_set_os_efi directory only contain
> >   incomplete dmesg output.  Boot with log_buf_len=16M to solve this.
> >   Fortunately the truncated log is sufficient to see what's going on.
>
> Hi Lukas, in case it helps, I got the user with the macmini and egpu to
> get logs that don't have the start cut off [0].
>
> >   We could constrain apple_set_os to newer models by checking for
> >   presence of the T2 PCI device [106b:1802].  Alternatively, we could
> >   use the BIOS date (DMI_BIOS_DATE in SMBIOS data) to enforce a
> >   cut-off such that only machines with a recent BIOS use apple_set_os.
>
> It might be simpler to match "iBridge" in the DMI bios_version as this
> indicates that a computer has the T2 (aka iBridge which runs bridgeOS).
> I don't think there have been any issues from when the downstream T2
> kernels had apple-set-os unconditionally so it should be fine to enable
> for all T2 macs at least. This would exclude the T1 Macs with possibly
> missing trackpad dimensions in applespi that you mentioned [1].
>
> On my MacBookPro16,1:
>
> $ cat /sys/class/dmi/id/bios_version
> 1715.60.5.0.0 (iBridge: 19.16.10647.0.0,0)
>
> [0]: https://gist.github.com/Redecorating/8111324065016363223b5ce719e48676/
> [1]: https://lore.kernel.org/all/ZoJPgSlZJ3ZlU2zL@wunner.de/
>

Thanks, this seems useful.

Does this mean we can drop the type1 product name list, and just look
for 'iBridge' in the type0 version string? Or does that list contain
non-T2 hardware as well?
Aditya Garg July 29, 2024, 7:46 a.m. UTC | #22
> Thanks, this seems useful.
> 
> Does this mean we can drop the type1 product name list, and just look
> for 'iBridge' in the type0 version string? Or does that list contain
> non-T2 hardware as well?

MacBookPro11,3
MacBookPro11,5
MacBookPro13,3
MacBookPro14,3

These are non T2 Macs that were in the previous list