diff mbox series

[v3,RESEND] acpi,pci: warn about duplicate IRQ routing entries returned from _PRT

Message ID 20230121153314.6109-1-mat.jonczyk@o2.pl
State Superseded
Headers show
Series [v3,RESEND] acpi,pci: warn about duplicate IRQ routing entries returned from _PRT | expand

Commit Message

Mateusz Jończyk Jan. 21, 2023, 3:33 p.m. UTC
On some platforms, the ACPI _PRT function returns duplicate interrupt
routing entries. Linux uses the first matching entry, but sometimes the
second matching entry contains the correct interrupt vector.

Print an error to dmesg if duplicate interrupt routing entries are
present, so that we could check how many models are affected.

This happens on a Dell Latitude E6500 laptop with the i2c-i801 Intel
SMBus controller. This controller was nonfunctional unless its interrupt
usage was disabled (using the "disable_features=0x10" module parameter).

After investigation, it turned out that the driver was using an
incorrect interrupt vector: in lspci output for this device there was:
        Interrupt: pin B routed to IRQ 19
but after running i2cdetect (without using any i2c-i801 module
parameters) the following was logged to dmesg:

        [...]
        i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
        i801_smbus 0000:00:1f.3: Transaction timeout
        i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
        i801_smbus 0000:00:1f.3: Transaction timeout
        irq 17: nobody cared (try booting with the "irqpoll" option)

Existence of duplicate entries in a table returned by the _PRT method
was confirmed by disassembling the ACPI DSDT table.

Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>

--
v2: - add a newline at the end of the kernel log message,
    - replace: "if (match == NULL)" -> "if (!match)"
    - patch description tweaks.
v3: - fix C style issues pointed by Jean Delvare,
    - switch severity from warning to error.
v3 RESEND: retested on top of v6.2-rc4

To consider: should we print a warning or an error in case of duplicate
entries? This may be not serious enough to disturb the user with an
error message at boot. On the other hand, hardware vendors should see
it and the kernel uses this logging severity in similar cases.
---
 drivers/acpi/pci_irq.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)


base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4

Comments

Bjorn Helgaas Jan. 23, 2023, 8:33 p.m. UTC | #1
On Sat, Jan 21, 2023 at 04:33:14PM +0100, Mateusz Jończyk wrote:
> On some platforms, the ACPI _PRT function returns duplicate interrupt
> routing entries. Linux uses the first matching entry, but sometimes the
> second matching entry contains the correct interrupt vector.
>
> Print an error to dmesg if duplicate interrupt routing entries are
> present, so that we could check how many models are affected.

It shouldn't be too hard to use qemu to figure out whether Windows
uses the last matching entry, i.e., treating _PRT entries as
assignments.  If so, maybe Linux could just do the same.

Is anybody up for that?

Bjorn
Mateusz Jończyk Jan. 23, 2023, 9 p.m. UTC | #2
W dniu 23.01.2023 o 21:33, Bjorn Helgaas pisze:
> On Sat, Jan 21, 2023 at 04:33:14PM +0100, Mateusz Jończyk wrote:
>> On some platforms, the ACPI _PRT function returns duplicate interrupt
>> routing entries. Linux uses the first matching entry, but sometimes the
>> second matching entry contains the correct interrupt vector.
>>
>> Print an error to dmesg if duplicate interrupt routing entries are
>> present, so that we could check how many models are affected.
> It shouldn't be too hard to use qemu to figure out whether Windows
> uses the last matching entry, i.e., treating _PRT entries as
> assignments.  If so, maybe Linux could just do the same.
>
> Is anybody up for that?
>
> Bjorn

The hardware in question has a working Windows XP installation,
and I could in theory check which interrupt vector it uses - but
I think that such reverse engineering is forbidden by Windows' EULA.

Greetings,
Mateusz
Rafael J. Wysocki Jan. 30, 2023, 3:56 p.m. UTC | #3
On Mon, Jan 23, 2023 at 10:44 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Jan 23, 2023 at 10:00:43PM +0100, Mateusz Jończyk wrote:
> > W dniu 23.01.2023 o 21:33, Bjorn Helgaas pisze:
> > > On Sat, Jan 21, 2023 at 04:33:14PM +0100, Mateusz Jończyk wrote:
> > >> On some platforms, the ACPI _PRT function returns duplicate interrupt
> > >> routing entries. Linux uses the first matching entry, but sometimes the
> > >> second matching entry contains the correct interrupt vector.
> > >>
> > >> Print an error to dmesg if duplicate interrupt routing entries are
> > >> present, so that we could check how many models are affected.
> > >
> > > It shouldn't be too hard to use qemu to figure out whether Windows
> > > uses the last matching entry, i.e., treating _PRT entries as
> > > assignments.  If so, maybe Linux could just do the same.
> > >
> > > Is anybody up for that?
> >
> > The hardware in question has a working Windows XP installation,
> > and I could in theory check which interrupt vector it uses - but
> > I think that such reverse engineering is forbidden by Windows' EULA.
>
> I'm not talking about any sort of disassembly or anything like that;
> just that we can observe what Windows does given the _PRT contents.
> You've already figured out that on your particular hardware, the _PRT
> has two entries, and Linux uses the first one while Windows uses the
> second one, right?
>
> On qemu, we have control over the BIOS and can easily update _PRT to
> whatever we want, and then we could boot Windows and see what it uses.
> But I guess maybe that wouldn't tell us anything more than what you
> already discovered.
>
> So my inclination would be to make Linux use the last matching entry.

But it would be able to log a diagnostic message anyway IMO.

So maybe two steps can be taken here, (1) adding the message printout
(this patch) and (2) changing the behavior?
Mateusz Jończyk Jan. 30, 2023, 7:36 p.m. UTC | #4
W dniu 30.01.2023 o 16:56, Rafael J. Wysocki pisze:
> On Mon, Jan 23, 2023 at 10:44 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> On Mon, Jan 23, 2023 at 10:00:43PM +0100, Mateusz Jończyk wrote:
>>> W dniu 23.01.2023 o 21:33, Bjorn Helgaas pisze:
>>>> On Sat, Jan 21, 2023 at 04:33:14PM +0100, Mateusz Jończyk wrote:
>>>>> On some platforms, the ACPI _PRT function returns duplicate interrupt
>>>>> routing entries. Linux uses the first matching entry, but sometimes the
>>>>> second matching entry contains the correct interrupt vector.
>>>>>
>>>>> Print an error to dmesg if duplicate interrupt routing entries are
>>>>> present, so that we could check how many models are affected.
>>>> It shouldn't be too hard to use qemu to figure out whether Windows
>>>> uses the last matching entry, i.e., treating _PRT entries as
>>>> assignments.  If so, maybe Linux could just do the same.
>>>>
>>>> Is anybody up for that?
>>> The hardware in question has a working Windows XP installation,
>>> and I could in theory check which interrupt vector it uses - but
>>> I think that such reverse engineering is forbidden by Windows' EULA.
>> I'm not talking about any sort of disassembly or anything like that;
>> just that we can observe what Windows does given the _PRT contents.
>> You've already figured out that on your particular hardware, the _PRT
>> has two entries, and Linux uses the first one while Windows uses the
>> second one, right?
>>
>> On qemu, we have control over the BIOS and can easily update _PRT to
>> whatever we want, and then we could boot Windows and see what it uses.
>> But I guess maybe that wouldn't tell us anything more than what you
>> already discovered.
>>
>> So my inclination would be to make Linux use the last matching entry.
> But it would be able to log a diagnostic message anyway IMO.
>
> So maybe two steps can be taken here, (1) adding the message printout
> (this patch) and (2) changing the behavior?

I have checked and Windows XP uses a different interrupt number for
the device than both numbers from the table returned by the _PRT
method (the correct and the incorrect ones).

It uses IRQ3, as shown in the Device Manager and in HWiNFO32 output:

    Intel 82801IB ICH9 - SMBus Controller [A3] --------------------------------

     [General Information]
      Device Name:                            Intel 82801IB ICH9 - SMBus Controller [A3]
      Original Device Name:                   Intel 82801IB ICH9 - SMBus Controller [A3]
      Device Class:                           SMBus (System Management Bus)
      Revision ID:                            3 [A3]
      PCI Address (Bus:Device:Function) Number: 0:31:3
      PCI Latency Timer:                      0
      Hardware ID:                            PCI\VEN_8086&DEV_2930&SUBSYS_024F1028&REV_03
     [System Resources]
      Interrupt Line:                         IRQ3
      Interrupt Pin:                          INTB#
      Memory Base Address 0                   F6ADAF00
      I/O Base Address 4                      1100
     [Features]
      Bus Mastering:                          Disabled
      Running At 66 MHz:                      Not Capable
      Fast Back-to-Back Transactions:         Capable
      DeviceInstanceId                        PCI\VEN_8086&DEV_2930&SUBSYS_024F1028&REV_03\3&61AAA01&0&FB

Unpatched Linux kernel uses IRQ19, which is incorrect; the correct vector (under Linux at least) is IRQ17. On the other hand, HWiNFO32 has neatly decoded contents of the SPD EEPROMs, which are connected to the SMBus controller in question. So either: 1. Windows' driver (or driver from HWiNFO32 perhaps?) does not use interrupts for the device. 2. Windows somehow has reassigned the interrupt vector for the device. The datasheet [1] indicates it is possible (page 375, Device 31 Interrupt Route Register), but I'm not sure if the interrupt can be reassigned to such a low number. So all in all, I would suggest shipping a patch that only prints a warning for a release or two before changing the default. Greetings, Mateusz [1] https://www.intel.com/content/www/us/en/io/io-controller-hub-9-datasheet.html
diff mbox series

Patch

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..28fcae93cc24 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -203,6 +203,8 @@  static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
 	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 	struct acpi_pci_routing_table *entry;
 	acpi_handle handle = NULL;
+	struct acpi_prt_entry *match = NULL;
+	const char *match_int_source = NULL;
 
 	if (dev->bus->bridge)
 		handle = ACPI_HANDLE(dev->bus->bridge);
@@ -219,13 +221,31 @@  static int acpi_pci_irq_find_prt_entry(struct pci_dev *dev,
 
 	entry = buffer.pointer;
 	while (entry && (entry->length > 0)) {
-		if (!acpi_pci_irq_check_entry(handle, dev, pin,
-						 entry, entry_ptr))
-			break;
+		struct acpi_prt_entry *curr;
+
+		if (!acpi_pci_irq_check_entry(handle, dev, pin, entry, &curr)) {
+			if (!match) {
+				match = curr;
+				match_int_source = entry->source;
+			} else {
+				pr_err(FW_BUG
+				       "ACPI _PRT returned duplicate IRQ routing entries for device %04x:%02x:%02x[INT%c]: %s[%d] and %s[%d]\n",
+				       curr->id.segment, curr->id.bus, curr->id.device,
+				       pin_name(curr->pin),
+				       match_int_source, match->index,
+				       entry->source, curr->index);
+				/* We use the first matching entry nonetheless,
+				 * for compatibility with older kernels.
+				 */
+			}
+		}
+
 		entry = (struct acpi_pci_routing_table *)
 		    ((unsigned long)entry + entry->length);
 	}
 
+	*entry_ptr = match;
+
 	kfree(buffer.pointer);
 	return 0;
 }