Message ID | 20230415150542.2368179-1-noltari@gmail.com |
---|---|
State | New |
Headers | show |
Series | ath9k: fix calibration data endianness | expand |
Álvaro Fernández Rojas <noltari@gmail.com> writes: > BCM63xx (Big Endian MIPS) devices store the calibration data in MTD > partitions but it needs to be swapped in order to work, otherwise it fails: > ath9k 0000:00:01.0: enabling device (0000 -> 0002) > ath: phy0: Ignoring endianness difference in EEPROM magic bytes. > ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 > ath: phy0: Unable to initialize hardware; initialization status: -22 > ath9k 0000:00:01.0: Failed to initialize device > ath9k: probe of 0000:00:01.0 failed with error -22 How does this affect other platforms? Why was the NO_EEP_SWAP flag set in the first place? Christian, care to comment on this? -Toke
Hi, On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: > Álvaro Fernández Rojas <noltari@gmail.com> writes: > >> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD >> partitions but it needs to be swapped in order to work, otherwise it fails: >> ath9k 0000:00:01.0: enabling device (0000 -> 0002) >> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. >> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 >> ath: phy0: Unable to initialize hardware; initialization status: -22 >> ath9k 0000:00:01.0: Failed to initialize device >> ath9k: probe of 0000:00:01.0 failed with error -22 > > How does this affect other platforms? Why was the NO_EEP_SWAP flag set > in the first place? Christian, care to comment on this? I knew this would come up. I've written what I know and remember in the pull-request/buglink. Maybe this can be added to the commit? Link: https://github.com/openwrt/openwrt/pull/12365 | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. Since the existing request_firmware eeprom fetcher code set the flag, the nvmem code had to do it too. In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. I don't know if there are devices out there, which have a swapped magic (which is used to detect the endianess), but the caldata is in the correct endiannes (or vice versa - Magic is correct, but data needs swapping). I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. So I don't expect there to be a new issue there). The older platform_data code has a "endian_check" flag in the ath9k_platform_data struct: <https://github.com/torvalds/linux/blob/master/include/linux/ath9k_platform.h#L38> Which enables the check and the endian-swapping (though it's disabled by default). Cheers, Christian
On 4/15/23 18:02, Christian Lamparter wrote: > Hi, > > On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: >> Álvaro Fernández Rojas <noltari@gmail.com> writes: >> >>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD >>> partitions but it needs to be swapped in order to work, otherwise it fails: >>> ath9k 0000:00:01.0: enabling device (0000 -> 0002) >>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. >>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 >>> ath: phy0: Unable to initialize hardware; initialization status: -22 >>> ath9k 0000:00:01.0: Failed to initialize device >>> ath9k: probe of 0000:00:01.0 failed with error -22 >> >> How does this affect other platforms? Why was the NO_EEP_SWAP flag set >> in the first place? Christian, care to comment on this? > > I knew this would come up. I've written what I know and remember in the > pull-request/buglink. > > Maybe this can be added to the commit? > Link: https://github.com/openwrt/openwrt/pull/12365 > > | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. > > Since the existing request_firmware eeprom fetcher code set the flag, > the nvmem code had to do it too. > > In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. > I don't know if there are devices out there, which have a swapped magic (which is > used to detect the endianess), but the caldata is in the correct endiannes (or > vice versa - Magic is correct, but data needs swapping). > > I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) > and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. > So I don't expect there to be a new issue there). Nope! This is a classic self-own!... Well at least, this now gets documented! Here are my findings. Please excuse the overlong lines. ## The good news / AVM FritzBox 7360v2 ## The good news: The AVM FritzBox 7360v2 worked the same as before. Bootlog with the patch applied - cold start: |[ 14.738647] ath9k_pci_owl_loader 0000:01:00.0: enabling device (0000 -> 0002) |[ 14.747344] ath9k_pci_owl_loader 0000:01:00.0: fixup device configuration |[ 15.116403] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000 |[ 15.121083] pci 0000:01:00.0: reg 0x10: [mem 0x1c000000-0x1c00ffff 64bit] |[ 15.128162] pci 0000:01:00.0: supports D1 |[ 15.131841] pci 0000:01:00.0: PME# supported from D0 D1 D3hot |[ 15.144186] pci 0000:01:00.0: BAR 0: assigned [mem 0x1c000000-0x1c00ffff 64bit] |[...] Wireguard loading |[ 15.367461] ifx_pcie_bios_map_irq port 0 dev 0000:01:00.0 slot 0 pin 1 |[ 15.372676] ifx_pcie_bios_map_irq dev 0000:01:00.0 irq 144 assigned |[ 15.379031] ath9k 0000:01:00.0: enabling device (0140 -> 0142) |[ 15.392181] ath: EEPROM regdomain: 0x8114 |[ 15.392219] ath: EEPROM indicates we should expect a country code |[ 15.392232] ath: doing EEPROM country->regdmn map search |[ 15.392242] ath: country maps to regdmn code: 0x37 |[ 15.392254] ath: Country alpha2 being used: DE |[ 15.392264] ath: Regpair used: 0x37 |[ 15.406435] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' |[ 15.411722] ieee80211 phy0: Atheros AR9287 Rev:2 mem=0xbc000000, irq=144 ## The not so good news / Netgear WNDR3700v2 ## But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 - doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with: "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0" |[ 23.260755] ath9k_pci_owl_loader 0000:00:11.0: enabling device (0000 -> 0002) |[ 23.268319] ath9k_pci_owl_loader 0000:00:12.0: enabling device (0000 -> 0002) |[ 23.285197] ath9k_pci_owl_loader 0000:00:11.0: fixup device configuration |[ 23.297232] ath9k_pci_owl_loader 0000:00:12.0: fixup device configuration |[ 23.320031] pci 0000:00:11.0: [168c:0029] type 00 class 0x028000 |[ 23.326070] pci 0000:00:11.0: reg 0x10: [mem 0x10000000-0x1000ffff] |[ 23.332430] pci 0000:00:11.0: PME# supported from D0 D3hot |[ 23.338570] pci 0000:00:11.0: BAR 0: assigned [mem 0x10000000-0x1000ffff] |[ 23.347404] pci 0000:00:12.0: [168c:0029] type 00 class 0x028000 |[ 23.353441] pci 0000:00:12.0: reg 0x10: [mem 0x10010000-0x1001ffff] |[ 23.359799] pci 0000:00:12.0: PME# supported from D0 D3hot |[ 23.365906] pci 0000:00:12.0: BAR 0: assigned [mem 0x10010000-0x1001ffff] |[...] Wireguard loading |[ 24.333503] ath9k 0000:00:11.0: enabling device (0000 -> 0002) |[ 24.363290] ath: EEPROM regdomain: 0x0 |[ 24.363319] ath: EEPROM indicates default country code should be used |[ 24.363325] ath: doing EEPROM country->regdmn map search |[ 24.363338] ath: country maps to regdmn code: 0x3a |[ 24.363346] ath: Country alpha2 being used: US |[ 24.363353] ath: Regpair used: 0x3a |[ 24.381709] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' |[ 24.383599] gpio-508 (fixed antenna group 1): hogged as output/high |[ 24.389941] gpio-509 (fixed antenna group 1): hogged as output/high |[ 24.396204] gpio-510 (fixed antenna group 1): hogged as output/high |[ 24.402480] gpio-511 (fixed antenna group 1): hogged as output/high |[ 24.409007] ieee80211 phy0: Atheros AR9280 Rev:2 mem=0xb0000000, irq=18 |[ 24.415793] ath9k 0000:00:12.0: enabling device (0000 -> 0002) |[ 24.505496] ath: phy1: Bad EEPROM VER 0x0001 or REV 0x06e0 |[ 24.511027] ath: phy1: Unable to initialize hardware; initialization status: -22 |[ 24.518420] ath9k 0000:00:12.0: Failed to initialize device |[ 24.524004] ath9k: probe of 0000:00:12.0 failed with error -22 without the patch (rebuild mac80211/backports and uploaded the newly created ath9k*.ko files and loaded them manually): [ 1205.670387] ath: phy10: Ignoring endianness difference in EEPROM magic bytes. <---- [ 1205.679077] ath: EEPROM regdomain: 0x0 [ 1205.679086] ath: EEPROM indicates default country code should be used [ 1205.679092] ath: doing EEPROM country->regdmn map search [ 1205.679105] ath: country maps to regdmn code: 0x3a [ 1205.679113] ath: Country alpha2 being used: US [ 1205.679120] ath: Regpair used: 0x3a [ 1205.692894] ieee80211 phy10: Selected rate control algorithm 'minstrel_ht' [ 1205.694757] gpio-508 (fixed antenna group 1): hogged as output/high [ 1205.701115] gpio-509 (fixed antenna group 1): hogged as output/high [ 1205.707396] gpio-510 (fixed antenna group 1): hogged as output/high [ 1205.713656] gpio-511 (fixed antenna group 1): hogged as output/high [ 1205.720192] ieee80211 phy10: Atheros AR9280 Rev:2 mem=0xb0000000, irq=18 [ 1205.816842] ath: phy11: Ignoring endianness difference in EEPROM magic bytes. <---- [ 1205.825521] ath: EEPROM regdomain: 0x0 [ 1205.825530] ath: EEPROM indicates default country code should be used [ 1205.825537] ath: doing EEPROM country->regdmn map search [ 1205.825549] ath: country maps to regdmn code: 0x3a [ 1205.825557] ath: Country alpha2 being used: US [ 1205.825564] ath: Regpair used: 0x3a [ 1205.841571] ieee80211 phy11: Selected rate control algorithm 'minstrel_ht' [ 1205.843795] ieee80211 phy11: Atheros AR9280 Rev:2 mem=0xb0010000, irq=19 Everything was fine again. ### Hexdump extract of relevant ART data (/dev/mtd6) ### |00001000 a5 5a 00 00 00 03 60 00 16 8c 00 29 60 08 00 01 |.Z....`....)`...| <- 2.4GHz WiFi OwlLoader boot code |00001010 02 80 60 2c 16 8c a0 95 50 00 16 8c 00 2a 50 08 |..`,....P....*P.| Apart from the Magic 0x5AA5 (or here in LE 0xA55A) |00001020 00 01 02 80 50 2c 16 8c a0 95 50 64 0c c0 05 04 |....P,....Pd....| this is irrelevant for ath9k (it skips it, |00001030 50 6c 38 11 00 03 40 04 07 3b 00 40 40 74 00 03 |Pl8...@..;.@@t..| see ar5416_eep_start_loc in ath9k's eeprom_def.c) |00001040 00 00 40 00 00 00 01 c2 60 34 00 44 00 00 ff ff |..@.....`4.D....| |00001050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| |* |00001200 0c b8 d5 cc e0 16 02 01 00 00 00 1f 00 21 b7 9b |.............!..| <- 2.4GHz ath9k's base_eep_header |00001210 81 c3 03 03 00 00 00 00 00 00 00 09 07 00 04 01 |................| |00001220 00 ff 02 00 00 01 00 00 00 fb 00 00 00 00 00 00 |................| |00001230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |............. |[...] |00005000 a5 5a 00 00 00 03 60 00 16 8c 00 29 60 08 00 01 |.Z....`....)`...| <- 5GHz WiFi OwlLoader boot code |00005010 02 80 60 2c 16 8c a0 94 50 00 16 8c 00 2a 50 08 |..`,....P....*P.| |00005020 00 01 02 80 50 2c 16 8c a0 94 50 64 0c c0 05 04 |....P,....Pd....| |00005030 50 6c 38 11 00 03 40 04 07 3b 00 40 40 74 00 03 |Pl8...@..;.@@t..| |00005040 00 00 40 00 00 00 01 c2 60 34 00 44 00 00 ff ff |..@.....`4.D....| |00005050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| |* |00005200 0c b8 27 40 e0 16 01 01 00 00 00 1f 00 21 b7 9b |..'@.........!..| <- 5GHz ath9k's base_eep_header |00005210 81 c3 03 03 00 00 00 00 00 00 00 09 07 00 04 00 |................| |00005220 01 ff 02 00 00 01 00 00 00 fe 00 00 00 00 00 00 |................| |00005230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| ### ath9k debug dumps of ath9k w/o patch (working) ### # cat /sys/kernel/debug/ieee80211/phy10/ath9k/base_eeprom // 2.4GHz Major Version : 14 Minor Version : 22 Checksum : 54732 Length : 3256 RegDomain1 : 0 RegDomain2 : 31 TX Mask : 3 RX Mask : 3 Allow 5GHz : 0 Allow 2GHz : 1 Disable 2GHz HT20 : 0 Disable 2GHz HT40 : 0 Disable 5Ghz HT20 : 0 Disable 5Ghz HT40 : 0 Big Endian : 1 Cal Bin Major Ver : 0 Cal Bin Minor Ver : 7 Cal Bin Build : 9 OpenLoop Power Ctrl : 0 MacAddress : 00:21:b7:9b:81:c3 <--- This looks like a valid MAC (Note: This MAC is not used. Netgear stores the MAC at a different location in the ART - not included in the extract above) # cat /sys/kernel/debug/ieee80211/phy11/ath9k // 5Ghz Major Version : 14 Minor Version : 22 Checksum : 10048 Length : 3256 RegDomain1 : 0 RegDomain2 : 31 TX Mask : 3 RX Mask : 3 Allow 5GHz : 1 Allow 2GHz : 0 Disable 2GHz HT20 : 0 Disable 2GHz HT40 : 0 Disable 5Ghz HT20 : 0 Disable 5Ghz HT40 : 0 Big Endian : 1 Cal Bin Major Ver : 0 Cal Bin Minor Ver : 7 Cal Bin Build : 9 OpenLoop Power Ctrl : 0 MacAddress : 00:21:b7:9b:81:c3 <--- This one too (it's the same)! # cat /sys/kernel/debug/ieee80211/phy0/ath9k/modal_eeprom [...] Ant. Common Control : 288 // That's 0x0120 [...] ### ath9k debug dumps of the ath9k with patch (Broken) ### # cat /sys/kernel/debug/ieee80211/phy0/ath9k/base_eeprom // This is the 2.4GHz Wifi, But now it says 5 GHz Major Version : 14 Minor Version : 22 Checksum : 54732 <--- checksum matches 2.4GHz @ 0x1200 (0xd5cc) Length : 3256 <--- this looks OK? Ok! RegDomain1 : 0 RegDomain2 : 31 TX Mask : 3 RX Mask : 3 Allow 5GHz : 1 <-- Says it's 5G! (0x1) Allow 2GHz : 0 Disable 2GHz HT20 : 0 Disable 2GHz HT40 : 0 Disable 5Ghz HT20 : 0 Disable 5Ghz HT40 : 0 Big Endian : 0 <-- Oh? what happend here? This should have been 1 (Is this why the 5GHz PHY worked?) Cal Bin Major Ver : 7 | My best guess is that the u16 swap ath9k does, swapped opCapFlags and eepmisc. | So opCapFlags's value of 0x2/AR5416_OPFLAGS_11G in eepMisc is not publicy defined/known. | while the eepMisc value of 0x1 (AR5416_OPFLAGS_11G) became "AR5416_OPFLAGS_11A" | (=aka 5GHz capable) . Cal Bin Minor Ver : 0 Cal Bin Build : 0 OpenLoop Power Ctrl : 1 MacAddress : 21:00:9b:b7:c3:81 <--- Multicast Bit set? This is bogus! # cat /sys/kernel/debug/ieee80211/phy0/ath9k/modal_eeprom [...] Ant. Common Control : 18874368 // That's 0x01 20 00 00 [...] Cheers, Christian
Christian Lamparter <chunkeey@gmail.com> writes: > On 4/15/23 18:02, Christian Lamparter wrote: >> Hi, >> >> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: >>> Álvaro Fernández Rojas <noltari@gmail.com> writes: >>> >>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD >>>> partitions but it needs to be swapped in order to work, otherwise it fails: >>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002) >>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. >>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 >>>> ath: phy0: Unable to initialize hardware; initialization status: -22 >>>> ath9k 0000:00:01.0: Failed to initialize device >>>> ath9k: probe of 0000:00:01.0 failed with error -22 >>> >>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set >>> in the first place? Christian, care to comment on this? >> >> I knew this would come up. I've written what I know and remember in the >> pull-request/buglink. >> >> Maybe this can be added to the commit? >> Link: https://github.com/openwrt/openwrt/pull/12365 >> >> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. >> >> Since the existing request_firmware eeprom fetcher code set the flag, >> the nvmem code had to do it too. >> >> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. >> I don't know if there are devices out there, which have a swapped magic (which is >> used to detect the endianess), but the caldata is in the correct endiannes (or >> vice versa - Magic is correct, but data needs swapping). >> >> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) >> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. >> So I don't expect there to be a new issue there). > > Nope! This is a classic self-own!... Well at least, this now gets documented! > > Here are my findings. Please excuse the overlong lines. > > ## The good news / AVM FritzBox 7360v2 ## > > The good news: The AVM FritzBox 7360v2 worked the same as before. [...] > ## The not so good news / Netgear WNDR3700v2 ## > > But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 - > doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with: > "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0" [...] Alright, so IIUC, we have a situation where some devices only work *with* the flag, and some devices only work *without* the flag? So we'll need some kind of platform-specific setting? Could we put this in the device trees, or is there a better solution? -Toke
On 4/16/23 12:50, Toke Høiland-Jørgensen wrote: > Christian Lamparter <chunkeey@gmail.com> writes: > >> On 4/15/23 18:02, Christian Lamparter wrote: >>> Hi, >>> >>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: >>>> Álvaro Fernández Rojas <noltari@gmail.com> writes: >>>> >>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD >>>>> partitions but it needs to be swapped in order to work, otherwise it fails: >>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002) >>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. >>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 >>>>> ath: phy0: Unable to initialize hardware; initialization status: -22 >>>>> ath9k 0000:00:01.0: Failed to initialize device >>>>> ath9k: probe of 0000:00:01.0 failed with error -22 >>>> >>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set >>>> in the first place? Christian, care to comment on this? >>> >>> I knew this would come up. I've written what I know and remember in the >>> pull-request/buglink. >>> >>> Maybe this can be added to the commit? >>> Link: https://github.com/openwrt/openwrt/pull/12365 >>> >>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. >>> >>> Since the existing request_firmware eeprom fetcher code set the flag, >>> the nvmem code had to do it too. >>> >>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. >>> I don't know if there are devices out there, which have a swapped magic (which is >>> used to detect the endianess), but the caldata is in the correct endiannes (or >>> vice versa - Magic is correct, but data needs swapping). >>> >>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) >>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. >>> So I don't expect there to be a new issue there). >> >> Nope! This is a classic self-own!... Well at least, this now gets documented! >> >> Here are my findings. Please excuse the overlong lines. >> >> ## The good news / AVM FritzBox 7360v2 ## >> >> The good news: The AVM FritzBox 7360v2 worked the same as before. > > [...] > >> ## The not so good news / Netgear WNDR3700v2 ## >> >> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 - >> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with: >> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0" > > [...] > > Alright, so IIUC, we have a situation where some devices only work > *with* the flag, and some devices only work *without* the flag? So we'll > need some kind of platform-specific setting? Could we put this in the > device trees, or is there a better solution? Depends. From what I gather, ath9k calls this "need_swap". Thing is, the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the official documentation about the AR9170 Base EEPROM (has the same base structure as AR5008 up to AR92xx) this is specified as: "Only bit 0 is defined as Big Endian. This bit should be written as 1 when the structure is interpreted in big Endian byte ordering. This bit must be reviewed before any larger than byte parameters can be interpreted." It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2), the caldata should be in "Big-Endian" too... so no swapping is necessary. Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals with this eepmisc flag: | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) { | *swap_needed = true; | ath_dbg(common, EEPROM, | "Big Endian EEPROM detected according to EEPMISC register.\n"); | } else { | *swap_needed = false; | } This doesn't take into consideration that swapping is not needed if the data is in big endian format on a big endian device. So, this could be changed so that the *swap_needed is only true if the flag and device endiannes disagrees? That said, Martin and Felix have written their reasons in the cover letter and patches for why the code is what it is: <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements> Toke, What's your take on this? Having something similar like the check_endian bool... but for OF? Or more logic that can somehow figure out if it's big or little endian. Cheers, Christian
El dom, 16 abr 2023 a las 15:37, Christian Lamparter (<chunkeey@gmail.com>) escribió: > > On 4/16/23 12:50, Toke Høiland-Jørgensen wrote: > > Christian Lamparter <chunkeey@gmail.com> writes: > > > >> On 4/15/23 18:02, Christian Lamparter wrote: > >>> Hi, > >>> > >>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: > >>>> Álvaro Fernández Rojas <noltari@gmail.com> writes: > >>>> > >>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD > >>>>> partitions but it needs to be swapped in order to work, otherwise it fails: > >>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002) > >>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. > >>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 > >>>>> ath: phy0: Unable to initialize hardware; initialization status: -22 > >>>>> ath9k 0000:00:01.0: Failed to initialize device > >>>>> ath9k: probe of 0000:00:01.0 failed with error -22 > >>>> > >>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set > >>>> in the first place? Christian, care to comment on this? > >>> > >>> I knew this would come up. I've written what I know and remember in the > >>> pull-request/buglink. > >>> > >>> Maybe this can be added to the commit? > >>> Link: https://github.com/openwrt/openwrt/pull/12365 > >>> > >>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. > >>> > >>> Since the existing request_firmware eeprom fetcher code set the flag, > >>> the nvmem code had to do it too. > >>> > >>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. > >>> I don't know if there are devices out there, which have a swapped magic (which is > >>> used to detect the endianess), but the caldata is in the correct endiannes (or > >>> vice versa - Magic is correct, but data needs swapping). > >>> > >>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) > >>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. > >>> So I don't expect there to be a new issue there). > >> > >> Nope! This is a classic self-own!... Well at least, this now gets documented! > >> > >> Here are my findings. Please excuse the overlong lines. > >> > >> ## The good news / AVM FritzBox 7360v2 ## > >> > >> The good news: The AVM FritzBox 7360v2 worked the same as before. > > > > [...] > > > >> ## The not so good news / Netgear WNDR3700v2 ## > >> > >> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 - > >> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with: > >> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0" > > > > [...] > > > > Alright, so IIUC, we have a situation where some devices only work > > *with* the flag, and some devices only work *without* the flag? So we'll > > need some kind of platform-specific setting? Could we put this in the > > device trees, or is there a better solution? > > Depends. From what I gather, ath9k calls this "need_swap". Thing is, > the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the > official documentation about the AR9170 Base EEPROM (has the same base > structure as AR5008 up to AR92xx) this is specified as: > > "Only bit 0 is defined as Big Endian. This bit should be written as 1 > when the structure is interpreted in big Endian byte ordering. This bit > must be reviewed before any larger than byte parameters can be interpreted." > > It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2), > the caldata should be in "Big-Endian" too... so no swapping is necessary. > > Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals > with this eepmisc flag: > > | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) { > | *swap_needed = true; > | ath_dbg(common, EEPROM, > | "Big Endian EEPROM detected according to EEPMISC register.\n"); > | } else { > | *swap_needed = false; > | } > > This doesn't take into consideration that swapping is not needed if > the data is in big endian format on a big endian device. So, this > could be changed so that the *swap_needed is only true if the flag and > device endiannes disagrees? > > That said, Martin and Felix have written their reasons in the cover letter > and patches for why the code is what it is: > <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements> > > Toke, What's your take on this? Having something similar like the > check_endian bool... but for OF? Or more logic that can somehow > figure out if it's big or little endian. I already have v2 patches adding a new device tree flag, but I'll wait for Toke's answer before sending them. In my patches I added a new DT flag "qca,endian-check", which follows the one from ath9k_platform_data (endian_check). > > Cheers, > Christian Best regards, Álvaro.
> | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) { > | *swap_needed = true; > | ath_dbg(common, EEPROM, > | "Big Endian EEPROM detected according to EEPMISC register.\n"); > | } else { > | *swap_needed = false; > | } > > This doesn't take into consideration that swapping is not needed if > the data is in big endian format on a big endian device. So, this > could be changed so that the *swap_needed is only true if the flag and > device endiannes disagrees? There are versions of the macro which performs the swap which understands the CPU endianness and become a NOP when it is not required. htons()/ntohs() are the classic examples. So you need to consider: Despite swap_needed being true, it is possible no swap it actually happening, because such a macro is being used. and Maybe using these variant can make the code simpler, by just doing the NOP swap when the CPU endianess does not require it. Andrew
Christian Lamparter <chunkeey@gmail.com> writes: > On 4/16/23 12:50, Toke Høiland-Jørgensen wrote: >> Christian Lamparter <chunkeey@gmail.com> writes: >> >>> On 4/15/23 18:02, Christian Lamparter wrote: >>>> Hi, >>>> >>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: >>>>> Álvaro Fernández Rojas <noltari@gmail.com> writes: >>>>> >>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD >>>>>> partitions but it needs to be swapped in order to work, otherwise it fails: >>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002) >>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. >>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 >>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22 >>>>>> ath9k 0000:00:01.0: Failed to initialize device >>>>>> ath9k: probe of 0000:00:01.0 failed with error -22 >>>>> >>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set >>>>> in the first place? Christian, care to comment on this? >>>> >>>> I knew this would come up. I've written what I know and remember in the >>>> pull-request/buglink. >>>> >>>> Maybe this can be added to the commit? >>>> Link: https://github.com/openwrt/openwrt/pull/12365 >>>> >>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. >>>> >>>> Since the existing request_firmware eeprom fetcher code set the flag, >>>> the nvmem code had to do it too. >>>> >>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. >>>> I don't know if there are devices out there, which have a swapped magic (which is >>>> used to detect the endianess), but the caldata is in the correct endiannes (or >>>> vice versa - Magic is correct, but data needs swapping). >>>> >>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) >>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. >>>> So I don't expect there to be a new issue there). >>> >>> Nope! This is a classic self-own!... Well at least, this now gets documented! >>> >>> Here are my findings. Please excuse the overlong lines. >>> >>> ## The good news / AVM FritzBox 7360v2 ## >>> >>> The good news: The AVM FritzBox 7360v2 worked the same as before. >> >> [...] >> >>> ## The not so good news / Netgear WNDR3700v2 ## >>> >>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 - >>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with: >>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0" >> >> [...] >> >> Alright, so IIUC, we have a situation where some devices only work >> *with* the flag, and some devices only work *without* the flag? So we'll >> need some kind of platform-specific setting? Could we put this in the >> device trees, or is there a better solution? > > Depends. From what I gather, ath9k calls this "need_swap". Thing is, > the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the > official documentation about the AR9170 Base EEPROM (has the same base > structure as AR5008 up to AR92xx) this is specified as: > > "Only bit 0 is defined as Big Endian. This bit should be written as 1 > when the structure is interpreted in big Endian byte ordering. This bit > must be reviewed before any larger than byte parameters can be interpreted." > > It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2), > the caldata should be in "Big-Endian" too... so no swapping is necessary. > > Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals > with this eepmisc flag: > > | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) { > | *swap_needed = true; > | ath_dbg(common, EEPROM, > | "Big Endian EEPROM detected according to EEPMISC register.\n"); > | } else { > | *swap_needed = false; > | } > > This doesn't take into consideration that swapping is not needed if > the data is in big endian format on a big endian device. So, this > could be changed so that the *swap_needed is only true if the flag and > device endiannes disagrees? > > That said, Martin and Felix have written their reasons in the cover letter > and patches for why the code is what it is: > <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements> > > Toke, What's your take on this? Having something similar like the > check_endian bool... but for OF? Or more logic that can somehow > figure out if it's big or little endian. Digging into that old thread, it seems we are re-hashing a lot of the old discussion when those patches went in. Basically, the code you quoted above is correct because the commit that introduced it sets all fields to be __le16 and __le32 types and reads them using the leXX_to_cpu() macros. The code *further up* in that function is what is enabled by Alvaro's patch. Which is a different type of swapping (where the whole eeprom is swab16()'ed, not just the actual multi-byte data fields in them). However, in OpenWrt the in-driver code to do this is not used; instead, a hotplug script applies the swapping before the device is seen by the driver, as described in this commit[0]. Martin indeed mentions that this is a device-specific thing, so the driver can't actually do the right thing without some outside feature flag[1]. The commit[0] also indicates that there used used to exist a device-tree binding in the out-of-tree device trees used in OpenWrt to do the unconditional swab16(). The code in [0] still exists in OpenWrt today, albeit in a somewhat modified form[2]. I guess the question then boils down to, Álvaro, can your issue be resolved by a pre-processing step similar to that which is done in [2]? Or do we need the device tree flag after all? -Toke [0] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=afa37092663d00aa0abf8c61943d9a1b5558b144 [1] https://narkive.com/2q5A6nu0.34 [2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/lantiq/xway/base-files/etc/hotplug.d/firmware/12-ath9k-eeprom;h=98bb9af6947a298775ff7fa26ac6501c57df8378;hb=HEAD
Hi Toke, El dom, 16 abr 2023 a las 23:49, Toke Høiland-Jørgensen (<toke@toke.dk>) escribió: > > Christian Lamparter <chunkeey@gmail.com> writes: > > > On 4/16/23 12:50, Toke Høiland-Jørgensen wrote: > >> Christian Lamparter <chunkeey@gmail.com> writes: > >> > >>> On 4/15/23 18:02, Christian Lamparter wrote: > >>>> Hi, > >>>> > >>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote: > >>>>> Álvaro Fernández Rojas <noltari@gmail.com> writes: > >>>>> > >>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD > >>>>>> partitions but it needs to be swapped in order to work, otherwise it fails: > >>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002) > >>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes. > >>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 > >>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22 > >>>>>> ath9k 0000:00:01.0: Failed to initialize device > >>>>>> ath9k: probe of 0000:00:01.0 failed with error -22 > >>>>> > >>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set > >>>>> in the first place? Christian, care to comment on this? > >>>> > >>>> I knew this would come up. I've written what I know and remember in the > >>>> pull-request/buglink. > >>>> > >>>> Maybe this can be added to the commit? > >>>> Link: https://github.com/openwrt/openwrt/pull/12365 > >>>> > >>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom. > >>>> > >>>> Since the existing request_firmware eeprom fetcher code set the flag, > >>>> the nvmem code had to do it too. > >>>> > >>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc. > >>>> I don't know if there are devices out there, which have a swapped magic (which is > >>>> used to detect the endianess), but the caldata is in the correct endiannes (or > >>>> vice versa - Magic is correct, but data needs swapping). > >>>> > >>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220) > >>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine. > >>>> So I don't expect there to be a new issue there). > >>> > >>> Nope! This is a classic self-own!... Well at least, this now gets documented! > >>> > >>> Here are my findings. Please excuse the overlong lines. > >>> > >>> ## The good news / AVM FritzBox 7360v2 ## > >>> > >>> The good news: The AVM FritzBox 7360v2 worked the same as before. > >> > >> [...] > >> > >>> ## The not so good news / Netgear WNDR3700v2 ## > >>> > >>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 - > >>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with: > >>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0" > >> > >> [...] > >> > >> Alright, so IIUC, we have a situation where some devices only work > >> *with* the flag, and some devices only work *without* the flag? So we'll > >> need some kind of platform-specific setting? Could we put this in the > >> device trees, or is there a better solution? > > > > Depends. From what I gather, ath9k calls this "need_swap". Thing is, > > the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the > > official documentation about the AR9170 Base EEPROM (has the same base > > structure as AR5008 up to AR92xx) this is specified as: > > > > "Only bit 0 is defined as Big Endian. This bit should be written as 1 > > when the structure is interpreted in big Endian byte ordering. This bit > > must be reviewed before any larger than byte parameters can be interpreted." > > > > It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2), > > the caldata should be in "Big-Endian" too... so no swapping is necessary. > > > > Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals > > with this eepmisc flag: > > > > | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) { > > | *swap_needed = true; > > | ath_dbg(common, EEPROM, > > | "Big Endian EEPROM detected according to EEPMISC register.\n"); > > | } else { > > | *swap_needed = false; > > | } > > > > This doesn't take into consideration that swapping is not needed if > > the data is in big endian format on a big endian device. So, this > > could be changed so that the *swap_needed is only true if the flag and > > device endiannes disagrees? > > > > That said, Martin and Felix have written their reasons in the cover letter > > and patches for why the code is what it is: > > <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements> > > > > Toke, What's your take on this? Having something similar like the > > check_endian bool... but for OF? Or more logic that can somehow > > figure out if it's big or little endian. > > Digging into that old thread, it seems we are re-hashing a lot of the > old discussion when those patches went in. Basically, the code you > quoted above is correct because the commit that introduced it sets all > fields to be __le16 and __le32 types and reads them using the > leXX_to_cpu() macros. > > The code *further up* in that function is what is enabled by Alvaro's > patch. Which is a different type of swapping (where the whole eeprom is > swab16()'ed, not just the actual multi-byte data fields in them). > However, in OpenWrt the in-driver code to do this is not used; instead, > a hotplug script applies the swapping before the device is seen by the > driver, as described in this commit[0]. Martin indeed mentions that this > is a device-specific thing, so the driver can't actually do the right > thing without some outside feature flag[1]. The commit[0] also indicates > that there used used to exist a device-tree binding in the out-of-tree > device trees used in OpenWrt to do the unconditional swab16(). > > The code in [0] still exists in OpenWrt today, albeit in a somewhat > modified form[2]. I guess the question then boils down to, Álvaro, can > your issue be resolved by a pre-processing step similar to that which is > done in [2]? Or do we need the device tree flag after all? TBH, yes, it can be solved by a pre-processing step similar to what's done in [2], but then having added nvmem support would make no sense at all for those devices that need swapping, since it's unusable without the flag. So, in my opinion the flag should be added in order to be able to use it without pre-processing the calibration data and to take advantage of nvmem support. I will send the v2 patch and even if it's not accepted I think I will add it as a downstream patch on OpenWrt... > > -Toke > > [0] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=afa37092663d00aa0abf8c61943d9a1b5558b144 > [1] https://narkive.com/2q5A6nu0.34 > [2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/lantiq/xway/base-files/etc/hotplug.d/firmware/12-ath9k-eeprom;h=98bb9af6947a298775ff7fa26ac6501c57df8378;hb=HEAD Best regards, Álvaro.
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c index 4f00400c7ffb..1314fbc55c0b 100644 --- a/drivers/net/wireless/ath/ath9k/init.c +++ b/drivers/net/wireless/ath/ath9k/init.c @@ -615,7 +615,6 @@ static int ath9k_nvmem_request_eeprom(struct ath_softc *sc) ah->nvmem_blob_len = len; ah->ah_flags &= ~AH_USE_EEPROM; - ah->ah_flags |= AH_NO_EEP_SWAP; return 0; }
BCM63xx (Big Endian MIPS) devices store the calibration data in MTD partitions but it needs to be swapped in order to work, otherwise it fails: ath9k 0000:00:01.0: enabling device (0000 -> 0002) ath: phy0: Ignoring endianness difference in EEPROM magic bytes. ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0 ath: phy0: Unable to initialize hardware; initialization status: -22 ath9k 0000:00:01.0: Failed to initialize device ath9k: probe of 0000:00:01.0 failed with error -22 Fixes: eb3a97a69be8 ("ath9k: fetch calibration data via nvmem subsystem") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> --- drivers/net/wireless/ath/ath9k/init.c | 1 - 1 file changed, 1 deletion(-)