Message ID | CAKv+Gu8LEFvdu99vCHbHGPmGLpHWdT+Wb2DECzBEEg8g_yynvg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 5 November 2014 11:54, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Hi Will, > You are not going to believe this, but I received a report about boot failures on legacy (non-EFI) x86 using the SMBIOS 3.0 code, so could you please hold off on pulling this until we know what is going on there? This does not affect ARM at all, but I may need to update patch 'dmi: add support for SMBIOS 3.0 64-bit entry point' to drop the SMBIOS 3.0 check in the non-EFI code path.
On Fri, Nov 07, 2014 at 07:34:12AM +0000, Ard Biesheuvel wrote: > You are not going to believe this, but I received a report about boot > failures on legacy (non-EFI) x86 using the SMBIOS 3.0 code, so could > you please hold off on pulling this until we know what is going on > there? This does not affect ARM at all, but I may need to update patch > 'dmi: add support for SMBIOS 3.0 64-bit entry point' to drop the > SMBIOS 3.0 check in the non-EFI code path. D'oh, I put this into -next last night (perhaps that's where the report came from), so I'll recreate that branch now without your series. Please send another pull request without the SMBIOS patch, then we can perhaps add that on top later on (that way it's easier to revert in the future :). Will
On Fri, Nov 07, 2014 at 10:04:35AM +0000, Will Deacon wrote: > On Fri, Nov 07, 2014 at 07:34:12AM +0000, Ard Biesheuvel wrote: > > You are not going to believe this, but I received a report about boot > > failures on legacy (non-EFI) x86 using the SMBIOS 3.0 code, so could > > you please hold off on pulling this until we know what is going on > > there? This does not affect ARM at all, but I may need to update patch > > 'dmi: add support for SMBIOS 3.0 64-bit entry point' to drop the > > SMBIOS 3.0 check in the non-EFI code path. > > D'oh, I put this into -next last night (perhaps that's where the report came > from), so I'll recreate that branch now without your series. > > Please send another pull request without the SMBIOS patch, then we can > perhaps add that on top later on (that way it's easier to revert in the > future :). You have a typo: s/perhaps/never/ ;)
On 7 November 2014 11:04, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Nov 07, 2014 at 07:34:12AM +0000, Ard Biesheuvel wrote: >> You are not going to believe this, but I received a report about boot >> failures on legacy (non-EFI) x86 using the SMBIOS 3.0 code, so could >> you please hold off on pulling this until we know what is going on >> there? This does not affect ARM at all, but I may need to update patch >> 'dmi: add support for SMBIOS 3.0 64-bit entry point' to drop the >> SMBIOS 3.0 check in the non-EFI code path. > > D'oh, I put this into -next last night (perhaps that's where the report came > from), so I'll recreate that branch now without your series. > Ah, yes, that is probably where they found it. Note that this is a different patch than the one that has been causing us (me) grief before, but apparently it now has infected other patches as well :-) > Please send another pull request without the SMBIOS patch, then we can > perhaps add that on top later on (that way it's easier to revert in the > future :). > In the mean time, we have pinpointed this. It turns out that on x86, the RHS of this expression u64 dmi_base = get_unaligned_le32(buf + 8); is promoted to a signed type before being assigned, even if get_unaligned_le32() returns u32. On ARM, it works as expected. I have confirmation from Matt and another Intel engineer that adding a (arguably redundant) 'u32' cast solves the issue. Should I include the corrected patch and send an updated pull request? Or if you and Matt prefer so, we could take this patch and the preceding one (efi: dmi: add support for SMBIOS 3.0 UEFI configuration table) through Matt's tree as well. There are no interdependencies between those two and the other patches.
On Fri, 2014-11-07 at 11:29 +0100, Ard Biesheuvel wrote: > > I have confirmation from Matt and another Intel engineer that adding a > (arguably redundant) 'u32' cast solves the issue. I think we need to get a better handle on this. It is completely surprising to me that type promotion from u32 to u64 goes through sign extension. And by "surprising" I mean, it sounds wrong. Ard, if you could throw me a unified diff of objdump -dr vmlinux, with and without the u32 cast, I'll take a look at figuring out what's happening.
On 7 November 2014 11:48, Matt Fleming <matt.fleming@intel.com> wrote: > On Fri, 2014-11-07 at 11:29 +0100, Ard Biesheuvel wrote: >> >> I have confirmation from Matt and another Intel engineer that adding a >> (arguably redundant) 'u32' cast solves the issue. > > I think we need to get a better handle on this. > > It is completely surprising to me that type promotion from u32 to u64 > goes through sign extension. > > And by "surprising" I mean, it sounds wrong. > > Ard, if you could throw me a unified diff of objdump -dr vmlinux, with > and without the u32 cast, I'll take a look at figuring out what's > happening. > With my compiler gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9) the objdump of drivers/firmware/dmi_scan.o is identical with and without the 'u32' cast. If I look at the original code: dmi_base = (buf[11] << 24) | (buf[10] << 16) | (buf[9] << 8) | buf[8]; I see a 'cltq' instruction in the dump that disappears once I add the 'u32' cast (which is what we addressed by introducing the get_unaligned_le32() in the 1st place) So, I'd happily share more objdumps, but perhaps we should find out first which compiler Yuanhan has been using?
On Fri, Nov 07, 2014 at 12:06:02PM +0100, Ard Biesheuvel wrote: > On 7 November 2014 11:48, Matt Fleming <matt.fleming@intel.com> wrote: > > On Fri, 2014-11-07 at 11:29 +0100, Ard Biesheuvel wrote: > >> > >> I have confirmation from Matt and another Intel engineer that adding a > >> (arguably redundant) 'u32' cast solves the issue. > > > > I think we need to get a better handle on this. > > > > It is completely surprising to me that type promotion from u32 to u64 > > goes through sign extension. > > > > And by "surprising" I mean, it sounds wrong. > > > > Ard, if you could throw me a unified diff of objdump -dr vmlinux, with > > and without the u32 cast, I'll take a look at figuring out what's > > happening. > > > > With my compiler > > gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9) > > the objdump of drivers/firmware/dmi_scan.o is identical with and > without the 'u32' cast. > > If I look at the original code: > > dmi_base = (buf[11] << 24) | (buf[10] << 16) | (buf[9] << 8) | buf[8]; > > I see a 'cltq' instruction in the dump that disappears once I add the > 'u32' cast (which is what we addressed by introducing the > get_unaligned_le32() in the 1st place) Yes, that's another mistake I made :( The story is that LKP reports this issue with an old commit: aacdce6e880894acb57d71dcb2e3fc61b4ed4e96("dmi: add support for SMBIOS 3.0 64-bit entry point"), where still use the original code you showed above: dmi_num = (buf[13] << 8) | buf[12]; dmi_len = (buf[7] << 8) | buf[6]; dmi_base = (buf[11] << 24) | (buf[10] << 16) | (buf[9] << 8) | buf[8]; I didn't except there are two version, thus when it failed to apply the debug patch you gave me, I just thought you wrote the debug patch on top your branch, and I applied on based of the first bad commit("aacdce6e"). Hence I fixed it manually and got your debug patch applied, but still with those original code, hence it never works. And then you ask me to do the (u32) cast, which is based on get_unaligned_le32(), I then changed the original code to get_unaligned_le16/32(), and did the cast, which works as expected, and I then thought the cast did work. And actually, it's the change to get_unaligned_le16/32() works: I double confirmed it this time, and that's why I didn't see such panic with your updated commits from efi-for-arm64(the code we used bisect is from efi-for-3.19). So, it's totally kind of stupid mistakes I made. Very sorry for the noise and for taking you guys time! --yliu > So, I'd happily share more objdumps, but perhaps we should find out > first which compiler Yuanhan has been using? > > -- > Ard.
On 7 November 2014 12:24, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Fri, Nov 07, 2014 at 12:06:02PM +0100, Ard Biesheuvel wrote: >> On 7 November 2014 11:48, Matt Fleming <matt.fleming@intel.com> wrote: >> > On Fri, 2014-11-07 at 11:29 +0100, Ard Biesheuvel wrote: >> >> >> >> I have confirmation from Matt and another Intel engineer that adding a >> >> (arguably redundant) 'u32' cast solves the issue. >> > >> > I think we need to get a better handle on this. >> > >> > It is completely surprising to me that type promotion from u32 to u64 >> > goes through sign extension. >> > >> > And by "surprising" I mean, it sounds wrong. >> > >> > Ard, if you could throw me a unified diff of objdump -dr vmlinux, with >> > and without the u32 cast, I'll take a look at figuring out what's >> > happening. >> > >> >> With my compiler >> >> gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9) >> >> the objdump of drivers/firmware/dmi_scan.o is identical with and >> without the 'u32' cast. >> >> If I look at the original code: >> >> dmi_base = (buf[11] << 24) | (buf[10] << 16) | (buf[9] << 8) | buf[8]; >> >> I see a 'cltq' instruction in the dump that disappears once I add the >> 'u32' cast (which is what we addressed by introducing the >> get_unaligned_le32() in the 1st place) > > Yes, that's another mistake I made :( > > The story is that LKP reports this issue with an old commit: > aacdce6e880894acb57d71dcb2e3fc61b4ed4e96("dmi: add support for SMBIOS > 3.0 64-bit entry point"), where still use the original code you showed > above: > > dmi_num = (buf[13] << 8) | buf[12]; > dmi_len = (buf[7] << 8) | buf[6]; > dmi_base = (buf[11] << 24) | (buf[10] << 16) | > (buf[9] << 8) | buf[8]; > > I didn't except there are two version, thus when it failed to apply the > debug patch you gave me, I just thought you wrote the debug patch on top > your branch, and I applied on based of the first bad commit("aacdce6e"). > > Hence I fixed it manually and got your debug patch applied, but still > with those original code, hence it never works. > > And then you ask me to do the (u32) cast, which is based on > get_unaligned_le32(), I then changed the original code to > get_unaligned_le16/32(), and did the cast, which works as expected, and > I then thought the cast did work. And actually, it's the change to > get_unaligned_le16/32() works: I double confirmed it this time, and > that's why I didn't see such panic with your updated commits from > efi-for-arm64(the code we used bisect is from efi-for-3.19). > > > So, it's totally kind of stupid mistakes I made. Very sorry for the > noise and for taking you guys time! > No worries. At the very least, we are a bit more confident now than before that the code is fine. Thanks, Ard.