mbox series

[v2,0/3] The UEFI panic notification mechanism, 2nd round

Message ID 20220729194532.228403-1-gpiccoli@igalia.com
Headers show
Series The UEFI panic notification mechanism, 2nd round | expand

Message

Guilherme G. Piccoli July 29, 2022, 7:45 p.m. UTC
Hey folks, this is the 2nd iteration of the patchset adding a simple
mechanism to notify the UEFI firmware about a panic event in the kernel.
V1 here:
https://lore.kernel.org/lkml/20220520195028.1347426-1-gpiccoli@igalia.com/


First thing, the differences in this V2:

- Ardb response in V1 mentioned a refactor aimed for v5.20 that removes an
obsolete/confusing way of setting EFI variables - this led to a weird
condition, deleted variables stayed in sysfs after deletion. Well, I've
refactored the code based on efi/next, so I'm using the recommended API
now - thanks a bunch for the advice Ardb!

- I've changed NULL-terminating char in patch 1 to the format I've seen
in Ardb's code, L'\0'.

- Patch 2 is new, it's somewhat a fix for a patch only in efi/next, part
of the efivar refactor.


In the V1 review, it was mentioned we could maybe use efi-pstore as a way
to signal the firmware about a panic event - in the end, the efi-pstore
mechanism can collect a dmesg, so it's even richer in the information level.
But I disagree that it is the way to go, for 3 main reasons:

a) efi-pstore could be impossible to use, if the users are already using
another pstore backend (like ramoops), which is _exactly_ our case!
Of course, we could rework pstore and allow 2 backends, quite a bit of work,
but...see next points!

b) Even if (a) is a not an issue, we have another one, even more important:
signaling the firmware about a panic is *different* than collecting a bunch
of data, a full dmesg even. This could be considered a security issue for
some users; also, the dmesg collected consumes a bunch more memory in the
(potentially scarce) UEFI available memory.
Although related, the goal of pstore is orthogonal to our mechanism here:
users rely on pstore to collect data, our proposal is a simple infrastructure
to just let the firmware know about a panic. Our kernel module also shows a
message and automatically clears the UEFI variable, so it tracks a single
panic, whereas efi-pstore logs are kept by default, in order to provide
data to users.

c) Finally, it's faster and less "invasive"/risky to just write a byte in a
variable on a panic event than having a ksmg dumper collecting the full dmesg
and writing it to the UEFI memory; again, some users wish to have the logs,
but not all of them.


With all of that said, I think this module makes sense, it's a very simple
solution that opens doors to firmware panic handling approaches, like in our
proposed case (a different splash screen on panic).

Finally, the variable name (PanicWarn) and value (0xFF by default, can be
changed by a module parameter) are just my personal choices but I'm open to
suggestions, not strongly attached to them heh

Thanks again for the reviews/suggestions!
Cheers,


Guilherme


Guilherme G. Piccoli (3):
  efi: Add a generic helper to convert strings to unicode
  efi: efibc: Guard against allocation failure
  efi-panic: Introduce the UEFI panic notification mechanism

 drivers/firmware/efi/Kconfig      | 10 ++++
 drivers/firmware/efi/Makefile     |  1 +
 drivers/firmware/efi/efi-panic.c  | 89 +++++++++++++++++++++++++++++++
 drivers/firmware/efi/efi-pstore.c |  4 +-
 drivers/firmware/efi/efibc.c      | 11 ++--
 include/linux/efi.h               | 18 +++++++
 6 files changed, 125 insertions(+), 8 deletions(-)
 create mode 100644 drivers/firmware/efi/efi-panic.c

Comments

Guilherme G. Piccoli Aug. 29, 2022, 1:04 p.m. UTC | #1
On 29/07/2022 16:45, Guilherme G. Piccoli wrote:
> Hey folks, this is the 2nd iteration of the patchset adding a simple
> mechanism to notify the UEFI firmware about a panic event in the kernel.
> V1 here:
> https://lore.kernel.org/lkml/20220520195028.1347426-1-gpiccoli@igalia.com/
> 
> 
> First thing, the differences in this V2:
> 
> - Ardb response in V1 mentioned a refactor aimed for v5.20 that removes an
> obsolete/confusing way of setting EFI variables - this led to a weird
> condition, deleted variables stayed in sysfs after deletion. Well, I've
> refactored the code based on efi/next, so I'm using the recommended API
> now - thanks a bunch for the advice Ardb!
> 
> - I've changed NULL-terminating char in patch 1 to the format I've seen
> in Ardb's code, L'\0'.
> 
> - Patch 2 is new, it's somewhat a fix for a patch only in efi/next, part
> of the efivar refactor.
> 
> 
> In the V1 review, it was mentioned we could maybe use efi-pstore as a way
> to signal the firmware about a panic event - in the end, the efi-pstore
> mechanism can collect a dmesg, so it's even richer in the information level.
> But I disagree that it is the way to go, for 3 main reasons:
> 
> a) efi-pstore could be impossible to use, if the users are already using
> another pstore backend (like ramoops), which is _exactly_ our case!
> Of course, we could rework pstore and allow 2 backends, quite a bit of work,
> but...see next points!
> 
> b) Even if (a) is a not an issue, we have another one, even more important:
> signaling the firmware about a panic is *different* than collecting a bunch
> of data, a full dmesg even. This could be considered a security issue for
> some users; also, the dmesg collected consumes a bunch more memory in the
> (potentially scarce) UEFI available memory.
> Although related, the goal of pstore is orthogonal to our mechanism here:
> users rely on pstore to collect data, our proposal is a simple infrastructure
> to just let the firmware know about a panic. Our kernel module also shows a
> message and automatically clears the UEFI variable, so it tracks a single
> panic, whereas efi-pstore logs are kept by default, in order to provide
> data to users.
> 
> c) Finally, it's faster and less "invasive"/risky to just write a byte in a
> variable on a panic event than having a ksmg dumper collecting the full dmesg
> and writing it to the UEFI memory; again, some users wish to have the logs,
> but not all of them.
> 
> 
> With all of that said, I think this module makes sense, it's a very simple
> solution that opens doors to firmware panic handling approaches, like in our
> proposed case (a different splash screen on panic).
> 
> Finally, the variable name (PanicWarn) and value (0xFF by default, can be
> changed by a module parameter) are just my personal choices but I'm open to
> suggestions, not strongly attached to them heh
> 
> Thanks again for the reviews/suggestions!
> Cheers,
> 
> 
> Guilherme
> 
> 

Hi Ard, sorry for the ping =]

Any opinions in this one? Patch 2 is a simple fix, BTW.
Cheers,


Guilherme
Ard Biesheuvel Sept. 5, 2022, 9:50 a.m. UTC | #2
On Mon, 29 Aug 2022 at 15:04, Guilherme G. Piccoli <gpiccoli@igalia.com> wrote:
>
> On 29/07/2022 16:45, Guilherme G. Piccoli wrote:
> > Hey folks, this is the 2nd iteration of the patchset adding a simple
> > mechanism to notify the UEFI firmware about a panic event in the kernel.
> > V1 here:
> > https://lore.kernel.org/lkml/20220520195028.1347426-1-gpiccoli@igalia.com/
> >
> >
> > First thing, the differences in this V2:
> >
> > - Ardb response in V1 mentioned a refactor aimed for v5.20 that removes an
> > obsolete/confusing way of setting EFI variables - this led to a weird
> > condition, deleted variables stayed in sysfs after deletion. Well, I've
> > refactored the code based on efi/next, so I'm using the recommended API
> > now - thanks a bunch for the advice Ardb!
> >
> > - I've changed NULL-terminating char in patch 1 to the format I've seen
> > in Ardb's code, L'\0'.
> >
> > - Patch 2 is new, it's somewhat a fix for a patch only in efi/next, part
> > of the efivar refactor.
> >
> >
> > In the V1 review, it was mentioned we could maybe use efi-pstore as a way
> > to signal the firmware about a panic event - in the end, the efi-pstore
> > mechanism can collect a dmesg, so it's even richer in the information level.
> > But I disagree that it is the way to go, for 3 main reasons:
> >
> > a) efi-pstore could be impossible to use, if the users are already using
> > another pstore backend (like ramoops), which is _exactly_ our case!
> > Of course, we could rework pstore and allow 2 backends, quite a bit of work,
> > but...see next points!
> >
> > b) Even if (a) is a not an issue, we have another one, even more important:
> > signaling the firmware about a panic is *different* than collecting a bunch
> > of data, a full dmesg even. This could be considered a security issue for
> > some users; also, the dmesg collected consumes a bunch more memory in the
> > (potentially scarce) UEFI available memory.
> > Although related, the goal of pstore is orthogonal to our mechanism here:
> > users rely on pstore to collect data, our proposal is a simple infrastructure
> > to just let the firmware know about a panic. Our kernel module also shows a
> > message and automatically clears the UEFI variable, so it tracks a single
> > panic, whereas efi-pstore logs are kept by default, in order to provide
> > data to users.
> >
> > c) Finally, it's faster and less "invasive"/risky to just write a byte in a
> > variable on a panic event than having a ksmg dumper collecting the full dmesg
> > and writing it to the UEFI memory; again, some users wish to have the logs,
> > but not all of them.
> >
> >
> > With all of that said, I think this module makes sense, it's a very simple
> > solution that opens doors to firmware panic handling approaches, like in our
> > proposed case (a different splash screen on panic).
> >
> > Finally, the variable name (PanicWarn) and value (0xFF by default, can be
> > changed by a module parameter) are just my personal choices but I'm open to
> > suggestions, not strongly attached to them heh
> >
> > Thanks again for the reviews/suggestions!
> > Cheers,
> >
> >
> > Guilherme
> >
> >
>
> Hi Ard, sorry for the ping =]
>
> Any opinions in this one? Patch 2 is a simple fix, BTW.

Hey,

No worries about the ping - apologies for the late response, I was on vacation.

I still don't see the point of this series, but I will take the fix if
you could please rebase it so it doesn't depend on the first patch.

Thanks,
Ard.
Guilherme G. Piccoli Sept. 9, 2022, 7:54 p.m. UTC | #3
On 05/09/2022 06:50, Ard Biesheuvel wrote:
> [...]
>> Hi Ard, sorry for the ping =]
>>
>> Any opinions in this one? Patch 2 is a simple fix, BTW.
> 
> Hey,
> 
> No worries about the ping - apologies for the late response, I was on vacation.

Hi Ard, no need for apologies at all - this is a known vacation period
in the year =)


> 
> I still don't see the point of this series, but I will take the fix if
> you could please rebase it so it doesn't depend on the first patch.
> 

That part is sad heheh
I mean, it's a pity I could convince you - I still don't see how can
kernel let UEFI know about a panic without this patch (or pstore, which
I still think is the wrong way and sometimes impossible to use, due to
other pstore backend usage).

Anyway, nothing I can do about that it seems, unfortunately heheh

Submitted the V2 fix here:
https://lore.kernel.org/linux-efi/20220909194214.186731-1-gpiccoli@igalia.com/

Thanks,


Guilherme