Message ID | 20220427224924.592546-5-gpiccoli@igalia.com |
---|---|
State | New |
Headers | show |
Series | The panic notifiers refactor | expand |
On Wed, Apr 27, 2022 at 3:51 PM Guilherme G. Piccoli <gpiccoli@igalia.com> wrote: > > Currently the gsmi driver registers a panic notifier as well as > reboot and die notifiers. The callbacks registered are called in > atomic and very limited context - for instance, panic disables > preemption, local IRQs and all other CPUs that aren't running the > current panic function. > > With that said, taking a spinlock in this scenario is a > dangerous invitation for a deadlock scenario. So, we fix > that in this commit by changing the regular spinlock with > a trylock, which is a safer approach. > > Fixes: 74c5b31c6618 ("driver: Google EFI SMI") > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: David Gow <davidgow@google.com> > Cc: Evan Green <evgreen@chromium.org> > Cc: Julius Werner <jwerner@chromium.org> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > drivers/firmware/google/gsmi.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c > index adaa492c3d2d..b01ed02e4a87 100644 > --- a/drivers/firmware/google/gsmi.c > +++ b/drivers/firmware/google/gsmi.c > @@ -629,7 +629,10 @@ static int gsmi_shutdown_reason(int reason) > if (saved_reason & (1 << reason)) > return 0; > > - spin_lock_irqsave(&gsmi_dev.lock, flags); > + if (!spin_trylock_irqsave(&gsmi_dev.lock, flags)) { > + rc = -EBUSY; > + goto out; > + } gsmi_shutdown_reason() is a common function called in other scenarios as well, like reboot and thermal trip, where it may still make sense to wait to acquire a spinlock. Maybe we should add a parameter to gsmi_shutdown_reason() so that you can get your change on panic, but we don't convert other callbacks into try-fail scenarios causing us to miss logs. Though thinking more about it, is this really a Good Change (TM)? The spinlock itself already disables interrupts, meaning the only case where this change makes a difference is if the panic happens from within the function that grabbed the spinlock (in which case the callback is also likely to panic), or in an NMI that panics within that window. The downside of this change is that if one core was politely working through an event with the lock held, and another core panics, we now might lose the panic log, even though it probably would have gone through fine assuming the other core has a chance to continue. -Evan
On 03/05/2022 15:03, Evan Green wrote: > [...] > gsmi_shutdown_reason() is a common function called in other scenarios > as well, like reboot and thermal trip, where it may still make sense > to wait to acquire a spinlock. Maybe we should add a parameter to > gsmi_shutdown_reason() so that you can get your change on panic, but > we don't convert other callbacks into try-fail scenarios causing us to > miss logs. > Hi Evan, thanks for your feedback, much appreciated! What I've done in other cases like this was to have a helper checking the spinlock in the panic notifier - if we can acquire that, go ahead but if not, bail out. For a proper example of an implementation, check patch 13 of the series: https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . Do you agree with that, or prefer really a parameter in gsmi_shutdown_reason() ? I'll follow your choice =) > Though thinking more about it, is this really a Good Change (TM)? The > spinlock itself already disables interrupts, meaning the only case > where this change makes a difference is if the panic happens from > within the function that grabbed the spinlock (in which case the > callback is also likely to panic), or in an NMI that panics within > that window. The downside of this change is that if one core was > politely working through an event with the lock held, and another core > panics, we now might lose the panic log, even though it probably would > have gone through fine assuming the other core has a chance to > continue. My feeling is that this is a good change, indeed - a lot of places are getting changed like this, in this series. Reasoning: the problem with your example is that, by default, secondary CPUs are disabled in the panic path, through an IPI mechanism. IPIs take precedence and interrupt the work in these CPUs, effectively interrupting the "polite work" with the lock held heh Then, such CPU is put to sleep and we finally reach the panic notifier hereby discussed, in the main CPU. If the other CPU was shut-off *with the lock held*, it's never finishing such work, so the lock is never to be released. Conclusion: the spinlock can't be acquired, hence we broke the machine (which is already broken, given it's panic) in the path of this notifier. This should be really rare, but..possible. So I think we should protect against this scenario. We can grab others' feedback if you prefer, and of course you have the rights to refuse this change in the gsmi code, but from my point-of-view, I don't see any advantage in just assume the risk, specially since the change is very very simple. Cheers, Guilherme
Hi Guilherme, On Tue, May 3, 2022 at 12:12 PM Guilherme G. Piccoli <gpiccoli@igalia.com> wrote: > > On 03/05/2022 15:03, Evan Green wrote: > > [...] > > gsmi_shutdown_reason() is a common function called in other scenarios > > as well, like reboot and thermal trip, where it may still make sense > > to wait to acquire a spinlock. Maybe we should add a parameter to > > gsmi_shutdown_reason() so that you can get your change on panic, but > > we don't convert other callbacks into try-fail scenarios causing us to > > miss logs. > > > > Hi Evan, thanks for your feedback, much appreciated! > What I've done in other cases like this was to have a helper checking > the spinlock in the panic notifier - if we can acquire that, go ahead > but if not, bail out. For a proper example of an implementation, check > patch 13 of the series: > https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . > > Do you agree with that, or prefer really a parameter in > gsmi_shutdown_reason() ? I'll follow your choice =) I'm fine with either, thanks for the link. Mostly I want to make sure other paths to gsmi_shutdown_reason() aren't also converted to a try. > > > > Though thinking more about it, is this really a Good Change (TM)? The > > spinlock itself already disables interrupts, meaning the only case > > where this change makes a difference is if the panic happens from > > within the function that grabbed the spinlock (in which case the > > callback is also likely to panic), or in an NMI that panics within > > that window. The downside of this change is that if one core was > > politely working through an event with the lock held, and another core > > panics, we now might lose the panic log, even though it probably would > > have gone through fine assuming the other core has a chance to > > continue. > > My feeling is that this is a good change, indeed - a lot of places are > getting changed like this, in this series. > > Reasoning: the problem with your example is that, by default, secondary > CPUs are disabled in the panic path, through an IPI mechanism. IPIs take > precedence and interrupt the work in these CPUs, effectively > interrupting the "polite work" with the lock held heh The IPI can only interrupt a CPU with irqs disabled if the IPI is an NMI. I haven't looked before to see if we use NMI IPIs to corral the other CPUs on panic. On x86, I grepped my way down to native_stop_other_cpus(), which looks like it does a normal IPI, waits 1 second, then does an NMI IPI. So, if a secondary CPU has the lock held, on x86 it has roughly 1s to finish what it's doing and re-enable interrupts before smp_send_stop() brings the NMI hammer down. I think this should be more than enough time for the secondary CPU to get out and release the lock. So then it makes sense to me that you're fixing cases where we panicked with the lock held, or hung with the lock held. Given the 1 second grace period x86 gives us, I'm on board, as that helps mitigate the risk that we bailed out early with the try and should have spun a bit longer instead. Thanks. -Evan > > Then, such CPU is put to sleep and we finally reach the panic notifier > hereby discussed, in the main CPU. If the other CPU was shut-off *with > the lock held*, it's never finishing such work, so the lock is never to > be released. Conclusion: the spinlock can't be acquired, hence we broke > the machine (which is already broken, given it's panic) in the path of > this notifier. > This should be really rare, but..possible. So I think we should protect > against this scenario. > > We can grab others' feedback if you prefer, and of course you have the > rights to refuse this change in the gsmi code, but from my > point-of-view, I don't see any advantage in just assume the risk, > specially since the change is very very simple. > > Cheers, > > > Guilherme
On 03/05/2022 18:56, Evan Green wrote: > Hi Guilherme, > [...] >> Do you agree with that, or prefer really a parameter in >> gsmi_shutdown_reason() ? I'll follow your choice =) > > I'm fine with either, thanks for the link. Mostly I want to make sure > other paths to gsmi_shutdown_reason() aren't also converted to a try. Hi Evan, thanks for the prompt response! So, I'll proceed like I did in s390, for consistency. > [...] >> Reasoning: the problem with your example is that, by default, secondary >> CPUs are disabled in the panic path, through an IPI mechanism. IPIs take >> precedence and interrupt the work in these CPUs, effectively >> interrupting the "polite work" with the lock held heh > > The IPI can only interrupt a CPU with irqs disabled if the IPI is an > NMI. I haven't looked before to see if we use NMI IPIs to corral the > other CPUs on panic. On x86, I grepped my way down to > native_stop_other_cpus(), which looks like it does a normal IPI, waits > 1 second, then does an NMI IPI. So, if a secondary CPU has the lock > held, on x86 it has roughly 1s to finish what it's doing and re-enable > interrupts before smp_send_stop() brings the NMI hammer down. I think > this should be more than enough time for the secondary CPU to get out > and release the lock. > > So then it makes sense to me that you're fixing cases where we > panicked with the lock held, or hung with the lock held. Given the 1 > second grace period x86 gives us, I'm on board, as that helps mitigate > the risk that we bailed out early with the try and should have spun a > bit longer instead. Thanks. > > -Evan Well, in the old path without "crash_kexec_post_notifiers", we indeed end-up relying on native_stop_other_cpus() for x86 as you said, and the "1s rule" makes sense. But after this series (or even before, if the kernel parameter "crash_kexec_post_notifiers" was used) the function used to stop CPUs in the panic path is crash_smp_send_stop(), and the call chain is like: Main CPU: crash_smp_send_stop() --kdump_nmi_shootdown_cpus() ----nmi_shootdown_cpus() Then, in each CPU (except the main one, running panic() path), we execute kdump_nmi_callback() in NMI context. So, we seem to indeed interrupt any context (even with IRQs disabled), increasing the likelihood of the potential lockups due to stopped CPUs holding the locks heheh Thanks again for the good discussion, let me know if anything I'm saying doesn't make sense - this crash path is a bit convoluted, specially in x86, I might have understood something wrongly =) Cheers, Guilherme
On Tue 2022-05-03 16:12:09, Guilherme G. Piccoli wrote: > On 03/05/2022 15:03, Evan Green wrote: > > [...] > > gsmi_shutdown_reason() is a common function called in other scenarios > > as well, like reboot and thermal trip, where it may still make sense > > to wait to acquire a spinlock. Maybe we should add a parameter to > > gsmi_shutdown_reason() so that you can get your change on panic, but > > we don't convert other callbacks into try-fail scenarios causing us to > > miss logs. > > > > Hi Evan, thanks for your feedback, much appreciated! > What I've done in other cases like this was to have a helper checking > the spinlock in the panic notifier - if we can acquire that, go ahead > but if not, bail out. For a proper example of an implementation, check > patch 13 of the series: > https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . > > Do you agree with that, or prefer really a parameter in > gsmi_shutdown_reason() ? I'll follow your choice =) I see two more alternative solutions: 1st variant is a trick already used in console write() callbacks. They do trylock() when oops_in_progress is set. They remember the result to prevent double unlock when printing Oops messages and the system will try to continue working. For example: pl011_console_write(struct console *co, const char *s, unsigned int count) { [...] int locked = 1; [...] if (uap->port.sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock(&uap->port.lock); else spin_lock(&uap->port.lock); [...] if (locked) spin_unlock(&uap->port.lock); } 2nd variant is to check panic_cpu variable. It is used in printk.c. We might move the function to panic.h: static bool panic_in_progress(void) { return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID); } and then do: if (panic_in_progress()) { ... > > Though thinking more about it, is this really a Good Change (TM)? The > > spinlock itself already disables interrupts, meaning the only case > > where this change makes a difference is if the panic happens from > > within the function that grabbed the spinlock (in which case the > > callback is also likely to panic), or in an NMI that panics within > > that window. As already mentioned in the other reply, panic() sometimes stops the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus(). Another situation is when the CPU using the lock ends in some infinite loop because something went wrong. The system is in an unpredictable state during panic(). I am not sure if this is possible with the code under gsmi_dev.lock but such things really happen during panic() in other subsystems. Using trylock in the panic() code path is a good practice. Best Regards, Petr
On 10/05/2022 08:38, Petr Mladek wrote: > [...] > I see two more alternative solutions: > > 1st variant is a trick already used in console write() callbacks. > They do trylock() when oops_in_progress is set. They remember > the result to prevent double unlock when printing Oops messages and > the system will try to continue working. For example: > > pl011_console_write(struct console *co, const char *s, unsigned int count) > { > [...] > int locked = 1; > [...] > if (uap->port.sysrq) > locked = 0; > else if (oops_in_progress) > locked = spin_trylock(&uap->port.lock); > else > spin_lock(&uap->port.lock); > > [...] > > if (locked) > spin_unlock(&uap->port.lock); > } > > > 2nd variant is to check panic_cpu variable. It is used in printk.c. > We might move the function to panic.h: > > static bool panic_in_progress(void) > { > return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID); > } > > and then do: > > if (panic_in_progress()) { > ... Thanks for the review Petr! I feel alternative two is way better, it checks for panic - the oops_in_progress isn't really enough, since we can call panic() directly, not necessarily through an oops path, correct? For me, we could stick with the lock check, but I'll defer to Evan - I didn't work the V2 patch yet, what do you prefer Evan? > [...] > As already mentioned in the other reply, panic() sometimes stops > the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus(). > > Another situation is when the CPU using the lock ends in some > infinite loop because something went wrong. The system is in > an unpredictable state during panic(). > > I am not sure if this is possible with the code under gsmi_dev.lock > but such things really happen during panic() in other subsystems. > Using trylock in the panic() code path is a good practice. > > Best Regards, > Petr Makes total sense, thanks for confirming! Cheers, Guilherme
On Tue, 10 May 2022 13:38:39 +0200 Petr Mladek <pmladek@suse.com> wrote: > As already mentioned in the other reply, panic() sometimes stops > the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus(). > > Another situation is when the CPU using the lock ends in some > infinite loop because something went wrong. The system is in > an unpredictable state during panic(). > > I am not sure if this is possible with the code under gsmi_dev.lock > but such things really happen during panic() in other subsystems. > Using trylock in the panic() code path is a good practice. I believe that Peter Zijlstra had a special spin lock for NMIs or early printk, where it would not block if the lock was held on the same CPU. That is, if an NMI happened and paniced while this lock was held on the same CPU, it would not deadlock. But it would block if the lock was held on another CPU. -- Steve
On 2022-05-10, Steven Rostedt <rostedt@goodmis.org> wrote: >> As already mentioned in the other reply, panic() sometimes stops the >> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus(). >> >> Another situation is when the CPU using the lock ends in some >> infinite loop because something went wrong. The system is in >> an unpredictable state during panic(). >> >> I am not sure if this is possible with the code under gsmi_dev.lock >> but such things really happen during panic() in other subsystems. >> Using trylock in the panic() code path is a good practice. > > I believe that Peter Zijlstra had a special spin lock for NMIs or > early printk, where it would not block if the lock was held on the > same CPU. That is, if an NMI happened and paniced while this lock was > held on the same CPU, it would not deadlock. But it would block if the > lock was held on another CPU. Yes. And starting with 5.19 it will be carrying the name that _you_ came up with (cpu_sync): printk_cpu_sync_get_irqsave() printk_cpu_sync_put_irqrestore() John
On Tue 2022-05-10 21:46:38, John Ogness wrote: > On 2022-05-10, Steven Rostedt <rostedt@goodmis.org> wrote: > >> As already mentioned in the other reply, panic() sometimes stops the > >> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus(). > >> > >> Another situation is when the CPU using the lock ends in some > >> infinite loop because something went wrong. The system is in > >> an unpredictable state during panic(). > >> > >> I am not sure if this is possible with the code under gsmi_dev.lock > >> but such things really happen during panic() in other subsystems. > >> Using trylock in the panic() code path is a good practice. > > > > I believe that Peter Zijlstra had a special spin lock for NMIs or > > early printk, where it would not block if the lock was held on the > > same CPU. That is, if an NMI happened and paniced while this lock was > > held on the same CPU, it would not deadlock. But it would block if the > > lock was held on another CPU. > > Yes. And starting with 5.19 it will be carrying the name that _you_ came > up with (cpu_sync): > > printk_cpu_sync_get_irqsave() > printk_cpu_sync_put_irqrestore() There is a risk that this lock might become a big kernel lock. This special lock would need to be used even during normal system operation. It does not make sense to suddenly start using another lock during panic. So I think that we should think twice before using it. I would prefer using trylock of the original lock when possible during panic. It is possible that I miss something. Best Regards, Petr
diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c index adaa492c3d2d..b01ed02e4a87 100644 --- a/drivers/firmware/google/gsmi.c +++ b/drivers/firmware/google/gsmi.c @@ -629,7 +629,10 @@ static int gsmi_shutdown_reason(int reason) if (saved_reason & (1 << reason)) return 0; - spin_lock_irqsave(&gsmi_dev.lock, flags); + if (!spin_trylock_irqsave(&gsmi_dev.lock, flags)) { + rc = -EBUSY; + goto out; + } saved_reason |= (1 << reason); @@ -646,6 +649,7 @@ static int gsmi_shutdown_reason(int reason) spin_unlock_irqrestore(&gsmi_dev.lock, flags); +out: if (rc < 0) printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n"); else
Currently the gsmi driver registers a panic notifier as well as reboot and die notifiers. The callbacks registered are called in atomic and very limited context - for instance, panic disables preemption, local IRQs and all other CPUs that aren't running the current panic function. With that said, taking a spinlock in this scenario is a dangerous invitation for a deadlock scenario. So, we fix that in this commit by changing the regular spinlock with a trylock, which is a safer approach. Fixes: 74c5b31c6618 ("driver: Google EFI SMI") Cc: Ard Biesheuvel <ardb@kernel.org> Cc: David Gow <davidgow@google.com> Cc: Evan Green <evgreen@chromium.org> Cc: Julius Werner <jwerner@chromium.org> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> --- drivers/firmware/google/gsmi.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)