[v2] efi/unaccepted: touch soft lockup during memory accept

Message ID	20240411004907.649394-1-yu.c.chen@intel.com
State	Superseded
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE1491C01; Thu, 11 Apr 2024 00:50:38 +0000 (UTC) From: Chen Yu <yu.c.chen@intel.com> To: linux-efi@vger.kernel.org Cc: Ard Biesheuvel <ardb@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Vlastimil Babka <vbabka@suse.cz>, Nikolay Borisov <nik.borisov@suse.com>, Chao Gao <chao.gao@intel.com>, linux-kernel@vger.kernel.org, Chen Yu <yu.c.chen@intel.com>, "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com> Subject: [PATCH v2] efi/unaccepted: touch soft lockup during memory accept Date: Thu, 11 Apr 2024 08:49:07 +0800 Message-Id: <20240411004907.649394-1-yu.c.chen@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v2] efi/unaccepted: touch soft lockup during memory accept \| expand [v2] efi/unaccepted: touch soft lockup during memory accept

Chen Yu April 11, 2024, 12:49 a.m. UTC

Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
by parallel memory acceptance") has released the spinlock so
other CPUs can do memory acceptance in parallel and not
triggers softlockup on other CPUs.

However the softlock up was intermittent shown up if the memory
of the TD guest is large, and the timeout of softlockup is set
to 1 second.

The symptom is:
When the local irq is enabled at the end of accept_memory(),
the softlockup detects that the watchdog on single CPU has
not been fed for a while. That is to say, even other CPUs
will not be blocked by spinlock, the current CPU might be
stunk with local irq disabled for a while, which hurts not
only nmi watchdog but also softlockup.

Chao Gao pointed out that the memory accept could be time
costly and there was similar report before. Thus to avoid
any softlocup detection during this stage, give the
softlockup a flag to skip the timeout check at the end of
accept_memory(), by invoking touch_softlockup_watchdog().

Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v1 -> v2:
	 Refine the commit log and add fixes tag/reviewed-by tag from Kirill.
---
 drivers/firmware/efi/unaccepted_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

Chen Yu April 22, 2024, 2:40 p.m. UTC | #1

On 2024-04-11 at 08:49:07 +0800, Chen Yu wrote:
> Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
> by parallel memory acceptance") has released the spinlock so
> other CPUs can do memory acceptance in parallel and not
> triggers softlockup on other CPUs.
> 
> However the softlock up was intermittent shown up if the memory
> of the TD guest is large, and the timeout of softlockup is set
> to 1 second.
> 
> The symptom is:
> When the local irq is enabled at the end of accept_memory(),
> the softlockup detects that the watchdog on single CPU has
> not been fed for a while. That is to say, even other CPUs
> will not be blocked by spinlock, the current CPU might be
> stunk with local irq disabled for a while, which hurts not
> only nmi watchdog but also softlockup.
> 
> Chao Gao pointed out that the memory accept could be time
> costly and there was similar report before. Thus to avoid
> any softlocup detection during this stage, give the
> softlockup a flag to skip the timeout check at the end of
> accept_memory(), by invoking touch_softlockup_watchdog().
> 
> Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
> Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> v1 -> v2:
> 	 Refine the commit log and add fixes tag/reviewed-by tag from Kirill.

Gently pinging about this patch.

thanks,
Chenyu

Ard Biesheuvel April 24, 2024, 5:12 p.m. UTC | #2

On Mon, 22 Apr 2024 at 16:40, Chen Yu <yu.c.chen@intel.com> wrote:
>
> On 2024-04-11 at 08:49:07 +0800, Chen Yu wrote:
> > Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
> > by parallel memory acceptance") has released the spinlock so
> > other CPUs can do memory acceptance in parallel and not
> > triggers softlockup on other CPUs.
> >
> > However the softlock up was intermittent shown up if the memory
> > of the TD guest is large, and the timeout of softlockup is set
> > to 1 second.
> >
> > The symptom is:
> > When the local irq is enabled at the end of accept_memory(),
> > the softlockup detects that the watchdog on single CPU has
> > not been fed for a while. That is to say, even other CPUs
> > will not be blocked by spinlock, the current CPU might be
> > stunk with local irq disabled for a while, which hurts not
> > only nmi watchdog but also softlockup.
> >
> > Chao Gao pointed out that the memory accept could be time
> > costly and there was similar report before. Thus to avoid
> > any softlocup detection during this stage, give the
> > softlockup a flag to skip the timeout check at the end of
> > accept_memory(), by invoking touch_softlockup_watchdog().
> >
> > Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
> > Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
> > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > ---
> > v1 -> v2:
> >        Refine the commit log and add fixes tag/reviewed-by tag from Kirill.
>
> Gently pinging about this patch.
>

Queued up in efi/urgent now, thanks.

Ard Biesheuvel May 3, 2024, 10:31 a.m. UTC | #3

On Wed, 24 Apr 2024 at 19:12, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 22 Apr 2024 at 16:40, Chen Yu <yu.c.chen@intel.com> wrote:
> >
> > On 2024-04-11 at 08:49:07 +0800, Chen Yu wrote:
> > > Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
> > > by parallel memory acceptance") has released the spinlock so
> > > other CPUs can do memory acceptance in parallel and not
> > > triggers softlockup on other CPUs.
> > >
> > > However the softlock up was intermittent shown up if the memory
> > > of the TD guest is large, and the timeout of softlockup is set
> > > to 1 second.
> > >
> > > The symptom is:
> > > When the local irq is enabled at the end of accept_memory(),
> > > the softlockup detects that the watchdog on single CPU has
> > > not been fed for a while. That is to say, even other CPUs
> > > will not be blocked by spinlock, the current CPU might be
> > > stunk with local irq disabled for a while, which hurts not
> > > only nmi watchdog but also softlockup.
> > >
> > > Chao Gao pointed out that the memory accept could be time
> > > costly and there was similar report before. Thus to avoid
> > > any softlocup detection during this stage, give the
> > > softlockup a flag to skip the timeout check at the end of
> > > accept_memory(), by invoking touch_softlockup_watchdog().
> > >
> > > Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
> > > Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
> > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > > ---
> > > v1 -> v2:
> > >        Refine the commit log and add fixes tag/reviewed-by tag from Kirill.
> >
> > Gently pinging about this patch.
> >
>
> Queued up in efi/urgent now, thanks.

OK, I was about to send this patch to Linus (and I am still going to).

However, I do wonder if sprinkling touch_softlockup_watchdog() left
and right is really the right solution here.

Looking at the backtrace, this is a page fault originating in user
space. So why do we end up calling into the hypervisor to accept a
chunk of memory large enough to trigger the softlockup watchdog? Or is
the hypercall simply taking a disproportionate amount of time?

And AIUI, touch_softlockup_watchdog() hides the fact that we are
hogging the CPU for way too long - is there any way we can at least
yield the CPU on this condition?

Kirill A. Shutemov May 3, 2024, 1:47 p.m. UTC | #4

On Fri, May 03, 2024 at 12:31:12PM +0200, Ard Biesheuvel wrote:
> On Wed, 24 Apr 2024 at 19:12, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Mon, 22 Apr 2024 at 16:40, Chen Yu <yu.c.chen@intel.com> wrote:
> > >
> > > On 2024-04-11 at 08:49:07 +0800, Chen Yu wrote:
> > > > Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
> > > > by parallel memory acceptance") has released the spinlock so
> > > > other CPUs can do memory acceptance in parallel and not
> > > > triggers softlockup on other CPUs.
> > > >
> > > > However the softlock up was intermittent shown up if the memory
> > > > of the TD guest is large, and the timeout of softlockup is set
> > > > to 1 second.
> > > >
> > > > The symptom is:
> > > > When the local irq is enabled at the end of accept_memory(),
> > > > the softlockup detects that the watchdog on single CPU has
> > > > not been fed for a while. That is to say, even other CPUs
> > > > will not be blocked by spinlock, the current CPU might be
> > > > stunk with local irq disabled for a while, which hurts not
> > > > only nmi watchdog but also softlockup.
> > > >
> > > > Chao Gao pointed out that the memory accept could be time
> > > > costly and there was similar report before. Thus to avoid
> > > > any softlocup detection during this stage, give the
> > > > softlockup a flag to skip the timeout check at the end of
> > > > accept_memory(), by invoking touch_softlockup_watchdog().
> > > >
> > > > Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
> > > > Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
> > > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > > > ---
> > > > v1 -> v2:
> > > >        Refine the commit log and add fixes tag/reviewed-by tag from Kirill.
> > >
> > > Gently pinging about this patch.
> > >
> >
> > Queued up in efi/urgent now, thanks.
> 
> OK, I was about to send this patch to Linus (and I am still going to).
> 
> However, I do wonder if sprinkling touch_softlockup_watchdog() left
> and right is really the right solution here.
> 
> Looking at the backtrace, this is a page fault originating in user
> space. So why do we end up calling into the hypervisor to accept a
> chunk of memory large enough to trigger the softlockup watchdog? Or is
> the hypercall simply taking a disproportionate amount of time?

Note that softlockup timeout was set to 1 second to trigger this. So this
is exaggerated case.

> And AIUI, touch_softlockup_watchdog() hides the fact that we are
> hogging the CPU for way too long - is there any way we can at least
> yield the CPU on this condition?

Not really. There's no magic entity that handles accept. It is done by
CPU.

There's a feature in pipeline that makes page accept interruptable in TDX
guest. It can help some cases. But if ended up in this codepath from
non-preemptable context, it won't help.

Chen Yu May 3, 2024, 3 p.m. UTC | #5

On 2024-05-03 at 16:47:49 +0300, Kirill A. Shutemov wrote:
> On Fri, May 03, 2024 at 12:31:12PM +0200, Ard Biesheuvel wrote:
> > On Wed, 24 Apr 2024 at 19:12, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Mon, 22 Apr 2024 at 16:40, Chen Yu <yu.c.chen@intel.com> wrote:
> > > >
> > > > On 2024-04-11 at 08:49:07 +0800, Chen Yu wrote:
> > > > > Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
> > > > > by parallel memory acceptance") has released the spinlock so
> > > > > other CPUs can do memory acceptance in parallel and not
> > > > > triggers softlockup on other CPUs.
> > > > >
> > > > > However the softlock up was intermittent shown up if the memory
> > > > > of the TD guest is large, and the timeout of softlockup is set
> > > > > to 1 second.
> > > > >
> > > > > The symptom is:
> > > > > When the local irq is enabled at the end of accept_memory(),
> > > > > the softlockup detects that the watchdog on single CPU has
> > > > > not been fed for a while. That is to say, even other CPUs
> > > > > will not be blocked by spinlock, the current CPU might be
> > > > > stunk with local irq disabled for a while, which hurts not
> > > > > only nmi watchdog but also softlockup.
> > > > >
> > > > > Chao Gao pointed out that the memory accept could be time
> > > > > costly and there was similar report before. Thus to avoid
> > > > > any softlocup detection during this stage, give the
> > > > > softlockup a flag to skip the timeout check at the end of
> > > > > accept_memory(), by invoking touch_softlockup_watchdog().
> > > > >
> > > > > Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
> > > > > Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
> > > > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > > > > ---
> > > > > v1 -> v2:
> > > > >        Refine the commit log and add fixes tag/reviewed-by tag from Kirill.
> > > >
> > > > Gently pinging about this patch.
> > > >
> > >
> > > Queued up in efi/urgent now, thanks.
> > 
> > OK, I was about to send this patch to Linus (and I am still going to).
> > 
> > However, I do wonder if sprinkling touch_softlockup_watchdog() left
> > and right is really the right solution here.
> > 
> > Looking at the backtrace, this is a page fault originating in user
> > space. So why do we end up calling into the hypervisor to accept a
> > chunk of memory large enough to trigger the softlockup watchdog? Or is
> > the hypercall simply taking a disproportionate amount of time?
> 
> Note that softlockup timeout was set to 1 second to trigger this. So this
> is exaggerated case.
> 
> > And AIUI, touch_softlockup_watchdog() hides the fact that we are
> > hogging the CPU for way too long - is there any way we can at least
> > yield the CPU on this condition?
> 
> Not really. There's no magic entity that handles accept. It is done by
> CPU.
> 
> There's a feature in pipeline that makes page accept interruptable in TDX
> guest. It can help some cases. But if ended up in this codepath from
> non-preemptable context, it won't help.
>

Is it possible to enable the local irq for a little while after
each arch_accept_memory(phys_start, phys_end),
and even split the [phys_start,phys_end] to smaller regions?
so the watchdog can be fed on time/tick is normal. But currently
the softlock fed at the end seems to be more easier to implement.

thanks,
Chenyu

Kirill A. Shutemov May 6, 2024, 9:24 a.m. UTC | #6

On Fri, May 03, 2024 at 11:00:18PM +0800, Chen Yu wrote:
> On 2024-05-03 at 16:47:49 +0300, Kirill A. Shutemov wrote:
> > On Fri, May 03, 2024 at 12:31:12PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 24 Apr 2024 at 19:12, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Mon, 22 Apr 2024 at 16:40, Chen Yu <yu.c.chen@intel.com> wrote:
> > > > >
> > > > > On 2024-04-11 at 08:49:07 +0800, Chen Yu wrote:
> > > > > > Commit 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused
> > > > > > by parallel memory acceptance") has released the spinlock so
> > > > > > other CPUs can do memory acceptance in parallel and not
> > > > > > triggers softlockup on other CPUs.
> > > > > >
> > > > > > However the softlock up was intermittent shown up if the memory
> > > > > > of the TD guest is large, and the timeout of softlockup is set
> > > > > > to 1 second.
> > > > > >
> > > > > > The symptom is:
> > > > > > When the local irq is enabled at the end of accept_memory(),
> > > > > > the softlockup detects that the watchdog on single CPU has
> > > > > > not been fed for a while. That is to say, even other CPUs
> > > > > > will not be blocked by spinlock, the current CPU might be
> > > > > > stunk with local irq disabled for a while, which hurts not
> > > > > > only nmi watchdog but also softlockup.
> > > > > >
> > > > > > Chao Gao pointed out that the memory accept could be time
> > > > > > costly and there was similar report before. Thus to avoid
> > > > > > any softlocup detection during this stage, give the
> > > > > > softlockup a flag to skip the timeout check at the end of
> > > > > > accept_memory(), by invoking touch_softlockup_watchdog().
> > > > > >
> > > > > > Fixes: 50e782a86c98 ("efi/unaccepted: Fix soft lockups caused by parallel memory acceptance")
> > > > > > Reported-by: "Hossain, Md Iqbal" <md.iqbal.hossain@intel.com>
> > > > > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > > > > > ---
> > > > > > v1 -> v2:
> > > > > >        Refine the commit log and add fixes tag/reviewed-by tag from Kirill.
> > > > >
> > > > > Gently pinging about this patch.
> > > > >
> > > >
> > > > Queued up in efi/urgent now, thanks.
> > > 
> > > OK, I was about to send this patch to Linus (and I am still going to).
> > > 
> > > However, I do wonder if sprinkling touch_softlockup_watchdog() left
> > > and right is really the right solution here.
> > > 
> > > Looking at the backtrace, this is a page fault originating in user
> > > space. So why do we end up calling into the hypervisor to accept a
> > > chunk of memory large enough to trigger the softlockup watchdog? Or is
> > > the hypercall simply taking a disproportionate amount of time?
> > 
> > Note that softlockup timeout was set to 1 second to trigger this. So this
> > is exaggerated case.
> > 
> > > And AIUI, touch_softlockup_watchdog() hides the fact that we are
> > > hogging the CPU for way too long - is there any way we can at least
> > > yield the CPU on this condition?
> > 
> > Not really. There's no magic entity that handles accept. It is done by
> > CPU.
> > 
> > There's a feature in pipeline that makes page accept interruptable in TDX
> > guest. It can help some cases. But if ended up in this codepath from
> > non-preemptable context, it won't help.
> >
> 
> Is it possible to enable the local irq for a little while after
> each arch_accept_memory(phys_start, phys_end),
> and even split the [phys_start,phys_end] to smaller regions?
> so the watchdog can be fed on time/tick is normal. But currently
> the softlock fed at the end seems to be more easier to implement.

That's what I did initially. But Vlastimil correctly pointed that it will
lead to deadlock.

https://lore.kernel.org/all/088593ea-e001-fa87-909f-a196b1373ca4@suse.cz/

[v2] efi/unaccepted: touch soft lockup during memory accept

Commit Message

Comments

Patch