diff mbox series

arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()

Message ID 20210514095001.13236-1-catalin.marinas@arm.com
State Accepted
Commit 588a513d34257fdde95a9f0df0202e31998e85c6
Headers show
Series arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() | expand

Commit Message

Catalin Marinas May 14, 2021, 9:50 a.m. UTC
To ensure that instructions are observable in a new mapping, the arm64
set_pte_at() implementation cleans the D-cache and invalidates the
I-cache to the PoU. As an optimisation, this is only done on executable
mappings and the PG_dcache_clean page flag is set to avoid future cache
maintenance on the same page.

When two different processes map the same page (e.g. private executable
file or shared mapping) there's a potential race on checking and setting
PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
fault paths the page is locked (PG_locked), mprotect() does not take the
page lock. The result is that one process may see the PG_dcache_clean
flag set but the I/D cache maintenance not yet performed.

Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
and set_bit(). In the rare event of a race, the cache maintenance is
done twice.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
---

Found while debating with Steven a similar race on PG_mte_tagged. For
the latter we'll have to take a lock but hopefully in practice it will
only happen when restoring from swap. Separate thread anyway.

There's at least arch/arm with a similar race. Powerpc seems to do it
properly with separate test/set. Other architectures have a bigger
problem as they do a similar check in update_mmu_cache(), called after
the pte was already exposed to user.

I looked at fixing this in the mprotect() code but taking the page lock
will slow it down, so not sure how popular this would be for such a rare
race.

 arch/arm64/mm/flush.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Steven Price May 14, 2021, 9:54 a.m. UTC | #1
On 14/05/2021 10:50, Catalin Marinas wrote:
> To ensure that instructions are observable in a new mapping, the arm64

> set_pte_at() implementation cleans the D-cache and invalidates the

> I-cache to the PoU. As an optimisation, this is only done on executable

> mappings and the PG_dcache_clean page flag is set to avoid future cache

> maintenance on the same page.

> 

> When two different processes map the same page (e.g. private executable

> file or shared mapping) there's a potential race on checking and setting

> PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the

> fault paths the page is locked (PG_locked), mprotect() does not take the

> page lock. The result is that one process may see the PG_dcache_clean

> flag set but the I/D cache maintenance not yet performed.

> 

> Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()

> and set_bit(). In the rare event of a race, the cache maintenance is

> done twice.

> 

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> Cc: <stable@vger.kernel.org>

> Cc: Will Deacon <will@kernel.org>

> Cc: Steven Price <steven.price@arm.com>


Thanks for writing up a proper patch.

Reviewed-by: Steven Price <steven.price@arm.com>


Steve

> ---

> 

> Found while debating with Steven a similar race on PG_mte_tagged. For

> the latter we'll have to take a lock but hopefully in practice it will

> only happen when restoring from swap. Separate thread anyway.

> 

> There's at least arch/arm with a similar race. Powerpc seems to do it

> properly with separate test/set. Other architectures have a bigger

> problem as they do a similar check in update_mmu_cache(), called after

> the pte was already exposed to user.

> 

> I looked at fixing this in the mprotect() code but taking the page lock

> will slow it down, so not sure how popular this would be for such a rare

> race.

> 

>  arch/arm64/mm/flush.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> index ac485163a4a7..6d44c028d1c9 100644

> --- a/arch/arm64/mm/flush.c

> +++ b/arch/arm64/mm/flush.c

> @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte)

>  {

>  	struct page *page = pte_page(pte);

>  

> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> +	if (!test_bit(PG_dcache_clean, &page->flags)) {

>  		sync_icache_aliases(page_address(page), page_size(page));

> +		set_bit(PG_dcache_clean, &page->flags);

> +	}

>  }

>  EXPORT_SYMBOL_GPL(__sync_icache_dcache);

>  

>
Will Deacon May 14, 2021, 10:38 a.m. UTC | #2
On Fri, May 14, 2021 at 10:50:01AM +0100, Catalin Marinas wrote:
> To ensure that instructions are observable in a new mapping, the arm64

> set_pte_at() implementation cleans the D-cache and invalidates the

> I-cache to the PoU. As an optimisation, this is only done on executable

> mappings and the PG_dcache_clean page flag is set to avoid future cache

> maintenance on the same page.

> 

> When two different processes map the same page (e.g. private executable

> file or shared mapping) there's a potential race on checking and setting

> PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the

> fault paths the page is locked (PG_locked), mprotect() does not take the

> page lock. The result is that one process may see the PG_dcache_clean

> flag set but the I/D cache maintenance not yet performed.

> 

> Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()

> and set_bit(). In the rare event of a race, the cache maintenance is

> done twice.

> 

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> Cc: <stable@vger.kernel.org>

> Cc: Will Deacon <will@kernel.org>

> Cc: Steven Price <steven.price@arm.com>

> ---

> 

> Found while debating with Steven a similar race on PG_mte_tagged. For

> the latter we'll have to take a lock but hopefully in practice it will

> only happen when restoring from swap. Separate thread anyway.

> 

> There's at least arch/arm with a similar race. Powerpc seems to do it

> properly with separate test/set. Other architectures have a bigger

> problem as they do a similar check in update_mmu_cache(), called after

> the pte was already exposed to user.

> 

> I looked at fixing this in the mprotect() code but taking the page lock

> will slow it down, so not sure how popular this would be for such a rare

> race.

> 

>  arch/arm64/mm/flush.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> index ac485163a4a7..6d44c028d1c9 100644

> --- a/arch/arm64/mm/flush.c

> +++ b/arch/arm64/mm/flush.c

> @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte)

>  {

>  	struct page *page = pte_page(pte);

>  

> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> +	if (!test_bit(PG_dcache_clean, &page->flags)) {

>  		sync_icache_aliases(page_address(page), page_size(page));

> +		set_bit(PG_dcache_clean, &page->flags);

> +	}


Acked-by: Will Deacon <will@kernel.org>


I wondered about the ISB for a bit (we don't broadcast it), but should
be fine as the racing CPU needs to return to userspace.

Will
Catalin Marinas May 14, 2021, 4:21 p.m. UTC | #3
On Fri, 14 May 2021 10:50:01 +0100, Catalin Marinas wrote:
> To ensure that instructions are observable in a new mapping, the arm64

> set_pte_at() implementation cleans the D-cache and invalidates the

> I-cache to the PoU. As an optimisation, this is only done on executable

> mappings and the PG_dcache_clean page flag is set to avoid future cache

> maintenance on the same page.

> 

> When two different processes map the same page (e.g. private executable

> file or shared mapping) there's a potential race on checking and setting

> PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the

> fault paths the page is locked (PG_locked), mprotect() does not take the

> page lock. The result is that one process may see the PG_dcache_clean

> flag set but the I/D cache maintenance not yet performed.

> 

> [...]


Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
      https://git.kernel.org/arm64/c/588a513d3425

-- 
Catalin
diff mbox series

Patch

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index ac485163a4a7..6d44c028d1c9 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -55,8 +55,10 @@  void __sync_icache_dcache(pte_t pte)
 {
 	struct page *page = pte_page(pte);
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+	if (!test_bit(PG_dcache_clean, &page->flags)) {
 		sync_icache_aliases(page_address(page), page_size(page));
+		set_bit(PG_dcache_clean, &page->flags);
+	}
 }
 EXPORT_SYMBOL_GPL(__sync_icache_dcache);