Message ID | 20210514095001.13236-1-catalin.marinas@arm.com |
---|---|
State | Accepted |
Commit | 588a513d34257fdde95a9f0df0202e31998e85c6 |
Headers | show |
Series | arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() | expand |
On 14/05/2021 10:50, Catalin Marinas wrote: > To ensure that instructions are observable in a new mapping, the arm64 > set_pte_at() implementation cleans the D-cache and invalidates the > I-cache to the PoU. As an optimisation, this is only done on executable > mappings and the PG_dcache_clean page flag is set to avoid future cache > maintenance on the same page. > > When two different processes map the same page (e.g. private executable > file or shared mapping) there's a potential race on checking and setting > PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the > fault paths the page is locked (PG_locked), mprotect() does not take the > page lock. The result is that one process may see the PG_dcache_clean > flag set but the I/D cache maintenance not yet performed. > > Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit() > and set_bit(). In the rare event of a race, the cache maintenance is > done twice. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Cc: <stable@vger.kernel.org> > Cc: Will Deacon <will@kernel.org> > Cc: Steven Price <steven.price@arm.com> Thanks for writing up a proper patch. Reviewed-by: Steven Price <steven.price@arm.com> Steve > --- > > Found while debating with Steven a similar race on PG_mte_tagged. For > the latter we'll have to take a lock but hopefully in practice it will > only happen when restoring from swap. Separate thread anyway. > > There's at least arch/arm with a similar race. Powerpc seems to do it > properly with separate test/set. Other architectures have a bigger > problem as they do a similar check in update_mmu_cache(), called after > the pte was already exposed to user. > > I looked at fixing this in the mprotect() code but taking the page lock > will slow it down, so not sure how popular this would be for such a rare > race. > > arch/arm64/mm/flush.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c > index ac485163a4a7..6d44c028d1c9 100644 > --- a/arch/arm64/mm/flush.c > +++ b/arch/arm64/mm/flush.c > @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte) > { > struct page *page = pte_page(pte); > > - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > + if (!test_bit(PG_dcache_clean, &page->flags)) { > sync_icache_aliases(page_address(page), page_size(page)); > + set_bit(PG_dcache_clean, &page->flags); > + } > } > EXPORT_SYMBOL_GPL(__sync_icache_dcache); > >
On Fri, May 14, 2021 at 10:50:01AM +0100, Catalin Marinas wrote: > To ensure that instructions are observable in a new mapping, the arm64 > set_pte_at() implementation cleans the D-cache and invalidates the > I-cache to the PoU. As an optimisation, this is only done on executable > mappings and the PG_dcache_clean page flag is set to avoid future cache > maintenance on the same page. > > When two different processes map the same page (e.g. private executable > file or shared mapping) there's a potential race on checking and setting > PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the > fault paths the page is locked (PG_locked), mprotect() does not take the > page lock. The result is that one process may see the PG_dcache_clean > flag set but the I/D cache maintenance not yet performed. > > Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit() > and set_bit(). In the rare event of a race, the cache maintenance is > done twice. > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Cc: <stable@vger.kernel.org> > Cc: Will Deacon <will@kernel.org> > Cc: Steven Price <steven.price@arm.com> > --- > > Found while debating with Steven a similar race on PG_mte_tagged. For > the latter we'll have to take a lock but hopefully in practice it will > only happen when restoring from swap. Separate thread anyway. > > There's at least arch/arm with a similar race. Powerpc seems to do it > properly with separate test/set. Other architectures have a bigger > problem as they do a similar check in update_mmu_cache(), called after > the pte was already exposed to user. > > I looked at fixing this in the mprotect() code but taking the page lock > will slow it down, so not sure how popular this would be for such a rare > race. > > arch/arm64/mm/flush.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c > index ac485163a4a7..6d44c028d1c9 100644 > --- a/arch/arm64/mm/flush.c > +++ b/arch/arm64/mm/flush.c > @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte) > { > struct page *page = pte_page(pte); > > - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > + if (!test_bit(PG_dcache_clean, &page->flags)) { > sync_icache_aliases(page_address(page), page_size(page)); > + set_bit(PG_dcache_clean, &page->flags); > + } Acked-by: Will Deacon <will@kernel.org> I wondered about the ISB for a bit (we don't broadcast it), but should be fine as the racing CPU needs to return to userspace. Will
On Fri, 14 May 2021 10:50:01 +0100, Catalin Marinas wrote: > To ensure that instructions are observable in a new mapping, the arm64 > set_pte_at() implementation cleans the D-cache and invalidates the > I-cache to the PoU. As an optimisation, this is only done on executable > mappings and the PG_dcache_clean page flag is set to avoid future cache > maintenance on the same page. > > When two different processes map the same page (e.g. private executable > file or shared mapping) there's a potential race on checking and setting > PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the > fault paths the page is locked (PG_locked), mprotect() does not take the > page lock. The result is that one process may see the PG_dcache_clean > flag set but the I/D cache maintenance not yet performed. > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() https://git.kernel.org/arm64/c/588a513d3425 -- Catalin
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index ac485163a4a7..6d44c028d1c9 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte) { struct page *page = pte_page(pte); - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) + if (!test_bit(PG_dcache_clean, &page->flags)) { sync_icache_aliases(page_address(page), page_size(page)); + set_bit(PG_dcache_clean, &page->flags); + } } EXPORT_SYMBOL_GPL(__sync_icache_dcache);
To ensure that instructions are observable in a new mapping, the arm64 set_pte_at() implementation cleans the D-cache and invalidates the I-cache to the PoU. As an optimisation, this is only done on executable mappings and the PG_dcache_clean page flag is set to avoid future cache maintenance on the same page. When two different processes map the same page (e.g. private executable file or shared mapping) there's a potential race on checking and setting PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the fault paths the page is locked (PG_locked), mprotect() does not take the page lock. The result is that one process may see the PG_dcache_clean flag set but the I/D cache maintenance not yet performed. Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit() and set_bit(). In the rare event of a race, the cache maintenance is done twice. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Steven Price <steven.price@arm.com> --- Found while debating with Steven a similar race on PG_mte_tagged. For the latter we'll have to take a lock but hopefully in practice it will only happen when restoring from swap. Separate thread anyway. There's at least arch/arm with a similar race. Powerpc seems to do it properly with separate test/set. Other architectures have a bigger problem as they do a similar check in update_mmu_cache(), called after the pte was already exposed to user. I looked at fixing this in the mprotect() code but taking the page lock will slow it down, so not sure how popular this would be for such a rare race. arch/arm64/mm/flush.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)