Message ID | 20160607163024.GB20477@arm.com |
---|---|
State | New |
Headers | show |
On Tue, Jun 07, 2016 at 05:30:25PM +0100, Will Deacon wrote: > On Tue, Jun 07, 2016 at 04:13:03PM +0200, Alexander Graf wrote: > > On 13.04.16 17:01, Catalin Marinas wrote: > > > When hardware updates of the access and dirty states are enabled, the > > > default ptep_set_access_flags() implementation based on calling > > > set_pte_at() directly is potentially racy. This triggers the "racy dirty > > > state clearing" warning in set_pte_at() because an existing writable PTE > > > is overridden with a clean entry. > > [...] > > > This patch breaks swapping for me. > > > > I've hit weird issues where systems stopped working half-way, with the > > kernel still being fine and user space applications just stopping to > > respond. > > > > After some debugging we found out that it always happens when swapping > > (to anything, backing storage doesn't matter). A quick bisect points to > > this commit as culprit and indeed, if I disable CONFIG_ARM64_HW_AFDBM > > the system works as expected. > > [...] > > > The back traces indicate that the page fault handler goes through and > > the process just keeps banging on the same page fault over and over > > again. I have not yet figured out what *exactly* is going wrong or why > > this patch would actually give us that effect. > > > > I was able to fully reproduce the issue with current Linus tree (4.7-rc2+). > > It looks like the following happens: > > 1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte > 2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF > 3. A read faults due to the missing access flag > 4. ptep_set_access_flags is called with dirty = 0, due to the read fault > 5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!) > 6. A write faults, but pte_write is true so we get stuck > > Does the diff below fix the issue for you? If so, I'll write a proper > patch. > > Cheers, > > Will > > --->8 > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 5954881a35ac..ba3fc12bd272 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -109,7 +109,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, > * PTE_RDONLY is cleared by default in the asm below, so set it in > * back if necessary (read-only or clean PTE). > */ > - if (!pte_write(entry) || !dirty) > + if (!pte_write(entry) || !pte_sw_dirty(entry)) > pte_val(entry) |= PTE_RDONLY; > > /* Whether it fixes this particular case or not, I think this patch is still valid. So feel free to add: Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 5954881a35ac..ba3fc12bd272 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -109,7 +109,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, * PTE_RDONLY is cleared by default in the asm below, so set it in * back if necessary (read-only or clean PTE). */ - if (!pte_write(entry) || !dirty) + if (!pte_write(entry) || !pte_sw_dirty(entry)) pte_val(entry) |= PTE_RDONLY; /*