Message ID | 1496771916-28203-3-git-send-email-will.deacon@arm.com |
---|---|
State | New |
Headers | show |
Series | mm: huge pages: Misc fixes for issues found during fuzzing | expand |
On 06/06/2017 07:58 PM, Will Deacon wrote: > page_ref_freeze and page_ref_unfreeze are designed to be used as a pair, > wrapping a critical section where struct pages can be modified without > having to worry about consistency for a concurrent fast-GUP. > > Whilst page_ref_freeze has full barrier semantics due to its use of > atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which > doesn't provide any barrier semantics and allows the operation to be > reordered with respect to page modifications in the critical section. > > This patch ensures that page_ref_unfreeze is ordered after any critical > section updates, by invoking smp_mb__before_atomic() prior to the > atomic_set. > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Acked-by: Steve Capper <steve.capper@arm.com> > Signed-off-by: Will Deacon <will.deacon@arm.com> Undecided if it's really needed. This is IMHO not the classical case from Documentation/core-api/atomic_ops.rst where we have to make modifications visible before we let others see them? Here the one who is freezing is doing it so others can't get their page pin and interfere with the freezer's work. But maybe there are some (documented or not) consistency guarantees to expect once you obtain the pin, that can be violated, or they might be added later, so it would be safer to add the barrier? > --- > include/linux/page_ref.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > index 610e13271918..74d32d7905cb 100644 > --- a/include/linux/page_ref.h > +++ b/include/linux/page_ref.h > @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) > VM_BUG_ON_PAGE(page_count(page) != 0, page); > VM_BUG_ON(count == 0); > > + smp_mb__before_atomic(); > atomic_set(&page->_refcount, count); > if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze)) > __page_ref_unfreeze(page, count); >
On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: > On 06/06/2017 07:58 PM, Will Deacon wrote: > > page_ref_freeze and page_ref_unfreeze are designed to be used as a pair, > > wrapping a critical section where struct pages can be modified without > > having to worry about consistency for a concurrent fast-GUP. > > > > Whilst page_ref_freeze has full barrier semantics due to its use of > > atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which > > doesn't provide any barrier semantics and allows the operation to be > > reordered with respect to page modifications in the critical section. > > > > This patch ensures that page_ref_unfreeze is ordered after any critical > > section updates, by invoking smp_mb__before_atomic() prior to the > > atomic_set. > > > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Acked-by: Steve Capper <steve.capper@arm.com> > > Signed-off-by: Will Deacon <will.deacon@arm.com> > > Undecided if it's really needed. This is IMHO not the classical case > from Documentation/core-api/atomic_ops.rst where we have to make > modifications visible before we let others see them? Here the one who is > freezing is doing it so others can't get their page pin and interfere > with the freezer's work. But maybe there are some (documented or not) > consistency guarantees to expect once you obtain the pin, that can be > violated, or they might be added later, so it would be safer to add the > barrier? The problem comes if the unfreeze is reordered so that it happens before the freezer has performed its work. For example, in migrate_huge_page_move_mapping: if (!page_ref_freeze(page, expected_count)) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } newpage->index = page->index; newpage->mapping = page->mapping; get_page(newpage); radix_tree_replace_slot(&mapping->page_tree, pslot, newpage); page_ref_unfreeze(page, expected_count - 1); then there's nothing stopping the CPU (and potentially the compiler) from reordering the unfreeze call so that it effectively becomes: if (!page_ref_freeze(page, expected_count)) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } page_ref_unfreeze(page, expected_count - 1); newpage->index = page->index; newpage->mapping = page->mapping; get_page(newpage); radix_tree_replace_slot(&mapping->page_tree, pslot, newpage); which then means that the freezer's work is carried out without the page being frozen. Will
On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: > On 06/06/2017 07:58 PM, Will Deacon wrote: > > page_ref_freeze and page_ref_unfreeze are designed to be used as a pair, > > wrapping a critical section where struct pages can be modified without > > having to worry about consistency for a concurrent fast-GUP. > > > > Whilst page_ref_freeze has full barrier semantics due to its use of > > atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which > > doesn't provide any barrier semantics and allows the operation to be > > reordered with respect to page modifications in the critical section. > > > > This patch ensures that page_ref_unfreeze is ordered after any critical > > section updates, by invoking smp_mb__before_atomic() prior to the > > atomic_set. > > > > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Acked-by: Steve Capper <steve.capper@arm.com> > > Signed-off-by: Will Deacon <will.deacon@arm.com> > > Undecided if it's really needed. This is IMHO not the classical case > from Documentation/core-api/atomic_ops.rst where we have to make > modifications visible before we let others see them? Here the one who is > freezing is doing it so others can't get their page pin and interfere > with the freezer's work. Hm.. I'm not sure I'm getting what you are talking about. What would guarantee others to see changes to page before seeing page unfreezed? > > include/linux/page_ref.h | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > > index 610e13271918..74d32d7905cb 100644 > > --- a/include/linux/page_ref.h > > +++ b/include/linux/page_ref.h > > @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) > > VM_BUG_ON_PAGE(page_count(page) != 0, page); > > VM_BUG_ON(count == 0); > > > > + smp_mb__before_atomic(); > > atomic_set(&page->_refcount, count); I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU would happily reorder. > > if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze)) > > __page_ref_unfreeze(page, count); > > > -- Kirill A. Shutemov
On 06/08/2017 12:34 PM, Will Deacon wrote: > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: >> >> Undecided if it's really needed. This is IMHO not the classical case >> from Documentation/core-api/atomic_ops.rst where we have to make >> modifications visible before we let others see them? Here the one who is >> freezing is doing it so others can't get their page pin and interfere >> with the freezer's work. But maybe there are some (documented or not) >> consistency guarantees to expect once you obtain the pin, that can be >> violated, or they might be added later, so it would be safer to add the >> barrier? > > The problem comes if the unfreeze is reordered so that it happens before the > freezer has performed its work. For example, in > migrate_huge_page_move_mapping: > > > if (!page_ref_freeze(page, expected_count)) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > > newpage->index = page->index; > newpage->mapping = page->mapping; > > get_page(newpage); > > radix_tree_replace_slot(&mapping->page_tree, pslot, newpage); > > page_ref_unfreeze(page, expected_count - 1); > > > then there's nothing stopping the CPU (and potentially the compiler) from > reordering the unfreeze call so that it effectively becomes: > > > if (!page_ref_freeze(page, expected_count)) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > > page_ref_unfreeze(page, expected_count - 1); > > newpage->index = page->index; > newpage->mapping = page->mapping; > > get_page(newpage); > > radix_tree_replace_slot(&mapping->page_tree, pslot, newpage); > > > which then means that the freezer's work is carried out without the page > being frozen. But in this example the modifications are for newpage and freezing is for page, so I think it doesn't apply. But I get the point. > Will > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >
On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote: > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: >> On 06/06/2017 07:58 PM, Will Deacon wrote: >>> page_ref_freeze and page_ref_unfreeze are designed to be used as a pair, >>> wrapping a critical section where struct pages can be modified without >>> having to worry about consistency for a concurrent fast-GUP. >>> >>> Whilst page_ref_freeze has full barrier semantics due to its use of >>> atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which >>> doesn't provide any barrier semantics and allows the operation to be >>> reordered with respect to page modifications in the critical section. >>> >>> This patch ensures that page_ref_unfreeze is ordered after any critical >>> section updates, by invoking smp_mb__before_atomic() prior to the >>> atomic_set. >>> >>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> >>> Acked-by: Steve Capper <steve.capper@arm.com> >>> Signed-off-by: Will Deacon <will.deacon@arm.com> >> >> Undecided if it's really needed. This is IMHO not the classical case >> from Documentation/core-api/atomic_ops.rst where we have to make >> modifications visible before we let others see them? Here the one who is >> freezing is doing it so others can't get their page pin and interfere >> with the freezer's work. > > Hm.. I'm not sure I'm getting what you are talking about. > > What would guarantee others to see changes to page before seeing page > unfreezed? My point was that we do the freezing for other reasons than to guarantee this, but it can be needed too. >>> include/linux/page_ref.h | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h >>> index 610e13271918..74d32d7905cb 100644 >>> --- a/include/linux/page_ref.h >>> +++ b/include/linux/page_ref.h >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) >>> VM_BUG_ON_PAGE(page_count(page) != 0, page); >>> VM_BUG_ON(count == 0); >>> >>> + smp_mb__before_atomic(); >>> atomic_set(&page->_refcount, count); > > I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is > not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU > would happily reorder. Yeah but there are compile barriers, and x86 is TSO, so that's enough? Also I found other instances by git grep (not a proof, though :) >>> if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze)) >>> __page_ref_unfreeze(page, count); >>> >> >
[+ PeterZ] On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote: > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote: > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: > >> On 06/06/2017 07:58 PM, Will Deacon wrote: > >>> include/linux/page_ref.h | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > >>> index 610e13271918..74d32d7905cb 100644 > >>> --- a/include/linux/page_ref.h > >>> +++ b/include/linux/page_ref.h > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) > >>> VM_BUG_ON_PAGE(page_count(page) != 0, page); > >>> VM_BUG_ON(count == 0); > >>> > >>> + smp_mb__before_atomic(); > >>> atomic_set(&page->_refcount, count); > > > > I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is > > not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU > > would happily reorder. > > Yeah but there are compile barriers, and x86 is TSO, so that's enough? > Also I found other instances by git grep (not a proof, though :) I think it boils down to whether: smp_mb__before_atomic(); atomic_set(); should have the same memory ordering semantics as: smp_mb(); atomic_set(); which it doesn't with the x86 implementation AFAICT. The horribly out-of-date atomic_ops.txt isn't so useful: | If a caller requires memory barrier semantics around an atomic_t | operation which does not return a value, a set of interfaces are | defined which accomplish this:: | | void smp_mb__before_atomic(void); | void smp_mb__after_atomic(void); | | For example, smp_mb__before_atomic() can be used like so:: | | obj->dead = 1; | smp_mb__before_atomic(); | atomic_dec(&obj->ref_count); | | It makes sure that all memory operations preceding the atomic_dec() | call are strongly ordered with respect to the atomic counter | operation. In the above example, it guarantees that the assignment of | "1" to obj->dead will be globally visible to other cpus before the | atomic counter decrement. | | Without the explicit smp_mb__before_atomic() call, the | implementation could legally allow the atomic counter update visible | to other cpus before the "obj->dead = 1;" assignment. which makes it sound more like the barrier is ordering all prior accesses against the atomic operation itself (without going near cumulativity...), and not with respect to anything later in program order. Anyway, I think that's sufficient for what we want here, but we should probably iron out the semantics of this thing. Will
On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote: > [+ PeterZ] > > On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote: > > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote: > > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: > > >> On 06/06/2017 07:58 PM, Will Deacon wrote: > > >>> include/linux/page_ref.h | 1 + > > >>> 1 file changed, 1 insertion(+) > > >>> > > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > > >>> index 610e13271918..74d32d7905cb 100644 > > >>> --- a/include/linux/page_ref.h > > >>> +++ b/include/linux/page_ref.h > > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) > > >>> VM_BUG_ON_PAGE(page_count(page) != 0, page); > > >>> VM_BUG_ON(count == 0); > > >>> > > >>> + smp_mb__before_atomic(); > > >>> atomic_set(&page->_refcount, count); Yeah, that's broken. smp_mb__{before,after}_atomic() goes with the atomic RmW ops that do not already imply barriers, such like atomic_add() and atomic_fetch_add_relaxed(). atomic_set() and atomic_read() are not RmW ops, they are plain write/read ops respectively. > > > > > > I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is > > > not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU > > > would happily reorder. > > > > Yeah but there are compile barriers, and x86 is TSO, so that's enough? > > Also I found other instances by git grep (not a proof, though :) > > I think it boils down to whether: > > smp_mb__before_atomic(); > atomic_set(); > > should have the same memory ordering semantics as: > > smp_mb(); > atomic_set(); > > which it doesn't with the x86 implementation AFAICT. Correct, it doesn't. The smp_mb__{before,after}_atomic() are to provide those barriers that are required to upgrade the atomic RmW primitive of the architecture. x86 has LOCK prefix instructions that are SC and therefore don't need any upgrading. ARM OTOH has unordered LL/SC and will need full DMB(ISH) for both. MIPS has pretty much all variants under the sun, strongly ordered LL/SC, half ordered LL/SC and weakly ordered LL/SC.. > The horribly out-of-date atomic_ops.txt isn't so useful: > > | If a caller requires memory barrier semantics around an atomic_t > | operation which does not return a value, a set of interfaces are > | defined which accomplish this:: > | > | void smp_mb__before_atomic(void); > | void smp_mb__after_atomic(void); > | > | For example, smp_mb__before_atomic() can be used like so:: > | > | obj->dead = 1; > | smp_mb__before_atomic(); > | atomic_dec(&obj->ref_count); > | > | It makes sure that all memory operations preceding the atomic_dec() > | call are strongly ordered with respect to the atomic counter > | operation. In the above example, it guarantees that the assignment of > | "1" to obj->dead will be globally visible to other cpus before the > | atomic counter decrement. > | > | Without the explicit smp_mb__before_atomic() call, the > | implementation could legally allow the atomic counter update visible > | to other cpus before the "obj->dead = 1;" assignment. > > which makes it sound more like the barrier is ordering all prior accesses > against the atomic operation itself (without going near cumulativity...), > and not with respect to anything later in program order. This is correct. > Anyway, I think that's sufficient for what we want here, but we should > probably iron out the semantics of this thing. s/smp_mb__\(before\|after\)_atomic/smp_mb/g should not change the semantics of the code in _any_ way, just make it slower on architectures that already have SC atomic primitives (like x86).
On Thu, Jun 08, 2017 at 02:16:41PM +0200, Peter Zijlstra wrote: > On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote: > > The horribly out-of-date atomic_ops.txt isn't so useful: > > > > | If a caller requires memory barrier semantics around an atomic_t > > | operation which does not return a value, a set of interfaces are > > | defined which accomplish this:: > > | > > | void smp_mb__before_atomic(void); > > | void smp_mb__after_atomic(void); > > | > > | For example, smp_mb__before_atomic() can be used like so:: > > | > > | obj->dead = 1; > > | smp_mb__before_atomic(); > > | atomic_dec(&obj->ref_count); > > | > > | It makes sure that all memory operations preceding the atomic_dec() > > | call are strongly ordered with respect to the atomic counter > > | operation. In the above example, it guarantees that the assignment of > > | "1" to obj->dead will be globally visible to other cpus before the > > | atomic counter decrement. > > | > > | Without the explicit smp_mb__before_atomic() call, the > > | implementation could legally allow the atomic counter update visible > > | to other cpus before the "obj->dead = 1;" assignment. > > > > which makes it sound more like the barrier is ordering all prior accesses > > against the atomic operation itself (without going near cumulativity...), > > and not with respect to anything later in program order. > > This is correct. Ah, my bad, It orders against everything later, the first of which is (obviously) the atomic op itself. It being a full barrier means both the Read and the Write of the RmW must happen _after_ everything preceding. > > Anyway, I think that's sufficient for what we want here, but we should > > probably iron out the semantics of this thing. > > s/smp_mb__\(before\|after\)_atomic/smp_mb/g > > should not change the semantics of the code in _any_ way, just make it > slower on architectures that already have SC atomic primitives (like > x86). >
On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote: > [+ PeterZ] > > On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote: > > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote: > > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: > > >> On 06/06/2017 07:58 PM, Will Deacon wrote: > > >>> include/linux/page_ref.h | 1 + > > >>> 1 file changed, 1 insertion(+) > > >>> > > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > > >>> index 610e13271918..74d32d7905cb 100644 > > >>> --- a/include/linux/page_ref.h > > >>> +++ b/include/linux/page_ref.h > > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) > > >>> VM_BUG_ON_PAGE(page_count(page) != 0, page); > > >>> VM_BUG_ON(count == 0); > > >>> > > >>> + smp_mb__before_atomic(); > > >>> atomic_set(&page->_refcount, count); So depending on what it actually required, we do have atomic_set_release() (atomic_t equivalent to smp_store_release()).
On Thu, Jun 08, 2017 at 02:50:59PM +0200, Peter Zijlstra wrote: > On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote: > > [+ PeterZ] > > > > On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote: > > > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote: > > > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote: > > > >> On 06/06/2017 07:58 PM, Will Deacon wrote: > > > >>> include/linux/page_ref.h | 1 + > > > >>> 1 file changed, 1 insertion(+) > > > >>> > > > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > > > >>> index 610e13271918..74d32d7905cb 100644 > > > >>> --- a/include/linux/page_ref.h > > > >>> +++ b/include/linux/page_ref.h > > > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) > > > >>> VM_BUG_ON_PAGE(page_count(page) != 0, page); > > > >>> VM_BUG_ON(count == 0); > > > >>> > > > >>> + smp_mb__before_atomic(); > > > >>> atomic_set(&page->_refcount, count); > > So depending on what it actually required, we do have > atomic_set_release() (atomic_t equivalent to smp_store_release()). Yeah, I was wondering about that this morning. I think it should do the trick here, but smp_mb() would be a better fit for the other parts of this API (page_ref_freeze uses atomic_cmpxchg and page_cache_get_speculative uses atomic_add_unless). I'll send a v2 with the full barrier. Will
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 610e13271918..74d32d7905cb 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count) VM_BUG_ON_PAGE(page_count(page) != 0, page); VM_BUG_ON(count == 0); + smp_mb__before_atomic(); atomic_set(&page->_refcount, count); if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze)) __page_ref_unfreeze(page, count);