diff mbox series

userfaultfd: hugetlbfs: fix new flag usage in error path

Message ID 20210521233952.236434-1-mike.kravetz@oracle.com
State Accepted
Commit e32905e57358fdfb82f9de024534f205b3af7dac
Headers show
Series userfaultfd: hugetlbfs: fix new flag usage in error path | expand

Commit Message

Mike Kravetz May 21, 2021, 11:39 p.m. UTC
In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific
page flags") the use of PagePrivate to indicate a reservation count
should be restored at free time was changed to the hugetlb specific flag
HPageRestoreReserve.  Changes to a userfaultfd error path as well as a
VM_BUG_ON() in remove_inode_hugepages() were overlooked.

Users could see incorrect hugetlb reserve counts if they experience an
error with a UFFDIO_COPY operation.  Specifically, this would be the
result of an unlikely copy_huge_page_from_user error.  There is not an
increased chance of hitting the VM_BUG_ON.

Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c |  2 +-
 mm/userfaultfd.c     | 28 ++++++++++++++--------------
 2 files changed, 15 insertions(+), 15 deletions(-)

Comments

Mina Almasry May 22, 2021, 7:41 p.m. UTC | #1
On Fri, May 21, 2021 at 4:40 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>

> In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific

> page flags") the use of PagePrivate to indicate a reservation count

> should be restored at free time was changed to the hugetlb specific flag

> HPageRestoreReserve.  Changes to a userfaultfd error path as well as a

> VM_BUG_ON() in remove_inode_hugepages() were overlooked.

>

> Users could see incorrect hugetlb reserve counts if they experience an

> error with a UFFDIO_COPY operation.  Specifically, this would be the

> result of an unlikely copy_huge_page_from_user error.  There is not an

> increased chance of hitting the VM_BUG_ON.

>

> Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")

> Cc: <stable@vger.kernel.org>

> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

> ---

>  fs/hugetlbfs/inode.c |  2 +-

>  mm/userfaultfd.c     | 28 ++++++++++++++--------------

>  2 files changed, 15 insertions(+), 15 deletions(-)

>


Reviewed-by: Mina Almasry <almasry.mina@google.com>


I'm guessing this may interact with my patch in review "[PATCH v3] mm,
hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY". I'll rebase my
patch and re-test.

> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c

> index 9d9e0097c1d3..55efd3dd04f6 100644

> --- a/fs/hugetlbfs/inode.c

> +++ b/fs/hugetlbfs/inode.c

> @@ -529,7 +529,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,

>                          * the subpool and global reserve usage count can need

>                          * to be adjusted.

>                          */

> -                       VM_BUG_ON(PagePrivate(page));

> +                       VM_BUG_ON(HPageRestoreReserve(page));

>                         remove_huge_page(page);

>                         freed++;

>                         if (!truncate_op) {

> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c

> index 5508f7d9e2dc..9ce5a3793ad4 100644

> --- a/mm/userfaultfd.c

> +++ b/mm/userfaultfd.c

> @@ -432,38 +432,38 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,

>                  * If a reservation for the page existed in the reservation

>                  * map of a private mapping, the map was modified to indicate

>                  * the reservation was consumed when the page was allocated.

> -                * We clear the PagePrivate flag now so that the global

> +                * We clear the HPageRestoreReserve flag now so that the global

>                  * reserve count will not be incremented in free_huge_page.

>                  * The reservation map will still indicate the reservation

>                  * was consumed and possibly prevent later page allocation.

>                  * This is better than leaking a global reservation.  If no

> -                * reservation existed, it is still safe to clear PagePrivate

> -                * as no adjustments to reservation counts were made during

> -                * allocation.

> +                * reservation existed, it is still safe to clear

> +                * HPageRestoreReserve as no adjustments to reservation counts

> +                * were made during allocation.

>                  *

>                  * The reservation map for shared mappings indicates which

>                  * pages have reservations.  When a huge page is allocated

>                  * for an address with a reservation, no change is made to

> -                * the reserve map.  In this case PagePrivate will be set

> -                * to indicate that the global reservation count should be

> +                * the reserve map.  In this case HPageRestoreReserve will be

> +                * set to indicate that the global reservation count should be

>                  * incremented when the page is freed.  This is the desired

>                  * behavior.  However, when a huge page is allocated for an

>                  * address without a reservation a reservation entry is added

> -                * to the reservation map, and PagePrivate will not be set.

> -                * When the page is freed, the global reserve count will NOT

> -                * be incremented and it will appear as though we have leaked

> -                * reserved page.  In this case, set PagePrivate so that the

> -                * global reserve count will be incremented to match the

> -                * reservation map entry which was created.

> +                * to the reservation map, and HPageRestoreReserve will not be

> +                * set. When the page is freed, the global reserve count will

> +                * NOT be incremented and it will appear as though we have

> +                * leaked reserved page.  In this case, set HPageRestoreReserve

> +                * so that the global reserve count will be incremented to

> +                * match the reservation map entry which was created.

>                  *

>                  * Note that vm_alloc_shared is based on the flags of the vma

>                  * for which the page was originally allocated.  dst_vma could

>                  * be different or NULL on error.

>                  */

>                 if (vm_alloc_shared)

> -                       SetPagePrivate(page);

> +                       SetHPageRestoreReserve(page);

>                 else

> -                       ClearPagePrivate(page);

> +                       ClearHPageRestoreReserve(page);

>                 put_page(page);

>         }

>         BUG_ON(copied < 0);

> --

> 2.31.1

>
Muchun Song May 23, 2021, 2:45 a.m. UTC | #2
On Sat, May 22, 2021 at 7:40 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>

> In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific

> page flags") the use of PagePrivate to indicate a reservation count

> should be restored at free time was changed to the hugetlb specific flag

> HPageRestoreReserve.  Changes to a userfaultfd error path as well as a

> VM_BUG_ON() in remove_inode_hugepages() were overlooked.

>

> Users could see incorrect hugetlb reserve counts if they experience an

> error with a UFFDIO_COPY operation.  Specifically, this would be the

> result of an unlikely copy_huge_page_from_user error.  There is not an

> increased chance of hitting the VM_BUG_ON.

>

> Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")

> Cc: <stable@vger.kernel.org>

> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>


Reviewed-by: Muchun Song <songmuchun@bytedance.com>


Thanks Mike.
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9d9e0097c1d3..55efd3dd04f6 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -529,7 +529,7 @@  static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 			 * the subpool and global reserve usage count can need
 			 * to be adjusted.
 			 */
-			VM_BUG_ON(PagePrivate(page));
+			VM_BUG_ON(HPageRestoreReserve(page));
 			remove_huge_page(page);
 			freed++;
 			if (!truncate_op) {
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 5508f7d9e2dc..9ce5a3793ad4 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -432,38 +432,38 @@  static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		 * If a reservation for the page existed in the reservation
 		 * map of a private mapping, the map was modified to indicate
 		 * the reservation was consumed when the page was allocated.
-		 * We clear the PagePrivate flag now so that the global
+		 * We clear the HPageRestoreReserve flag now so that the global
 		 * reserve count will not be incremented in free_huge_page.
 		 * The reservation map will still indicate the reservation
 		 * was consumed and possibly prevent later page allocation.
 		 * This is better than leaking a global reservation.  If no
-		 * reservation existed, it is still safe to clear PagePrivate
-		 * as no adjustments to reservation counts were made during
-		 * allocation.
+		 * reservation existed, it is still safe to clear
+		 * HPageRestoreReserve as no adjustments to reservation counts
+		 * were made during allocation.
 		 *
 		 * The reservation map for shared mappings indicates which
 		 * pages have reservations.  When a huge page is allocated
 		 * for an address with a reservation, no change is made to
-		 * the reserve map.  In this case PagePrivate will be set
-		 * to indicate that the global reservation count should be
+		 * the reserve map.  In this case HPageRestoreReserve will be
+		 * set to indicate that the global reservation count should be
 		 * incremented when the page is freed.  This is the desired
 		 * behavior.  However, when a huge page is allocated for an
 		 * address without a reservation a reservation entry is added
-		 * to the reservation map, and PagePrivate will not be set.
-		 * When the page is freed, the global reserve count will NOT
-		 * be incremented and it will appear as though we have leaked
-		 * reserved page.  In this case, set PagePrivate so that the
-		 * global reserve count will be incremented to match the
-		 * reservation map entry which was created.
+		 * to the reservation map, and HPageRestoreReserve will not be
+		 * set. When the page is freed, the global reserve count will
+		 * NOT be incremented and it will appear as though we have
+		 * leaked reserved page.  In this case, set HPageRestoreReserve
+		 * so that the global reserve count will be incremented to
+		 * match the reservation map entry which was created.
 		 *
 		 * Note that vm_alloc_shared is based on the flags of the vma
 		 * for which the page was originally allocated.  dst_vma could
 		 * be different or NULL on error.
 		 */
 		if (vm_alloc_shared)
-			SetPagePrivate(page);
+			SetHPageRestoreReserve(page);
 		else
-			ClearPagePrivate(page);
+			ClearHPageRestoreReserve(page);
 		put_page(page);
 	}
 	BUG_ON(copied < 0);