diff mbox series

[11/13] mm: add unsafe_follow_pfn

Message ID 20201007164426.1812530-12-daniel.vetter@ffwll.ch
State Superseded
Headers show
Series [01/13] drm/exynos: Stop using frame_vector helpers | expand

Commit Message

Daniel Vetter Oct. 7, 2020, 4:44 p.m. UTC
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Oct. 7, 2020, 5:36 p.m. UTC | #1
On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never

> change the pfn range they point at. But this has changed:

> 

> - gpu drivers dynamically manage their memory nowadays, invalidating

> ptes with unmap_mapping_range when buffers get moved

> 

> - contiguous dma allocations have moved from dedicated carvetouts to

> cma regions. This means if we miss the unmap the pfn might contain

> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

> 

> - even /dev/mem now invalidates mappings when the kernel requests that

> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87

> ("/dev/mem: Revoke mappings when a driver claims the region")

> 

> Accessing pfns obtained from ptes without holding all the locks is

> therefore no longer a good idea.

> 

> Unfortunately there's some users where this is not fixable (like v4l

> userptr of iomem mappings) or involves a pile of work (vfio type1

> iommu). For now annotate these as unsafe and splat appropriately.

> 

> This patch adds an unsafe_follow_pfn, which later patches will then

> roll out to all appropriate places.

> 

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

> Cc: Jason Gunthorpe <jgg@ziepe.ca>

> Cc: Kees Cook <keescook@chromium.org>

> Cc: Dan Williams <dan.j.williams@intel.com>

> Cc: Andrew Morton <akpm@linux-foundation.org>

> Cc: John Hubbard <jhubbard@nvidia.com>

> Cc: Jérôme Glisse <jglisse@redhat.com>

> Cc: Jan Kara <jack@suse.cz>

> Cc: Dan Williams <dan.j.williams@intel.com>

> Cc: linux-mm@kvack.org

> Cc: linux-arm-kernel@lists.infradead.org

> Cc: linux-samsung-soc@vger.kernel.org

> Cc: linux-media@vger.kernel.org

> Cc: kvm@vger.kernel.org

> ---

>  include/linux/mm.h |  2 ++

>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-

>  mm/nommu.c         | 17 +++++++++++++++++

>  security/Kconfig   | 13 +++++++++++++

>  4 files changed, 63 insertions(+), 1 deletion(-)


Makes sense to me.

I wonder if we could change the original follow_pfn to require the
ptep and then lockdep_assert_held() it against the page table lock?

> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,

> +	unsigned long *pfn)

> +{

> +#ifdef CONFIG_STRICT_FOLLOW_PFN

> +	pr_info("unsafe follow_pfn usage rejected, see

> CONFIG_STRICT_FOLLOW_PFN\n");


Wonder if we can print something useful here, like the current
PID/process name?

> diff --git a/security/Kconfig b/security/Kconfig

> index 7561f6f99f1d..48945402e103 100644

> --- a/security/Kconfig

> +++ b/security/Kconfig

> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH

>  	  If you wish for all usermode helper programs to be disabled,

>  	  specify an empty string here (i.e. "").

>  

> +config STRICT_FOLLOW_PFN

> +	bool "Disable unsafe use of follow_pfn"

> +	depends on MMU


I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
default n

Jason
Daniel Vetter Oct. 7, 2020, 6:10 p.m. UTC | #2
On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> >
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> >
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: kvm@vger.kernel.org
> > ---
> >  include/linux/mm.h |  2 ++
> >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> >  mm/nommu.c         | 17 +++++++++++++++++
> >  security/Kconfig   | 13 +++++++++++++
> >  4 files changed, 63 insertions(+), 1 deletion(-)
>
> Makes sense to me.
>
> I wonder if we could change the original follow_pfn to require the
> ptep and then lockdep_assert_held() it against the page table lock?

The safe variant with the pagetable lock is follow_pte_pmd. The only
way to make follow_pfn safe is if you have an mmu notifier and
corresponding retry logic. That is not covered by lockdep (it would
splat if we annotate the retry side), so I'm not sure how you'd check
for that?

Checking for ptep lock doesn't work here, since the one leftover safe
user of this (kvm) doesn't need that at all, because it has the mmu
notifier.

Also follow_pte_pmd will splat with lockdep if you get it wrong, since
the function leaves you with the right ptlock lock when it returns. If
you forget to unlock that, lockdep will complain.

So I think we're as good as it gets, since I really have no idea how
to make sure follow_pfn callers do have an mmu notifier registered.

> > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > +     unsigned long *pfn)
> > +{
> > +#ifdef CONFIG_STRICT_FOLLOW_PFN
> > +     pr_info("unsafe follow_pfn usage rejected, see
> > CONFIG_STRICT_FOLLOW_PFN\n");
>
> Wonder if we can print something useful here, like the current
> PID/process name?

Yeah adding comm/pid here makes sense.

> > diff --git a/security/Kconfig b/security/Kconfig
> > index 7561f6f99f1d..48945402e103 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
> >         If you wish for all usermode helper programs to be disabled,
> >         specify an empty string here (i.e. "").
> >
> > +config STRICT_FOLLOW_PFN
> > +     bool "Disable unsafe use of follow_pfn"
> > +     depends on MMU
>
> I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
> default n

I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
explicit enables and default to "do not break uapi, damn the
(security) bugs". Which is I think how this should be done. It is in
the security section though, so hopefully competent distros will
enable this all.
-Daniel
Jason Gunthorpe Oct. 7, 2020, 7 p.m. UTC | #3
On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:

> >

> > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:

> > > Way back it was a reasonable assumptions that iomem mappings never

> > > change the pfn range they point at. But this has changed:

> > >

> > > - gpu drivers dynamically manage their memory nowadays, invalidating

> > > ptes with unmap_mapping_range when buffers get moved

> > >

> > > - contiguous dma allocations have moved from dedicated carvetouts to

> > > cma regions. This means if we miss the unmap the pfn might contain

> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

> > >

> > > - even /dev/mem now invalidates mappings when the kernel requests that

> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87

> > > ("/dev/mem: Revoke mappings when a driver claims the region")

> > >

> > > Accessing pfns obtained from ptes without holding all the locks is

> > > therefore no longer a good idea.

> > >

> > > Unfortunately there's some users where this is not fixable (like v4l

> > > userptr of iomem mappings) or involves a pile of work (vfio type1

> > > iommu). For now annotate these as unsafe and splat appropriately.

> > >

> > > This patch adds an unsafe_follow_pfn, which later patches will then

> > > roll out to all appropriate places.

> > >

> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>

> > > Cc: Kees Cook <keescook@chromium.org>

> > > Cc: Dan Williams <dan.j.williams@intel.com>

> > > Cc: Andrew Morton <akpm@linux-foundation.org>

> > > Cc: John Hubbard <jhubbard@nvidia.com>

> > > Cc: Jérôme Glisse <jglisse@redhat.com>

> > > Cc: Jan Kara <jack@suse.cz>

> > > Cc: Dan Williams <dan.j.williams@intel.com>

> > > Cc: linux-mm@kvack.org

> > > Cc: linux-arm-kernel@lists.infradead.org

> > > Cc: linux-samsung-soc@vger.kernel.org

> > > Cc: linux-media@vger.kernel.org

> > > Cc: kvm@vger.kernel.org

> > >  include/linux/mm.h |  2 ++

> > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-

> > >  mm/nommu.c         | 17 +++++++++++++++++

> > >  security/Kconfig   | 13 +++++++++++++

> > >  4 files changed, 63 insertions(+), 1 deletion(-)

> >

> > Makes sense to me.

> >

> > I wonder if we could change the original follow_pfn to require the

> > ptep and then lockdep_assert_held() it against the page table lock?

> 

> The safe variant with the pagetable lock is follow_pte_pmd. The only

> way to make follow_pfn safe is if you have an mmu notifier and

> corresponding retry logic. That is not covered by lockdep (it would

> splat if we annotate the retry side), so I'm not sure how you'd check

> for that?


Right OK.

> Checking for ptep lock doesn't work here, since the one leftover safe

> user of this (kvm) doesn't need that at all, because it has the mmu

> notifier.


Ah, so a better name and/or function kdoc for follow_pfn is probably a
good iead in this patch as well.

> So I think we're as good as it gets, since I really have no idea how

> to make sure follow_pfn callers do have an mmu notifier registered.


Yah, can't be done. Most mmu notifier users should be using
hmm_range_fault anyhow, kvm is really very special here.
 
> I've followed the few other CONFIG_STRICT_FOO I've seen, which are all

> explicit enables and default to "do not break uapi, damn the

> (security) bugs". Which is I think how this should be done. It is in

> the security section though, so hopefully competent distros will

> enable this all.


I thought the strict ones were more general and less clear security
worries, not bugs like this.

This is "allow a user triggerable use after free bug to exist in the
kernel"

Jason
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@  int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 8d467e23b44e..8db7ad1c261c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@  EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@  config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"