Message ID | 20220505215947.364694-1-javierm@redhat.com |
---|---|
Headers | show |
Series | fbdev: Fix use-after-free caused by wrong fb_info cleanup in drivers | expand |
On 5/5/22 23:59, Javier Martinez Canillas wrote: > Hello, > > This series contains patches suggested by Daniel Vetter to fix a use-after-free > error in the fb_release() function, due a fb_info associated with a fbdev being > freed too early while a user-space process still has the fbdev dev node opened. > Pushed this to drm-misc (drm-misc-fixes).
Hello Andrzej, On 5/6/22 15:07, Andrzej Hajda wrote: > On 06.05.2022 00:05, Javier Martinez Canillas wrote: [snip] >> + >> + framebuffer_release(info); >> + >> if (request_mem_succeeded) >> release_mem_region(info->apertures->ranges[0].base, >> info->apertures->ranges[0].size); > > You are releasing info, then you are using it. > > I suspect it is responsible for multiple failures of Intel CI [1]. > Yes, it is :( sorry about the mess. Ville already reported this to me. I'll post a patch in a minute. I wonder how this didn't happen before since .remove() happens before .fb_destroy() AFAIU...
On 06.05.2022 00:04, Javier Martinez Canillas wrote: > From: Daniel Vetter <daniel.vetter@ffwll.ch> > > Most fbdev drivers have issues with the fb_info lifetime, because call to > framebuffer_release() from their driver's .remove callback, rather than > doing from fbops.fb_destroy callback. > > Doing that will destroy the fb_info too early, while references to it may > still exist, leading to a use-after-free error. > > To prevent this, check the fb_info reference counter when attempting to > kfree the data structure in framebuffer_release(). That will leak it but > at least will prevent the mentioned error. > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> > Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> > --- > > (no changes since v1) > > drivers/video/fbdev/core/fbsysfs.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c > index 8c1ee9ecec3d..c2a60b187467 100644 > --- a/drivers/video/fbdev/core/fbsysfs.c > +++ b/drivers/video/fbdev/core/fbsysfs.c > @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) > { > if (!info) > return; > + > + if (WARN_ON(refcount_read(&info->count))) > + return; > + Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it? Regarding fb drivers, just for stats: git grep -p framebuffer_release | grep _remove | wc -l Suggests there is at least 70 incorrect users of this :) Regards Andrzej
Hello Andrzej, On 5/9/22 16:56, Andrzej Hajda wrote: > On 06.05.2022 00:04, Javier Martinez Canillas wrote: >> From: Daniel Vetter <daniel.vetter@ffwll.ch> >> >> Most fbdev drivers have issues with the fb_info lifetime, because call to >> framebuffer_release() from their driver's .remove callback, rather than >> doing from fbops.fb_destroy callback. >> >> Doing that will destroy the fb_info too early, while references to it may >> still exist, leading to a use-after-free error. >> >> To prevent this, check the fb_info reference counter when attempting to >> kfree the data structure in framebuffer_release(). That will leak it but >> at least will prevent the mentioned error. >> >> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> >> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> >> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> >> --- >> >> (no changes since v1) >> >> drivers/video/fbdev/core/fbsysfs.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c >> index 8c1ee9ecec3d..c2a60b187467 100644 >> --- a/drivers/video/fbdev/core/fbsysfs.c >> +++ b/drivers/video/fbdev/core/fbsysfs.c >> @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) >> { >> if (!info) >> return; >> + >> + if (WARN_ON(refcount_read(&info->count))) >> + return; >> + > > Regarding drm: > What about drm_fb_helper_fini? It calls also framebuffer_release and is > called often from _remove paths (checked intel/radeon/nouveau). I guess > it should be fixed as well. Do you plan to fix it? > I think you are correct. Maybe we need something like the following? diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap); - framebuffer_release(info); } fb_helper->fbdev = NULL; @@ -2112,6 +2111,7 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) static void drm_fbdev_fb_destroy(struct fb_info *info) { drm_fbdev_release(info->par); + framebuffer_release(info); } static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma) > > Regarding fb drivers, just for stats: > git grep -p framebuffer_release | grep _remove | wc -l > Suggests there is at least 70 incorrect users of this :) > Yes, Daniel already mentioned that most of them get this wrong but I was mostly interested in {simple,efi,vesa}fb since are used with "nomodeset". But given that I only touched those tree and still managed to introduce a regression, I won't attempt to fix the others.
On 09.05.2022 17:30, Javier Martinez Canillas wrote: > Hello Andrzej, > > On 5/9/22 16:56, Andrzej Hajda wrote: >> On 06.05.2022 00:04, Javier Martinez Canillas wrote: >>> From: Daniel Vetter <daniel.vetter@ffwll.ch> >>> >>> Most fbdev drivers have issues with the fb_info lifetime, because call to >>> framebuffer_release() from their driver's .remove callback, rather than >>> doing from fbops.fb_destroy callback. >>> >>> Doing that will destroy the fb_info too early, while references to it may >>> still exist, leading to a use-after-free error. >>> >>> To prevent this, check the fb_info reference counter when attempting to >>> kfree the data structure in framebuffer_release(). That will leak it but >>> at least will prevent the mentioned error. >>> >>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> >>> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> >>> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> >>> --- >>> >>> (no changes since v1) >>> >>> drivers/video/fbdev/core/fbsysfs.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c >>> index 8c1ee9ecec3d..c2a60b187467 100644 >>> --- a/drivers/video/fbdev/core/fbsysfs.c >>> +++ b/drivers/video/fbdev/core/fbsysfs.c >>> @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) >>> { >>> if (!info) >>> return; >>> + >>> + if (WARN_ON(refcount_read(&info->count))) >>> + return; >>> + >> Regarding drm: >> What about drm_fb_helper_fini? It calls also framebuffer_release and is >> called often from _remove paths (checked intel/radeon/nouveau). I guess >> it should be fixed as well. Do you plan to fix it? >> > I think you are correct. Maybe we need something like the following? > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index d265a73313c9..b09598f7af28 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) > if (info) { > if (info->cmap.len) > fb_dealloc_cmap(&info->cmap); > - framebuffer_release(info); What about cmap? I am not an fb expert, but IMO cmap can be accessed from userspace as well. Regards Andrzej > } > fb_helper->fbdev = NULL; > > @@ -2112,6 +2111,7 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) > static void drm_fbdev_fb_destroy(struct fb_info *info) > { > drm_fbdev_release(info->par); > + framebuffer_release(info); > } > > static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma) > >> Regarding fb drivers, just for stats: >> git grep -p framebuffer_release | grep _remove | wc -l >> Suggests there is at least 70 incorrect users of this :) >> > Yes, Daniel already mentioned that most of them get this wrong but I was > mostly interested in {simple,efi,vesa}fb since are used with "nomodeset". > > But given that I only touched those tree and still managed to introduce > a regression, I won't attempt to fix the others. >
On 5/9/22 17:51, Andrzej Hajda wrote: [snip] >>>> + >>> Regarding drm: >>> What about drm_fb_helper_fini? It calls also framebuffer_release and is >>> called often from _remove paths (checked intel/radeon/nouveau). I guess >>> it should be fixed as well. Do you plan to fix it? >>> >> I think you are correct. Maybe we need something like the following? >> >> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c >> index d265a73313c9..b09598f7af28 100644 >> --- a/drivers/gpu/drm/drm_fb_helper.c >> +++ b/drivers/gpu/drm/drm_fb_helper.c >> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) >> if (info) { >> if (info->cmap.len) >> fb_dealloc_cmap(&info->cmap); >> - framebuffer_release(info); > > What about cmap? I am not an fb expert, but IMO cmap can be accessed > from userspace as well. > I actually thought about the same but then remembered what Daniel said in [0] (AFAIU at least) that these should be done in .remove() so the current code looks like matches that and only framebuffer_release() should be moved. For vesafb a previous patch proposed to also move a release_region() call to .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1]. But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that is the correct thing to do. [0]: https://www.spinics.net/lists/dri-devel/msg346257.html [1]: https://www.spinics.net/lists/dri-devel/msg346261.html
Hi Javier Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas: > On 5/9/22 17:51, Andrzej Hajda wrote: > > [snip] > >>>>> + >>>> Regarding drm: >>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is >>>> called often from _remove paths (checked intel/radeon/nouveau). I guess >>>> it should be fixed as well. Do you plan to fix it? >>>> >>> I think you are correct. Maybe we need something like the following? >>> >>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c >>> index d265a73313c9..b09598f7af28 100644 >>> --- a/drivers/gpu/drm/drm_fb_helper.c >>> +++ b/drivers/gpu/drm/drm_fb_helper.c >>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) >>> if (info) { >>> if (info->cmap.len) >>> fb_dealloc_cmap(&info->cmap); >>> - framebuffer_release(info); >> >> What about cmap? I am not an fb expert, but IMO cmap can be accessed >> from userspace as well. >> > > I actually thought about the same but then remembered what Daniel said in [0] > (AFAIU at least) that these should be done in .remove() so the current code > looks like matches that and only framebuffer_release() should be moved. > > For vesafb a previous patch proposed to also move a release_region() call to > .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1]. > > But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that > is the correct thing to do. The cmap data structure is software state that can be accessed via icotl as long as the devfile is open. Drivers update the hardware from it. See [1]. Moving that cleanup into fb_destroy seems appropriate to me. Best regards Thomas [1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/video/fbdev/core/fbcmap.c#L231 > > [0]: https://www.spinics.net/lists/dri-devel/msg346257.html > [1]: https://www.spinics.net/lists/dri-devel/msg346261.html >
Hi Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas: > On 5/9/22 17:51, Andrzej Hajda wrote: > > [snip] > >>>>> + >>>> Regarding drm: >>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is >>>> called often from _remove paths (checked intel/radeon/nouveau). I guess >>>> it should be fixed as well. Do you plan to fix it? >>>> >>> I think you are correct. Maybe we need something like the following? >>> >>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c >>> index d265a73313c9..b09598f7af28 100644 >>> --- a/drivers/gpu/drm/drm_fb_helper.c >>> +++ b/drivers/gpu/drm/drm_fb_helper.c >>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) >>> if (info) { >>> if (info->cmap.len) >>> fb_dealloc_cmap(&info->cmap); >>> - framebuffer_release(info); After reviewing that code, drm_fb_helper_fini() appears to be called from .fb_destroy (see drm_fbdev_release). The code is hard to follow though. If there another way of releasing the framebuffer here? Best regards Thomas >> >> What about cmap? I am not an fb expert, but IMO cmap can be accessed >> from userspace as well. >> > > I actually thought about the same but then remembered what Daniel said in [0] > (AFAIU at least) that these should be done in .remove() so the current code > looks like matches that and only framebuffer_release() should be moved. > > For vesafb a previous patch proposed to also move a release_region() call to > .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1]. > > But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that > is the correct thing to do. > > [0]: https://www.spinics.net/lists/dri-devel/msg346257.html > [1]: https://www.spinics.net/lists/dri-devel/msg346261.html >
Hello Thomas, On 5/9/22 20:32, Thomas Zimmermann wrote: > Hi > > Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas: >> On 5/9/22 17:51, Andrzej Hajda wrote: >> >> [snip] >> >>>>>> + >>>>> Regarding drm: >>>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is >>>>> called often from _remove paths (checked intel/radeon/nouveau). I guess >>>>> it should be fixed as well. Do you plan to fix it? >>>>> >>>> I think you are correct. Maybe we need something like the following? >>>> >>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c >>>> index d265a73313c9..b09598f7af28 100644 >>>> --- a/drivers/gpu/drm/drm_fb_helper.c >>>> +++ b/drivers/gpu/drm/drm_fb_helper.c >>>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) >>>> if (info) { >>>> if (info->cmap.len) >>>> fb_dealloc_cmap(&info->cmap); >>>> - framebuffer_release(info); > > After reviewing that code, drm_fb_helper_fini() appears to be called > from .fb_destroy (see drm_fbdev_release). The code is hard to follow > though. If there another way of releasing the framebuffer here? > Andrzej mentioned intel/radeon/nouveau as example, I only looked at i915 and the call chain is the following as far as I can tell: struct pci_driver i915_pci_driver = { ... .remove = i915_pci_remove, ... }; i915_driver_remove intel_modeset_driver_remove_noirq intel_fbdev_fini intel_fbdev_destroy drm_fb_helper_fini framebuffer_release So my underdestanding is that if a program has the emulated fbdev device opened and the i915 module is removed, then a use-after-free would be triggered on drm_fbdev_fb_destroy() once the program closes the device: drm_fbdev_fb_destroy drm_fbdev_release(info->par); <-- info was already freed on .remove
On 5/9/22 20:12, Thomas Zimmermann wrote: [snip] >> I actually thought about the same but then remembered what Daniel said in [0] >> (AFAIU at least) that these should be done in .remove() so the current code >> looks like matches that and only framebuffer_release() should be moved. >> >> For vesafb a previous patch proposed to also move a release_region() call to >> .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1]. >> >> But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that >> is the correct thing to do. > > The cmap data structure is software state that can be accessed via icotl > as long as the devfile is open. Drivers update the hardware from it. See > [1]. Moving that cleanup into fb_destroy seems appropriate to me. > I see, that makes sense. Then something like the following instead? diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..ce0d89c49e42 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) cancel_work_sync(&fb_helper->resume_work); cancel_work_sync(&fb_helper->damage_work); - info = fb_helper->fbdev; - if (info) { - if (info->cmap.len) - fb_dealloc_cmap(&info->cmap); - framebuffer_release(info); - } fb_helper->fbdev = NULL; mutex_lock(&kernel_fb_helper_lock); @@ -2111,7 +2105,11 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) */ static void drm_fbdev_fb_destroy(struct fb_info *info) { + if (info->cmap.len) + fb_dealloc_cmap(&info->cmap); + drm_fbdev_release(info->par); + framebuffer_release(info); } static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
On 09.05.2022 22:03, Javier Martinez Canillas wrote: > On 5/9/22 20:12, Thomas Zimmermann wrote: > > [snip] > >>> I actually thought about the same but then remembered what Daniel said in [0] >>> (AFAIU at least) that these should be done in .remove() so the current code >>> looks like matches that and only framebuffer_release() should be moved. >>> >>> For vesafb a previous patch proposed to also move a release_region() call to >>> .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1]. >>> >>> But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that >>> is the correct thing to do. >> The cmap data structure is software state that can be accessed via icotl >> as long as the devfile is open. Drivers update the hardware from it. See >> [1]. Moving that cleanup into fb_destroy seems appropriate to me. >> > I see, that makes sense. Then something like the following instead? > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index d265a73313c9..ce0d89c49e42 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) > cancel_work_sync(&fb_helper->resume_work); > cancel_work_sync(&fb_helper->damage_work); > > - info = fb_helper->fbdev; > - if (info) { > - if (info->cmap.len) > - fb_dealloc_cmap(&info->cmap); > - framebuffer_release(info); > - } > fb_helper->fbdev = NULL; > > mutex_lock(&kernel_fb_helper_lock); > @@ -2111,7 +2105,11 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) > */ > static void drm_fbdev_fb_destroy(struct fb_info *info) > { > + if (info->cmap.len) > + fb_dealloc_cmap(&info->cmap); > + > drm_fbdev_release(info->par); > + framebuffer_release(info); I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid. Regards Andrzej > } > > static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma) >
On 5/10/22 00:22, Andrzej Hajda wrote: [snip] >> static void drm_fbdev_fb_destroy(struct fb_info *info) >> { >> + if (info->cmap.len) >> + fb_dealloc_cmap(&info->cmap); >> + >> drm_fbdev_release(info->par); >> + framebuffer_release(info); > > I would put drm_fbdev_release at the beginning - it cancels workers > which could expect cmap to be still valid. > Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers. I think that besides this patch, drivers shouldn't need to call to the drm_fb_helper_fini() function directly. Since that would be called during drm_fbdev_fb_destroy() anyways. We should probably remove that call in all drivers and make this helper function static and just private to drm_fb_helper functions. Or am I missing something here ? [0]:
On 10.05.2022 00:42, Javier Martinez Canillas wrote: > On 5/10/22 00:22, Andrzej Hajda wrote: > > [snip] > >>> static void drm_fbdev_fb_destroy(struct fb_info *info) >>> { >>> + if (info->cmap.len) >>> + fb_dealloc_cmap(&info->cmap); >>> + >>> drm_fbdev_release(info->par); >>> + framebuffer_release(info); >> I would put drm_fbdev_release at the beginning - it cancels workers >> which could expect cmap to be still valid. >> > Indeed, you are correct again. [0] is the final version of the patch I've > but don't have an i915 test machine to give it a try. I'll test tomorrow > on my test systems to verify that it doesn't cause any regressions since > with other DRM drivers. > > I think that besides this patch, drivers shouldn't need to call to the > drm_fb_helper_fini() function directly. Since that would be called during > drm_fbdev_fb_destroy() anyways. > > We should probably remove that call in all drivers and make this helper > function static and just private to drm_fb_helper functions. > > Or am I missing something here ? This is question for experts :) I do not know what are user API/ABI expectations regarding removal of fbdev driver, I wonder if they are documented somewhere :) Apparently we have some process of 'zombification' here - we need to remove the driver without waiting for userspace closing framebuffer(???) (to unbind ops-es and remove references to driver related things), but we need to leave some structures to fool userspace, 'info' seems to be one of them. So I guess there should be something called on driver's _remove path, and sth on destroy path. Regards Andrzej > > [0]: > From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001 > From: Javier Martinez Canillas <javierm@redhat.com> > Date: Tue May 10 00:39:55 2022 +0200 > Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info > too early > > Currently these are done in drm_fb_helper_fini() but this helper is called > by drivers in their .remove callback, which could lead to a use-after-free > if a process has opened the emulated fbdev node while a driver is removed. > > For example, in i915 driver the call chain during remove is the following: > > struct pci_driver i915_pci_driver = { > ... > .remove = i915_pci_remove, > ... > }; > > i915_pci_remove > i915_driver_remove > intel_modeset_driver_remove_noirq > intel_fbdev_fini > intel_fbdev_destroy > drm_fb_helper_fini > framebuffer_release > > Later the process will close the fbdev node file descriptor leading to the > mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following: > > drm_fbdev_fb_destroy > drm_fbdev_release(info->par); <-- info was already freed on .remove > > To prevent that, let's move the framebuffer_release() call to the end of > the drm_fbdev_fb_destroy() function. > > Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early > and is more correct to do it in drm_fbdev_fb_destroy() as well. After a > call to drm_fbdev_release() has been made. > > Reported-by: Andrzej Hajda <andrzej.hajda@intel.com> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> > --- > drivers/gpu/drm/drm_fb_helper.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index d265a73313c9..7288fbd26bcc 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) > cancel_work_sync(&fb_helper->resume_work); > cancel_work_sync(&fb_helper->damage_work); > > - info = fb_helper->fbdev; > - if (info) { > - if (info->cmap.len) > - fb_dealloc_cmap(&info->cmap); > - framebuffer_release(info); > - } > fb_helper->fbdev = NULL; > > mutex_lock(&kernel_fb_helper_lock); > @@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) > static void drm_fbdev_fb_destroy(struct fb_info *info) > { > drm_fbdev_release(info->par); > + if (info->cmap.len) > + fb_dealloc_cmap(&info->cmap); > + framebuffer_release(info); > } > > static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
On 5/10/22 09:19, Andrzej Hajda wrote: > > > On 10.05.2022 00:42, Javier Martinez Canillas wrote: >> On 5/10/22 00:22, Andrzej Hajda wrote: >> >> [snip] >> >>>> static void drm_fbdev_fb_destroy(struct fb_info *info) >>>> { >>>> + if (info->cmap.len) >>>> + fb_dealloc_cmap(&info->cmap); >>>> + >>>> drm_fbdev_release(info->par); >>>> + framebuffer_release(info); >>> I would put drm_fbdev_release at the beginning - it cancels workers >>> which could expect cmap to be still valid. >>> >> Indeed, you are correct again. [0] is the final version of the patch I've >> but don't have an i915 test machine to give it a try. I'll test tomorrow >> on my test systems to verify that it doesn't cause any regressions since >> with other DRM drivers. >> >> I think that besides this patch, drivers shouldn't need to call to the >> drm_fb_helper_fini() function directly. Since that would be called during >> drm_fbdev_fb_destroy() anyways. >> >> We should probably remove that call in all drivers and make this helper >> function static and just private to drm_fb_helper functions. >> >> Or am I missing something here ? > > This is question for experts :) Fair. I'm definitely not one of them :) > I do not know what are user API/ABI expectations regarding removal of > fbdev driver, I wonder if they are documented somewhere :) I don't know. At least I haven't found them. > Apparently we have some process of 'zombification' here - we need to > remove the driver without waiting for userspace closing framebuffer(???) > (to unbind ops-es and remove references to driver related things), but > we need to leave some structures to fool userspace, 'info' seems to be > one of them. That's correct, yes. I think that any driver that provides a .mmap file operation would have the same issue. But drivers keep an internal state and just return -ENODEV or whatever on read/write/close after a removal. The fbdev subsystem is different though since as you said it, the fbdev core unconditionally calls to the driver .fb_release() callback with a struct fb_info reference as argument. I tried to prevent that with commit aafa025c76dc ("fbdev: Make fb_release() return -ENODEV if fbdev was unregistered") but Daniel pointed out that is was wrong since could leak memory allocated and was expected to be freed on release. That's why I instead fixed the issue in the fbdev drivers and just added a warn on fb_release(), that is $SUBJECT. > So I guess there should be something called on driver's _remove path, > and sth on destroy path. > That was my question actually, do we need something to be called in the destroy path ? Since that could just be internal to the DRM fb helpers. In other words, drivers should only care about setting a generic fbdev by calling drm_fbdev_generic_setup(), and then do any HW cleanup in the removal path, but let the fb helpers to handle the SW cleanup in destroy.
Hi Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas: > On 5/10/22 00:22, Andrzej Hajda wrote: > > [snip] > >>> static void drm_fbdev_fb_destroy(struct fb_info *info) >>> { >>> + if (info->cmap.len) >>> + fb_dealloc_cmap(&info->cmap); >>> + >>> drm_fbdev_release(info->par); >>> + framebuffer_release(info); >> >> I would put drm_fbdev_release at the beginning - it cancels workers >> which could expect cmap to be still valid. >> > > Indeed, you are correct again. [0] is the final version of the patch I've > but don't have an i915 test machine to give it a try. I'll test tomorrow > on my test systems to verify that it doesn't cause any regressions since > with other DRM drivers. You have to go through all DRM drivers that call drm_fb_helper_fini() and make sure that they free fb_info. For example armada appears to be leaking now. [1] Best regards Thomas [1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/armada/armada_fbdev.c#L152 > > I think that besides this patch, drivers shouldn't need to call to the > drm_fb_helper_fini() function directly. Since that would be called during > drm_fbdev_fb_destroy() anyways. > > We should probably remove that call in all drivers and make this helper > function static and just private to drm_fb_helper functions. > > Or am I missing something here ? > > [0]: > From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001 > From: Javier Martinez Canillas <javierm@redhat.com> > Date: Tue May 10 00:39:55 2022 +0200 > Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info > too early > > Currently these are done in drm_fb_helper_fini() but this helper is called > by drivers in their .remove callback, which could lead to a use-after-free > if a process has opened the emulated fbdev node while a driver is removed. > > For example, in i915 driver the call chain during remove is the following: > > struct pci_driver i915_pci_driver = { > ... > .remove = i915_pci_remove, > ... > }; > > i915_pci_remove > i915_driver_remove > intel_modeset_driver_remove_noirq > intel_fbdev_fini > intel_fbdev_destroy > drm_fb_helper_fini > framebuffer_release > > Later the process will close the fbdev node file descriptor leading to the > mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following: > > drm_fbdev_fb_destroy > drm_fbdev_release(info->par); <-- info was already freed on .remove > > To prevent that, let's move the framebuffer_release() call to the end of > the drm_fbdev_fb_destroy() function. > > Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early > and is more correct to do it in drm_fbdev_fb_destroy() as well. After a > call to drm_fbdev_release() has been made. > > Reported-by: Andrzej Hajda <andrzej.hajda@intel.com> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> > --- > drivers/gpu/drm/drm_fb_helper.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index d265a73313c9..7288fbd26bcc 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) > cancel_work_sync(&fb_helper->resume_work); > cancel_work_sync(&fb_helper->damage_work); > > - info = fb_helper->fbdev; > - if (info) { > - if (info->cmap.len) > - fb_dealloc_cmap(&info->cmap); > - framebuffer_release(info); > - } > fb_helper->fbdev = NULL; > > mutex_lock(&kernel_fb_helper_lock); > @@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) > static void drm_fbdev_fb_destroy(struct fb_info *info) > { > drm_fbdev_release(info->par); > + if (info->cmap.len) > + fb_dealloc_cmap(&info->cmap); > + framebuffer_release(info); > } > > static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Hello Thomas, On 5/10/22 10:04, Thomas Zimmermann wrote: > Hi > > Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas: >> On 5/10/22 00:22, Andrzej Hajda wrote: >> >> [snip] >> >>>> static void drm_fbdev_fb_destroy(struct fb_info *info) >>>> { >>>> + if (info->cmap.len) >>>> + fb_dealloc_cmap(&info->cmap); >>>> + >>>> drm_fbdev_release(info->par); >>>> + framebuffer_release(info); >>> >>> I would put drm_fbdev_release at the beginning - it cancels workers >>> which could expect cmap to be still valid. >>> >> >> Indeed, you are correct again. [0] is the final version of the patch I've >> but don't have an i915 test machine to give it a try. I'll test tomorrow >> on my test systems to verify that it doesn't cause any regressions since >> with other DRM drivers. > > You have to go through all DRM drivers that call drm_fb_helper_fini() > and make sure that they free fb_info. For example armada appears to be > leaking now. [1] > But shouldn't fb_info be freed when unregister_framebuffer() is called through drm_dev_unregister() ? AFAICT the call chain is the following: drm_put_dev() drm_dev_unregister() drm_client_dev_unregister() drm_fbdev_client_unregister() drm_fb_helper_unregister_fbi() unregister_framebuffer() do_unregister_framebuffer() put_fb_info() drm_fbdev_fb_destroy() framebuffer_release() which is the reason why I believe that drm_fb_helper_fini() should be an internal static function and only called from drm_fbdev_fb_destroy(). Drivers shouldn't really explicitly call this helper in my opinion.
Hi Am 10.05.22 um 10:30 schrieb Javier Martinez Canillas: > Hello Thomas, > > On 5/10/22 10:04, Thomas Zimmermann wrote: >> Hi >> >> Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas: >>> On 5/10/22 00:22, Andrzej Hajda wrote: >>> >>> [snip] >>> >>>>> static void drm_fbdev_fb_destroy(struct fb_info *info) >>>>> { >>>>> + if (info->cmap.len) >>>>> + fb_dealloc_cmap(&info->cmap); >>>>> + >>>>> drm_fbdev_release(info->par); >>>>> + framebuffer_release(info); >>>> >>>> I would put drm_fbdev_release at the beginning - it cancels workers >>>> which could expect cmap to be still valid. >>>> >>> >>> Indeed, you are correct again. [0] is the final version of the patch I've >>> but don't have an i915 test machine to give it a try. I'll test tomorrow >>> on my test systems to verify that it doesn't cause any regressions since >>> with other DRM drivers. >> >> You have to go through all DRM drivers that call drm_fb_helper_fini() >> and make sure that they free fb_info. For example armada appears to be >> leaking now. [1] >> > > But shouldn't fb_info be freed when unregister_framebuffer() is called > through drm_dev_unregister() ? AFAICT the call chain is the following: > > drm_put_dev() > drm_dev_unregister() > drm_client_dev_unregister() > drm_fbdev_client_unregister() > drm_fb_helper_unregister_fbi() > unregister_framebuffer() > do_unregister_framebuffer() > put_fb_info() > drm_fbdev_fb_destroy() > framebuffer_release() > > which is the reason why I believe that drm_fb_helper_fini() should be > an internal static function and only called from drm_fbdev_fb_destroy(). > > Drivers shouldn't really explicitly call this helper in my opinion. Thanks. That makes sense. Best regards Thomas >
Hi Am 10.05.22 um 10:37 schrieb Thomas Zimmermann: ... >>> >>> You have to go through all DRM drivers that call drm_fb_helper_fini() >>> and make sure that they free fb_info. For example armada appears to be >>> leaking now. [1] >>> >> >> But shouldn't fb_info be freed when unregister_framebuffer() is called >> through drm_dev_unregister() ? AFAICT the call chain is the following: >> >> drm_put_dev() >> drm_dev_unregister() >> drm_client_dev_unregister() >> drm_fbdev_client_unregister() >> drm_fb_helper_unregister_fbi() >> unregister_framebuffer() >> do_unregister_framebuffer() >> put_fb_info() >> drm_fbdev_fb_destroy() >> framebuffer_release() >> >> which is the reason why I believe that drm_fb_helper_fini() should be >> an internal static function and only called from drm_fbdev_fb_destroy(). >> >> Drivers shouldn't really explicitly call this helper in my opinion. One more stupid question: does armada actually use drm_fbdev_fb_destroy()? It's supposed to be a callback for struct fb_ops. Armada uses it's own instance of fb_ops, which apparently doesn't contain fb_destroy. [1] Best regards Thomas [1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/armada/armada_fbdev.c#L19 > > Thanks. That makes sense. > > Best regards > Thomas > > >> >
Hello Thomas, On 5/10/22 10:50, Thomas Zimmermann wrote: [snip] >>> Drivers shouldn't really explicitly call this helper in my opinion. > > One more stupid question: does armada actually use > drm_fbdev_fb_destroy()? It's supposed to be a callback for struct > fb_ops. Armada uses it's own instance of fb_ops, which apparently > doesn't contain fb_destroy. [1] > No stupid question at all. You are correct on this. So I guess we still need this call in the drivers that don't provide a .fb_destroy() handler. I see many options here: 1) Document in drm_fb_helper_alloc_fbi() that drivers only need to call drm_fb_helper_fini() explicitly if they are not setting up a fbdev with drm_fbdev_generic_setup(), otherwise is not needed. 2) Make drm_fbdev_fb_destroy() an exported symbol so drivers that have custom fb_ops can use it. 3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info(). I'm leaning towards option (3). Then the fb_info release will be automatic whether drivers are using the generic setup or a custom one.
Hi Javier Am 10.05.22 um 11:06 schrieb Javier Martinez Canillas: > Hello Thomas, > > On 5/10/22 10:50, Thomas Zimmermann wrote: > > [snip] > >>>> Drivers shouldn't really explicitly call this helper in my opinion. >> >> One more stupid question: does armada actually use >> drm_fbdev_fb_destroy()? It's supposed to be a callback for struct >> fb_ops. Armada uses it's own instance of fb_ops, which apparently >> doesn't contain fb_destroy. [1] >> > > No stupid question at all. You are correct on this. So I guess we still > need this call in the drivers that don't provide a .fb_destroy() handler. > > I see many options here: > > 1) Document in drm_fb_helper_alloc_fbi() that drivers only need to call > drm_fb_helper_fini() explicitly if they are not setting up a fbdev > with drm_fbdev_generic_setup(), otherwise is not needed. > > 2) Make drm_fbdev_fb_destroy() an exported symbol so drivers that have > custom fb_ops can use it. > > 3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when > they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info(). > > I'm leaning towards option (3). Then the fb_info release will be automatic > whether drivers are using the generic setup or a custom one. IMHO this would just be another glitch to paper over all the broken code. And if you follow through drm_fbdev_fb_helper(), [1] it'll call _fini at some point and probably blow up in some other way. Instances of struct fb_ops are also usually const. The only reliable way AFAICT is to do what generic fbdev does: use unregister_framebuffer and do the software cleanup somewhere within fb_destroy. And then fix all drivers to use that pattern. Best regards Thomas [1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/drm_fb_helper.c#L2082 >
On 5/10/22 11:39, Thomas Zimmermann wrote: [snip] >> >> 3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when >> they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info(). >> >> I'm leaning towards option (3). Then the fb_info release will be automatic >> whether drivers are using the generic setup or a custom one. > > IMHO this would just be another glitch to paper over all the broken > code. And if you follow through drm_fbdev_fb_helper(), [1] it'll call > _fini at some point and probably blow up in some other way. Instances of > struct fb_ops are also usually const. > > The only reliable way AFAICT is to do what generic fbdev does: use > unregister_framebuffer and do the software cleanup somewhere within > fb_destroy. And then fix all drivers to use that pattern. > Right. We can't really abstract this away from drivers that are not using the generic fbdev helpers. So then they will have to provide their own .fb_destroy() callback and do the cleanup.
On Mon, May 09, 2022 at 10:00:41PM +0200, Javier Martinez Canillas wrote: > Hello Thomas, > > On 5/9/22 20:32, Thomas Zimmermann wrote: > > Hi > > > > Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas: > >> On 5/9/22 17:51, Andrzej Hajda wrote: > >> > >> [snip] > >> > >>>>>> + > >>>>> Regarding drm: > >>>>> What about drm_fb_helper_fini? It calls also framebuffer_release and is > >>>>> called often from _remove paths (checked intel/radeon/nouveau). I guess > >>>>> it should be fixed as well. Do you plan to fix it? > >>>>> > >>>> I think you are correct. Maybe we need something like the following? > >>>> > >>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > >>>> index d265a73313c9..b09598f7af28 100644 > >>>> --- a/drivers/gpu/drm/drm_fb_helper.c > >>>> +++ b/drivers/gpu/drm/drm_fb_helper.c > >>>> @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) > >>>> if (info) { > >>>> if (info->cmap.len) > >>>> fb_dealloc_cmap(&info->cmap); > >>>> - framebuffer_release(info); > > > > After reviewing that code, drm_fb_helper_fini() appears to be called > > from .fb_destroy (see drm_fbdev_release). The code is hard to follow > > though. If there another way of releasing the framebuffer here? > > > > Andrzej mentioned intel/radeon/nouveau as example, I only looked at i915 > and the call chain is the following as far as I can tell: > > struct pci_driver i915_pci_driver = { > ... > .remove = i915_pci_remove, > ... > }; > > > i915_driver_remove > intel_modeset_driver_remove_noirq > intel_fbdev_fini > intel_fbdev_destroy > drm_fb_helper_fini > framebuffer_release > > So my underdestanding is that if a program has the emulated fbdev device > opened and the i915 module is removed, then a use-after-free would be > triggered on drm_fbdev_fb_destroy() once the program closes the device: > > drm_fbdev_fb_destroy > drm_fbdev_release(info->par); <-- info was already freed on .remove Yeah the old drivers that haven't switched over to the drm_client based fbdev emulations are all kinds of wrong and release it too early. Note that they don't use the provided ->remove hook, since that would result in double-cleanup like you point out. Those old drivers work more like all the other fbdev drivers where all the cleanup is done in ->remove, and if it's a real hotunplug you just die in fire :-/ Switching them all over to the drm_client based fbdev helpers and unexporting these old (dangerous!) functions would be really neat. But it's also a loooooot of work, and generally those big drivers don't get hotunplugged. -Daniel