Message ID | 20220718072322.8927-8-tzimmermann@suse.de |
---|---|
State | New |
Headers | show |
Series | fbdev: Maintain device ownership with aperture helpers | expand |
Hi, > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c > index f42a0d8bc211..101e13c2cf41 100644 > --- a/drivers/video/aperture.c > +++ b/drivers/video/aperture.c > @@ -8,6 +8,7 @@ > #include <linux/pci.h> > #include <linux/platform_device.h> > #include <linux/slab.h> > +#include <linux/sysfb.h> > #include <linux/types.h> > #include <linux/vgaarb.h> > > @@ -286,7 +287,20 @@ int > aperture_remove_conflicting_devices(resource_size_t base, > resource_size_t si > #if IS_REACHABLE(CONFIG_FB) > struct apertures_struct *a; > int ret; > +#endif > + > + /* > + * If a driver asked to unregister a platform device registered by > + * sysfb, then can be assumed that this is a driver for a display > + * that is set up by the system firmware and has a generic driver. > + * > + * Drivers for devices that don't have a generic driver will never > + * ask for this, so let's assume that a real driver for the display > + * was already probed and prevent sysfb to register devices later. > + */ > + sysfb_disable(); This call to sysfb_disable() has been causing trouble with regard to VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices to get rid of any console drivers (d173780620792c) using the device in question, but now even unrelated drivers are getting killed. Example situation: Machine has two GPUs and uses efifb for the console. Efifb registers with the aperture system the efi framebuffer region, which is covered by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap with the efifb on GPU1 but the efifb is killed regardless due to the unconditional call to sysfb_disable(). The console switches to dummy and locks up from the user perspective. This seems unnecessary, as the device is unrelated. I do not quite understand the comment justifying the call. Some discussions with workarounds: https://old.reddit.com/r/VFIO/comments/11qei4t/framebuffer_doesnt_work_anymore_after_passthrough/ https://bbs.archlinux.org/viewtopic.php?id=280512 Thanks, Samuel
Hi Am 20.03.23 um 02:47 schrieb Samuel Čavoj: > Hi, > >> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c >> index f42a0d8bc211..101e13c2cf41 100644 >> --- a/drivers/video/aperture.c >> +++ b/drivers/video/aperture.c >> @@ -8,6 +8,7 @@ >> #include <linux/pci.h> >> #include <linux/platform_device.h> >> #include <linux/slab.h> >> +#include <linux/sysfb.h> >> #include <linux/types.h> >> #include <linux/vgaarb.h> >> >> @@ -286,7 +287,20 @@ int >> aperture_remove_conflicting_devices(resource_size_t base, >> resource_size_t si >> #if IS_REACHABLE(CONFIG_FB) >> struct apertures_struct *a; >> int ret; >> +#endif >> + >> + /* >> + * If a driver asked to unregister a platform device registered by >> + * sysfb, then can be assumed that this is a driver for a display >> + * that is set up by the system firmware and has a generic driver. >> + * >> + * Drivers for devices that don't have a generic driver will never >> + * ask for this, so let's assume that a real driver for the display >> + * was already probed and prevent sysfb to register devices later. >> + */ >> + sysfb_disable(); > > This call to sysfb_disable() has been causing trouble with regard to > VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices to > get rid of any console drivers (d173780620792c) using the device in > question, but now even unrelated drivers are getting killed. Example > situation: Which drivers do you use? Best regards Thomas > > Machine has two GPUs and uses efifb for the console. Efifb registers > with the aperture system the efi framebuffer region, which is covered > by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls > aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap > with the efifb on GPU1 but the efifb is killed regardless due to > the unconditional call to sysfb_disable(). The console switches > to dummy and locks up from the user perspective. > This seems unnecessary, as the device is unrelated. > > I do not quite understand the comment justifying the call. > > Some discussions with workarounds: > https://old.reddit.com/r/VFIO/comments/11qei4t/framebuffer_doesnt_work_anymore_after_passthrough/ > https://bbs.archlinux.org/viewtopic.php?id=280512 > > > Thanks, > Samuel
Thomas Zimmermann <tzimmermann@suse.de> writes: [...] >>> + /* >>> + * If a driver asked to unregister a platform device registered by >>> + * sysfb, then can be assumed that this is a driver for a display >>> + * that is set up by the system firmware and has a generic driver. >>> + * >>> + * Drivers for devices that don't have a generic driver will never >>> + * ask for this, so let's assume that a real driver for the display >>> + * was already probed and prevent sysfb to register devices later. >>> + */ >>> + sysfb_disable(); >> >> This call to sysfb_disable() has been causing trouble with regard to >> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices to >> get rid of any console drivers (d173780620792c) using the device in >> question, but now even unrelated drivers are getting killed. Example >> situation: > > Which drivers do you use? > Also, what kernel version? [...] >> >> Machine has two GPUs and uses efifb for the console. Efifb registers >> with the aperture system the efi framebuffer region, which is covered >> by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls >> aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap >> with the efifb on GPU1 but the efifb is killed regardless due to >> the unconditional call to sysfb_disable(). The console switches >> to dummy and locks up from the user perspective. >> This seems unnecessary, as the device is unrelated. >> That's a bug indeed but I thought that was already fixed...
On 2023-03-20 11:13, Javier Martinez Canillas wrote: > Thomas Zimmermann <tzimmermann@suse.de> writes: > > [...] > >>>> + /* >>>> + * If a driver asked to unregister a platform device registered >>>> by >>>> + * sysfb, then can be assumed that this is a driver for a >>>> display >>>> + * that is set up by the system firmware and has a generic >>>> driver. >>>> + * >>>> + * Drivers for devices that don't have a generic driver will >>>> never >>>> + * ask for this, so let's assume that a real driver for the >>>> display >>>> + * was already probed and prevent sysfb to register devices >>>> later. >>>> + */ >>>> + sysfb_disable(); >>> >>> This call to sysfb_disable() has been causing trouble with regard to >>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices >>> to >>> get rid of any console drivers (d173780620792c) using the device in >>> question, but now even unrelated drivers are getting killed. Example >>> situation: >> >> Which drivers do you use? This happens with either no drivers loaded or the proprietary nvidia driver. Nouveau is fine as it doesn't rely on efifb but brings its own. >> > > Also, what kernel version? I tried with 6.2.6, can build mainline and test there as well. Thanks for help! > > [...] > >>> >>> Machine has two GPUs and uses efifb for the console. Efifb registers >>> with the aperture system the efi framebuffer region, which is covered >>> by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls >>> aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap >>> with the efifb on GPU1 but the efifb is killed regardless due to >>> the unconditional call to sysfb_disable(). The console switches >>> to dummy and locks up from the user perspective. >>> This seems unnecessary, as the device is unrelated. >>> > > That's a bug indeed but I thought that was already fixed...
Samuel Čavoj <samuel@cavoj.net> writes: [...] >>>> This call to sysfb_disable() has been causing trouble with regard to >>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices >>>> to >>>> get rid of any console drivers (d173780620792c) using the device in >>>> question, but now even unrelated drivers are getting killed. Example >>>> situation: >>> >>> Which drivers do you use? > > This happens with either no drivers loaded or the proprietary nvidia > driver. Nouveau is fine as it doesn't rely on efifb but brings its own. > Which is what all DRM drivers should do. If they want to make sure that a fbdev will be present after the DRM driver probes, then should register an emulated fbdev. There was an attempt to workaround that in [0], in particular patch [1] but that effort was not continued since the only DRM driver that would be affected is the Nvidia proprietary driver that relies on efifb/simpledrm to have a VT. [0]: https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both [1]: https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/
On 2023-03-20 13:12, Javier Martinez Canillas wrote: > Samuel Čavoj <samuel@cavoj.net> writes: > > [...] > >>>>> This call to sysfb_disable() has been causing trouble with regard >>>>> to >>>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices >>>>> to >>>>> get rid of any console drivers (d173780620792c) using the device in >>>>> question, but now even unrelated drivers are getting killed. >>>>> Example >>>>> situation: >>>> >>>> Which drivers do you use? >> >> This happens with either no drivers loaded or the proprietary nvidia >> driver. Nouveau is fine as it doesn't rely on efifb but brings its >> own. >> > > Which is what all DRM drivers should do. If they want to make sure that > a > fbdev will be present after the DRM driver probes, then should register > an > emulated fbdev. I don't see how this is specific to Nvidia or DRM drivers. The efifb is killed if vfio-pci (or another driver which uses the aperture system to remove conflicting drivers) is bound to ANY pci device, regardless of whether it's nvidia's fault for not implementing a framebuffer. Fair enough, I agree that they should, but I for one expect my efifb to not die at a random time when a random unrelated driver does a random thing with another unrelated GPU. Or is the efifb considered a stop-gap solution the only purpose of which is early boot--before another GPU driver is loaded? > > There was an attempt to workaround that in [0], in particular patch [1] > but that effort was not continued since the only DRM driver that would > be > affected is the Nvidia proprietary driver that relies on > efifb/simpledrm > to have a VT. > > [0]: > https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both > [1]: > https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/
Samuel Čavoj <samuel@cavoj.net> writes: Hello Samuel, > On 2023-03-20 13:12, Javier Martinez Canillas wrote: >> Samuel Čavoj <samuel@cavoj.net> writes: >> >> [...] >> >>>>>> This call to sysfb_disable() has been causing trouble with regard >>>>>> to >>>>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices >>>>>> to >>>>>> get rid of any console drivers (d173780620792c) using the device in >>>>>> question, but now even unrelated drivers are getting killed. >>>>>> Example >>>>>> situation: >>>>> >>>>> Which drivers do you use? >>> >>> This happens with either no drivers loaded or the proprietary nvidia >>> driver. Nouveau is fine as it doesn't rely on efifb but brings its >>> own. >>> >> >> Which is what all DRM drivers should do. If they want to make sure that >> a >> fbdev will be present after the DRM driver probes, then should register >> an >> emulated fbdev. > > I don't see how this is specific to Nvidia or DRM drivers. > Not specific to Nvidia per se but as mentioned it only affected Nvidia due that driver relying on a different graphics driver to get a VT console. > The efifb is killed if vfio-pci (or another driver which uses the > aperture system to remove conflicting drivers) is bound to ANY pci > device, regardless of whether it's nvidia's fault for not implementing > a framebuffer. Fair enough, I agree that they should, but > I for one expect my efifb to not die at a random time > when a random unrelated driver does a random thing with another > unrelated GPU. > There was a patch series to address that: https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both In particular, this patch: https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/ > Or is the efifb considered a stop-gap solution the only purpose of > which is early boot--before another GPU driver is loaded? > All the firmware-provided graphics drivers are really a best effort IMO, that is something only to be used to get early video output and any in the case of "nomodeset" (i.e: some distros have a "Safe graphics mode" boot entry that prevents DRM drivers to be loaded but used for troubleshooting. But as soon as a real DRM driver is probed (either in the host or a guest when the device is passed-through), I believe that is very likely that it won't work anymore. In other words, is not a robust way to get output and is just a best effort.
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c index f42a0d8bc211..101e13c2cf41 100644 --- a/drivers/video/aperture.c +++ b/drivers/video/aperture.c @@ -8,6 +8,7 @@ #include <linux/pci.h> #include <linux/platform_device.h> #include <linux/slab.h> +#include <linux/sysfb.h> #include <linux/types.h> #include <linux/vgaarb.h> @@ -286,7 +287,20 @@ int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si #if IS_REACHABLE(CONFIG_FB) struct apertures_struct *a; int ret; +#endif + + /* + * If a driver asked to unregister a platform device registered by + * sysfb, then can be assumed that this is a driver for a display + * that is set up by the system firmware and has a generic driver. + * + * Drivers for devices that don't have a generic driver will never + * ask for this, so let's assume that a real driver for the display + * was already probed and prevent sysfb to register devices later. + */ + sysfb_disable(); +#if IS_REACHABLE(CONFIG_FB) a = alloc_apertures(1); if (!a) return -ENOMEM; diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 877de85309f3..3ee3ea018245 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -19,7 +19,6 @@ #include <linux/kernel.h> #include <linux/major.h> #include <linux/slab.h> -#include <linux/sysfb.h> #include <linux/mm.h> #include <linux/mman.h> #include <linux/vt.h> @@ -1765,17 +1764,6 @@ int remove_conflicting_framebuffers(struct apertures_struct *a, do_free = true; } - /* - * If a driver asked to unregister a platform device registered by - * sysfb, then can be assumed that this is a driver for a display - * that is set up by the system firmware and has a generic driver. - * - * Drivers for devices that don't have a generic driver will never - * ask for this, so let's assume that a real driver for the display - * was already probed and prevent sysfb to register devices later. - */ - sysfb_disable(); - mutex_lock(®istration_lock); do_remove_conflicting_framebuffers(a, name, primary); mutex_unlock(®istration_lock);