Message ID | 20220607182338.344270-4-javierm@redhat.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
On Tue, 2022-06-07 at 20:23 +0200, Javier Martinez Canillas wrote: > The platform devices registered by sysfb match with firmware-based DRM or > fbdev drivers, that are used to have early graphics using a framebuffer > provided by the system firmware. > > DRM or fbdev drivers later are probed and remove all conflicting framebuffers, > leading to these platform devices for generic drivers to be unregistered. > > But the current solution has a race, since the sysfb_init() function could > be called after a DRM or fbdev driver is probed and request to unregister > the devices for drivers with conflicting framebuffes. > > To prevent this, disable any future sysfb platform device registration by > calling sysfb_disable(), if a driver requests to remove the conflicting > framebuffers. > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > > Changes in v6: > - Move the sysfb_disable() before the remove conflicting framebuffers > loop (Daniel Vetter). > > Changes in v5: > - Move the sysfb_disable() call at conflicting framebuffers again to > avoid the need of a DRIVER_FIRMWARE capability flag. > - Add Daniel Vetter's Reviewed-by tag again since reverted to the old > patch that he already reviewed in v2. > > Changes in v3: > - Call sysfb_disable() when a DRM dev and a fbdev are registered rather > than when conflicting framebuffers are removed (Thomas Zimmermann). > - Call sysfb_disable() when a fbdev framebuffer is registered rather > than when conflicting framebuffers are removed (Thomas Zimmermann). > - Drop Daniel Vetter's Reviewed-by tag since patch changed a lot. > > Changes in v2: > - Explain in the commit message that fbmem has to unregister the device > as fallback if a driver registered the device itself (Daniel Vetter). > - Also explain that fallback in a comment in the code (Daniel Vetter). > - Don't encode in fbmem the assumption that sysfb will always register > platform devices (Daniel Vetter). > - Add a FIXME comment about drivers registering devices (Daniel Vetter). > > drivers/video/fbdev/core/fbmem.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c > index 2fda5917c212..e0720fef0ee6 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -19,6 +19,7 @@ > #include <linux/kernel.h> > #include <linux/major.h> > #include <linux/slab.h> > +#include <linux/sysfb.h> > #include <linux/mm.h> > #include <linux/mman.h> > #include <linux/vt.h> > @@ -1764,6 +1765,17 @@ int remove_conflicting_framebuffers(struct apertures_struct *a, > do_free = true; > } > > + /* > + * If a driver asked to unregister a platform device registered by > + * sysfb, then can be assumed that this is a driver for a display > + * that is set up by the system firmware and has a generic driver. > + * > + * Drivers for devices that don't have a generic driver will never > + * ask for this, so let's assume that a real driver for the display > + * was already probed and prevent sysfb to register devices later. > + */ > + sysfb_disable(); > + > mutex_lock(®istration_lock); > do_remove_conflicting_framebuffers(a, name, primary); > mutex_unlock(®istration_lock); Hi, Javier. This change broke arm64 with vmwgfx. We get a kernel oops at boot (let me know if you'd like .config or just have us test something directly for you): Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000001787ee000 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Modules linked in: vmwgfx(+) e1000e(+) nvme ahci(+) xhci_pci drm_ttm_helper ttm sha256_arm64 sha1_ce nvme_core xhci_pci_renesas aes_neon_bs aes_neon_blk aes> CPU: 3 PID: 215 Comm: systemd-udevd Tainted: G U 5.18.0-rc5-vmwgfx #12 Hardware name: VMware, Inc. VBSA/VBSA, BIOS VEFI 12/31/2020 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kernfs_find_and_get_ns+0x2c/0x80 lr : sysfs_unmerge_group+0x30/0x80 sp : ffff80000b78b3f0 x29: ffff80000b78b3f0 x28: ffff0000f7970910 x27: 0000000000000002 x26: ffff0000f7970900 x25: ffff80000abca358 x24: ffff80000936cfa0 x23: ffff80000ac8f458 x22: 0000000000000000 x21: ffff800009431938 x20: ffff800009431800 x19: 0000000000000000 x18: 0000000000000000 x17: 42555300302e7265 x16: 66667562656d6172 x15: 4d006d726f667461 x14: 6c703d4d45545359 x13: 595342555300302e x12: 726566667562656d x11: 00323137323d4d55 x10: 4e51455300726566 x9 : ffff8000085f7400 x8 : ffff0000f7970980 x7 : 0000000000000000 x6 : 0000000077ffffff x5 : 0000000070000000 x4 : ffff80000b78b548 x3 : ffff8000088b0420 x2 : 0000000000000000 x1 : ffff800009431938 x0 : 0000000000000000 Call trace: kernfs_find_and_get_ns+0x2c/0x80 sysfs_unmerge_group+0x30/0x80 dpm_sysfs_remove+0x3c/0x17c device_del+0xb0/0x3a0 platform_device_del.part.0+0x24/0xb0 platform_device_unregister+0x30/0x50 sysfb_disable+0x4c/0x80 remove_conflicting_framebuffers+0x40/0x100 remove_conflicting_pci_framebuffers+0x128/0x240 drm_aperture_remove_conflicting_pci_framebuffers+0xb8/0x170 vmw_probe+0x50/0xd30 [vmwgfx] local_pci_probe+0x4c/0xc0 pci_device_probe+0x1e8/0x230 really_probe+0x18c/0x3f0 __driver_probe_device+0x124/0x1c0 driver_probe_device+0x44/0x140 __driver_attach+0xe0/0x234 bus_for_each_dev+0x7c/0xe0 driver_attach+0x30/0x40 bus_add_driver+0x158/0x250 driver_register+0x84/0x140 __pci_register_driver+0x50/0x5c vmw_pci_driver_init+0x44/0x1000 [vmwgfx] do_one_initcall+0x50/0x250 do_init_module+0x50/0x260 load_module+0x23e4/0x27c0 __do_sys_finit_module+0xac/0x12c __arm64_sys_finit_module+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x34/0x9c el0_svc+0x54/0x1e0 el0t_64_sync_handler+0xa4/0x130 el0t_64_sync+0x1a0/0x1a4 Code: aa0003f3 a9025bf5 aa0103f5 aa0203f6 (f9400400) ---[ end trace 0000000000000000 ]---
Hello Zack, On 6/16/22 21:29, Zack Rusin wrote: > On Tue, 2022-06-07 at 20:23 +0200, Javier Martinez Canillas wrote: >> The platform devices registered by sysfb match with firmware-based DRM or >> fbdev drivers, that are used to have early graphics using a framebuffer >> provided by the system firmware. >> [snip] > > Hi, Javier. > > This change broke arm64 with vmwgfx. We get a kernel oops at boot (let me know if > you'd like .config or just have us test something directly for you): > Yes please share your .config and I'll try to reproduce on an arm64 machine. > > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000008 > Mem abort info: > ESR = 0x96000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x04: level 0 translation fault > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp=00000001787ee000 > [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 > Internal error: Oops: 96000004 [#1] SMP > Modules linked in: vmwgfx(+) e1000e(+) nvme ahci(+) xhci_pci drm_ttm_helper ttm > sha256_arm64 sha1_ce nvme_core xhci_pci_renesas aes_neon_bs aes_neon_blk aes> > CPU: 3 PID: 215 Comm: systemd-udevd Tainted: G U 5.18.0-rc5-vmwgfx > #12 I'm confused, your kernel version seems to be 5.18.0-rc5 but this patch is only in drm-misc-next now and will land in 5.20... Did you backport it? Can you please try to reproduce with latest drm-tip ?
On Thu, 2022-06-16 at 21:55 +0200, Javier Martinez Canillas wrote: > Hello Zack, > > On 6/16/22 21:29, Zack Rusin wrote: > > On Tue, 2022-06-07 at 20:23 +0200, Javier Martinez Canillas wrote: > > > The platform devices registered by sysfb match with firmware-based DRM or > > > fbdev drivers, that are used to have early graphics using a framebuffer > > > provided by the system firmware. > > > > > [snip] > > > > > Hi, Javier. > > > > This change broke arm64 with vmwgfx. We get a kernel oops at boot (let me know if > > you'd like .config or just have us test something directly for you): > > > > Yes please share your .config and I'll try to reproduce on an arm64 machine. Attached. It might be a little hard to reproduce unless you have an arm64 machine with a dedicated gpu. You'll need a system that actually transitions from a generic fb driver (e.g. efifb) to the dedicated one. > > > > Unable to handle kernel NULL pointer dereference at virtual address > > 0000000000000008 > > Mem abort info: > > ESR = 0x96000004 > > EC = 0x25: DABT (current EL), IL = 32 bits > > SET = 0, FnV = 0 > > EA = 0, S1PTW = 0 > > FSC = 0x04: level 0 translation fault > > Data abort info: > > ISV = 0, ISS = 0x00000004 > > CM = 0, WnR = 0 > > user pgtable: 4k pages, 48-bit VAs, pgdp=00000001787ee000 > > [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 > > Internal error: Oops: 96000004 [#1] SMP > > Modules linked in: vmwgfx(+) e1000e(+) nvme ahci(+) xhci_pci drm_ttm_helper ttm > > sha256_arm64 sha1_ce nvme_core xhci_pci_renesas aes_neon_bs aes_neon_blk aes> > > CPU: 3 PID: 215 Comm: systemd-udevd Tainted: G U 5.18.0-rc5-vmwgfx > > #12 > > I'm confused, your kernel version seems to be 5.18.0-rc5 but this patch > is only in drm-misc-next now and will land in 5.20... > > Did you backport it? Can you please try to reproduce with latest drm-tip ? No, this is drm-misc-next as of yesterday. drm-misc-next was still on 5.18.0-rc5 yesterday. z
On 6/16/22 23:03, Zack Rusin wrote: > On Thu, 2022-06-16 at 21:55 +0200, Javier Martinez Canillas wrote: >> Hello Zack, >> >> On 6/16/22 21:29, Zack Rusin wrote: >>> On Tue, 2022-06-07 at 20:23 +0200, Javier Martinez Canillas wrote: >>>> The platform devices registered by sysfb match with firmware-based DRM or >>>> fbdev drivers, that are used to have early graphics using a framebuffer >>>> provided by the system firmware. >>>> >> >> [snip] >> >>> >>> Hi, Javier. >>> >>> This change broke arm64 with vmwgfx. We get a kernel oops at boot (let me know if >>> you'd like .config or just have us test something directly for you): >>> >> >> Yes please share your .config and I'll try to reproduce on an arm64 machine. > > Attached. It might be a little hard to reproduce unless you have an arm64 machine > with a dedicated gpu. You'll need a system that actually transitions from a generic > fb driver (e.g. efifb) to the dedicated one. > Yes, all my testing for this was done with a rpi4 so I should be able to reproduce that case. I'm confused though because I tested efifb -> vc4, simplefb -> vc4 and simpledrm -> vc4. >>> >>> Unable to handle kernel NULL pointer dereference at virtual address >>> 0000000000000008 >>> Mem abort info: >>> ESR = 0x96000004 >>> EC = 0x25: DABT (current EL), IL = 32 bits >>> SET = 0, FnV = 0 >>> EA = 0, S1PTW = 0 >>> FSC = 0x04: level 0 translation fault >>> Data abort info: >>> ISV = 0, ISS = 0x00000004 >>> CM = 0, WnR = 0 >>> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001787ee000 >>> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 >>> Internal error: Oops: 96000004 [#1] SMP >>> Modules linked in: vmwgfx(+) e1000e(+) nvme ahci(+) xhci_pci drm_ttm_helper ttm >>> sha256_arm64 sha1_ce nvme_core xhci_pci_renesas aes_neon_bs aes_neon_blk aes> >>> CPU: 3 PID: 215 Comm: systemd-udevd Tainted: G U 5.18.0-rc5-vmwgfx >>> #12 >> >> I'm confused, your kernel version seems to be 5.18.0-rc5 but this patch >> is only in drm-misc-next now and will land in 5.20... >> >> Did you backport it? Can you please try to reproduce with latest drm-tip ? > > No, this is drm-misc-next as of yesterday. drm-misc-next was still on 5.18.0-rc5 > yesterday. > Right! I looked at the base for drm-tip but forgot that drm-misc was still on 5.18. I'll look at this tomorrow but in the meantime, could you please look if the following commits on top of drm-misc-next help ? d258d00fb9c7 fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove 1b5853dfab7f fbdev: efifb: Fix a use-after-free due early fb_info cleanup
On 6/17/22 00:18, Javier Martinez Canillas wrote: > On 6/16/22 23:03, Zack Rusin wrote: [snip] > > I'll look at this tomorrow but in the meantime, could you please look if the following > commits on top of drm-misc-next help ? > > d258d00fb9c7 fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove > 1b5853dfab7f fbdev: efifb: Fix a use-after-free due early fb_info cleanup > Scratch that. I see in your config now that you are not using efifb but instead simpledrm: CONFIG_DRM_SIMPLEDRM=y, CONFIG_SYSFB_SIMPLEFB=y and CONFIG_DRM_VMWGFX. Since you mentioned efifb I misunderstood that you are using it. Anyways, as said I'll investigate this tomorrow.
On Fri, 2022-06-17 at 01:21 +0200, Javier Martinez Canillas wrote: > On 6/17/22 00:18, Javier Martinez Canillas wrote: > > On 6/16/22 23:03, Zack Rusin wrote: > > [snip] > > > > > I'll look at this tomorrow but in the meantime, could you please look if the following > > commits on top of drm-misc-next help ? > > > > d258d00fb9c7 fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove > > 1b5853dfab7f fbdev: efifb: Fix a use-after-free due early fb_info cleanup > > > > Scratch that. I see in your config now that you are not using efifb but instead > simpledrm: CONFIG_DRM_SIMPLEDRM=y, CONFIG_SYSFB_SIMPLEFB=y and CONFIG_DRM_VMWGFX. > > Since you mentioned efifb I misunderstood that you are using it. Anyways, as > said I'll investigate this tomorrow. Sounds good. Let me know if you'd like me to try it without SIMPLEFB. z
Hello Zack, On 6/17/22 03:35, Zack Rusin wrote: > On Fri, 2022-06-17 at 01:21 +0200, Javier Martinez Canillas wrote: >> On 6/17/22 00:18, Javier Martinez Canillas wrote: >>> On 6/16/22 23:03, Zack Rusin wrote: >> >> [snip] >> >>> >>> I'll look at this tomorrow but in the meantime, could you please look if the following >>> commits on top of drm-misc-next help ? >>> >>> d258d00fb9c7 fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove >>> 1b5853dfab7f fbdev: efifb: Fix a use-after-free due early fb_info cleanup >>> >> >> Scratch that. I see in your config now that you are not using efifb but instead >> simpledrm: CONFIG_DRM_SIMPLEDRM=y, CONFIG_SYSFB_SIMPLEFB=y and CONFIG_DRM_VMWGFX. >> >> Since you mentioned efifb I misunderstood that you are using it. Anyways, as >> said I'll investigate this tomorrow. > > Sounds good. Let me know if you'd like me to try it without SIMPLEFB. > Yes, please do. Either with CONFIG_SYSFB_SIMPLEFB disabled and CONFIG_FB_EFI enabled (so that "efi-framebuffer" is registered and efifb probed) or with CONFIG_SYSFB_SIMPLEFB but CONFIG_FB_SIMPLE enabled (so "simple-framebuffer is used too but with simplefb instead of simpledrm). I'm not able to reproduce, it would be useful to have another data point.
On Fri, 2022-06-17 at 08:46 +0200, Javier Martinez Canillas wrote: > Hello Zack, > > On 6/17/22 03:35, Zack Rusin wrote: > > On Fri, 2022-06-17 at 01:21 +0200, Javier Martinez Canillas wrote: > > > On 6/17/22 00:18, Javier Martinez Canillas wrote: > > > > On 6/16/22 23:03, Zack Rusin wrote: > > > > > > [snip] > > > > > > > > > > > I'll look at this tomorrow but in the meantime, could you please look if the following > > > > commits on top of drm-misc-next help ? > > > > > > > > d258d00fb9c7 fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove > > > > 1b5853dfab7f fbdev: efifb: Fix a use-after-free due early fb_info cleanup > > > > > > > > > > Scratch that. I see in your config now that you are not using efifb but instead > > > simpledrm: CONFIG_DRM_SIMPLEDRM=y, CONFIG_SYSFB_SIMPLEFB=y and CONFIG_DRM_VMWGFX. > > > > > > Since you mentioned efifb I misunderstood that you are using it. Anyways, as > > > said I'll investigate this tomorrow. > > > > Sounds good. Let me know if you'd like me to try it without SIMPLEFB. > > > > Yes, please do. Either with CONFIG_SYSFB_SIMPLEFB disabled and CONFIG_FB_EFI > enabled (so that "efi-framebuffer" is registered and efifb probed) or with > CONFIG_SYSFB_SIMPLEFB but CONFIG_FB_SIMPLE enabled (so "simple-framebuffer > is used too but with simplefb instead of simpledrm). > > I'm not able to reproduce, it would be useful to have another data point. Also happening for me with CONFIG_SYSFB_SIMPLEFB, on a Intel Core i7- 1065G7 (with iGPU). Reverting this commit on top of 5.19-rc5 "fixes" the issue.
Hello Xi, On 7/4/22 12:29, Xi Ruoyao wrote: > On Mon, 2022-07-04 at 17:36 +0800, Xi Ruoyao wrote: > >>> Yes, please do. Either with CONFIG_SYSFB_SIMPLEFB disabled and CONFIG_FB_EFI >>> enabled (so that "efi-framebuffer" is registered and efifb probed) or with >>> CONFIG_SYSFB_SIMPLEFB but CONFIG_FB_SIMPLE enabled (so "simple-framebuffer >>> is used too but with simplefb instead of simpledrm). >>> >>> I'm not able to reproduce, it would be useful to have another data point. >> >> Also happening for me with CONFIG_SYSFB_SIMPLEFB, on a Intel Core i7- >> 1065G7 (with iGPU). >> >> Reverting this commit on top of 5.19-rc5 "fixes" the issue. > > With CONFIG_SYSFB_SIMPLEFB and CONFIG_FB_SIMPLE enabled, there is no > issue. > > I guess it's something going wrong on a "drm -> drm" pass over. For now > I'll continue to use simpledrm with this commit reverted. > Yes, we need to also cherry-pick b84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers") now that the sysfb_disable() patches are in v5.19-rc5.
On Mon, 2022-07-04 at 13:04 +0200, Javier Martinez Canillas wrote: > Hello Xi, > > > > With CONFIG_SYSFB_SIMPLEFB and CONFIG_FB_SIMPLE enabled, there is no > > issue. > > > > I guess it's something going wrong on a "drm -> drm" pass over. For now > > I'll continue to use simpledrm with this commit reverted. > > > > Yes, we need to also cherry-pick b84efa28a48 ("drm/aperture: Run fbdev > removal before internal helpers") now that the sysfb_disable() patches > are in v5.19-rc5. I confirm that cherry-picking b84efa28a48 fixes the issue for v5.19-rc5.
On 7/4/22 14:11, Xi Ruoyao wrote: > On Mon, 2022-07-04 at 13:04 +0200, Javier Martinez Canillas wrote: >> Hello Xi, >>> >>> With CONFIG_SYSFB_SIMPLEFB and CONFIG_FB_SIMPLE enabled, there is no >>> issue. >>> >>> I guess it's something going wrong on a "drm -> drm" pass over. For now >>> I'll continue to use simpledrm with this commit reverted. >>> >> >> Yes, we need to also cherry-pick b84efa28a48 ("drm/aperture: Run fbdev >> removal before internal helpers") now that the sysfb_disable() patches >> are in v5.19-rc5. > > I confirm that cherry-picking b84efa28a48 fixes the issue for v5.19-rc5. > Thanks for testing it! Thomas is getting that patch through the drm-fixes path, so it should make it to the v5.19-rc cycle at some point.
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 2fda5917c212..e0720fef0ee6 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -19,6 +19,7 @@ #include <linux/kernel.h> #include <linux/major.h> #include <linux/slab.h> +#include <linux/sysfb.h> #include <linux/mm.h> #include <linux/mman.h> #include <linux/vt.h> @@ -1764,6 +1765,17 @@ int remove_conflicting_framebuffers(struct apertures_struct *a, do_free = true; } + /* + * If a driver asked to unregister a platform device registered by + * sysfb, then can be assumed that this is a driver for a display + * that is set up by the system firmware and has a generic driver. + * + * Drivers for devices that don't have a generic driver will never + * ask for this, so let's assume that a real driver for the display + * was already probed and prevent sysfb to register devices later. + */ + sysfb_disable(); + mutex_lock(®istration_lock); do_remove_conflicting_framebuffers(a, name, primary); mutex_unlock(®istration_lock);