Message ID | 89fa50d0-945f-d1ed-ae55-ee947187f209@collabora.com |
---|---|
State | New |
Headers | show |
Series | next/master boot bisection: Oops in nouveau driver on jetson-tk1 | expand |
On 08/12/2018 00:08, Lyude Paul wrote: > uhhhhhhhhhhhhh > didn't we fix this weeks ago? with "drm/nouveau: tegra: Call > nouveau_drm_device_init()" Yes here's the fix from Thierry: https://patchwork.freedesktop.org/patch/263587/ and I can confirm that it does fix the Oops when applied on top of next-20181206 (what I used for the bisection last week): http://lava.baylibre.com:10080/scheduler/job/71109 However the fix doesn't appear to have been applied in any upstream tree yet. Guillaume > On Fri, 2018-12-07 at 23:31 +0000, Guillaume Tucker wrote: >> Please find below an automated bisection report for a kernel Oops >> seen during the initialisation of the nouveau GPU driver on >> jetson-tk1. >> >> >> All the LAVA test jobs for this bisection can be found here: >> >> >> http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=lava-bisect-staging-7366#table >> >> >> Here's the beginning of the Oops stack trace: >> >> [ 7.485361] [00000064] *pgd=f9e7b835 >> [ 7.485372] Internal error: Oops: 17 [#1] SMP ARM >> [ 7.485376] Modules linked in: snd_soc_tegra_rt5640(+) >> snd_soc_tegra_utils snd_soc_rt5640(+) snd_soc_rl6231 snd_soc_tegra30_ahub >> snd_hda_tegra snd_soc_core snd_hda_codec snd_hda_core ac97_bus >> snd_pcm_dmaengine snd_pcm xhci_tegra(+) snd_timer snd soundcore nouveau(+) >> ttm tegra_devfreq tegra_wdt >> [ 7.542227] CPU: 1 PID: 128 Comm: udevd Not tainted 4.20.0-rc5-next- >> 20181206 #44 >> [ 7.549603] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) >> [ 7.555859] PC is at drm_plane_register_all+0x18/0x50 >> [ 7.560899] LR is at drm_modeset_register_all+0xc/0x6c >> >> >> Full log: >> >> http://lava.baylibre.com:10080/scheduler/job/68628#L816 >> >> >> The bisection was run from next-20181206 as this is where the >> issue was discovered on kernelci.org but the patch it found has >> already been merged in mainline. >> >> Hope this helps! >> >> Guillaume >> >> >> -----------8<------------------------8<----------- >> >> >> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * >> * This automated bisection report was sent to you on the basis * >> * that you may be involved with the breaking commit it has * >> * found. No manual investigation has been done to verify it, * >> * and the root cause of the problem may be somewhere else. * >> * Hope this helps! * >> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * >> >> Bisection result for next/master (next-20181206) on jetson-tk1 >> >> Good: 84df9525b0c2 Linux 4.19 >> Bad: 4c92b7b3080d Add linux-next specific files for 20181206 >> Found: cfea88a4d866 drm/nouveau: Start using new drm_dev >> initialization helpers >> >> Checks: >> revert: PASS >> verify: PASS >> >> Parameters: >> Tree: next >> URL: None >> Branch: master >> Target: jetson-tk1 >> Lab: lab-baylibre >> Config: multi_v7_defconfig >> Plan: dmesg-nouveau >> >> Breaking commit found: >> >> ---------------------------------------------------------------------------- >> --- >> commit cfea88a4d86632f28cf80be97079f131645b7869 >> Author: Lyude Paul <lyude@redhat.com> >> Date: Wed Aug 22 21:40:07 2018 -0400 >> >> drm/nouveau: Start using new drm_dev initialization helpers >> >> Per the documentation in drm_get_pci_dev(), this function is deprecated >> and shouldn't be used anymore. As it turns out, we're going to need to >> stop using drm_get_pci_dev() anyway in order to allow us to turn off the >> card before full system shutdowns, otherwise we'll hit race conditions >> with userspace while trying to tear down the card on shutdown. >> >> So, start using drm_dev_get() and drm_dev_put(), and just turn our >> load/unload callbacks into open coded init/fini() functions. >> >> Signed-off-by: Lyude Paul <lyude@redhat.com> >> Cc: Karol Herbst <kherbst@redhat.com> >> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c >> b/drivers/gpu/drm/nouveau/nouveau_drm.c >> index 905956809d21..2b2baf6e0e0d 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c >> @@ -458,75 +458,8 @@ nouveau_accel_init(struct nouveau_drm *drm) >> nouveau_bo_move_init(drm); >> } >> >> -static int nouveau_drm_probe(struct pci_dev *pdev, >> - const struct pci_device_id *pent) >> -{ >> - struct nvkm_device *device; >> - struct apertures_struct *aper; >> - bool boot = false; >> - int ret; >> - >> - if (vga_switcheroo_client_probe_defer(pdev)) >> - return -EPROBE_DEFER; >> - >> - /* We need to check that the chipset is supported before booting >> - * fbdev off the hardware, as there's no way to put it back. >> - */ >> - ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0, >> &device); >> - if (ret) >> - return ret; >> - >> - nvkm_device_del(&device); >> - >> - /* Remove conflicting drivers (vesafb, efifb etc). */ >> - aper = alloc_apertures(3); >> - if (!aper) >> - return -ENOMEM; >> - >> - aper->ranges[0].base = pci_resource_start(pdev, 1); >> - aper->ranges[0].size = pci_resource_len(pdev, 1); >> - aper->count = 1; >> - >> - if (pci_resource_len(pdev, 2)) { >> - aper->ranges[aper->count].base = pci_resource_start(pdev, 2); >> - aper->ranges[aper->count].size = pci_resource_len(pdev, 2); >> - aper->count++; >> - } >> - >> - if (pci_resource_len(pdev, 3)) { >> - aper->ranges[aper->count].base = pci_resource_start(pdev, 3); >> - aper->ranges[aper->count].size = pci_resource_len(pdev, 3); >> - aper->count++; >> - } >> - >> -#ifdef CONFIG_X86 >> - boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW; >> -#endif >> - if (nouveau_modeset != 2) >> - drm_fb_helper_remove_conflicting_framebuffers(aper, >> "nouveaufb", boot); >> - kfree(aper); >> - >> - ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug, >> - true, true, ~0ULL, &device); >> - if (ret) >> - return ret; >> - >> - pci_set_master(pdev); >> - >> - if (nouveau_atomic) >> - driver_pci.driver_features |= DRIVER_ATOMIC; >> - >> - ret = drm_get_pci_dev(pdev, pent, &driver_pci); >> - if (ret) { >> - nvkm_device_del(&device); >> - return ret; >> - } >> - >> - return 0; >> -} >> - >> static int >> -nouveau_drm_load(struct drm_device *dev, unsigned long flags) >> +nouveau_drm_device_init(struct drm_device *dev) >> { >> struct nouveau_drm *drm; >> int ret; >> @@ -613,7 +546,7 @@ nouveau_drm_load(struct drm_device *dev, unsigned long >> flags) >> } >> >> static void >> -nouveau_drm_unload(struct drm_device *dev) >> +nouveau_drm_device_fini(struct drm_device *dev) >> { >> struct nouveau_drm *drm = nouveau_drm(dev); >> >> @@ -642,18 +575,116 @@ nouveau_drm_unload(struct drm_device *dev) >> kfree(drm); >> } >> >> +static int nouveau_drm_probe(struct pci_dev *pdev, >> + const struct pci_device_id *pent) >> +{ >> + struct nvkm_device *device; >> + struct drm_device *drm_dev; >> + struct apertures_struct *aper; >> + bool boot = false; >> + int ret; >> + >> + if (vga_switcheroo_client_probe_defer(pdev)) >> + return -EPROBE_DEFER; >> + >> + /* We need to check that the chipset is supported before booting >> + * fbdev off the hardware, as there's no way to put it back. >> + */ >> + ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0, >> &device); >> + if (ret) >> + return ret; >> + >> + nvkm_device_del(&device); >> + >> + /* Remove conflicting drivers (vesafb, efifb etc). */ >> + aper = alloc_apertures(3); >> + if (!aper) >> + return -ENOMEM; >> + >> + aper->ranges[0].base = pci_resource_start(pdev, 1); >> + aper->ranges[0].size = pci_resource_len(pdev, 1); >> + aper->count = 1; >> + >> + if (pci_resource_len(pdev, 2)) { >> + aper->ranges[aper->count].base = pci_resource_start(pdev, 2); >> + aper->ranges[aper->count].size = pci_resource_len(pdev, 2); >> + aper->count++; >> + } >> + >> + if (pci_resource_len(pdev, 3)) { >> + aper->ranges[aper->count].base = pci_resource_start(pdev, 3); >> + aper->ranges[aper->count].size = pci_resource_len(pdev, 3); >> + aper->count++; >> + } >> + >> +#ifdef CONFIG_X86 >> + boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW; >> +#endif >> + if (nouveau_modeset != 2) >> + drm_fb_helper_remove_conflicting_framebuffers(aper, >> "nouveaufb", boot); >> + kfree(aper); >> + >> + ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug, >> + true, true, ~0ULL, &device); >> + if (ret) >> + return ret; >> + >> + pci_set_master(pdev); >> + >> + if (nouveau_atomic) >> + driver_pci.driver_features |= DRIVER_ATOMIC; >> + >> + drm_dev = drm_dev_alloc(&driver_pci, &pdev->dev); >> + if (IS_ERR(drm_dev)) { >> + ret = PTR_ERR(drm_dev); >> + goto fail_nvkm; >> + } >> + >> + ret = pci_enable_device(pdev); >> + if (ret) >> + goto fail_drm; >> + >> + drm_dev->pdev = pdev; >> + pci_set_drvdata(pdev, drm_dev); >> + >> + ret = nouveau_drm_device_init(drm_dev); >> + if (ret) >> + goto fail_pci; >> + >> + ret = drm_dev_register(drm_dev, pent->driver_data); >> + if (ret) >> + goto fail_drm_dev_init; >> + >> + return 0; >> + >> +fail_drm_dev_init: >> + nouveau_drm_device_fini(drm_dev); >> +fail_pci: >> + pci_disable_device(pdev); >> +fail_drm: >> + drm_dev_put(drm_dev); >> +fail_nvkm: >> + nvkm_device_del(&device); >> + return ret; >> +} >> + >> void >> nouveau_drm_device_remove(struct drm_device *dev) >> { >> + struct pci_dev *pdev = dev->pdev; >> struct nouveau_drm *drm = nouveau_drm(dev); >> struct nvkm_client *client; >> struct nvkm_device *device; >> >> + drm_dev_unregister(dev); >> + >> dev->irq_enabled = false; >> client = nvxx_client(&drm->client.base); >> device = nvkm_device_find(client->device); >> - drm_put_dev(dev); >> >> + nouveau_drm_device_fini(dev); >> + pci_disable_device(pdev); >> + drm_dev_put(dev); >> nvkm_device_del(&device); >> } >> >> @@ -1020,8 +1051,6 @@ driver_stub = { >> DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER | >> DRIVER_KMS_LEGACY_CONTEXT, >> >> - .load = nouveau_drm_load, >> - .unload = nouveau_drm_unload, >> .open = nouveau_drm_open, >> .postclose = nouveau_drm_postclose, >> .lastclose = nouveau_vga_lastclose, >> ---------------------------------------------------------------------------- >> --- >> >> >> Git bisection log: >> >> ---------------------------------------------------------------------------- >> --- >> git bisect start >> # good: [84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d] Linux 4.19 >> git bisect good 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d >> # bad: [4c92b7b3080d8281941ae81c51cd62bb49bdc3d4] Add linux-next specific >> files for 20181206 >> git bisect bad 4c92b7b3080d8281941ae81c51cd62bb49bdc3d4 >> # bad: [c38239b4be1ac7e4bcf5bbd971353bae51525b8f] Merge branch 'parisc-4.20- >> 2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux >> git bisect bad c38239b4be1ac7e4bcf5bbd971353bae51525b8f >> # good: [d49f8a52b15bf35db778035340d8a673149f9f93] Merge tag 'scsi-misc' of >> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi >> git bisect good d49f8a52b15bf35db778035340d8a673149f9f93 >> # good: [ac747c0715f29c2be3848b719a1b7e65b07f7b21] Merge tag 'kbuild-v4.20' >> of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild >> git bisect good ac747c0715f29c2be3848b719a1b7e65b07f7b21 >> # bad: [46972c03ab667dc298cad0c9db517fb9b1521b5f] Merge tag 'drm-misc-next- >> fixes-2018-10-10' of git://anongit.freedesktop.org/drm/drm-misc into drm- >> next >> git bisect bad 46972c03ab667dc298cad0c9db517fb9b1521b5f >> # good: [6ac99a328ee16d3f8cc253f1df62623cee3e9ea5] drm/exynos: mixer: Make >> plane alpha configurable >> git bisect good 6ac99a328ee16d3f8cc253f1df62623cee3e9ea5 >> # good: [0957dc7097a3f462f6cedb45cf9b9785cc29e5bb] drm/amdgpu: revert "stop >> using gart_start as offset for the GTT domain" >> git bisect good 0957dc7097a3f462f6cedb45cf9b9785cc29e5bb >> # good: [2de0b0a158bf423208c3898522c8fa1c1078df48] Merge tag 'drm/tegra/for- >> 4.20-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next >> git bisect good 2de0b0a158bf423208c3898522c8fa1c1078df48 >> # good: [04b96b63c5640a305e30611def7a9c5fcd7a72cf] drm/msm/dpu: Remove >> unneeded checks in dpu_crtc.c >> git bisect good 04b96b63c5640a305e30611def7a9c5fcd7a72cf >> # good: [6952e3a1dffcb931cf8625aa01642b9afac2af61] Merge branch 'for- >> upstream/mali-dp' of git://linux-arm.org/linux-ld into drm-next >> git bisect good 6952e3a1dffcb931cf8625aa01642b9afac2af61 >> # good: [62e681f7dcab746412dce22d4b75b32c5ea38cdb] Merge tag 'drm-msm-fixes- >> 2018-10-09' of git://people.freedesktop.org/~robclark/linux into drm-next >> git bisect good 62e681f7dcab746412dce22d4b75b32c5ea38cdb >> # bad: [7e6191d4360a2df6cf2a2613dcb79680cb943df8] Merge branch 'linux-4.20' >> of git://github.com/skeggsb/linux into drm-next >> git bisect bad 7e6191d4360a2df6cf2a2613dcb79680cb943df8 >> # good: [c4cee69a4497d9c6ad8868d63568b30e50cac9e9] drm/nouveau: Fix >> potential memory leak in nouveau_drm_load() >> git bisect good c4cee69a4497d9c6ad8868d63568b30e50cac9e9 >> # bad: [a971558c298755d2c07bc5508c65d689471763c8] drm/nouveau/disp: keep >> track of high-speed state, program into clock >> git bisect bad a971558c298755d2c07bc5508c65d689471763c8 >> # bad: [4126b99e744b7a29746e201e2be6644d2edf3c56] drm/nouveau/disp: add a >> way to configure scrambling/tmds for hdmi 2.0 >> git bisect bad 4126b99e744b7a29746e201e2be6644d2edf3c56 >> # bad: [cfea88a4d86632f28cf80be97079f131645b7869] drm/nouveau: Start using >> new drm_dev initialization helpers >> git bisect bad cfea88a4d86632f28cf80be97079f131645b7869 >> # first bad commit: [cfea88a4d86632f28cf80be97079f131645b7869] drm/nouveau: >> Start using new drm_dev initialization helpers >> ---------------------------------------------------------------------------- >> ---
On Mon, Dec 10, 2018 at 10:00:08AM +0000, Guillaume Tucker wrote: > On 08/12/2018 00:08, Lyude Paul wrote: > > uhhhhhhhhhhhhh > > didn't we fix this weeks ago? with "drm/nouveau: tegra: Call > > nouveau_drm_device_init()" > > Yes here's the fix from Thierry: > > https://patchwork.freedesktop.org/patch/263587/ > > > and I can confirm that it does fix the Oops when applied on top > of next-20181206 (what I used for the bisection last week): > > http://lava.baylibre.com:10080/scheduler/job/71109 > > > However the fix doesn't appear to have been applied in any > upstream tree yet. This has been broken for a considerable time now with no response from Ben - is there some other path we can use to get the fix merged?
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 905956809d21..2b2baf6e0e0d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -458,75 +458,8 @@ nouveau_accel_init(struct nouveau_drm *drm) nouveau_bo_move_init(drm); } -static int nouveau_drm_probe(struct pci_dev *pdev, - const struct pci_device_id *pent) -{ - struct nvkm_device *device; - struct apertures_struct *aper; - bool boot = false; - int ret; - - if (vga_switcheroo_client_probe_defer(pdev)) - return -EPROBE_DEFER; - - /* We need to check that the chipset is supported before booting - * fbdev off the hardware, as there's no way to put it back. - */ - ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0, &device); - if (ret) - return ret; - - nvkm_device_del(&device); - - /* Remove conflicting drivers (vesafb, efifb etc). */ - aper = alloc_apertures(3); - if (!aper) - return -ENOMEM; - - aper->ranges[0].base = pci_resource_start(pdev, 1); - aper->ranges[0].size = pci_resource_len(pdev, 1); - aper->count = 1; - - if (pci_resource_len(pdev, 2)) { - aper->ranges[aper->count].base = pci_resource_start(pdev, 2); - aper->ranges[aper->count].size = pci_resource_len(pdev, 2); - aper->count++; - } - - if (pci_resource_len(pdev, 3)) { - aper->ranges[aper->count].base = pci_resource_start(pdev, 3); - aper->ranges[aper->count].size = pci_resource_len(pdev, 3); - aper->count++; - } - -#ifdef CONFIG_X86 - boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW; -#endif - if (nouveau_modeset != 2) - drm_fb_helper_remove_conflicting_framebuffers(aper, "nouveaufb", boot); - kfree(aper); - - ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug, - true, true, ~0ULL, &device); - if (ret) - return ret; - - pci_set_master(pdev); - - if (nouveau_atomic) - driver_pci.driver_features |= DRIVER_ATOMIC; - - ret = drm_get_pci_dev(pdev, pent, &driver_pci); - if (ret) { - nvkm_device_del(&device); - return ret; - } - - return 0; -} - static int -nouveau_drm_load(struct drm_device *dev, unsigned long flags) +nouveau_drm_device_init(struct drm_device *dev) { struct nouveau_drm *drm; int ret; @@ -613,7 +546,7 @@ nouveau_drm_load(struct drm_device *dev, unsigned long flags) } static void -nouveau_drm_unload(struct drm_device *dev) +nouveau_drm_device_fini(struct drm_device *dev) { struct nouveau_drm *drm = nouveau_drm(dev); @@ -642,18 +575,116 @@ nouveau_drm_unload(struct drm_device *dev) kfree(drm); } +static int nouveau_drm_probe(struct pci_dev *pdev, + const struct pci_device_id *pent) +{ + struct nvkm_device *device; + struct drm_device *drm_dev; + struct apertures_struct *aper; + bool boot = false; + int ret; + + if (vga_switcheroo_client_probe_defer(pdev)) + return -EPROBE_DEFER; + + /* We need to check that the chipset is supported before booting + * fbdev off the hardware, as there's no way to put it back. + */ + ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0, &device); + if (ret) + return ret; + + nvkm_device_del(&device); + + /* Remove conflicting drivers (vesafb, efifb etc). */ + aper = alloc_apertures(3); + if (!aper) + return -ENOMEM; + + aper->ranges[0].base = pci_resource_start(pdev, 1); + aper->ranges[0].size = pci_resource_len(pdev, 1); + aper->count = 1; + + if (pci_resource_len(pdev, 2)) { + aper->ranges[aper->count].base = pci_resource_start(pdev, 2); + aper->ranges[aper->count].size = pci_resource_len(pdev, 2); + aper->count++; + } + + if (pci_resource_len(pdev, 3)) { + aper->ranges[aper->count].base = pci_resource_start(pdev, 3); + aper->ranges[aper->count].size = pci_resource_len(pdev, 3); + aper->count++; + } + +#ifdef CONFIG_X86 + boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW; +#endif + if (nouveau_modeset != 2) + drm_fb_helper_remove_conflicting_framebuffers(aper, "nouveaufb", boot); + kfree(aper); + + ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug, + true, true, ~0ULL, &device); + if (ret) + return ret; + + pci_set_master(pdev); + + if (nouveau_atomic) + driver_pci.driver_features |= DRIVER_ATOMIC; + + drm_dev = drm_dev_alloc(&driver_pci, &pdev->dev); + if (IS_ERR(drm_dev)) { + ret = PTR_ERR(drm_dev); + goto fail_nvkm; + } + + ret = pci_enable_device(pdev); + if (ret) + goto fail_drm; + + drm_dev->pdev = pdev; + pci_set_drvdata(pdev, drm_dev); + + ret = nouveau_drm_device_init(drm_dev); + if (ret) + goto fail_pci; + + ret = drm_dev_register(drm_dev, pent->driver_data); + if (ret) + goto fail_drm_dev_init; + + return 0; + +fail_drm_dev_init: + nouveau_drm_device_fini(drm_dev); +fail_pci: + pci_disable_device(pdev); +fail_drm: + drm_dev_put(drm_dev); +fail_nvkm: + nvkm_device_del(&device); + return ret; +} + void nouveau_drm_device_remove(struct drm_device *dev) { + struct pci_dev *pdev = dev->pdev; struct nouveau_drm *drm = nouveau_drm(dev); struct nvkm_client *client; struct nvkm_device *device; + drm_dev_unregister(dev); + dev->irq_enabled = false; client = nvxx_client(&drm->client.base); device = nvkm_device_find(client->device); - drm_put_dev(dev); + nouveau_drm_device_fini(dev); + pci_disable_device(pdev); + drm_dev_put(dev); nvkm_device_del(&device); } @@ -1020,8 +1051,6 @@ driver_stub = { DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER | DRIVER_KMS_LEGACY_CONTEXT, - .load = nouveau_drm_load, - .unload = nouveau_drm_unload, .open = nouveau_drm_open, .postclose = nouveau_drm_postclose, .lastclose = nouveau_vga_lastclose,