Message ID | 20200915184607.84435-1-alexander.deucher@amd.com |
---|---|
State | New |
Headers | show |
Series | Revert "drm/radeon: handle PCIe root ports with addressing limitations" | expand |
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote: > This change breaks tons of systems. Very vague :( This commit has also been merged for over a year, why the sudden problem now? > This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713. You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")"? That's the proper way to reference commits in changelogs please. It's even documented that way... > > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973 > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697 > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763 > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140 > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287 > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > Cc: stable@vger.kernel.org > Cc: Christoph Hellwig <hch@lst.de> > Cc: christian.koenig@amd.com Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") as well? thanks, greg k-h
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> This change breaks tons of systems.
Did you do at least some basic root causing on why? Do GPUs get
fed address they can't deal with? Any examples?
Bug 1 doesn't seem to contain any analysis and was reported against
a very old kernel that had all kind of fixes since.
Bug 2 seems to imply a drm kthread is accessing some structure it
shouldn't, which would imply a mismatch between pools used by radeon
now and those actually provided by the core. Something that should
be pretty to trivial to fix for someone understanding the whole ttm
pool maze.
Bug 3: same as 1, but an even older kernel.
Bug 4: looks like 1 and 3, and actually verified to work properly
in 5.9-rc. Did you try to get the other reporters test this as well?
All over not a very useful changelog.
On Wed, Sep 16, 2020 at 06:16:25PM -0400, Alex Deucher wrote: > On Wed, Sep 16, 2020 at 3:04 AM Christoph Hellwig <hch@lst.de> wrote: > > > > On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote: > > > This change breaks tons of systems. > > > > Did you do at least some basic root causing on why? Do GPUs get > > fed address they can't deal with? Any examples? > > > > Bug 1 doesn't seem to contain any analysis and was reported against > > a very old kernel that had all kind of fixes since. > > > > Bug 2 seems to imply a drm kthread is accessing some structure it > > shouldn't, which would imply a mismatch between pools used by radeon > > now and those actually provided by the core. Something that should > > be pretty to trivial to fix for someone understanding the whole ttm > > pool maze. > > > > Bug 3: same as 1, but an even older kernel. > > > > Bug 4: looks like 1 and 3, and actually verified to work properly > > in 5.9-rc. Did you try to get the other reporters test this as well? > > It would appear that the change in 5.9 to disable AGP on radeon fixed > the issue. I'm following up on the other tickets to see if I can get > confirmation. On another thread[1], the user was able to avoid the > issue by disabling HIMEM. Looks like some issue with HIMEM and/or > AGP. Thanks. I'll try to spend some time to figure out what could be highmem related. I'd much rather get this fixed properly.
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index b7c3fb2bfb54..eed23dffccf4 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2391,6 +2391,7 @@ struct radeon_device { struct radeon_wb wb; struct radeon_dummy_page dummy_page; bool shutdown; + bool need_dma32; bool need_swiotlb; bool accel_working; bool fastfb_working; /* IGP feature*/ diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 266e3cbbd09b..f74c74ad8b5d 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1363,25 +1363,28 @@ int radeon_device_init(struct radeon_device *rdev, else rdev->mc.mc_mask = 0xffffffffULL; /* 32 bit MC */ - /* set DMA mask. + /* set DMA mask + need_dma32 flags. * PCIE - can handle 40-bits. * IGP - can handle 40-bits * AGP - generally dma32 is safest * PCI - dma32 for legacy pci gart, 40 bits on newer asics */ - dma_bits = 40; + rdev->need_dma32 = false; if (rdev->flags & RADEON_IS_AGP) - dma_bits = 32; + rdev->need_dma32 = true; if ((rdev->flags & RADEON_IS_PCI) && (rdev->family <= CHIP_RS740)) - dma_bits = 32; + rdev->need_dma32 = true; #ifdef CONFIG_PPC64 if (rdev->family == CHIP_CEDAR) - dma_bits = 32; + rdev->need_dma32 = true; #endif + dma_bits = rdev->need_dma32 ? 32 : 40; r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits)); if (r) { + rdev->need_dma32 = true; + dma_bits = 32; pr_warn("radeon: No suitable DMA available\n"); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 357e8e98cca9..d2550862313e 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,7 @@ int radeon_ttm_init(struct radeon_device *rdev) &radeon_bo_driver, rdev->ddev->anon_inode->i_mapping, rdev->ddev->vma_offset_manager, - dma_addressing_limited(&rdev->pdev->dev)); + rdev->need_dma32); if (r) { DRM_ERROR("failed initializing buffer object driver(%d).\n", r); return r;
This change breaks tons of systems. This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Cc: christian.koenig@amd.com --- drivers/gpu/drm/radeon/radeon.h | 1 + drivers/gpu/drm/radeon/radeon_device.c | 13 ++++++++----- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 3 files changed, 10 insertions(+), 6 deletions(-)