diff mbox series

Revert "drm/radeon: handle PCIe root ports with addressing limitations"

Message ID 20200915184607.84435-1-alexander.deucher@amd.com
State New
Headers show
Series Revert "drm/radeon: handle PCIe root ports with addressing limitations" | expand

Commit Message

Alex Deucher Sept. 15, 2020, 6:46 p.m. UTC
This change breaks tons of systems.

This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/radeon/radeon.h        |  1 +
 drivers/gpu/drm/radeon/radeon_device.c | 13 ++++++++-----
 drivers/gpu/drm/radeon/radeon_ttm.c    |  2 +-
 3 files changed, 10 insertions(+), 6 deletions(-)

Comments

Greg Kroah-Hartman Sept. 16, 2020, 6:33 a.m. UTC | #1
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> This change breaks tons of systems.


Very vague :(

This commit has also been merged for over a year, why the sudden
problem now?

> This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.


You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with
addressing limitations")"?

That's the proper way to reference commits in changelogs please.  It's
even documented that way...

> 

> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973

> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697

> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763

> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140

> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287

> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

> Cc: stable@vger.kernel.org

> Cc: Christoph Hellwig <hch@lst.de>

> Cc: christian.koenig@amd.com


Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")

as well?

thanks,

greg k-h
Christoph Hellwig Sept. 16, 2020, 7:04 a.m. UTC | #2
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> This change breaks tons of systems.


Did you do at least some basic root causing on why?  Do GPUs get
fed address they can't deal with?  Any examples?

Bug 1 doesn't seem to contain any analysis and was reported against
a very old kernel that had all kind of fixes since.

Bug 2 seems to imply a drm kthread is accessing some structure it
shouldn't, which would imply a mismatch between pools used by radeon
now and those actually provided by the core.  Something that should
be pretty to trivial to fix for someone understanding the whole ttm
pool maze.

Bug 3: same as 1, but an even older kernel.

Bug 4: looks like 1 and 3, and actually verified to work properly
in 5.9-rc.  Did you try to get the other reporters test this as well?

All over not a very useful changelog.
Christoph Hellwig Sept. 17, 2020, 5:27 p.m. UTC | #3
On Wed, Sep 16, 2020 at 06:16:25PM -0400, Alex Deucher wrote:
> On Wed, Sep 16, 2020 at 3:04 AM Christoph Hellwig <hch@lst.de> wrote:

> >

> > On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:

> > > This change breaks tons of systems.

> >

> > Did you do at least some basic root causing on why?  Do GPUs get

> > fed address they can't deal with?  Any examples?

> >

> > Bug 1 doesn't seem to contain any analysis and was reported against

> > a very old kernel that had all kind of fixes since.

> >

> > Bug 2 seems to imply a drm kthread is accessing some structure it

> > shouldn't, which would imply a mismatch between pools used by radeon

> > now and those actually provided by the core.  Something that should

> > be pretty to trivial to fix for someone understanding the whole ttm

> > pool maze.

> >

> > Bug 3: same as 1, but an even older kernel.

> >

> > Bug 4: looks like 1 and 3, and actually verified to work properly

> > in 5.9-rc.  Did you try to get the other reporters test this as well?

> 

> It would appear that the change in 5.9 to disable AGP on radeon fixed

> the issue.  I'm following up on the other tickets to see if I can get

> confirmation.  On another thread[1], the user was able to avoid the

> issue by disabling HIMEM.  Looks like some issue with HIMEM and/or

> AGP.


Thanks.  I'll try to spend some time to figure out what could be
highmem related.  I'd much rather get this fixed properly.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index b7c3fb2bfb54..eed23dffccf4 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2391,6 +2391,7 @@  struct radeon_device {
 	struct radeon_wb		wb;
 	struct radeon_dummy_page	dummy_page;
 	bool				shutdown;
+	bool				need_dma32;
 	bool				need_swiotlb;
 	bool				accel_working;
 	bool				fastfb_working; /* IGP feature*/
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 266e3cbbd09b..f74c74ad8b5d 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1363,25 +1363,28 @@  int radeon_device_init(struct radeon_device *rdev,
 	else
 		rdev->mc.mc_mask = 0xffffffffULL; /* 32 bit MC */
 
-	/* set DMA mask.
+	/* set DMA mask + need_dma32 flags.
 	 * PCIE - can handle 40-bits.
 	 * IGP - can handle 40-bits
 	 * AGP - generally dma32 is safest
 	 * PCI - dma32 for legacy pci gart, 40 bits on newer asics
 	 */
-	dma_bits = 40;
+	rdev->need_dma32 = false;
 	if (rdev->flags & RADEON_IS_AGP)
-		dma_bits = 32;
+		rdev->need_dma32 = true;
 	if ((rdev->flags & RADEON_IS_PCI) &&
 	    (rdev->family <= CHIP_RS740))
-		dma_bits = 32;
+		rdev->need_dma32 = true;
 #ifdef CONFIG_PPC64
 	if (rdev->family == CHIP_CEDAR)
-		dma_bits = 32;
+		rdev->need_dma32 = true;
 #endif
 
+	dma_bits = rdev->need_dma32 ? 32 : 40;
 	r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits));
 	if (r) {
+		rdev->need_dma32 = true;
+		dma_bits = 32;
 		pr_warn("radeon: No suitable DMA available\n");
 		return r;
 	}
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 357e8e98cca9..d2550862313e 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -787,7 +787,7 @@  int radeon_ttm_init(struct radeon_device *rdev)
 			       &radeon_bo_driver,
 			       rdev->ddev->anon_inode->i_mapping,
 			       rdev->ddev->vma_offset_manager,
-			       dma_addressing_limited(&rdev->pdev->dev));
+			       rdev->need_dma32);
 	if (r) {
 		DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
 		return r;