Message ID | 7eb19a73-a558-d2e6-bd8d-34fe95045dfd@gmail.com |
---|---|
State | New |
Headers | show |
Am 28.10.2016 um 19:37 schrieb Mario Kleiner: > > > On 10/28/2016 03:34 AM, Michel Dänzer wrote: >> On 27/10/16 10:33 PM, Mike Lothian wrote: >>> >>> Just another gentle ping to see where you are with this? >> >> I haven't got a chance to look into this any further. >> >> > > Fwiw., as a proof of concept, the attached experimental patch does > work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and > DRI3/Present when applied to drm-next (updated from a few days ago). > With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone > under all loads. The tearing with "windowed" windows now looks as > expected for regular tearing not related to Prime. Yeah, that's pretty much what I had in mind as well. You additionally need to wait for the shared fences when you export the BO for the first time, but that's only a nitpick. > > ftrace confirms the i915 driver's pageflip function is waiting on the > fence in reservation_object_wait_timeout_rcu() as it should. > > That entry->tv.shared needs to be set false for such buffers in > amdgpu_bo_list_set() makes sense to me, as that is part of the buffer > validation for command stream submission. There are other places in > the driver where tv.shared is set, which i didn't check so far. > > I don't know which of these would need to be updated with a "exported > bo" check as well, e.g., for video decoding or maybe gpu compute? > Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made > no difference. I assume that makes sense because that functions seems > to deal with amdgpu internal vm page tables or page table entries for > such a bo, not with something visible to external clients? Yes, exactly. VM updates doesn't matter for anybody else than amdgpu. Additional to that we don't even add a fence to the shared slot we reserve (could probably drop that for optimization). Please remove the debugging stuff and the extra code on the VM updates and add a reservation_object_wait_timeout_rcu() to the export path and we should be good to go. Regards, Christian. > > All i can say is it fixes 3D rendering under DRI3 + Prime + > pageflipping without causing any obvious new problems. > > -mario
On 29/10/16 02:37 AM, Mario Kleiner wrote: > On 10/28/2016 03:34 AM, Michel Dänzer wrote: >> On 27/10/16 10:33 PM, Mike Lothian wrote: >>> >>> Just another gentle ping to see where you are with this? >> >> I haven't got a chance to look into this any further. > > Fwiw., as a proof of concept, the attached experimental patch does work > as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and > DRI3/Present when applied to drm-next (updated from a few days ago). > With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone > under all loads. The tearing with "windowed" windows now looks as > expected for regular tearing not related to Prime. > > ftrace confirms the i915 driver's pageflip function is waiting on the > fence in reservation_object_wait_timeout_rcu() as it should. > > That entry->tv.shared needs to be set false for such buffers in > amdgpu_bo_list_set() makes sense to me, as that is part of the buffer > validation for command stream submission. There are other places in the > driver where tv.shared is set, which i didn't check so far. > > I don't know which of these would need to be updated with a "exported > bo" check as well, e.g., for video decoding or maybe gpu compute? Adding > or removing the check to amdgpu_gem_va_update_vm(), e.g., made no > difference. I assume that makes sense because that functions seems to > deal with amdgpu internal vm page tables or page table entries for such > a bo, not with something visible to external clients? > > All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping > without causing any obvious new problems. [...] > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > index 7700dc2..bfbfeb9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev, > if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) > return ERR_PTR(-EPERM); > > + bo->prime_exported = true; > + DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo); > + > return drm_gem_prime_export(dev, gobj, flags); > } > This will take effect in non-PRIME cases as well, at least DRI3 and GL<->[other API] interop off the top of my head. Is that okay?
On 10/28/2016 07:48 PM, Christian König wrote: > Am 28.10.2016 um 19:37 schrieb Mario Kleiner: >> >> >> On 10/28/2016 03:34 AM, Michel Dänzer wrote: >>> On 27/10/16 10:33 PM, Mike Lothian wrote: >>>> >>>> Just another gentle ping to see where you are with this? >>> >>> I haven't got a chance to look into this any further. >>> >>> >> >> Fwiw., as a proof of concept, the attached experimental patch does >> work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and >> DRI3/Present when applied to drm-next (updated from a few days ago). >> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone >> under all loads. The tearing with "windowed" windows now looks as >> expected for regular tearing not related to Prime. > > Yeah, that's pretty much what I had in mind as well. You additionally > need to wait for the shared fences when you export the BO for the first > time, but that's only a nitpick. > >> >> ftrace confirms the i915 driver's pageflip function is waiting on the >> fence in reservation_object_wait_timeout_rcu() as it should. >> >> That entry->tv.shared needs to be set false for such buffers in >> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer >> validation for command stream submission. There are other places in >> the driver where tv.shared is set, which i didn't check so far. >> >> I don't know which of these would need to be updated with a "exported >> bo" check as well, e.g., for video decoding or maybe gpu compute? >> Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made >> no difference. I assume that makes sense because that functions seems >> to deal with amdgpu internal vm page tables or page table entries for >> such a bo, not with something visible to external clients? > > Yes, exactly. VM updates doesn't matter for anybody else than amdgpu. > Additional to that we don't even add a fence to the shared slot we > reserve (could probably drop that for optimization). > > Please remove the debugging stuff and the extra code on the VM updates > and add a reservation_object_wait_timeout_rcu() to the export path and > we should be good to go. > Done. Patch v2 just sent out after retesting with Tonga only and Intel + Tonga renderoffload. Would be cool if we could get this into 4.9-rc. Ideally also backported to 4.8, given it is a simple change, as that would be the next official kernel for *buntu 16.04-LTS and derivatives, but that's probably breaking the rules as it doesn't fix a regression? thanks, -mario > Regards, > Christian. > >> >> All i can say is it fixes 3D rendering under DRI3 + Prime + >> pageflipping without causing any obvious new problems. >> >> -mario > >
From 2a8d7fcd36da30305fa675df311c697162792597 Mon Sep 17 00:00:00 2001 From: Mario Kleiner <mario.kleiner.de@gmail.com> Date: Wed, 26 Oct 2016 10:58:00 +0200 Subject: [PATCH] drm/amdgpu: Attach exclusive fence to prime exported bo's. External clients which import our bo's wait only for exclusive dmabuf-fences, not on shared ones, so attach fences on such exported buffers as exclusive ones, not shared ones. -> Backup commit. Work in progress. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +++ 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 039b57e..a337d56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -459,6 +459,7 @@ struct amdgpu_bo { u64 metadata_flags; void *metadata; u32 metadata_size; + bool prime_exported; /* list of all virtual address to which this bo * is associated to */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c index 651115d..6e1d7b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c @@ -132,7 +132,10 @@ static int amdgpu_bo_list_set(struct amdgpu_device *adev, entry->priority = min(info[i].bo_priority, AMDGPU_BO_LIST_MAX_PRIORITY); entry->tv.bo = &entry->robj->tbo; - entry->tv.shared = true; + entry->tv.shared = !entry->robj->prime_exported; + + if (entry->robj->prime_exported) + DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj); if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS) gds_obj = entry->robj; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index a7ea9a3..730a68e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -494,6 +494,12 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev, tv.bo = &bo_va->bo->tbo; tv.shared = true; + + if (bo_va->bo->prime_exported) { + DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo); + /* tv.shared = false; */ + } + list_add(&tv.head, &list); amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c index 7700dc2..bfbfeb9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev, if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) return ERR_PTR(-EPERM); + bo->prime_exported = true; + DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo); + return drm_gem_prime_export(dev, gobj, flags); } -- 2.7.4