Message ID | 20210630164413.25481-1-ville.syrjala@linux.intel.com |
---|---|
State | Accepted |
Commit | 2feeb52859fc1ab94cd35b61ada3a6ac4ff24243 |
Headers | show |
Series | drm/i915/gt: Fix -EDEADLK handling regression | expand |
On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst <maarten.lankhorst@linux.intel.com> wrote: > Op 30-06-2021 om 18:44 schreef Ville Syrjala: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > The conversion to ww mutexes failed to address the fence code which > > already returns -EDEADLK when we run out of fences. Ww mutexes on > > the other hand treat -EDEADLK as an internal errno value indicating > > a need to restart the operation due to a deadlock. So now when the > > fence code returns -EDEADLK the higher level code erroneously > > restarts everything instead of returning the error to userspace > > as is expected. > > > > To remedy this let's switch the fence code to use a different errno > > value for this. -ENOBUFS seems like a semi-reasonable unique choice. > > Apart from igt the only user of this I could find is sna, and even > > there all we do is dump the current fence registers from debugfs > > into the X server log. So no user visible functionality is affected. > > If we really cared about preserving this we could of course convert > > back to -EDEADLK higher up, but doesn't seem like that's worth > > the hassle here. > > > > Not quite sure which commit specifically broke this, but I'll > > just attribute it to the general gem ww mutex work. > > > > Cc: stable@vger.kernel.org > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > Cc: Thomas Hellström <thomas.hellstrom@intel.com> > > Testcase: igt/gem_pread/exhaustion > > Testcase: igt/gem_pwrite/basic-exhaustion > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.") > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > index cac7f3f44642..f8948de72036 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) > > if (intel_has_pending_fb_unpin(ggtt->vm.i915)) > > return ERR_PTR(-EAGAIN); > > > > - return ERR_PTR(-EDEADLK); > > + return ERR_PTR(-ENOBUFS); > > } > > > > int __i915_vma_pin_fence(struct i915_vma *vma) > > Makes sense.. > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks. - please cc: dri-devel on anything gem/gt related. - this should probably be ENOSPC or something like that for at least a seeming retention of errno consistentcy: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Jul 13, 2021 at 9:58 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst > <maarten.lankhorst@linux.intel.com> wrote: > > Op 30-06-2021 om 18:44 schreef Ville Syrjala: > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > The conversion to ww mutexes failed to address the fence code which > > > already returns -EDEADLK when we run out of fences. Ww mutexes on > > > the other hand treat -EDEADLK as an internal errno value indicating > > > a need to restart the operation due to a deadlock. So now when the > > > fence code returns -EDEADLK the higher level code erroneously > > > restarts everything instead of returning the error to userspace > > > as is expected. > > > > > > To remedy this let's switch the fence code to use a different errno > > > value for this. -ENOBUFS seems like a semi-reasonable unique choice. > > > Apart from igt the only user of this I could find is sna, and even > > > there all we do is dump the current fence registers from debugfs > > > into the X server log. So no user visible functionality is affected. > > > If we really cared about preserving this we could of course convert > > > back to -EDEADLK higher up, but doesn't seem like that's worth > > > the hassle here. > > > > > > Not quite sure which commit specifically broke this, but I'll > > > just attribute it to the general gem ww mutex work. > > > > > > Cc: stable@vger.kernel.org > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > Cc: Thomas Hellström <thomas.hellstrom@intel.com> > > > Testcase: igt/gem_pread/exhaustion > > > Testcase: igt/gem_pwrite/basic-exhaustion > > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences > > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.") > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > --- > > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > index cac7f3f44642..f8948de72036 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) > > > if (intel_has_pending_fb_unpin(ggtt->vm.i915)) > > > return ERR_PTR(-EAGAIN); > > > > > > - return ERR_PTR(-EDEADLK); > > > + return ERR_PTR(-ENOBUFS); > > > } > > > > > > int __i915_vma_pin_fence(struct i915_vma *vma) > > > > Makes sense.. > > > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks. > > - please cc: dri-devel on anything gem/gt related. > - this should probably be ENOSPC or something like that for at least a > seeming retention of errno consistentcy: > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values Other option would be to map that back to EDEADLK in the execbuf ioctl somewhere, so we retain a distinct errno code. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Jul 13, 2021 at 09:59:18PM +0200, Daniel Vetter wrote: > On Tue, Jul 13, 2021 at 9:58 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst > > <maarten.lankhorst@linux.intel.com> wrote: > > > Op 30-06-2021 om 18:44 schreef Ville Syrjala: > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > > > The conversion to ww mutexes failed to address the fence code which > > > > already returns -EDEADLK when we run out of fences. Ww mutexes on > > > > the other hand treat -EDEADLK as an internal errno value indicating > > > > a need to restart the operation due to a deadlock. So now when the > > > > fence code returns -EDEADLK the higher level code erroneously > > > > restarts everything instead of returning the error to userspace > > > > as is expected. > > > > > > > > To remedy this let's switch the fence code to use a different errno > > > > value for this. -ENOBUFS seems like a semi-reasonable unique choice. > > > > Apart from igt the only user of this I could find is sna, and even > > > > there all we do is dump the current fence registers from debugfs > > > > into the X server log. So no user visible functionality is affected. > > > > If we really cared about preserving this we could of course convert > > > > back to -EDEADLK higher up, but doesn't seem like that's worth > > > > the hassle here. > > > > > > > > Not quite sure which commit specifically broke this, but I'll > > > > just attribute it to the general gem ww mutex work. > > > > > > > > Cc: stable@vger.kernel.org > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > Cc: Thomas Hellström <thomas.hellstrom@intel.com> > > > > Testcase: igt/gem_pread/exhaustion > > > > Testcase: igt/gem_pwrite/basic-exhaustion > > > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences > > > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.") > > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > --- > > > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > > index cac7f3f44642..f8948de72036 100644 > > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) > > > > if (intel_has_pending_fb_unpin(ggtt->vm.i915)) > > > > return ERR_PTR(-EAGAIN); > > > > > > > > - return ERR_PTR(-EDEADLK); > > > > + return ERR_PTR(-ENOBUFS); > > > > } > > > > > > > > int __i915_vma_pin_fence(struct i915_vma *vma) > > > > > > Makes sense.. > > > > > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > > > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks. > > > > - please cc: dri-devel on anything gem/gt related. > > - this should probably be ENOSPC or something like that for at least a > > seeming retention of errno consistentcy: > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values > > Other option would be to map that back to EDEADLK in the execbuf ioctl > somewhere, so we retain a distinct errno code. I'm about to push this patch to drm-intel-fixes... I'm assuming if there's any fix it will be a follow-up patch and not a revert or force push, right?! > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, Jul 13, 2021 at 09:59:18PM +0200, Daniel Vetter wrote: > On Tue, Jul 13, 2021 at 9:58 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Thu, Jul 1, 2021 at 9:07 AM Maarten Lankhorst > > <maarten.lankhorst@linux.intel.com> wrote: > > > Op 30-06-2021 om 18:44 schreef Ville Syrjala: > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > > > The conversion to ww mutexes failed to address the fence code which > > > > already returns -EDEADLK when we run out of fences. Ww mutexes on > > > > the other hand treat -EDEADLK as an internal errno value indicating > > > > a need to restart the operation due to a deadlock. So now when the > > > > fence code returns -EDEADLK the higher level code erroneously > > > > restarts everything instead of returning the error to userspace > > > > as is expected. > > > > > > > > To remedy this let's switch the fence code to use a different errno > > > > value for this. -ENOBUFS seems like a semi-reasonable unique choice. > > > > Apart from igt the only user of this I could find is sna, and even > > > > there all we do is dump the current fence registers from debugfs > > > > into the X server log. So no user visible functionality is affected. > > > > If we really cared about preserving this we could of course convert > > > > back to -EDEADLK higher up, but doesn't seem like that's worth > > > > the hassle here. > > > > > > > > Not quite sure which commit specifically broke this, but I'll > > > > just attribute it to the general gem ww mutex work. > > > > > > > > Cc: stable@vger.kernel.org > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > Cc: Thomas Hellström <thomas.hellstrom@intel.com> > > > > Testcase: igt/gem_pread/exhaustion > > > > Testcase: igt/gem_pwrite/basic-exhaustion > > > > Testcase: igt/gem_fenced_exec_thrash/too-many-fences > > > > Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.") > > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > --- > > > > drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > > index cac7f3f44642..f8948de72036 100644 > > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c > > > > @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) > > > > if (intel_has_pending_fb_unpin(ggtt->vm.i915)) > > > > return ERR_PTR(-EAGAIN); > > > > > > > > - return ERR_PTR(-EDEADLK); > > > > + return ERR_PTR(-ENOBUFS); > > > > } > > > > > > > > int __i915_vma_pin_fence(struct i915_vma *vma) > > > > > > Makes sense.. > > > > > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > > > Is it a slightly more reent commit? Might probably be the part that converts execbuffer to use ww locks. > > > > - please cc: dri-devel on anything gem/gt related. Thought I did. Apparently got lost somewhere. > > - this should probably be ENOSPC or something like that for at least a > > seeming retention of errno consistentcy: ENOSPC is already used for other things. > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#recommended-ioctl-return-values > > Other option would be to map that back to EDEADLK in the execbuf ioctl > somewhere, so we retain a distinct errno code. Already mentioned in the commit msg. -- Ville Syrjälä Intel
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index cac7f3f44642..f8948de72036 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) if (intel_has_pending_fb_unpin(ggtt->vm.i915)) return ERR_PTR(-EAGAIN); - return ERR_PTR(-EDEADLK); + return ERR_PTR(-ENOBUFS); } int __i915_vma_pin_fence(struct i915_vma *vma)