From patchwork Wed Sep 16 09:42:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 309544 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0540C43461 for ; Wed, 16 Sep 2020 09:42:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87FB021974 for ; Wed, 16 Sep 2020 09:42:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726751AbgIPJm0 (ORCPT ); Wed, 16 Sep 2020 05:42:26 -0400 Received: from mail.fireflyinternet.com ([77.68.26.236]:62292 "EHLO fireflyinternet.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726723AbgIPJmZ (ORCPT ); Wed, 16 Sep 2020 05:42:25 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 22443433-1500050 for multiple; Wed, 16 Sep 2020 10:42:19 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Cc: Chris Wilson , Joonas Lahtinen , stable@vger.kernel.org Subject: [PATCH 2/4] drm/i915: Cancel outstanding work after disabling heartbeats on an engine Date: Wed, 16 Sep 2020 10:42:17 +0100 Message-Id: <20200916094219.3878-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200916094219.3878-1-chris@chris-wilson.co.uk> References: <20200916094219.3878-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org We only allow persistent requests to remain on the GPU past the closure of their containing context (and process) so long as they are continuously checked for hangs or allow other requests to preempt them, as we need to ensure forward progress of the system. If we allow persistent contexts to remain on the system after the the hangcheck mechanism is disabled, the system may grind to a halt. On disabling the mechanism, we sent a pulse along the engine to remove all executing contexts from the engine which would check for hung contexts -- but we did not prevent those contexts from being resubmitted if they survived the final hangcheck. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-stop Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: # v5.7+ Reviewed-by: Tvrtko Ursulin Acked-by: Joonas Lahtinen --- drivers/gpu/drm/i915/gt/intel_engine.h | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 5 +++++ 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 08e2c000dcc3..7c3a1012e702 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -337,4 +337,13 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) return intel_engine_has_preemption(engine); } +static inline bool +intel_engine_has_heartbeat(const struct intel_engine_cs *engine) +{ + if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL)) + return false; + + return READ_ONCE(engine->props.heartbeat_interval_ms); +} + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 436ce368ddaa..0e813819b041 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -542,8 +542,13 @@ bool __i915_request_submit(struct i915_request *request) if (i915_request_completed(request)) goto xfer; + if (unlikely(intel_context_is_closed(request->context) && + !intel_engine_has_heartbeat(engine))) + intel_context_set_banned(request->context); + if (unlikely(intel_context_is_banned(request->context))) i915_request_set_error_once(request, -EIO); + if (unlikely(fatal_error(request->fence.error))) __i915_request_skip(request); From patchwork Wed Sep 16 09:42:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 263802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B6D6C35256 for ; Wed, 16 Sep 2020 09:42:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1191B2222D for ; Wed, 16 Sep 2020 09:42:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726612AbgIPJmd (ORCPT ); Wed, 16 Sep 2020 05:42:33 -0400 Received: from mail.fireflyinternet.com ([77.68.26.236]:62293 "EHLO fireflyinternet.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726760AbgIPJm2 (ORCPT ); Wed, 16 Sep 2020 05:42:28 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 22443434-1500050 for multiple; Wed, 16 Sep 2020 10:42:19 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Cc: Chris Wilson , Joonas Lahtinen , stable@vger.kernel.org Subject: [PATCH 3/4] drm/i915/gt: Always send a pulse down the engine after disabling heartbeat Date: Wed, 16 Sep 2020 10:42:18 +0100 Message-Id: <20200916094219.3878-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200916094219.3878-1-chris@chris-wilson.co.uk> References: <20200916094219.3878-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Currently, we check we can send a pulse prior to disabling the heartbeat to verify that we can change the heartbeat, but since we may re-evaluate execution upon changing the heartbeat interval we need another pulse afterwards to refresh execution. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: # v5.7+ --- drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 8ffdf676c0a0..d09df370f7cd 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -192,10 +192,12 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine, WRITE_ONCE(engine->props.heartbeat_interval_ms, delay); if (intel_engine_pm_get_if_awake(engine)) { - if (delay) + if (delay) { intel_engine_unpark_heartbeat(engine); - else + } else { intel_engine_park_heartbeat(engine); + intel_engine_pulse(engine); /* recheck execution */ + } intel_engine_pm_put(engine); } From patchwork Wed Sep 16 09:42:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 309543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB3B3C2BB84 for ; Wed, 16 Sep 2020 09:42:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B241A21974 for ; Wed, 16 Sep 2020 09:42:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726302AbgIPJma (ORCPT ); Wed, 16 Sep 2020 05:42:30 -0400 Received: from mail.fireflyinternet.com ([77.68.26.236]:62292 "EHLO fireflyinternet.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726758AbgIPJm2 (ORCPT ); Wed, 16 Sep 2020 05:42:28 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 22443435-1500050 for multiple; Wed, 16 Sep 2020 10:42:19 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Cc: Chris Wilson , Joonas Lahtinen , stable@vger.kernel.org Subject: [PATCH 4/4] drm/i915/gem: Always test execution status on closing the context Date: Wed, 16 Sep 2020 10:42:19 +0100 Message-Id: <20200916094219.3878-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200916094219.3878-1-chris@chris-wilson.co.uk> References: <20200916094219.3878-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Verify that if a context is active at the time it is closed, that it is either persistent and preemptible (with hangcheck running) or it shall be removed from execution. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-close Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: # v5.7+ Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 48 +++++---------------- 1 file changed, 10 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index a548626fa8bc..4fd38101bb56 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -390,24 +390,6 @@ __context_engines_static(const struct i915_gem_context *ctx) return rcu_dereference_protected(ctx->engines, true); } -static bool __reset_engine(struct intel_engine_cs *engine) -{ - struct intel_gt *gt = engine->gt; - bool success = false; - - if (!intel_has_reset_engine(gt)) - return false; - - if (!test_and_set_bit(I915_RESET_ENGINE + engine->id, - >->reset.flags)) { - success = intel_engine_reset(engine, NULL) == 0; - clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, - >->reset.flags); - } - - return success; -} - static void __reset_context(struct i915_gem_context *ctx, struct intel_engine_cs *engine) { @@ -431,12 +413,7 @@ static bool __cancel_engine(struct intel_engine_cs *engine) * kill the banned context, we fallback to doing a local reset * instead. */ - if (IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT) && - !intel_engine_pulse(engine)) - return true; - - /* If we are unable to send a pulse, try resetting this engine. */ - return __reset_engine(engine); + return intel_engine_pulse(engine) == 0; } static bool @@ -506,7 +483,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce) return engine; } -static void kill_engines(struct i915_gem_engines *engines) +static void kill_engines(struct i915_gem_engines *engines, bool ban) { struct i915_gem_engines_iter it; struct intel_context *ce; @@ -521,7 +498,7 @@ static void kill_engines(struct i915_gem_engines *engines) for_each_gem_engine(ce, engines, it) { struct intel_engine_cs *engine; - if (intel_context_set_banned(ce)) + if (ban && intel_context_set_banned(ce)) continue; /* @@ -534,7 +511,7 @@ static void kill_engines(struct i915_gem_engines *engines) engine = active_engine(ce); /* First attempt to gracefully cancel the context */ - if (engine && !__cancel_engine(engine)) + if (engine && !__cancel_engine(engine) && ban) /* * If we are unable to send a preemptive pulse to bump * the context from the GPU, we have to resort to a full @@ -544,8 +521,10 @@ static void kill_engines(struct i915_gem_engines *engines) } } -static void kill_stale_engines(struct i915_gem_context *ctx) +static void kill_context(struct i915_gem_context *ctx) { + bool ban = (!i915_gem_context_is_persistent(ctx) || + !ctx->i915->params.enable_hangcheck); struct i915_gem_engines *pos, *next; spin_lock_irq(&ctx->stale.lock); @@ -558,7 +537,7 @@ static void kill_stale_engines(struct i915_gem_context *ctx) spin_unlock_irq(&ctx->stale.lock); - kill_engines(pos); + kill_engines(pos, ban); spin_lock_irq(&ctx->stale.lock); GEM_BUG_ON(i915_sw_fence_signaled(&pos->fence)); @@ -570,11 +549,6 @@ static void kill_stale_engines(struct i915_gem_context *ctx) spin_unlock_irq(&ctx->stale.lock); } -static void kill_context(struct i915_gem_context *ctx) -{ - kill_stale_engines(ctx); -} - static void engines_idle_release(struct i915_gem_context *ctx, struct i915_gem_engines *engines) { @@ -609,7 +583,7 @@ static void engines_idle_release(struct i915_gem_context *ctx, kill: if (list_empty(&engines->link)) /* raced, already closed */ - kill_engines(engines); + kill_engines(engines, true); i915_sw_fence_commit(&engines->fence); } @@ -667,9 +641,7 @@ static void context_close(struct i915_gem_context *ctx) * case we opt to forcibly kill off all remaining requests on * context close. */ - if (!i915_gem_context_is_persistent(ctx) || - !ctx->i915->params.enable_hangcheck) - kill_context(ctx); + kill_context(ctx); i915_gem_context_put(ctx); }