From patchwork Tue May 12 08:59:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209838 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B9CBC54E94 for ; Tue, 12 May 2020 09:03:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 662D4206B7 for ; Tue, 12 May 2020 09:03:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="G3K7uswl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729348AbgELJAD (ORCPT ); Tue, 12 May 2020 05:00:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729322AbgELJAB (ORCPT ); Tue, 12 May 2020 05:00:01 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA361C05BD09 for ; Tue, 12 May 2020 01:59:59 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id m24so11242153wml.2 for ; Tue, 12 May 2020 01:59:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I6zGk0sRXoiLMVlOrHdYKmGYdqgb8HIUHeWjfHU8LZ8=; b=G3K7uswlmhW+rpia5CreMV57ZkKdKlwZ6FOlSyfm3Oe7LuGmiEiRHD/PML6gjh4GhM OdvDdQt4u51YIl+RwDLbW7P8kjQMLKpQZcHKzE4dlQdQnOJqX7Su2t5BwmDsqLUnYibz JMBsPiQPy7q4aI4KfXGjLvCIMnBefB60TFyyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I6zGk0sRXoiLMVlOrHdYKmGYdqgb8HIUHeWjfHU8LZ8=; b=FXEMqVI0w+69WYMKPsvDSvIninlHn92vWdokY4yHiU199BQo9bF/5+xanQOJr7U/mg My5ZJ+wmoXvhyyHr7BGrQScRAOgIcoZJuNs4wtSILfFBhs3hOSRXot4iA/3gdh93mquN N2HZ4sgN8/vHGl7T7K4XRvSV4kQg3AUWz/J8wSZYELHJs1NoDZpa3FRG3G7MsUMlXstX FlE3EidhKIIiOVXTlOfCTMX29rKwlkitEWYH/o3e6G3Phdn5+rOG5d/gmwjbPfracEBi QXAUIKxOg2iqgW1nfy3zUDItunK3Y1pB4NqScXcVLbWyt/M46H6AvCcyjGP5JNaPBPwV UZoQ== X-Gm-Message-State: AGi0PuZc1dNMQYsuntnI+saGWazxua3RqDsWFhkaXYRn4PPjYQdQzmoa dSGS395n/RFNlBW3sM6Z4RmY1xjdKB8= X-Google-Smtp-Source: APiQypLCYf7Wgh0Wtz0pUmQn/xpulX/EPJ0KyLgnUHWq+1CmCns3LkuSYY1ZCdFMCm+vHgIBfsEl0Q== X-Received: by 2002:a1c:3bc5:: with SMTP id i188mr25545127wma.90.1589273998427; Tue, 12 May 2020 01:59:58 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.01.59.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 01:59:57 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 02/17] dma-fence: basic lockdep annotations Date: Tue, 12 May 2020 10:59:29 +0200 Message-Id: <20200512085944.222637-3-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org Design is similar to the lockdep annotations for workers, but with some twists: - We use a read-lock for the execution/worker/completion side, so that this explicit annotation can be more liberally sprinkled around. With read locks lockdep isn't going to complain if the read-side isn't nested the same way under all circumstances, so ABBA deadlocks are ok. Which they are, since this is an annotation only. - We're using non-recursive lockdep read lock mode, since in recursive read lock mode lockdep does not catch read side hazards. And we _very_ much want read side hazards to be caught. For full details of this limitation see commit e91498589746065e3ae95d9a00b068e525eec34f Author: Peter Zijlstra Date: Wed Aug 23 13:13:11 2017 +0200 locking/lockdep/selftests: Add mixed read-write ABBA tests - To allow nesting of the read-side explicit annotations we explicitly keep track of the nesting. lock_is_held() allows us to do that. - The wait-side annotation is a write lock, and entirely done within dma_fence_wait() for everyone by default. - To be able to freely annotate helper functions I want to make it ok to call dma_fence_begin/end_signalling from soft/hardirq context. First attempt was using the hardirq locking context for the write side in lockdep, but this forces all normal spinlocks nested within dma_fence_begin/end_signalling to be spinlocks. That bollocks. The approach now is to simple check in_atomic(), and for these cases entirely rely on the might_sleep() check in dma_fence_wait(). That will catch any wrong nesting against spinlocks from soft/hardirq contexts. The idea here is that every code path that's critical for eventually signalling a dma_fence should be annotated with dma_fence_begin/end_signalling. The annotation ideally starts right after a dma_fence is published (added to a dma_resv, exposed as a sync_file fd, attached to a drm_syncobj fd, or anything else that makes the dma_fence visible to other kernel threads), up to and including the dma_fence_wait(). Examples are irq handlers, the scheduler rt threads, the tail of execbuf (after the corresponding fences are visible), any workers that end up signalling dma_fences and really anything else. Not annotated should be code paths that only complete fences opportunistically as the gpu progresses, like e.g. shrinker/eviction code. The main class of deadlocks this is supposed to catch are: Thread A: mutex_lock(A); mutex_unlock(A); dma_fence_signal(); Thread B: mutex_lock(A); dma_fence_wait(); mutex_unlock(A); Thread B is blocked on A signalling the fence, but A never gets around to that because it cannot acquire the lock A. Note that dma_fence_wait() is allowed to be nested within dma_fence_begin/end_signalling sections. To allow this to happen the read lock needs to be upgraded to a write lock, which means that any other lock is acquired between the dma_fence_begin_signalling() call and the call to dma_fence_wait(), and still held, this will result in an immediate lockdep complaint. The only other option would be to not annotate such calls, defeating the point. Therefore these annotations cannot be sprinkled over the code entirely mindless to avoid false positives. v2: handle soft/hardirq ctx better against write side and dont forget EXPORT_SYMBOL, drivers can't use this otherwise. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-fence.c | 53 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 12 +++++++++ 2 files changed, 65 insertions(+) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 6802125349fb..d5c0fd2efc70 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -110,6 +110,52 @@ u64 dma_fence_context_alloc(unsigned num) } EXPORT_SYMBOL(dma_fence_context_alloc); +#ifdef CONFIG_LOCKDEP +struct lockdep_map dma_fence_lockdep_map = { + .name = "dma_fence_map" +}; + +bool dma_fence_begin_signalling(void) +{ + /* explicitly nesting ... */ + if (lock_is_held_type(&dma_fence_lockdep_map, 1)) + return true; + + /* rely on might_sleep check for soft/hardirq locks */ + if (in_atomic()) + return true; + + /* ... and non-recursive readlock */ + lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_); + + return false; +} +EXPORT_SYMBOL(dma_fence_begin_signalling); + +void dma_fence_end_signalling(bool cookie) +{ + if (cookie) + return; + + lock_release(&dma_fence_lockdep_map, _RET_IP_); +} +EXPORT_SYMBOL(dma_fence_end_signalling); + +void __dma_fence_might_wait(void) +{ + bool tmp; + + tmp = lock_is_held_type(&dma_fence_lockdep_map, 1); + if (tmp) + lock_release(&dma_fence_lockdep_map, _THIS_IP_); + lock_map_acquire(&dma_fence_lockdep_map); + lock_map_release(&dma_fence_lockdep_map); + if (tmp) + lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_); +} +#endif + + /** * dma_fence_signal_locked - signal completion of a fence * @fence: the fence to signal @@ -170,14 +216,19 @@ int dma_fence_signal(struct dma_fence *fence) { unsigned long flags; int ret; + bool tmp; if (!fence) return -EINVAL; + tmp = dma_fence_begin_signalling(); + spin_lock_irqsave(fence->lock, flags); ret = dma_fence_signal_locked(fence); spin_unlock_irqrestore(fence->lock, flags); + dma_fence_end_signalling(tmp); + return ret; } EXPORT_SYMBOL(dma_fence_signal); @@ -211,6 +262,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout) if (timeout > 0) might_sleep(); + __dma_fence_might_wait(); + trace_dma_fence_wait_start(fence); if (fence->ops->wait) ret = fence->ops->wait(fence, intr, timeout); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 3347c54f3a87..3f288f7db2ef 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -357,6 +357,18 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep) } while (1); } +#ifdef CONFIG_LOCKDEP +bool dma_fence_begin_signalling(void); +void dma_fence_end_signalling(bool cookie); +#else +static inline bool dma_fence_begin_signalling(void) +{ + return true; +} +static inline void dma_fence_end_signalling(bool cookie) {} +static inline void __dma_fence_might_wait(void) {} +#endif + int dma_fence_signal(struct dma_fence *fence); int dma_fence_signal_locked(struct dma_fence *fence); signed long dma_fence_default_wait(struct dma_fence *fence, From patchwork Tue May 12 08:59:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 987DBC54E8F for ; Tue, 12 May 2020 09:04:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 789F520714 for ; Tue, 12 May 2020 09:04:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="CLBPuhz5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729386AbgELJD5 (ORCPT ); Tue, 12 May 2020 05:03:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729270AbgELJAB (ORCPT ); Tue, 12 May 2020 05:00:01 -0400 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E64E9C05BD0D for ; Tue, 12 May 2020 02:00:00 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id m12so15932238wmc.0 for ; Tue, 12 May 2020 02:00:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NZNungkfzjrpX3wvUKHqphES4AQA2BjvgdkrK2cjuiE=; b=CLBPuhz5g03xplqY3Lp/3c6jCI7cBpGlK2yI20bsQjZlk+Y0DSRAAzM7Uk0yJ/74fT qODjSJnOacCEhx0BQuKhrp9xCJljtl7WlTp/enOv9Yf5zmkcwWAQwPHdsOTS6iInjswc E2mZfpyZhXG3pjjGV03WLfeasYqdUTvXYUfyk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NZNungkfzjrpX3wvUKHqphES4AQA2BjvgdkrK2cjuiE=; b=isat1JYgb8Q0hNwgQSv+5TdfWe2K2BoWbu7avBNq0oxFwTqJ0Gr32COONAM5sQaHbS 1HZ1HlUewnLtfDAiCvAeDDi/0+BMJIGHfvYCjeWLxAwYMf1Mi6XiPz3zjX0438EMwvYY ztF/xpOKqC6xEkVNoFCrfxl9PWKib5rXhCua4VNK27qU7zNhKQki7RsjIVj3TkZwLtIM +hoUgBLDpGcPS4DzELhMT2rvYI3qgXPTc5lMskRh8bY8jHkIXNf7baPl4CijpkBC/VLW 4VDWdDYEzQZUWU+8/4zabRVDcEW3ME0eb/jvZQMJdMVdYiz/n2lFggO5cD+YLzHmAYSO 6rHg== X-Gm-Message-State: AGi0Pubagt5FLQCHEfNFhrsz6gQRsAniV8gUGuRWQrWc6Fu62s1d/YFs JECJOk5bchfmcgKM2aE76mDRzg== X-Google-Smtp-Source: APiQypKft08OoGrIYtbiSYRYQwvgFtVy2fvk5tEE2SUvyg8qKyVKerp+e40UsbcgCmUWXpSeXbccUQ== X-Received: by 2002:a05:600c:2055:: with SMTP id p21mr13223418wmg.127.1589273999661; Tue, 12 May 2020 01:59:59 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.01.59.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 01:59:59 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 03/17] dma-fence: prime lockdep annotations Date: Tue, 12 May 2020 10:59:30 +0200 Message-Id: <20200512085944.222637-4-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org Two in one go: - it is allowed to call dma_fence_wait() while holding a dma_resv_lock(). This is fundamental to how eviction works with ttm, so required. - it is allowed to call dma_fence_wait() from memory reclaim contexts, specifically from shrinker callbacks (which i915 does), and from mmu notifier callbacks (which amdgpu does, and which i915 sometimes also does, and probably always should, but that's kinda a debate). Also for stuff like HMM we really need to be able to do this, or things get real dicey. Consequence is that any critical path necessary to get to a dma_fence_signal for a fence must never a) call dma_resv_lock nor b) allocate memory with GFP_KERNEL. Also by implication of dma_resv_lock(), no userspace faulting allowed. That's some supremely obnoxious limitations, which is why we need to sprinkle the right annotations to all relevant paths. The one big locking context we're leaving out here is mmu notifiers, added in commit 23b68395c7c78a764e8963fc15a7cfd318bf187f Author: Daniel Vetter Date: Mon Aug 26 22:14:21 2019 +0200 mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end that one covers a lot of other callsites, and it's also allowed to wait on dma-fences from mmu notifiers. But there's no ready-made functions exposed to prime this, so I've left it out for now. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 1 + include/linux/dma-fence.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 99c0a33c918d..392c336f0732 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -115,6 +115,7 @@ static int __init dma_resv_lockdep(void) if (ret == -EDEADLK) dma_resv_lock_slow(&obj, &ctx); fs_reclaim_acquire(GFP_KERNEL); + __dma_fence_might_wait(); fs_reclaim_release(GFP_KERNEL); ww_mutex_unlock(&obj.lock); ww_acquire_fini(&ctx); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 3f288f7db2ef..09e23adb351d 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -360,6 +360,7 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep) #ifdef CONFIG_LOCKDEP bool dma_fence_begin_signalling(void); void dma_fence_end_signalling(bool cookie); +void __dma_fence_might_wait(void); #else static inline bool dma_fence_begin_signalling(void) { From patchwork Tue May 12 08:59:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209843 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 966A2C54E4B for ; Tue, 12 May 2020 09:00:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 72716207FF for ; Tue, 12 May 2020 09:00:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="QOpkeUs0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729423AbgELJAJ (ORCPT ); Tue, 12 May 2020 05:00:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729392AbgELJAH (ORCPT ); Tue, 12 May 2020 05:00:07 -0400 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85C1CC061A0F for ; Tue, 12 May 2020 02:00:06 -0700 (PDT) Received: by mail-wr1-x441.google.com with SMTP id y16so7170668wrs.3 for ; Tue, 12 May 2020 02:00:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QpRcEBLBFti7+uwIPhFpaG5e28opFzcZsYVebGatC0I=; b=QOpkeUs0h0wN4sti4X0g6ThD3SrxXZsFWkczPFMfaK9K848FuZY/LBbwUozpO63nQf eTJs+eMU1PYoj07tdTFH4MaPU0yMQAi0jxHqRvYdOuXMTkIN/Mt01aR+HFippq1ZJM2r G6xGmiSojmbfOZiM1YCpKK0rOM2Epr1zxLZfQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QpRcEBLBFti7+uwIPhFpaG5e28opFzcZsYVebGatC0I=; b=HLoom1W6BJH3eR6sOILeQ62dFUZDFg14bZ+2fEhWxNRIWxKyliOnhtIOztRqxIKydz UXS4LMG9KrGyY6FKvt/SXWetf61aNPrVsYKsedmgKBg8YigqX4+BLddknATM60Ehh0y/ fRRo4y7S5fVgaXtLW0+Qg31PQL9kmqo1ywr5FjpkWtDkUC/hhGjr3HDEaieEstzSxheD ms9HugEgdmcZzFQYDD/bL2cDGRlxnFuQytJEBkyg8vwfKqYhEqC5TNb/6sjEQ2AK4lmc VZnh7GtP3lVL5E7krXXuVJVHsUhimkv6DvCGGbwLRif78P2OOz/MKaXOsAwJ1Wps0I5l gw4A== X-Gm-Message-State: AGi0PubASu61Vwo2faa90Lde+qRR7pTCwXeeL9MVo9saHvqG4cN7SeW2 r9pjJA0J9fD+tYjLScqn/tmGCQ== X-Google-Smtp-Source: APiQypKd1E6Oeu9FhxQgfCRXfBgs5PqqzFNBcX4uC9QIYdSJsh2y79L6Is6aCs3MeuVyGs7xUaybiQ== X-Received: by 2002:adf:8302:: with SMTP id 2mr24564990wrd.114.1589274005258; Tue, 12 May 2020 02:00:05 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:04 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 08/17] drm/scheduler: use dma-fence annotations in main thread Date: Tue, 12 May 2020 10:59:35 +0200 Message-Id: <20200512085944.222637-9-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org If the scheduler rt thread gets stuck on a mutex that we're holding while waiting for gpu workloads to complete, we have a problem. Add dma-fence annotations so that lockdep can check this for us. I've tried to quite carefully review this, and I think it's at the right spot. But obviosly no expert on drm scheduler. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 2f319102ae9f..06a736e506ad 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -763,9 +763,12 @@ static int drm_sched_main(void *param) struct sched_param sparam = {.sched_priority = 1}; struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param; int r; + bool fence_cookie; sched_setscheduler(current, SCHED_FIFO, &sparam); + fence_cookie = dma_fence_begin_signalling(); + while (!kthread_should_stop()) { struct drm_sched_entity *entity = NULL; struct drm_sched_fence *s_fence; @@ -823,6 +826,9 @@ static int drm_sched_main(void *param) wake_up(&sched->job_scheduled); } + + dma_fence_end_signalling(fence_cookie); + return 0; } From patchwork Tue May 12 08:59:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B924C54E4A for ; Tue, 12 May 2020 09:00:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 220B6214DB for ; Tue, 12 May 2020 09:00:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="SDwGwQpJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729433AbgELJAK (ORCPT ); Tue, 12 May 2020 05:00:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729392AbgELJAK (ORCPT ); Tue, 12 May 2020 05:00:10 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86D9EC05BD0A for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id l11so8435857wru.0 for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3/9uzJ3fevUELKKtrW0wppzXu6P75uy22NGdQSlmDz8=; b=SDwGwQpJaRBMJHLwj1aebHZrNlfPB3lM8R5Nxnw2iursO6/Tq0w7BeIxMzSmD391GY iPFvU8Hqv0qaq3/SShkh7frLZZM3Zpz71K/gsOA/w0EdvszMgINIghxumAUMRmpCHVZ6 PNjddJYrJBWaP/HRgZfmAEs0NKLQNEFW8NDqo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3/9uzJ3fevUELKKtrW0wppzXu6P75uy22NGdQSlmDz8=; b=oena61bb5Qi+6IN37xQakhCUZRieDPgvTUpVoUKFA6TKNT/o/zcz4fsl53qARtW8dS ddFSzxET/CvN8BXpSsYL7XUVcA06tsBET036yR2vxSvQebka3wUDj44Xi2x3hTY/knSX 0sbCXB8VDye9FVahpvlcDZGQz7bA092dEo6aU9roumKtdiHjSSnT/ybh1kkIGJ/3ajIO 9mLY/CUGObq53zNPxxR3A9WCvvouqcQrrk4jGf3cl3RZI0LURPllOzHmFqQg+iP45WZY DAvxznnkXKAIVWMB/lfROj0EMvBnTPUPyFQ4Azbd33PPWZN5yElmoyv9sn+7ZCJLa0Y2 zrSg== X-Gm-Message-State: AGi0PuZuod8hIOkV3xD/UVRbH5ZfMLuErjHA2NCu4HtuyXszoPByG5i/ mXmB5rPKbEl6h1FbtfRWK7gvWw== X-Google-Smtp-Source: APiQypJNdraV8h80zz8iXZE8tE3/9vYAqexMkf4hJDDcWF5MN2cVkPdcNZnVM3rSm/bq3B4xJ2e1wQ== X-Received: by 2002:a5d:6702:: with SMTP id o2mr17490653wru.231.1589274006358; Tue, 12 May 2020 02:00:06 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:05 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 09/17] drm/amdgpu: use dma-fence annotations in cs_submit() Date: Tue, 12 May 2020 10:59:36 +0200 Message-Id: <20200512085944.222637-10-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org This is a bit tricky, since ->notifier_lock is held while calling dma_fence_wait we must ensure that also the read side (i.e. dma_fence_begin_signalling) is on the same side. If we mix this up lockdep complaints, and that's again why we want to have these annotations. A nice side effect of this is that because of the fs_reclaim priming for dma_fence_enable lockdep now automatically checks for us that nothing in here allocates memory, without even running any userptr workloads. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 7653f62b1b2d..6db3f3c629b0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1213,6 +1213,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, struct amdgpu_job *job; uint64_t seq; int r; + bool fence_cookie; job = p->job; p->job = NULL; @@ -1227,6 +1228,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, */ mutex_lock(&p->adev->notifier_lock); + fence_cookie = dma_fence_begin_signalling(); + /* If userptr are invalidated after amdgpu_cs_parser_bos(), return * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl. */ @@ -1264,12 +1267,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm); ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence); + dma_fence_end_signalling(fence_cookie); mutex_unlock(&p->adev->notifier_lock); return 0; error_abort: drm_sched_job_cleanup(&job->base); + dma_fence_end_signalling(fence_cookie); mutex_unlock(&p->adev->notifier_lock); error_unlock: From patchwork Tue May 12 08:59:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209842 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50B8EC8182B for ; Tue, 12 May 2020 09:00:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 32447206A3 for ; Tue, 12 May 2020 09:00:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="GdZJjNq2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729437AbgELJAx (ORCPT ); Tue, 12 May 2020 05:00:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729428AbgELJAK (ORCPT ); Tue, 12 May 2020 05:00:10 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE9AFC061A0F for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id m24so11242659wml.2 for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=r5PRcG/JtprEwCOUf2/ZPMLqKS9RbW1APrGVc2im3Nw=; b=GdZJjNq28F9KcRJyjlTagS8E3fkL1j9MVdiS32+c3J3oz4My0gpMGd7IrjrVwziyZj 3INpjpPYeyhbD4+NwUOkIx7M0B6lMTqDeQsZGgAtFaTLfzA0CcV9W86y8Gzd1CxGRleL BXUlHw6StLiPmKsmxJLvaFWoCXKd6BPsYXKG0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r5PRcG/JtprEwCOUf2/ZPMLqKS9RbW1APrGVc2im3Nw=; b=XlYRMV8RdH1XINQS1YQer1Uc61hlocpwLmxcl/rlgfrPHuNlZvLIIWBKXQYQ+qM31B 6QhRJGvqxumn13FnxjAI8H+DX2emAK64CUm5cuV7LFNKuGGZb36JQqfQ71V8sDby2fuR logIjCxXHGAgCm0EK7Ejf+badeN7DwNDYAr/Cc6xpZFFvTep0rYfdU31tuHe1Gej7nL2 1Xi0hJNv+/sJ/rrFX4cR5MUnyh2wOMMx9wOwun51W2zSctkCbvPtNsAFDyRu8G2rmahr 7X1CUFz+Vj8/IktD/0hj315JNgPgfO0FCh5Q5/xroto4eXiFP6/M/N1HY9mzCrxo2Ysl z+Ng== X-Gm-Message-State: AGi0PublpwcjOwm82MVuDnKBaj40cmuzyHVJU6FcmB64mGMELJVlhzuK TIL1idvn7gWBkObuDOImnPSgOQ== X-Google-Smtp-Source: APiQypLn0gzMJwxb1G5NuhzTlxFQ4Fcx0V6C0P/n0rDCyjAZoEVClTf3ENM81ku64jmXd4+qf9UzyQ== X-Received: by 2002:a05:600c:2055:: with SMTP id p21mr13224168wmg.127.1589274007383; Tue, 12 May 2020 02:00:07 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:06 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Date: Tue, 12 May 2020 10:59:37 +0200 Message-Id: <20200512085944.222637-11-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org My dma-fence lockdep annotations caught an inversion because we allocate memory where we really shouldn't: kmem_cache_alloc+0x2b/0x6d0 amdgpu_fence_emit+0x30/0x330 [amdgpu] amdgpu_ib_schedule+0x306/0x550 [amdgpu] amdgpu_job_run+0x10f/0x260 [amdgpu] drm_sched_main+0x1b9/0x490 [gpu_sched] kthread+0x12e/0x150 Trouble right now is that lockdep only validates against GFP_FS, which would be good enough for shrinkers. But for mmu_notifiers we actually need !GFP_ATOMIC, since they can be called from any page laundering, even if GFP_NOFS or GFP_NOIO are set. I guess we should improve the lockdep annotations for fs_reclaim_acquire/release. Ofc real fix is to properly preallocate this fence and stuff it into the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of the way. v2: Two more allocations in scheduler paths. Frist one: __kmalloc+0x58/0x720 amdgpu_vmid_grab+0x100/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched] Second one: kmem_cache_alloc+0x2b/0x6d0 amdgpu_sync_fence+0x7e/0x110 [amdgpu] amdgpu_vmid_grab+0x86b/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched] Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d878fe7fee51..055b47241bb1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, uint32_t seq; int r; - fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL); + fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC); if (fence == NULL) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index fe92dcd94d4a..fdcd6659f5ad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm, if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait)) return amdgpu_sync_fence(sync, ring->vmid_wait, false); - fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL); + fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC); if (!fences) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index b87ca171986a..330476cc0c86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f, if (amdgpu_sync_add_later(sync, f, explicit)) return 0; - e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL); + e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC); if (!e) return -ENOMEM; From patchwork Tue May 12 08:59:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209844 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C642C81825 for ; Tue, 12 May 2020 09:00:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0DB562222D for ; Tue, 12 May 2020 09:00:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="PnuV7xfl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729520AbgELJAr (ORCPT ); Tue, 12 May 2020 05:00:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729438AbgELJAM (ORCPT ); Tue, 12 May 2020 05:00:12 -0400 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6AA6C05BD0A for ; Tue, 12 May 2020 02:00:10 -0700 (PDT) Received: by mail-wr1-x443.google.com with SMTP id v12so14305955wrp.12 for ; Tue, 12 May 2020 02:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=U6eMxQ6gGsP+PGOv8CvoIWRRS5TJUyI8UxMjiqc47Mo=; b=PnuV7xflEjOlvfaUhZ+KpylEZ6B+nAuxnCL/fVvuzcQE1nkl8ut88s1ez31P+4ljF4 iSgH127tca2DU8l3HJWyuwwLmSRVfbD3QBzO1HejgP3sc52tGIzmsdHjyMsMiHc5wDaC KKhsrvU1VK5jPM+f5/epbuW0a2Bm2UKTZqW0g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U6eMxQ6gGsP+PGOv8CvoIWRRS5TJUyI8UxMjiqc47Mo=; b=OyRYPwZuUCKPo3+bmpw4t75BkLIf7rqOtJBTfV0VcQAwmsNi68ae4LLdOa37vFDwzV LFYGpKa5CCx91mtLV8kp5dS4feessd3IfoAH697c7xbpSr5UjU2yxIyPHfeyZmEr+7jL 54hAWH8HaDxsvSU47/kkLOl6Y+kfMbffIHXz0WRv2ekBtaAnIUyCuGaqBApfy8g25amB sd+GRa8D6za3OuPXDOw5+W8pEWjkq8kbNZNtE/8cUIwAyxmFx9vgDg1aH4OVNLyK1U+h Slxdz2K69Ehm7C1mNIZHL7qhdhu/J5JsVEQxMDjpgTS+9/LZ2aGHga5MpVKRFohSDe2v khWQ== X-Gm-Message-State: AGi0PuYWdfXlRe/l7EmwfA8JPMG482E/SK1MQ6tCZaSNAJ9X6l51Up5V grFZsmcO7v/lmApjbYJIFOMpVg== X-Google-Smtp-Source: APiQypLQNaARhwh4BlYnWAU5WfsoUDeU0pdENYNXKn3KrAt4Qe7dKmRwhwnGgQvcs+0sUHauBpwL7w== X-Received: by 2002:adf:e751:: with SMTP id c17mr25218165wrn.351.1589274009683; Tue, 12 May 2020 02:00:09 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:09 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 12/17] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Date: Tue, 12 May 2020 10:59:39 +0200 Message-Id: <20200512085944.222637-13-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org Trying to grab dma_resv_lock while in commit_tail before we've done all the code that leads to the eventual signalling of the vblank event (which can be a dma_fence) is deadlock-y. Don't do that. Here the solution is easy because just grabbing locks to read something races anyway. We don't need to bother, READ_ONCE is equivalent. And avoids the locking issue. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 9bfaa4cad483..28e1af9f823c 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6699,7 +6699,11 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, * explicitly on fences instead * and in general should be called for * blocking commit to as per framework helpers + * + * Yes, this deadlocks, since you're calling dma_resv_lock in a + * path that leads to a dma_fence_signal(). Don't do that. */ +#if 0 r = amdgpu_bo_reserve(abo, true); if (unlikely(r != 0)) DRM_ERROR("failed to reserve buffer before flip\n"); @@ -6709,6 +6713,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, tmz_surface = amdgpu_bo_encrypted(abo); amdgpu_bo_unreserve(abo); +#endif + /* + * this races anyway, so READ_ONCE isn't any better or worse + * than the stuff above. Except the stuff above can deadlock. + */ + tiling_flags = READ_ONCE(abo->tiling_flags); fill_dc_plane_info_and_addr( dm->adev, new_plane_state, tiling_flags, From patchwork Tue May 12 08:59:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4BCCC54E97 for ; Tue, 12 May 2020 09:00:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A1FFB20A8B for ; Tue, 12 May 2020 09:00:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="kRhMNR3h" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729492AbgELJAg (ORCPT ); Tue, 12 May 2020 05:00:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729459AbgELJAO (ORCPT ); Tue, 12 May 2020 05:00:14 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98109C061A0E for ; Tue, 12 May 2020 02:00:14 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id l17so794363wrr.4 for ; Tue, 12 May 2020 02:00:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VL9TtbPRMRPCTQiRlmMi8z5JITNnf6io7xX7MRKl+Ko=; b=kRhMNR3hP3pKPVHZhZ9cT25ln+EAm07SNjcy31Y4dhe3XkzIe07JM4Ya995O4xRy86 +saK6s4RC0NG7qbyo1lVxn1eZUImoXrpCgVi47HZi5titk7nGYe9UjbrS8UkF6O3hPRR ovYttYkktjgExaN6BUOrX19W0nVFlKxDNLAKE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VL9TtbPRMRPCTQiRlmMi8z5JITNnf6io7xX7MRKl+Ko=; b=TYsW7922fhaqnuSKJ4GdNM1r/RIIqVIjy/YR2uNIwJ+yCFkon/VALrtgTnsK3D9rCi FnBKC3rLU4E98bJgdBDUKilrc0xWC0YCkLk5H7t7m35QmDZYi/sIYN3wR+jM9SAqkUvD cizWSDN78SFlq71mhkMzIPEhNwDSWKsPJd0TDFb7I2iXF+BWBC+5qNwXDjnfl7Hg0cxj zebmXGN0eGYM5EHHCv3duLFBFVw/wsOqMXsCFk9cIrM+w3ucIxFsucVlH0CC0u3dpp4w mSqhodcD6wjDwIZJJ3EjL9mcFwlm40AUohLNs3aRLIDtu8qugnEzsRrZoZmUCMO9cnuB ginA== X-Gm-Message-State: AGi0Pub6sQEfVoRUX6GFH0YXGtcu+caJJlsj5ONog8dh0hn/M7eXr2da KeRrw70KHsfVgcuG2hxk9QQn2Q== X-Google-Smtp-Source: APiQypISnc9hbO9Grrz8ir5cKtb2Z6Km73KFVxCF0gD5VBDJDeUiMEfX/2s/6rU4lnpOUPSzhy7kNw== X-Received: by 2002:a5d:4e0a:: with SMTP id p10mr23345391wrt.215.1589274013291; Tue, 12 May 2020 02:00:13 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:12 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 15/17] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Date: Tue, 12 May 2020 10:59:42 +0200 Message-Id: <20200512085944.222637-16-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org This is one from the department of "maybe play lottery if you hit this, karma compensation might work". Or at least lockdep ftw! This reverts commit 565d1941557756a584ac357d945bc374d5fcd1d0. It's not quite as low-risk as the commit message claims, because this grabs console_lock, which might be held when we allocate memory, which might never happen because the dma_fence_wait() is stuck waiting on our gpu reset: [ 136.763714] ====================================================== [ 136.763714] WARNING: possible circular locking dependency detected [ 136.763715] 5.7.0-rc3+ #346 Tainted: G W [ 136.763716] ------------------------------------------------------ [ 136.763716] kworker/2:3/682 is trying to acquire lock: [ 136.763716] ffffffff8226f140 (console_lock){+.+.}-{0:0}, at: drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763723] but task is already holding lock: [ 136.763724] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763726] which lock already depends on the new lock. [ 136.763726] the existing dependency chain (in reverse order) is: [ 136.763727] -> #2 (dma_fence_map){++++}-{0:0}: [ 136.763730] __dma_fence_might_wait+0x41/0xb0 [ 136.763732] dma_resv_lockdep+0x171/0x202 [ 136.763734] do_one_initcall+0x5d/0x2f0 [ 136.763736] kernel_init_freeable+0x20d/0x26d [ 136.763738] kernel_init+0xa/0xfb [ 136.763740] ret_from_fork+0x27/0x50 [ 136.763740] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 136.763743] fs_reclaim_acquire.part.0+0x25/0x30 [ 136.763745] kmem_cache_alloc_trace+0x2e/0x6e0 [ 136.763747] device_create_groups_vargs+0x52/0xf0 [ 136.763747] device_create+0x49/0x60 [ 136.763749] fb_console_init+0x25/0x145 [ 136.763750] fbmem_init+0xcc/0xe2 [ 136.763750] do_one_initcall+0x5d/0x2f0 [ 136.763751] kernel_init_freeable+0x20d/0x26d [ 136.763752] kernel_init+0xa/0xfb [ 136.763753] ret_from_fork+0x27/0x50 [ 136.763753] -> #0 (console_lock){+.+.}-{0:0}: [ 136.763755] __lock_acquire+0x1241/0x23f0 [ 136.763756] lock_acquire+0xad/0x370 [ 136.763757] console_lock+0x47/0x70 [ 136.763761] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763809] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.763850] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.763851] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.763852] process_one_work+0x23c/0x580 [ 136.763853] worker_thread+0x50/0x3b0 [ 136.763854] kthread+0x12e/0x150 [ 136.763855] ret_from_fork+0x27/0x50 [ 136.763855] other info that might help us debug this: [ 136.763856] Chain exists of: console_lock --> fs_reclaim --> dma_fence_map [ 136.763857] Possible unsafe locking scenario: [ 136.763857] CPU0 CPU1 [ 136.763857] ---- ---- [ 136.763857] lock(dma_fence_map); [ 136.763858] lock(fs_reclaim); [ 136.763858] lock(dma_fence_map); [ 136.763858] lock(console_lock); [ 136.763859] *** DEADLOCK *** [ 136.763860] 4 locks held by kworker/2:3/682: [ 136.763860] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763862] #1: ffffc90000cafe58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763863] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763865] #3: ffff8887ab621748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x5ab/0xe7b [amdgpu] [ 136.763914] stack backtrace: [ 136.763915] CPU: 2 PID: 682 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 136.763916] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 136.763918] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 136.763919] Call Trace: [ 136.763922] dump_stack+0x8f/0xd0 [ 136.763924] check_noncircular+0x162/0x180 [ 136.763926] __lock_acquire+0x1241/0x23f0 [ 136.763927] lock_acquire+0xad/0x370 [ 136.763932] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763933] ? mark_held_locks+0x2d/0x80 [ 136.763934] ? _raw_spin_unlock_irqrestore+0x46/0x60 [ 136.763936] console_lock+0x47/0x70 [ 136.763940] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763944] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763993] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.764036] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.764038] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.764040] process_one_work+0x23c/0x580 [ 136.764041] worker_thread+0x50/0x3b0 [ 136.764042] ? process_one_work+0x580/0x580 [ 136.764044] kthread+0x12e/0x150 [ 136.764045] ? kthread_create_worker_on_cpu+0x70/0x70 [ 136.764046] ret_from_fork+0x27/0x50 Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5560d045b2e0..3584e29323c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4047,8 +4047,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, if (r) goto out; - amdgpu_fbdev_set_suspend(tmp_adev, 0); - /* must succeed. */ amdgpu_ras_resume(tmp_adev); @@ -4217,8 +4215,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, */ amdgpu_unregister_gpu_instance(tmp_adev); - amdgpu_fbdev_set_suspend(tmp_adev, 1); - /* disable ras on ALL IPs */ if (!(in_ras_intr && !use_baco) && amdgpu_device_ip_need_full_reset(tmp_adev)) From patchwork Tue May 12 08:59:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17475C54E94 for ; Tue, 12 May 2020 09:00:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E385E2495C for ; Tue, 12 May 2020 09:00:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="i9sr1b15" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729354AbgELJAd (ORCPT ); Tue, 12 May 2020 05:00:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729229AbgELJAQ (ORCPT ); Tue, 12 May 2020 05:00:16 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 180DAC061A0F for ; Tue, 12 May 2020 02:00:16 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id e26so20839341wmk.5 for ; Tue, 12 May 2020 02:00:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SfgLaCvxfD7XNn5XcBekzeLLwvBoLk08cBBrX2rnnjo=; b=i9sr1b15hbiGaOUTudiYdX3zH08R5W0n+iiQBJ4jzvHs74gg+UjUiBZvQJSouDtJEs U8ZHUVFajgAOIXV46QqfuBEq50M9oQlamOF5wQdCRZXIwzGeHVJqbdbx5a26kN6Nt22r FqdwM2qxFwhdtXPbYwpqsk/7a6mCzD3lqiM4A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SfgLaCvxfD7XNn5XcBekzeLLwvBoLk08cBBrX2rnnjo=; b=pj/VapyJzYgOg6TzlBq71jpZw4l6fAIq48VrLNQ/zkkRH0qaSQz589YaAy0q5GPytk 0wUgRq9oAuQgE5TupdMSxnmswlkmkWScJoWuIjMWWr0i6ECLvIW/fwOjmnp9mie8O9YF 6UFupo0YXFUY81Az7v93B2U62xHBDgEgdHX5wYmdBtycPGm58OY/IEELB8oywbRJf6u2 gLzOoy6qyJGJnwidV1Z0l4hr0KOKYaF0VX+yidnyVFAQXJrpPfgcykv4ToV5dtoHH+8K PTJPD5C0BVXoyaH5pZ/wJ2tuvVlEwNfJ9VaSnnx8LRNBZ+kNCqyHnBTjTv/SsLV0ehsU HsEw== X-Gm-Message-State: AGi0Pubr63jSInSXo34Gbu7L+w1IGBtyPNTAeBWmmeLCsZzdl/EMTfRR b3Sr8FqnpEo8/3I6NiktNyWRug== X-Google-Smtp-Source: APiQypK/giXJoKHEmIIgvZJ7mM3tDZU5ICCASqmWo22Iyt5Tb6Wt3spXNWAx+Pb9At2o6/v8/Np1tg== X-Received: by 2002:a05:600c:40d:: with SMTP id q13mr26312967wmb.69.1589274014662; Tue, 12 May 2020 02:00:14 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:14 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [RFC 16/17] drm/amdgpu: gpu recovery does full modesets Date: Tue, 12 May 2020 10:59:43 +0200 Message-Id: <20200512085944.222637-17-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org ... I think it's time to stop this little exercise. The lockdep splat, for the record: [ 132.583381] ====================================================== [ 132.584091] WARNING: possible circular locking dependency detected [ 132.584775] 5.7.0-rc3+ #346 Tainted: G W [ 132.585461] ------------------------------------------------------ [ 132.586184] kworker/2:3/865 is trying to acquire lock: [ 132.586857] ffffc90000677c70 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.587569] but task is already holding lock: [ 132.589044] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 132.589803] which lock already depends on the new lock. [ 132.592009] the existing dependency chain (in reverse order) is: [ 132.593507] -> #2 (dma_fence_map){++++}-{0:0}: [ 132.595019] dma_fence_begin_signalling+0x50/0x60 [ 132.595767] drm_atomic_helper_commit+0xa1/0x180 [drm_kms_helper] [ 132.596567] drm_client_modeset_commit_atomic+0x1ea/0x250 [drm] [ 132.597420] drm_client_modeset_commit_locked+0x55/0x190 [drm] [ 132.598178] drm_client_modeset_commit+0x24/0x40 [drm] [ 132.598948] drm_fb_helper_restore_fbdev_mode_unlocked+0x4b/0xa0 [drm_kms_helper] [ 132.599738] drm_fb_helper_set_par+0x30/0x40 [drm_kms_helper] [ 132.600539] fbcon_init+0x2e8/0x660 [ 132.601344] visual_init+0xce/0x130 [ 132.602156] do_bind_con_driver+0x1bc/0x2b0 [ 132.602970] do_take_over_console+0x115/0x180 [ 132.603763] do_fbcon_takeover+0x58/0xb0 [ 132.604564] register_framebuffer+0x1ee/0x300 [ 132.605369] __drm_fb_helper_initial_config_and_unlock+0x36e/0x520 [drm_kms_helper] [ 132.606187] amdgpu_fbdev_init+0xb3/0xf0 [amdgpu] [ 132.607032] amdgpu_device_init.cold+0xe90/0x1677 [amdgpu] [ 132.607862] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu] [ 132.608697] amdgpu_pci_probe+0xf7/0x180 [amdgpu] [ 132.609511] local_pci_probe+0x42/0x80 [ 132.610324] pci_device_probe+0x104/0x1a0 [ 132.611130] really_probe+0x147/0x3c0 [ 132.611939] driver_probe_device+0xb6/0x100 [ 132.612766] device_driver_attach+0x53/0x60 [ 132.613593] __driver_attach+0x8c/0x150 [ 132.614419] bus_for_each_dev+0x7b/0xc0 [ 132.615249] bus_add_driver+0x14c/0x1f0 [ 132.616071] driver_register+0x6c/0xc0 [ 132.616902] do_one_initcall+0x5d/0x2f0 [ 132.617731] do_init_module+0x5c/0x230 [ 132.618560] load_module+0x2981/0x2bc0 [ 132.619391] __do_sys_finit_module+0xaa/0x110 [ 132.620228] do_syscall_64+0x5a/0x250 [ 132.621064] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 132.621903] -> #1 (crtc_ww_class_mutex){+.+.}-{3:3}: [ 132.623587] __ww_mutex_lock.constprop.0+0xcc/0x10c0 [ 132.624448] ww_mutex_lock+0x43/0xb0 [ 132.625315] drm_modeset_lock+0x44/0x120 [drm] [ 132.626184] drmm_mode_config_init+0x2db/0x8b0 [drm] [ 132.627098] amdgpu_device_init.cold+0xbd1/0x1677 [amdgpu] [ 132.628007] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu] [ 132.628920] amdgpu_pci_probe+0xf7/0x180 [amdgpu] [ 132.629804] local_pci_probe+0x42/0x80 [ 132.630690] pci_device_probe+0x104/0x1a0 [ 132.631583] really_probe+0x147/0x3c0 [ 132.632479] driver_probe_device+0xb6/0x100 [ 132.633379] device_driver_attach+0x53/0x60 [ 132.634275] __driver_attach+0x8c/0x150 [ 132.635170] bus_for_each_dev+0x7b/0xc0 [ 132.636069] bus_add_driver+0x14c/0x1f0 [ 132.636974] driver_register+0x6c/0xc0 [ 132.637870] do_one_initcall+0x5d/0x2f0 [ 132.638765] do_init_module+0x5c/0x230 [ 132.639654] load_module+0x2981/0x2bc0 [ 132.640522] __do_sys_finit_module+0xaa/0x110 [ 132.641372] do_syscall_64+0x5a/0x250 [ 132.642203] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 132.643022] -> #0 (crtc_ww_class_acquire){+.+.}-{0:0}: [ 132.644643] __lock_acquire+0x1241/0x23f0 [ 132.645469] lock_acquire+0xad/0x370 [ 132.646274] drm_modeset_acquire_init+0xd2/0x100 [drm] [ 132.647071] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.647902] dm_suspend+0x1c/0x60 [amdgpu] [ 132.648698] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu] [ 132.649498] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu] [ 132.650300] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu] [ 132.651084] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 132.651825] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 132.652594] process_one_work+0x23c/0x580 [ 132.653402] worker_thread+0x50/0x3b0 [ 132.654139] kthread+0x12e/0x150 [ 132.654868] ret_from_fork+0x27/0x50 [ 132.655598] other info that might help us debug this: [ 132.657739] Chain exists of: crtc_ww_class_acquire --> crtc_ww_class_mutex --> dma_fence_map [ 132.659877] Possible unsafe locking scenario: [ 132.661416] CPU0 CPU1 [ 132.662126] ---- ---- [ 132.662847] lock(dma_fence_map); [ 132.663574] lock(crtc_ww_class_mutex); [ 132.664319] lock(dma_fence_map); [ 132.665063] lock(crtc_ww_class_acquire); [ 132.665799] *** DEADLOCK *** [ 132.667965] 4 locks held by kworker/2:3/865: [ 132.668701] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 132.669462] #1: ffffc90000677e58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 132.670242] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 132.671039] #3: ffff8887b84a1748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x59e/0xe64 [amdgpu] [ 132.671902] stack backtrace: [ 132.673515] CPU: 2 PID: 865 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 132.674347] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 132.675194] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 132.676046] Call Trace: [ 132.676897] dump_stack+0x8f/0xd0 [ 132.677748] check_noncircular+0x162/0x180 [ 132.678604] ? stack_trace_save+0x4b/0x70 [ 132.679459] __lock_acquire+0x1241/0x23f0 [ 132.680311] lock_acquire+0xad/0x370 [ 132.681163] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.682021] ? cpumask_next+0x16/0x20 [ 132.682880] ? module_assert_mutex_or_preempt+0x14/0x40 [ 132.683737] ? __module_address+0x28/0xf0 [ 132.684601] drm_modeset_acquire_init+0xd2/0x100 [drm] [ 132.685466] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.686335] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.687255] dm_suspend+0x1c/0x60 [amdgpu] [ 132.688152] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu] [ 132.689057] ? amdgpu_fence_process+0x4c/0x150 [amdgpu] [ 132.689963] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu] [ 132.690893] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu] [ 132.691818] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 132.692707] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 132.693597] process_one_work+0x23c/0x580 [ 132.694487] worker_thread+0x50/0x3b0 [ 132.695373] ? process_one_work+0x580/0x580 [ 132.696264] kthread+0x12e/0x150 [ 132.697154] ? kthread_create_worker_on_cpu+0x70/0x70 [ 132.698057] ret_from_fork+0x27/0x50 Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3584e29323c0..b3b84a0d3baf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2415,6 +2415,14 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev) /* displays are handled separately */ if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) { /* XXX handle errors */ + + /* + * This is dm_suspend, which calls modeset locks, and + * that a pretty good inversion against dma_fence_signal + * which gpu recovery is supposed to guarantee. + * + * Dont ask me how to fix this. + */ r = adev->ip_blocks[i].version->funcs->suspend(adev); /* XXX handle errors */ if (r) {