From patchwork Fri Jan 17 18:47:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 858494 Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8E451A9B24 for ; Fri, 17 Jan 2025 18:47:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737139662; cv=none; b=MSPW0n6WIFEVvvMR8VKtokrBlIDpkIlG/V/I/XQfsETQ0eL3f+p/UfzH+mNDBn5WMj7NBNX+zKS1wXsnm130WBt8E3eL+hd7KQ2V1i6UQqF62cxwM0d8pwZhmVC7bglzaqrYV1yC+CoUjnhLkJVwRjtYG6FsCE796TpJgis7sao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737139662; c=relaxed/simple; bh=7AFM9ok1w81f2C++Gf69M3gwTiOfZOHfwkbHtBmlR6U=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=h01jPKZbgwNHRQYQsQsyphFrBji9yUtHQ3mUVSWsJ/RCfWIerDPA/CjIaj+ctA3d+FZQgV3SZLUqHM+UEVnDJkPAhkYcL1F9Y/p1N70dNPIME/3CU24r7OFpJ7L3hnuspZxsu+IzkTU9pBm2meqvIj/Cr/Y65r1XebVkHlNLiSE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DnU4C7H/; arc=none smtp.client-ip=209.85.219.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DnU4C7H/" Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-6dafe70ccd6so4251156d6.1 for ; Fri, 17 Jan 2025 10:47:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737139660; x=1737744460; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=thdukG+G+Hf8HToZ/Gcz+gc/k0VJqX4Uv/6IUD32wZs=; b=DnU4C7H/I/4+Wo02u32dUDI15HCFEkkjy949lrxypeyExQ+XE4JQ4tPSZFbppS+7KH Aam09kiy+dlEYN5IJ4iMafpfbZKZQINlG6cnICP0a32z/5f0QEWnG1jPMtjQ8/RXCptf mhmo8ZLZBsZXT/TTmLT+XPly+m5KpHa6YrPTKXsRDMiC7SM1K6ShDdcJhe81lleFo9Oz 5K9VGrdVSvJR/e5pMD0sBGx/TlTfGo2zvBqmsi9f2brx9uML2a0T6oraCpJbEc27kts2 yHtjSTsbrCamJIKLnxeNQK35mgjHLPFg/pQvtJ9NkwzAYsEaC/kGaNvtZv13NArv45SM DrKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737139660; x=1737744460; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=thdukG+G+Hf8HToZ/Gcz+gc/k0VJqX4Uv/6IUD32wZs=; b=iwbbdWT7o2xhTtymx0i7KSwFGwULfsofepw4xmkpSW9dulRSW+ipSg9s05zfbnsppA prt3PqF0T44lvTTfv/HYQoTWG9bLojIbTv6n8dNxB4il26daRB91JrlCLC5pD8p0kmp1 4opvKCxY/9Vd+5e3n31cLuGvVDaG9p5tR5yBcgf2dp49Msy0/IhLddLxgNWJFEPcS+3M km2qQ7aodSz1P1jVV3waUfZcgL7c1d+mJqIVAwck+X0WlH4Vx0Af5vAyab0BnaiCA3ne foFdrNI1p1AoPviFxTUvdAgBGk8z+xIiSvCWzea3Wn3pULjmuqzIsQyJA7AotX/mAu7L A4rA== X-Forwarded-Encrypted: i=1; AJvYcCUPoq9OyYcEi/WAcsnEvgn3LntNXhBp+DwFqEyRQ56Xau97G3KdyijSSGCfbiWLMaBNud82KOhA30XgEUlY@vger.kernel.org X-Gm-Message-State: AOJu0Yz/pWNSahuiWG5hIPMdyd8py5DbjIK2A9MiamIdCWn147LnBeY/ RjA0rcUb+kfphrsHm3gFyqc+gneWn1VZMujxop6ZMZmVi3kRkkbB X-Gm-Gg: ASbGncujR0FLyeeKeXEq4KXcY0ofdZnE5u2Ag1JknN3Tnv59F19MQjR6bkiYBIhkZjG mot1/xA1NnI9MPlnKMtyjCYfNc+acxhWmrCvCbmRF27j8YyYXjQp6C5/srYi7qp0lVrpx/3dEHs lTdBJRQA2DyYEAqGlTVOLKC5lZ5yfFHp0/7dfPTQjMZi9xNFJyVyrZj9CLlwDuWV/RmllKuDxZ3 vVvdS9HmDEIXxKLhFKDu9l212upaJwDrOwYtnwXsRlxoTZG5ZGREVpu46iVKs3M/OoPSLuQpfhL qEFJo2+LPpdIQOg= X-Google-Smtp-Source: AGHT+IFCtwKid4R9G/cATIXezUnzqNQSbD7LKvgass5fNNZc7tPfiHB8Zo6yUqQCv+onFrLudNkh8Q== X-Received: by 2002:a05:6214:27e5:b0:6d8:967a:1a60 with SMTP id 6a1803df08f44-6e1b2155c60mr25286736d6.2.1737139658289; Fri, 17 Jan 2025 10:47:38 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6e1afcd3859sm13992176d6.74.2025.01.17.10.47.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 10:47:37 -0800 (PST) From: Connor Abbott Date: Fri, 17 Jan 2025 13:47:07 -0500 Subject: [PATCH 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250117-msm-gpu-fault-fixes-next-v1-1-bc9b332b5d0b@gmail.com> References: <20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com> In-Reply-To: <20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737139656; l=5695; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=7AFM9ok1w81f2C++Gf69M3gwTiOfZOHfwkbHtBmlR6U=; b=X+UwyS84Mnzz7AGUA5nnVxAYS//fGOH+DQuYOJ9NlINCnfa/ZQ7Lo1kEPeJsOuB3QXodNs8nF iU2wOQc8uUUBczya9xYLQs0rwFL2blr/6XtA1wGsbUnBXZjEb0pUDBH X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS asserts an interrupt. The only way to clear that bit is to resume the transaction by writing SMMU_CBn_RESUME, but typically resuming the transaction requires complex operations (copying in pages, etc.) that can't be done in IRQ context. drm/msm already has a problem, because its fault handler sometimes schedules a job to dump the GPU state and doesn't resume translation until this is complete. Work around this by disabling context fault interrupts until after the transaction is resumed. Because other context banks can share an IRQ line, we may still get an interrupt intended for another context bank, but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip it assuming that interrupts are disabled which is accomplished by removing the bit from ARM_SMMU_CB_FSR_FAULT. Signed-off-by: Connor Abbott --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 +++++++++++++- drivers/iommu/arm/arm-smmu/arm-smmu.c | 32 ++++++++++++++++++++++++++++++ drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 +- 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 59d02687280e8d37b5e944619fcfe4ebd1bd6926..ee2fdf7e79a6d04bc2700e454253c96b573c5569 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -125,12 +125,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina struct arm_smmu_domain *smmu_domain = (void *)cookie; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; struct arm_smmu_device *smmu = smmu_domain->smmu; - u32 reg = 0; + u32 reg = 0, sctlr; + unsigned long flags; if (terminate) reg |= ARM_SMMU_RESUME_TERMINATE; + spin_lock_irqsave(&smmu_domain->stall_lock, flags); + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg); + + /* + * Re-enable interrupts after they were disabled by + * arm_smmu_context_fault(). + */ + sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR); + sctlr |= ARM_SMMU_SCTLR_CFIE; + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr); + + spin_unlock_irqrestore(&smmu_domain->stall_lock, flags); } static void qcom_adreno_smmu_set_prr_bit(const void *cookie, bool set) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 79afc92e1d8b984dd35c469a3f283ad0c78f3d26..c92de760940ee2872f22dbe1b2519e02766aa143 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -457,12 +457,43 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) DEFAULT_RATELIMIT_BURST); int idx = smmu_domain->cfg.cbndx; int ret; + unsigned long flags; arm_smmu_read_context_fault_info(smmu, idx, &cfi); if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) return IRQ_NONE; + /* + * On some implementations FSR.SS asserts a context fault + * interrupt. We do not want this behavior, because resolving the + * original context fault typically requires operations that cannot be + * performed in IRQ context but leaving the stall unacknowledged will + * immediately lead to another spurious interrupt as FSR.SS is still + * set. Work around this by disabling interrupts for this context bank. + * It's expected that interrupts are re-enabled after resuming the + * translation. + * + * We have to do this before report_iommu_fault() so that we don't + * leave interrupts disabled in case the downstream user decides the + * fault can be resolved inside its fault handler. + * + * There is a possible race if there are multiple context banks sharing + * the same interrupt and both signal an interrupt in between writing + * RESUME and SCTLR. We could disable interrupts here before we + * re-enable them in the resume handler, leaving interrupts enabled. + * Lock the write to serialize it with the resume handler. + */ + if (cfi.fsr & ARM_SMMU_CB_FSR_SS) { + u32 val; + + spin_lock_irqsave(&smmu_domain->stall_lock, flags); + val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR); + val &= ~ARM_SMMU_SCTLR_CFIE; + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val); + spin_unlock_irqrestore(&smmu_domain->stall_lock, flags); + } + ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova, cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); @@ -921,6 +952,7 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) mutex_init(&smmu_domain->init_mutex); spin_lock_init(&smmu_domain->cb_lock); + spin_lock_init(&smmu_domain->stall_lock); return &smmu_domain->domain; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index 2dbf3243b5ad2db01e17fb26c26c838942a491be..153fac131b2484d468fd482ffbf130efc8cfb8f6 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -216,7 +216,6 @@ enum arm_smmu_cbar_type { ARM_SMMU_CB_FSR_TLBLKF) #define ARM_SMMU_CB_FSR_FAULT (ARM_SMMU_CB_FSR_MULTI | \ - ARM_SMMU_CB_FSR_SS | \ ARM_SMMU_CB_FSR_UUT | \ ARM_SMMU_CB_FSR_EF | \ ARM_SMMU_CB_FSR_PF | \ @@ -384,6 +383,7 @@ struct arm_smmu_domain { enum arm_smmu_domain_stage stage; struct mutex init_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ + spinlock_t stall_lock; struct iommu_domain domain; }; From patchwork Fri Jan 17 18:47:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 858493 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA0DC1A9B48 for ; Fri, 17 Jan 2025 18:47:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737139664; cv=none; b=PjsoJSJ5a4SYm316DNn5hv6vi1SvTAAcD4VOll3zoJH5BXbQ1hiMi2+n3SYYdrT/dyvsAeEiqyEg2F8yxj0UGFmzv21MbQQIOF3WmW6gSSAsK5poOB1NA4MZGIpwfetVxOmSW6VnrK746vT1uAabjH7Q9bZ9B0SkCkkgIYa15nc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737139664; c=relaxed/simple; bh=JyOLVh0gEkcPnnA/qjt0Kfj3pqe9UFDopPZk8F1e/x0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=tVF4M6XkAbcHBc0fo5Hwyk5oKrEZQFrZjs2TPhJLEimDvS/rBfw4BphrcHPBo61bQLAgtqsQGZbb9E6POjwMvLJ5o1tdQ8rOeFWDjhc4Imun7jT2t/JQCcJcoP/d4qNNosqJeiOCLstnjoHbcjbN6zI5k7aerCdWRQ5IIMCbezk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dcn8JkyH; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dcn8JkyH" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6dada7c89d9so2098816d6.3 for ; Fri, 17 Jan 2025 10:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737139661; x=1737744461; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=7cXS8jZh32k3xn1O/NOhZ8hhmDw8Y92QQ/WTT5FTNxU=; b=dcn8JkyHoT0J7xQuP+JP4tmgHKOhDkDlX9d98Wslo6/smLbce6tDfb/GmHqi4pAyMo GU5cdcfIccMitKx51z+MciATESjr1oP8KI+MV/MR5+iCo7MVjCkgUhOZHrjgFknECWHB GRwC9jdJHCfTtNi+nwsQn/0EhsVDI/P6FGqb5CL9IJLsjgS5EkzCNygPzTj6Ij7gVbVa Q6ZxMG1jSpSgdqY1buFst/cnAxTzJF3/oANIDCx5vflo2wYz1/V8q10UfXTSJNxpbagJ 5hY4+DbibcyGDAUA2J9k3tfguJovXZEcI3afcg/u6sajPntuUPFCOPty+7a1thSjrcNA /PSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737139661; x=1737744461; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7cXS8jZh32k3xn1O/NOhZ8hhmDw8Y92QQ/WTT5FTNxU=; b=aBHBU5V6hKpG3KgfQWEmAm5fRMgy9NcDmEGUyLTzDsN/uWgCL63Du03k+CHpBpyt2v XLhkR4xdAlzMzmXOY1i12VWjgJKYK9a9J7ja2BV+Dpx/dpVIwgd2KJXEM/RoxlvdCqbz mwKqHjn6eEcqoS4cUU1wZw2GM7+0AkQV6x/DM2EGM6VTCSdPE9Zp+sNPyQd0WiTgZQl9 kT/Xn6gt/4ujnW1mAMO+LcpybCVgW/smIpaCb/smLZEXVsgETLl6SGPu97g/Bbx5pS4g 0bpT0qRImS68r2j9gHwIrEhjPJen9SRfqWwd0SxmTHFukxt8we124pqZkI6yhKwM7jbM oBPw== X-Forwarded-Encrypted: i=1; AJvYcCWSzBxaP+terY8/qhOa4ogGk3LGRfFhcFfRIHzVE7ZbHJTMOAonIrUikqe2v2bvcgQhoxcDd9kG10e7UlFP@vger.kernel.org X-Gm-Message-State: AOJu0YyhcgmN8ifQ3OR6AiG6sXWBIvhhpQOJjshTBlUVJ34bgEmB5CN/ YIxXScvtu8MlkOBytGxzwOnnYIqfFLDQJYMPEqwxKVTztmVlPCXW X-Gm-Gg: ASbGncuEQNbX6+Xa+u/GvZjylDsk4ss1KXv7jCbJUBHIfKta+T+CbdJUGnafduKea2F 8xKOYw4xSvApz87jx1Uptbnpys4BXlXMskA0lXNGJlcQp8FDSHX2aPjdMrXVxm42bx87LyycWvE 1DSp5iw0Qzq++BoXG6GGBSPlIfqUU4mpR1gTFd3HvF7XM2OsCG8hHFrB/G8TK8fHWPbx6LEY7Do qgWb7zmM0XZWcrh9PhPUqN9KE64roBmD0OW5DzM6S4qt7kQ36Ofg/0VYGIC+olRsGRHYrZHYrTR /bz/eaU/tTX7Udw= X-Google-Smtp-Source: AGHT+IFkED0JQTC0Xpd5G1Q8mAnhBtwFYNMO1jthK556QYFx0lYcOPWbJ2YYI/wogCp8QUtnRASS4Q== X-Received: by 2002:a05:6214:528b:b0:6d8:7d84:a514 with SMTP id 6a1803df08f44-6e1b2175b9dmr23314216d6.5.1737139659256; Fri, 17 Jan 2025 10:47:39 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6e1afcd3859sm13992176d6.74.2025.01.17.10.47.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 10:47:38 -0800 (PST) From: Connor Abbott Date: Fri, 17 Jan 2025 13:47:08 -0500 Subject: [PATCH 2/3] iommu/arm-smmu-qcom: Make set_stall work when the device is on Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250117-msm-gpu-fault-fixes-next-v1-2-bc9b332b5d0b@gmail.com> References: <20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com> In-Reply-To: <20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737139656; l=2114; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=JyOLVh0gEkcPnnA/qjt0Kfj3pqe9UFDopPZk8F1e/x0=; b=Z8mukskVxG/5911ObuOE8aZjxBMyoMMxQL49HUEn1UzKTTkKUCl1YqM1Kdm1q24jL47h7A9xL sQTHHPy79i0CWIRIUNZjb06P1taEZ4QbjZs13mX8cjDTJoodH+OmmrC X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= Up until now we have only called the set_stall callback during initialization when the device is off. But we will soon start calling it to temporarily disable stall-on-fault when the device is on, so handle that by checking if the device is on and writing SCTLR. Signed-off-by: Connor Abbott --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 31 +++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index ee2fdf7e79a6d04bc2700e454253c96b573c5569..54be27f7b49d78b7542fd714d6aade2b23c65fc0 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -112,12 +112,37 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled) { struct arm_smmu_domain *smmu_domain = (void *)cookie; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; - struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu); + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); + u32 mask = BIT(cfg->cbndx); + bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled; if (enabled) - qsmmu->stall_enabled |= BIT(cfg->cbndx); + qsmmu->stall_enabled |= mask; else - qsmmu->stall_enabled &= ~BIT(cfg->cbndx); + qsmmu->stall_enabled &= ~mask; + + /* + * If the device is on and we changed the setting, update the register. + */ + if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) { + u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR); + + if (enabled) + reg |= ARM_SMMU_SCTLR_CFCFG; + else + reg &= ~ARM_SMMU_SCTLR_CFCFG; + + arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg); + + /* + * If doing this in the context fault handler, make sure the + * update lands before we acknowledge the fault. + */ + wmb(); + + pm_runtime_put_autosuspend(smmu->dev); + } } static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate) From patchwork Fri Jan 17 18:47:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Connor Abbott X-Patchwork-Id: 858296 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F1E71A9B2C for ; Fri, 17 Jan 2025 18:47:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737139664; cv=none; b=k41Y5t3wY3rzxo687rxe3pl8dXdSAh5zBLDO1jS11hEx9y60ibTkJI3YYMbiMfqUtcFI629Fp/LCr2n1FQ2xm7GmYC7hOt5tGL/43o3fmYBePVzH0aRIM7Rtha9XfyiuS3AX2HVVVqyX97Rpn19pFDE6NPGBGD1STnuur4msk6U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737139664; c=relaxed/simple; bh=Azd0LHGJbPV6BAUlAR+Xa9srFF0aNcFCaktmBN9LT5I=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ZhGBzmRtDErT6Xtz4iZ+Z+Em6Cf3RLPnM60YQ3sXSN/U8+FlX6rijcFTUifeEYOnde8sq5Ac468pvgZKXaLdPZ7zcCWd1uI2MAri/AR/Ks+Pxr2CDN6uWo69wVURLyka0iNe/ULcqsS/YP4nHyFmYRLd4Mq87GPgmV2kfD1FBfE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VPpIsm3t; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VPpIsm3t" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-6d8983825a3so3380616d6.0 for ; Fri, 17 Jan 2025 10:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737139660; x=1737744460; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=4L1KrStSGgMrLKkKl6uxBZehY7mDLoPXpGyU7mfraHE=; b=VPpIsm3tETTi/r1bPLozaojw2/fDFn8yawBI6op76LNRbHrKoSeTITPPVVfRSjv1R9 m7Kr6waOA+Walp5MMN+9tQ90spGHWHkE08Vb+j7tHeCQH078ght1FyFHhW2rwijMcry2 17bA3fKnki3ng7URjh7vJrz2hhR/LZdi8fVt4QRxTFej0l0iFdMPjAD413IoG3pCbcW6 zzrUYJ/LWRYMuIj4TDv36oSwqE/9tJwAfHe/fmmzF8+OL9enWxmN3f+WGq8p5m6kCI9r Ti8D2Lw7mCrQsyqrQuRc/HSXFaDvolrfS9i2eX5vUvj42McJkm/dl+JPrTQVGvPl3/4g D/jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737139660; x=1737744460; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4L1KrStSGgMrLKkKl6uxBZehY7mDLoPXpGyU7mfraHE=; b=CKgrCG5kC7nQx6aVChSTPlpVgJ1ciNcaIUwK+cC/+B1c5unLU/PdhNh7I1RwARxd/W qMsUhd/YYw4doTA7hoC0iLgzcWjeZOAjlFy9YrTZJsgCMVsZIO2D3ex+mFp/U7lvybxW mqoJGjijU3dhQu7h3f42VPd9uBLOq+sxgHlAIXStnn3TcKHqIXB2Y3g6aq52qk/OPV4K DGKkwcUSknpbm/MjXQNeh4pIQHU2IBvA9aoxISmQ+VeTbvr197LEXVQc16r8n7NlqUKF Ig0ZpV7DDtlyXEjBJraYP3C8GECtshftnOcYjiqB6qzt2Ibk00lEyhlQUTV6JzaKJIeM L2Og== X-Forwarded-Encrypted: i=1; AJvYcCV4NdACxFb2IXD+dgiimBfIrQ+2t75GQ7f9O/pbcoeoobN898EKI6SUbQdaqGZ9NjakI+GPv1fa0oFc8qBA@vger.kernel.org X-Gm-Message-State: AOJu0Yw/SQECAvGXwNIPP2WjDseAqCCb8tXFIBvwQ6v7GR7PxW0vUPd1 d1WBiZGa+sNcc57hG/qI0aQlZtICKuGJsVaOZLJU3d9YGfCsQIZD X-Gm-Gg: ASbGncvZZPA8yi6Ucj+UJVNrHIo9ZNz/z9njckc9KK3Lgl5C2/7+lG89AlS5uH4qMht 6pm9WogXxaxi+WGNRNIOFANu8R9Uek0mcnqwkzQCCiV/+ghVXVTiqlTLDP6KU799Wxv8tA7wfPy js2NATwrlgqeMVU4I3mNQHFJBZUDilyDGshMsd838mPUlbQZOMw1MSOKFN5T81B+kWV9KpyH55C GKGgHwr+LLIIXDGtP801aXxvCiF4Sx/CGcFdg46dKP+fO7/nylg6cPWq6oLUTRQFnOpojID0iRE UkuVQqVy6thAcwc= X-Google-Smtp-Source: AGHT+IEIyw5aGTESg9zUs6A5tho39lsXqAds1z3a8v1SLXZHSS9GUHBu+n7xHH9/ywAcj3SysGTDjA== X-Received: by 2002:a05:6214:528a:b0:6d8:a90b:1564 with SMTP id 6a1803df08f44-6e1b21ba3f6mr22060366d6.6.1737139660304; Fri, 17 Jan 2025 10:47:40 -0800 (PST) Received: from [192.168.1.99] (ool-4355b0da.dyn.optonline.net. [67.85.176.218]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6e1afcd3859sm13992176d6.74.2025.01.17.10.47.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 10:47:39 -0800 (PST) From: Connor Abbott Date: Fri, 17 Jan 2025 13:47:09 -0500 Subject: [PATCH 3/3] drm/msm: Temporarily disable stall-on-fault after a page fault Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250117-msm-gpu-fault-fixes-next-v1-3-bc9b332b5d0b@gmail.com> References: <20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com> In-Reply-To: <20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com> To: Rob Clark , Will Deacon , Robin Murphy , Joerg Roedel , Sean Paul , Konrad Dybcio , Abhinav Kumar , Dmitry Baryshkov , Marijn Suijten Cc: iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, freedreno@lists.freedesktop.org, Connor Abbott X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1737139656; l=9406; i=cwabbott0@gmail.com; s=20240426; h=from:subject:message-id; bh=Azd0LHGJbPV6BAUlAR+Xa9srFF0aNcFCaktmBN9LT5I=; b=gspZvcy0PinUMwLI8bcCW+Uo6tVZOE/Ynr6m7+0VeKjlXgIRjeYaEXzAu1aAjUlDpx/jq2nlv ln3K0SO9BS3B9wdm/nclFxOBJtN7Ugr7u838eMB87ADklULFCMaraL+ X-Developer-Key: i=cwabbott0@gmail.com; a=ed25519; pk=dkpOeRSXLzVgqhy0Idr3nsBr4ranyERLMnoAgR4cHmY= When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares a context bank with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problem, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott --- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 2 ++ drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 +++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 56 ++++++++++++++++++++++++++++++++- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 21 +++++++++++++ drivers/gpu/drm/msm/msm_iommu.c | 9 ++++++ drivers/gpu/drm/msm/msm_mmu.h | 1 + 6 files changed, 92 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index 71dca78cd7a5324e9ff5b14f173e2209fa42e196..a559e47af5b549e154fa6c32ef8879dd856531a2 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -131,6 +131,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_ringbuffer *ring = submit->ring; unsigned int i, ibs = 0; + adreno_gpu_enable_iommu_stall(adreno_gpu); + if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) { ring->cur_ctx_seqno = 0; a5xx_submit_in_rb(gpu, submit); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 0ae29a7c8a4d3f74236a35cc919f69d5c0a384a0..0e63ee62d3eff3e274bae375430efbdf6f8dccf0 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -212,6 +212,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_ringbuffer *ring = submit->ring; unsigned int i, ibs = 0; + adreno_gpu_enable_iommu_stall(adreno_gpu); + a6xx_set_pagetable(a6xx_gpu, ring, submit); get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0), @@ -335,6 +337,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_ringbuffer *ring = submit->ring; unsigned int i, ibs = 0; + adreno_gpu_enable_iommu_stall(adreno_gpu); + /* * Toggle concurrent binning for pagetable switch and set the thread to * BR since only it can execute the pagetable switch packets. diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 1238f326597808eb28b4c6822cbd41a26e555eb9..6bf834d075219193cce187ec5f55aa691121aad3 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -246,16 +246,65 @@ u64 adreno_private_address_space_size(struct msm_gpu *gpu) return SZ_4G; } +void adreno_gpu_enable_iommu_stall(struct adreno_gpu *adreno_gpu) +{ + struct msm_gpu *gpu = &adreno_gpu->base; + unsigned long flags; + + /* + * Wait until the cooldown period has passed and we would actually + * collect a crashdump to re-enable stall-on-fault. + */ + spin_lock_irqsave(&adreno_gpu->fault_stall_lock, flags); + if (!adreno_gpu->stall_enabled && + READ_ONCE(adreno_gpu->enable_stall_on_submit) && + !READ_ONCE(gpu->crashstate)) { + adreno_gpu->stall_enabled = true; + + gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, true); + } + spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, flags); +} + +static void fault_stall_handler(struct timer_list *t) +{ + struct adreno_gpu *gpu = from_timer(gpu, t, fault_stall_timer); + + WRITE_ONCE(gpu->enable_stall_on_submit, true); +} + + #define ARM_SMMU_FSR_TF BIT(1) #define ARM_SMMU_FSR_PF BIT(3) #define ARM_SMMU_FSR_EF BIT(4) +#define ARM_SMMU_FSR_SS BIT(30) int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags, struct adreno_smmu_fault_info *info, const char *block, u32 scratch[4]) { + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); const char *type = "UNKNOWN"; - bool do_devcoredump = info && !READ_ONCE(gpu->crashstate); + bool do_devcoredump = info && (info->fsr & ARM_SMMU_FSR_SS) && + !READ_ONCE(gpu->crashstate); + unsigned long irq_flags; + + /* + * In case there is a subsequent storm of pagefaults, disable + * stall-on-fault for at least half a second. + */ + spin_lock_irqsave(&adreno_gpu->fault_stall_lock, irq_flags); + if (adreno_gpu->stall_enabled) { + adreno_gpu->stall_enabled = false; + adreno_gpu->enable_stall_on_submit = false; + + gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, false); + + } + spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, irq_flags); + + mod_timer(&adreno_gpu->fault_stall_timer, + round_jiffies_up(jiffies + msecs_to_jiffies(500))); /* * If we aren't going to be resuming later from fault_worker, then do @@ -1143,6 +1192,11 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, adreno_gpu->info->inactive_period); pm_runtime_use_autosuspend(dev); + spin_lock_init(&adreno_gpu->fault_stall_lock); + timer_setup(&adreno_gpu->fault_stall_timer, fault_stall_handler, 0); + adreno_gpu->enable_stall_on_submit = true; + adreno_gpu->stall_enabled = true; + return msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base, gpu_name, &adreno_gpu_config); } diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index dcf454629ce037b2a8274a6699674ad754ce1f07..c59501afa40c223d02bea3ff9b0dbc309d099317 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h @@ -205,6 +205,25 @@ struct adreno_gpu { /* firmware: */ const struct firmware *fw[ADRENO_FW_MAX]; + spinlock_t fault_stall_lock; + + struct timer_list fault_stall_timer; + + /** + * enable_stall_on_submit: + * + * Whether to re-enable stall-on-fault on the next submit. + */ + bool enable_stall_on_submit; + + /** + * stall_enabled: + * + * Whether stall-on-fault is currently enabled. + */ + bool stall_enabled; + + struct { /** * @rgb565_predicator: Unknown, introduced with A650 family, @@ -629,6 +648,8 @@ int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags, struct adreno_smmu_fault_info *info, const char *block, u32 scratch[4]); +void adreno_gpu_enable_iommu_stall(struct adreno_gpu *gpu); + int adreno_read_speedbin(struct device *dev, u32 *speedbin); /* diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 2a94e82316f95c5f9dcc37ef0a4664a29e3492b2..8d5380e6dcc217c7c209b51527bf15748b3ada71 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -351,6 +351,14 @@ static void msm_iommu_resume_translation(struct msm_mmu *mmu) adreno_smmu->resume_translation(adreno_smmu->cookie, true); } +static void msm_iommu_set_stall(struct msm_mmu *mmu, bool enable) +{ + struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev); + + if (adreno_smmu->set_stall) + adreno_smmu->set_stall(adreno_smmu->cookie, enable); +} + static void msm_iommu_detach(struct msm_mmu *mmu) { struct msm_iommu *iommu = to_msm_iommu(mmu); @@ -399,6 +407,7 @@ static const struct msm_mmu_funcs funcs = { .unmap = msm_iommu_unmap, .destroy = msm_iommu_destroy, .resume_translation = msm_iommu_resume_translation, + .set_stall = msm_iommu_set_stall, }; struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks) diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h index 88af4f490881f2a6789ae2d03e1c02d10046331a..2694a356a17904e7572b767b16ed0cee806406cf 100644 --- a/drivers/gpu/drm/msm/msm_mmu.h +++ b/drivers/gpu/drm/msm/msm_mmu.h @@ -16,6 +16,7 @@ struct msm_mmu_funcs { int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len); void (*destroy)(struct msm_mmu *mmu); void (*resume_translation)(struct msm_mmu *mmu); + void (*set_stall)(struct msm_mmu *mmu, bool enable); }; enum msm_mmu_type {