diff mbox series

scsi: mpt3sas: page fault in reply q processing

Message ID d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com
State New
Headers show
Series scsi: mpt3sas: page fault in reply q processing | expand

Commit Message

Matt Lupfer March 8, 2022, 3:27 p.m. UTC
We encountered a page fault in mpt3sas on a LUN reset error path:

[  145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0)
[  145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2)
[  145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2)
[  145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00
[  145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0)
[  145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0)
[  149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002)
[  149.875202] BUG: unable to handle page fault for address: 00000007fffc445d
[  149.885617] #PF: supervisor read access in kernel mode
[  149.894346] #PF: error_code(0x0000) - not-present page
[  149.903123] PGD 0 P4D 0
[  149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S         O      5.10.89-altav-1 #1
[  149.934327] Hardware name: DDN           200NVX2             /200NVX2-MB          , BIOS ATHG2.2.02.01 09/10/2021
[  149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas]
[  149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee
[  149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246
[  150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071
[  150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8
[  150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff
[  150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000
[  150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80
[  150.054963] FS:  0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000
[  150.066715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0
[  150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  150.108323] PKRU: 55555554
[  150.114690] Call Trace:
[  150.120497]  ? printk+0x48/0x4a
[  150.127049]  mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas]
[  150.136453]  mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas]
[  150.145759]  scsih_dev_reset+0xea/0x300 [mpt3sas]
[  150.153891]  scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod]
[  150.162206]  ? __scsi_host_match+0x20/0x20 [scsi_mod]
[  150.170406]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[  150.178925]  ? blk_mq_tagset_busy_iter+0x45/0x60
[  150.186638]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[  150.195087]  scsi_error_handler+0x3a5/0x4a0 [scsi_mod]
[  150.203206]  ? __schedule+0x1e9/0x610
[  150.209783]  ? scsi_eh_get_sense+0x210/0x210 [scsi_mod]
[  150.217924]  kthread+0x12e/0x150
[  150.224041]  ? kthread_worker_fn+0x130/0x130
[  150.231206]  ret_from_fork+0x1f/0x30

This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
pointer outside of the list_for_each_entry() loop. At the end of the
full list traversal the pointer is invalid.

Move the _base_process_reply_queue() call inside of the loop.

Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Cc: stable@vger.kernel.org
Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset")
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Sreekanth Reddy March 11, 2022, 1:49 p.m. UTC | #1
On Tue, Mar 8, 2022 at 8:57 PM Matt Lupfer <mlupfer@ddn.com> wrote:
>
> We encountered a page fault in mpt3sas on a LUN reset error path:
>
> [  145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0)
> [  145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2)
> [  145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2)
> [  145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00
> [  145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0)
> [  145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0)
> [  149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002)
> [  149.875202] BUG: unable to handle page fault for address: 00000007fffc445d
> [  149.885617] #PF: supervisor read access in kernel mode
> [  149.894346] #PF: error_code(0x0000) - not-present page
> [  149.903123] PGD 0 P4D 0
> [  149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S         O      5.10.89-altav-1 #1
> [  149.934327] Hardware name: DDN           200NVX2             /200NVX2-MB          , BIOS ATHG2.2.02.01 09/10/2021
> [  149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas]
> [  149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee
> [  149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246
> [  150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071
> [  150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8
> [  150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff
> [  150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000
> [  150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80
> [  150.054963] FS:  0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000
> [  150.066715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0
> [  150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  150.108323] PKRU: 55555554
> [  150.114690] Call Trace:
> [  150.120497]  ? printk+0x48/0x4a
> [  150.127049]  mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas]
> [  150.136453]  mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas]
> [  150.145759]  scsih_dev_reset+0xea/0x300 [mpt3sas]
> [  150.153891]  scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod]
> [  150.162206]  ? __scsi_host_match+0x20/0x20 [scsi_mod]
> [  150.170406]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
> [  150.178925]  ? blk_mq_tagset_busy_iter+0x45/0x60
> [  150.186638]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
> [  150.195087]  scsi_error_handler+0x3a5/0x4a0 [scsi_mod]
> [  150.203206]  ? __schedule+0x1e9/0x610
> [  150.209783]  ? scsi_eh_get_sense+0x210/0x210 [scsi_mod]
> [  150.217924]  kthread+0x12e/0x150
> [  150.224041]  ? kthread_worker_fn+0x130/0x130
> [  150.231206]  ret_from_fork+0x1f/0x30
>
> This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
> pointer outside of the list_for_each_entry() loop. At the end of the
> full list traversal the pointer is invalid.
>
> Move the _base_process_reply_queue() call inside of the loop.
>
> Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
> Cc: stable@vger.kernel.org
> Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset")

Thanks for the patch.
Ack-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>

> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 511726f92d9a..76229b839560 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -2011,9 +2011,10 @@ mpt3sas_base_sync_reply_irqs(struct MPT3SAS_ADAPTER *ioc, u8 poll)
>                                 enable_irq(reply_q->os_irq);
>                         }
>                 }
> +
> +               if (poll)
> +                       _base_process_reply_queue(reply_q);
>         }
> -       if (poll)
> -               _base_process_reply_queue(reply_q);
>  }
>
>  /**
> --
> 2.35.1
>
Martin K. Petersen March 15, 2022, 5:01 a.m. UTC | #2
On Tue, 8 Mar 2022 15:27:02 +0000, Matt Lupfer wrote:

> We encountered a page fault in mpt3sas on a LUN reset error path:
> 
> [  145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0)
> [  145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2)
> [  145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2)
> [  145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00
> [  145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0)
> [  145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0)
> [  149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002)
> [  149.875202] BUG: unable to handle page fault for address: 00000007fffc445d
> [  149.885617] #PF: supervisor read access in kernel mode
> [  149.894346] #PF: error_code(0x0000) - not-present page
> [  149.903123] PGD 0 P4D 0
> [  149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S         O      5.10.89-altav-1 #1
> [  149.934327] Hardware name: DDN           200NVX2             /200NVX2-MB          , BIOS ATHG2.2.02.01 09/10/2021
> [  149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas]
> [  149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee
> [  149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246
> [  150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071
> [  150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8
> [  150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff
> [  150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000
> [  150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80
> [  150.054963] FS:  0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000
> [  150.066715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0
> [  150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  150.108323] PKRU: 55555554
> [  150.114690] Call Trace:
> [  150.120497]  ? printk+0x48/0x4a
> [  150.127049]  mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas]
> [  150.136453]  mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas]
> [  150.145759]  scsih_dev_reset+0xea/0x300 [mpt3sas]
> [  150.153891]  scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod]
> [  150.162206]  ? __scsi_host_match+0x20/0x20 [scsi_mod]
> [  150.170406]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
> [  150.178925]  ? blk_mq_tagset_busy_iter+0x45/0x60
> [  150.186638]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
> [  150.195087]  scsi_error_handler+0x3a5/0x4a0 [scsi_mod]
> [  150.203206]  ? __schedule+0x1e9/0x610
> [  150.209783]  ? scsi_eh_get_sense+0x210/0x210 [scsi_mod]
> [  150.217924]  kthread+0x12e/0x150
> [  150.224041]  ? kthread_worker_fn+0x130/0x130
> [  150.231206]  ret_from_fork+0x1f/0x30
> 
> [...]

Applied to 5.17/scsi-fixes, thanks!

[1/1] scsi: mpt3sas: page fault in reply q processing
      https://git.kernel.org/mkp/scsi/c/69ad4ef868c1
diff mbox series

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 511726f92d9a..76229b839560 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2011,9 +2011,10 @@  mpt3sas_base_sync_reply_irqs(struct MPT3SAS_ADAPTER *ioc, u8 poll)
 				enable_irq(reply_q->os_irq);
 			}
 		}
+
+		if (poll)
+			_base_process_reply_queue(reply_q);
 	}
-	if (poll)
-		_base_process_reply_queue(reply_q);
 }
 
 /**