Message ID | d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com |
---|---|
State | New |
Headers | show |
Series | scsi: mpt3sas: page fault in reply q processing | expand |
On Tue, Mar 8, 2022 at 8:57 PM Matt Lupfer <mlupfer@ddn.com> wrote: > > We encountered a page fault in mpt3sas on a LUN reset error path: > > [ 145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0) > [ 145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2) > [ 145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2) > [ 145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00 > [ 145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0) > [ 145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0) > [ 149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002) > [ 149.875202] BUG: unable to handle page fault for address: 00000007fffc445d > [ 149.885617] #PF: supervisor read access in kernel mode > [ 149.894346] #PF: error_code(0x0000) - not-present page > [ 149.903123] PGD 0 P4D 0 > [ 149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S O 5.10.89-altav-1 #1 > [ 149.934327] Hardware name: DDN 200NVX2 /200NVX2-MB , BIOS ATHG2.2.02.01 09/10/2021 > [ 149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas] > [ 149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee > [ 149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246 > [ 150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071 > [ 150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8 > [ 150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff > [ 150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000 > [ 150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80 > [ 150.054963] FS: 0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000 > [ 150.066715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0 > [ 150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 150.108323] PKRU: 55555554 > [ 150.114690] Call Trace: > [ 150.120497] ? printk+0x48/0x4a > [ 150.127049] mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas] > [ 150.136453] mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas] > [ 150.145759] scsih_dev_reset+0xea/0x300 [mpt3sas] > [ 150.153891] scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod] > [ 150.162206] ? __scsi_host_match+0x20/0x20 [scsi_mod] > [ 150.170406] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] > [ 150.178925] ? blk_mq_tagset_busy_iter+0x45/0x60 > [ 150.186638] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] > [ 150.195087] scsi_error_handler+0x3a5/0x4a0 [scsi_mod] > [ 150.203206] ? __schedule+0x1e9/0x610 > [ 150.209783] ? scsi_eh_get_sense+0x210/0x210 [scsi_mod] > [ 150.217924] kthread+0x12e/0x150 > [ 150.224041] ? kthread_worker_fn+0x130/0x130 > [ 150.231206] ret_from_fork+0x1f/0x30 > > This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q > pointer outside of the list_for_each_entry() loop. At the end of the > full list traversal the pointer is invalid. > > Move the _base_process_reply_queue() call inside of the loop. > > Signed-off-by: Matt Lupfer <mlupfer@ddn.com> > Cc: stable@vger.kernel.org > Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset") Thanks for the patch. Ack-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> > --- > drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c > index 511726f92d9a..76229b839560 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_base.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c > @@ -2011,9 +2011,10 @@ mpt3sas_base_sync_reply_irqs(struct MPT3SAS_ADAPTER *ioc, u8 poll) > enable_irq(reply_q->os_irq); > } > } > + > + if (poll) > + _base_process_reply_queue(reply_q); > } > - if (poll) > - _base_process_reply_queue(reply_q); > } > > /** > -- > 2.35.1 >
On Tue, 8 Mar 2022 15:27:02 +0000, Matt Lupfer wrote: > We encountered a page fault in mpt3sas on a LUN reset error path: > > [ 145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0) > [ 145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2) > [ 145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2) > [ 145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00 > [ 145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0) > [ 145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0) > [ 149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002) > [ 149.875202] BUG: unable to handle page fault for address: 00000007fffc445d > [ 149.885617] #PF: supervisor read access in kernel mode > [ 149.894346] #PF: error_code(0x0000) - not-present page > [ 149.903123] PGD 0 P4D 0 > [ 149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S O 5.10.89-altav-1 #1 > [ 149.934327] Hardware name: DDN 200NVX2 /200NVX2-MB , BIOS ATHG2.2.02.01 09/10/2021 > [ 149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas] > [ 149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee > [ 149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246 > [ 150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071 > [ 150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8 > [ 150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff > [ 150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000 > [ 150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80 > [ 150.054963] FS: 0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000 > [ 150.066715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0 > [ 150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 150.108323] PKRU: 55555554 > [ 150.114690] Call Trace: > [ 150.120497] ? printk+0x48/0x4a > [ 150.127049] mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas] > [ 150.136453] mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas] > [ 150.145759] scsih_dev_reset+0xea/0x300 [mpt3sas] > [ 150.153891] scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod] > [ 150.162206] ? __scsi_host_match+0x20/0x20 [scsi_mod] > [ 150.170406] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] > [ 150.178925] ? blk_mq_tagset_busy_iter+0x45/0x60 > [ 150.186638] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] > [ 150.195087] scsi_error_handler+0x3a5/0x4a0 [scsi_mod] > [ 150.203206] ? __schedule+0x1e9/0x610 > [ 150.209783] ? scsi_eh_get_sense+0x210/0x210 [scsi_mod] > [ 150.217924] kthread+0x12e/0x150 > [ 150.224041] ? kthread_worker_fn+0x130/0x130 > [ 150.231206] ret_from_fork+0x1f/0x30 > > [...] Applied to 5.17/scsi-fixes, thanks! [1/1] scsi: mpt3sas: page fault in reply q processing https://git.kernel.org/mkp/scsi/c/69ad4ef868c1
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 511726f92d9a..76229b839560 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -2011,9 +2011,10 @@ mpt3sas_base_sync_reply_irqs(struct MPT3SAS_ADAPTER *ioc, u8 poll) enable_irq(reply_q->os_irq); } } + + if (poll) + _base_process_reply_queue(reply_q); } - if (poll) - _base_process_reply_queue(reply_q); } /**
We encountered a page fault in mpt3sas on a LUN reset error path: [ 145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0) [ 145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2) [ 145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2) [ 145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00 [ 145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0) [ 145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0) [ 149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002) [ 149.875202] BUG: unable to handle page fault for address: 00000007fffc445d [ 149.885617] #PF: supervisor read access in kernel mode [ 149.894346] #PF: error_code(0x0000) - not-present page [ 149.903123] PGD 0 P4D 0 [ 149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S O 5.10.89-altav-1 #1 [ 149.934327] Hardware name: DDN 200NVX2 /200NVX2-MB , BIOS ATHG2.2.02.01 09/10/2021 [ 149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas] [ 149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee [ 149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246 [ 150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071 [ 150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8 [ 150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff [ 150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000 [ 150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80 [ 150.054963] FS: 0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000 [ 150.066715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0 [ 150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 150.108323] PKRU: 55555554 [ 150.114690] Call Trace: [ 150.120497] ? printk+0x48/0x4a [ 150.127049] mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas] [ 150.136453] mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas] [ 150.145759] scsih_dev_reset+0xea/0x300 [mpt3sas] [ 150.153891] scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod] [ 150.162206] ? __scsi_host_match+0x20/0x20 [scsi_mod] [ 150.170406] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] [ 150.178925] ? blk_mq_tagset_busy_iter+0x45/0x60 [ 150.186638] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] [ 150.195087] scsi_error_handler+0x3a5/0x4a0 [scsi_mod] [ 150.203206] ? __schedule+0x1e9/0x610 [ 150.209783] ? scsi_eh_get_sense+0x210/0x210 [scsi_mod] [ 150.217924] kthread+0x12e/0x150 [ 150.224041] ? kthread_worker_fn+0x130/0x130 [ 150.231206] ret_from_fork+0x1f/0x30 This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q pointer outside of the list_for_each_entry() loop. At the end of the full list traversal the pointer is invalid. Move the _base_process_reply_queue() call inside of the loop. Signed-off-by: Matt Lupfer <mlupfer@ddn.com> Cc: stable@vger.kernel.org Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset") --- drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)