Message ID | 1663854664-76165-1-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers | show |
Series | scsi: pm8001: Fix running_req for internal abort commands | expand |
On Thu, Sep 22, 2022 at 3:57 PM John Garry <john.garry@huawei.com> wrote: > > Disabling the remote phy for a SATA disk causes a hang: > > root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols > sata > root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable > root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed > [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache > [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK > [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk > [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK > ... > [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. > [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 > [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 > [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker > [ 124.034319] Call trace: > [ 124.036758] __switch_to+0x128/0x278 > [ 124.040333] __schedule+0x434/0xa58 > [ 124.043820] schedule+0x94/0x138 > [ 124.047045] schedule_timeout+0x2fc/0x368 > [ 124.051052] wait_for_completion+0xdc/0x200 > [ 124.055234] __flush_workqueue+0x1a8/0x708 > [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 > [ 124.063858] sas_port_event_worker+0x60/0x98 > [ 124.068126] process_one_work+0x3f8/0x660 > [ 124.072134] worker_thread+0x70/0x700 > [ 124.075793] kthread+0x1a4/0x1b8 > [ 124.079014] ret_from_fork+0x10/0x20 > > The issue is that the per-device running_req read in > pm8001_dev_gone_notify() never goes to zero and we never make progress. > This is caused by missing accounting for running_req for when an internal > abort command completes. > > In commit 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") > we started to send internal abort commands as a proper sas_task. In this > when we deliver a sas_task to HW the per-device running_req is incremented > in pm8001_queue_command(). However it is never decremented for internal > abort commnds, so decrement in pm8001_mpi_task_abort_resp(). > > Fixes: 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") > Signed-off-by: John Garry <john.garry@huawei.com> lgtm Acked-by: Jack Wang <jinpu.wang@ionos.com> > > diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c > index 91d78d0a38fe..628b08ba6770 100644 > --- a/drivers/scsi/pm8001/pm8001_hwi.c > +++ b/drivers/scsi/pm8001/pm8001_hwi.c > @@ -3612,6 +3612,10 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) > pm8001_dbg(pm8001_ha, FAIL, " TASK NULL. RETURNING !!!\n"); > return -1; > } > + > + if (t->task_proto == SAS_PROTOCOL_INTERNAL_ABORT) > + atomic_dec(&pm8001_dev->running_req); > + > ts = &t->task_status; > if (status != 0) > pm8001_dbg(pm8001_ha, FAIL, "task abort failed status 0x%x ,tag = 0x%x, scp= 0x%x\n", > -- > 2.35.3 >
John,
> Disabling the remote phy for a SATA disk causes a hang:
Applied to 6.1/scsi-staging, thanks!
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index 91d78d0a38fe..628b08ba6770 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -3612,6 +3612,10 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) pm8001_dbg(pm8001_ha, FAIL, " TASK NULL. RETURNING !!!\n"); return -1; } + + if (t->task_proto == SAS_PROTOCOL_INTERNAL_ABORT) + atomic_dec(&pm8001_dev->running_req); + ts = &t->task_status; if (status != 0) pm8001_dbg(pm8001_ha, FAIL, "task abort failed status 0x%x ,tag = 0x%x, scp= 0x%x\n",
Disabling the remote phy for a SATA disk causes a hang: root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols sata root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK ... [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker [ 124.034319] Call trace: [ 124.036758] __switch_to+0x128/0x278 [ 124.040333] __schedule+0x434/0xa58 [ 124.043820] schedule+0x94/0x138 [ 124.047045] schedule_timeout+0x2fc/0x368 [ 124.051052] wait_for_completion+0xdc/0x200 [ 124.055234] __flush_workqueue+0x1a8/0x708 [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 [ 124.063858] sas_port_event_worker+0x60/0x98 [ 124.068126] process_one_work+0x3f8/0x660 [ 124.072134] worker_thread+0x70/0x700 [ 124.075793] kthread+0x1a4/0x1b8 [ 124.079014] ret_from_fork+0x10/0x20 The issue is that the per-device running_req read in pm8001_dev_gone_notify() never goes to zero and we never make progress. This is caused by missing accounting for running_req for when an internal abort command completes. In commit 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") we started to send internal abort commands as a proper sas_task. In this when we deliver a sas_task to HW the per-device running_req is incremented in pm8001_queue_command(). However it is never decremented for internal abort commnds, so decrement in pm8001_mpi_task_abort_resp(). Fixes: 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") Signed-off-by: John Garry <john.garry@huawei.com>