Message ID | 20241115162003.3908-2-Kai.Makisara@kolumbus.fi |
---|---|
State | New |
Headers | show |
Series | scsi: st: More reset patches | expand |
> On 15. Nov 2024, at 18.20, Kai Mäkisara <Kai.Makisara@kolumbus.fi> wrote: > > Don't use the was_reset flag set by scsi error handling. It is enough to > recognize device resets based in the UNIT ATTENTION sense data. (The > retry counts are zero and either REQ_OP_DRV_OUT or REC_OP_DRV_IN are > used to prevent retries by midlevel.) > Please hold this patch until the problem below has been solved. The following came into my mind when I looked at John's debugging data he sent to linux-scsi today. I still think that UNIT ATTENTIONs (UAs) reach the high level device without problems. The problem is that the device attached to the target first issuing a SCSI command after reset gets the UA. As long as this is st device, there are no problems. But, if it is the sg device attached to the same target, the tape device misses it. The device->was_reset flag stays set in many (most?) cases forever, unless someone resets it. (Scsi_error resets it only if the associated kthread ends up locking the device door.) Because the flag stays set, the st device can notice it even if the sg device gets the UA. Using a flag set by an other layer is ugly (to put it mildly). Even uglier is that st clears the flag when it has noticed that it is set. And there are cases where the device reset does not originate from the same computer. Does anyone have any suggestions?
> On 15. Nov 2024, at 19.45, Kai Mäkisara (Kolumbus) <kai.makisara@kolumbus.fi> wrote: > ... > > I still think that UNIT ATTENTIONs (UAs) reach the high level device without > problems. The problem is that the device attached to the target first issuing > a SCSI command after reset gets the UA. As long as this is st device, > there are no problems. But, if it is the sg device attached to the same target, > the tape device misses it. > ... > > And there are cases where the device reset does not originate from the > same computer. > > Does anyone have any suggestions? Besides Power on/Reset, the same problem applies to the New Media case. One solution might be the following: the midlevel maintains counters for the Power on/Reset and the Media change UNIT ATTENTIONs. The ULDs can read these counters (using wrappers). If the ULD find for a device that the counter value has changed, then the event corresponding to the counter has occurred. The problem of clearing event flags is avoided, A drawback comes from the counter wrap-arounds. If, e.g., four-bit counters are used and there are 16 Power on/Resets between the checks by the ULD, the event is missed when the counter is used. (The ULDs should also check the sense data they do receive. It is possible/probable that the event is recognized based on the sense data even if the counter check misses it.) This solution is easy to implement. I have a test implementation running and it seems to be working. Other solutions that have come into my mind are much more complicated than the counters. Here are examples: - the UNIT ATTENTIONs would be sent to all ULDs attached to the device when they issue the next SCSI command - the ULDs would have possibility to register a callback for UNIT ATTENTIONs
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c index e8ef27d7ef61..3acaa3561c81 100644 --- a/drivers/scsi/st.c +++ b/drivers/scsi/st.c @@ -163,9 +163,11 @@ static const char *st_formats[] = { static int debugging = DEBUG; +/* Setting these non-zero may risk recognizing resets */ #define MAX_RETRIES 0 #define MAX_WRITE_RETRIES 0 #define MAX_READY_RETRIES 0 + #define NO_TAPE NOT_READY #define ST_TIMEOUT (900 * HZ) @@ -413,10 +415,11 @@ static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt) if (cmdstatp->have_sense && cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x17) STp->cleaning_req = 1; /* ASC and ASCQ => cleaning requested */ - if (cmdstatp->have_sense && scode == UNIT_ATTENTION && cmdstatp->sense_hdr.asc == 0x29) + if (cmdstatp->have_sense && scode == UNIT_ATTENTION && + cmdstatp->sense_hdr.asc == 0x29) { STp->pos_unknown = 1; /* ASC => power on / reset */ - - STp->pos_unknown |= STp->device->was_reset; + st_printk(KERN_WARNING, STp, "Power on/reset recognized."); + } if (cmdstatp->have_sense && scode == RECOVERED_ERROR @@ -3631,9 +3634,7 @@ static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg) retval = (-EIO); goto out; } - reset_state(STp); - /* remove this when the midlevel properly clears was_reset */ - STp->device->was_reset = 0; + reset_state(STp); /* Clears pos_unknown */ } if (mtc.mt_op != MTNOP && mtc.mt_op != MTSETBLK &&
Don't use the was_reset flag set by scsi error handling. It is enough to recognize device resets based in the UNIT ATTENTION sense data. (The retry counts are zero and either REQ_OP_DRV_OUT or REC_OP_DRV_IN are used to prevent retries by midlevel.) Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> --- drivers/scsi/st.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)