diff mbox series

[1/2] scsi: st: Remove use of device->was_reset

Message ID 20241115162003.3908-2-Kai.Makisara@kolumbus.fi
State New
Headers show
Series scsi: st: More reset patches | expand

Commit Message

Kai Mäkisara (Kolumbus) Nov. 15, 2024, 4:20 p.m. UTC
Don't use the was_reset flag set by scsi error handling. It is enough to
recognize device resets based in the UNIT ATTENTION sense data. (The
retry counts are zero and either REQ_OP_DRV_OUT or REC_OP_DRV_IN are
used to prevent retries by midlevel.)

Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
---
 drivers/scsi/st.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Kai Mäkisara (Kolumbus) Nov. 15, 2024, 5:45 p.m. UTC | #1
> On 15. Nov 2024, at 18.20, Kai Mäkisara <Kai.Makisara@kolumbus.fi> wrote:
> 
> Don't use the was_reset flag set by scsi error handling. It is enough to
> recognize device resets based in the UNIT ATTENTION sense data. (The
> retry counts are zero and either REQ_OP_DRV_OUT or REC_OP_DRV_IN are
> used to prevent retries by midlevel.)
> 

Please hold this patch until the problem below has been solved.


The following came into my mind when I looked at John's debugging data he
sent to linux-scsi today.

I still think that UNIT ATTENTIONs (UAs) reach the high level device without
problems. The problem is that the device attached to the target first issuing
a SCSI command after reset gets the UA. As long as this is st device,
there are no problems. But, if it is the sg device attached to the same target,
the tape device misses it.

The device->was_reset flag stays set in many (most?) cases forever, unless
someone resets it. (Scsi_error resets it only if the associated kthread ends
up locking the device door.) Because the flag stays set, the st device can
notice it even if the sg device gets the UA.

Using a flag set by an other layer is ugly (to put it mildly). Even uglier is that
st clears the flag when it has noticed that it is set.

And there are cases where the device reset does not originate from the
same computer.

Does anyone have any suggestions?
Kai Mäkisara (Kolumbus) Nov. 16, 2024, 3:35 p.m. UTC | #2
> On 15. Nov 2024, at 19.45, Kai Mäkisara (Kolumbus) <kai.makisara@kolumbus.fi> wrote:
> 
...
> 
> I still think that UNIT ATTENTIONs (UAs) reach the high level device without
> problems. The problem is that the device attached to the target first issuing
> a SCSI command after reset gets the UA. As long as this is st device,
> there are no problems. But, if it is the sg device attached to the same target,
> the tape device misses it.
> 
...
> 
> And there are cases where the device reset does not originate from the
> same computer.
> 
> Does anyone have any suggestions?

Besides Power on/Reset, the same problem applies to the New Media case.

One solution might be the following: the midlevel maintains counters for
the Power on/Reset and the Media change UNIT ATTENTIONs. The ULDs
can read these counters (using wrappers). If the ULD find for a device that
the counter value has changed, then the event corresponding to the counter
has occurred. The problem of clearing event flags is avoided,

A drawback comes from the counter wrap-arounds. If, e.g., four-bit counters
are used and there are 16 Power on/Resets between the checks by the ULD,
the event is missed when the counter is used. (The ULDs should also check
the sense data they do receive. It is possible/probable that the event is
recognized based on the sense data even if the counter check misses it.)

This solution is easy to implement. I have a test implementation running and
it seems to be working.


Other solutions that have come into my mind are much more complicated 
than the counters. Here are examples:
- the UNIT ATTENTIONs would be sent to all ULDs attached to the device
  when they issue the next SCSI command
- the ULDs would have possibility to register a callback for UNIT ATTENTIONs
diff mbox series

Patch

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index e8ef27d7ef61..3acaa3561c81 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -163,9 +163,11 @@  static const char *st_formats[] = {
 
 static int debugging = DEBUG;
 
+/* Setting these non-zero may risk recognizing resets */
 #define MAX_RETRIES 0
 #define MAX_WRITE_RETRIES 0
 #define MAX_READY_RETRIES 0
+
 #define NO_TAPE  NOT_READY
 
 #define ST_TIMEOUT (900 * HZ)
@@ -413,10 +415,11 @@  static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt)
 	if (cmdstatp->have_sense &&
 	    cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x17)
 		STp->cleaning_req = 1; /* ASC and ASCQ => cleaning requested */
-	if (cmdstatp->have_sense && scode == UNIT_ATTENTION && cmdstatp->sense_hdr.asc == 0x29)
+	if (cmdstatp->have_sense && scode == UNIT_ATTENTION &&
+		cmdstatp->sense_hdr.asc == 0x29) {
 		STp->pos_unknown = 1; /* ASC => power on / reset */
-
-	STp->pos_unknown |= STp->device->was_reset;
+		st_printk(KERN_WARNING, STp, "Power on/reset recognized.");
+	}
 
 	if (cmdstatp->have_sense &&
 	    scode == RECOVERED_ERROR
@@ -3631,9 +3634,7 @@  static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
 				retval = (-EIO);
 				goto out;
 			}
-			reset_state(STp);
-			/* remove this when the midlevel properly clears was_reset */
-			STp->device->was_reset = 0;
+			reset_state(STp); /* Clears pos_unknown */
 		}
 
 		if (mtc.mt_op != MTNOP && mtc.mt_op != MTSETBLK &&