Message ID | 20230505204950.21645-1-brian@purestorage.com |
---|---|
State | New |
Headers | show |
Series | scsi: sd: Avoid sending an INQUIRY if the page is not supported | expand |
> On May 8, 2023, at 5:26 AM, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > Brian, > >> When SCSI devices are discovered the function sd_read_cpr gets called. >> This call results in an INQUIRY to page 0xb9. This VPD page is called >> regardless of whether the target has advertised this page as >> supported. > > We used to check the page list first. However, we found several devices > that we did not discover correctly because the pages, while present, > were not advertised in page 0 and thus ignored. So not checking the > supported pages first is a feature. Wouldn’t it be better if those targets fixed that problem? Do they report nothing when VPD page 0 is requested or an incomplete list that doesn’t include this 0xb9? > > What exactly is the problem you are experiencing wrt. your target > getting an inquiry for the CPR page? What happens for us is that our support and customers are used to looking for errors when they see a problem as you would expect. When we are logging VPD page 0xb9 not supported for every device every time a Linux host reboots it leads to that being incorrectly attributed to whatever actual problem is going on. There is nothing more we can do since we do return the list of page codes supported and this 0xb9 is not in the list. Thanks, Brian > > -- > Martin K. Petersen Oracle Linux Engineering
> On May 8, 2023, at 3:09 AM, Benjamin Block <bblock@linux.ibm.com> wrote: > > On Fri, May 05, 2023 at 01:49:50PM -0700, Brian Bunker wrote: >> When SCSI devices are discovered the function sd_read_cpr gets called. >> This call results in an INQUIRY to page 0xb9. This VPD page is called >> regardless of whether the target has advertised this page as supported. >> >> Instead of just sending this INQUIRY page, first check to see if that >> page is in the supported pages. This will avoid sending requests to >> targets which do not support the page. The error is unexpected on the >> target and leads to questions. I am not sure what percentage of SCSI >> devices support this page, but this will eliminate at least one >> request to the target in the discovery phase for all that do not. The >> function added could also have potential users besides this specific >> one. >> >> Signed-off-by: Brian Bunker <brian@purestorage.com> >> Reviewed-by: Seamus Connor <sconnor@purestorage.com> >> Reviewed-by: Krishna Kant <krishna.kant@purestorage.com> >> --- >> drivers/scsi/scsi.c | 40 ++++++++++++++++++++++++++++++++++++++ >> drivers/scsi/sd.c | 4 +++- >> include/scsi/scsi_device.h | 1 + >> 3 files changed, 44 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c >> index 09ef0b31dfc0..9265b3d6a18f 100644 >> --- a/drivers/scsi/scsi.c >> +++ b/drivers/scsi/scsi.c >> @@ -356,6 +356,46 @@ static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page) >> return result; >> } >> >> +/** >> + * scsi_vpd_page_supported - Check if a VPD page is supported on a SCSI device >> + * @sdev: The device to ask >> + * @page: Check existence of this Vital Product Data page >> + * >> + * Functions which explicitly request a given VPD page >> + * should first check whether that page is among the >> + * supported VPD pages. This will avoid targets returning >> + * unnecessary errors which can cause confusion. -EINVAL is >> + * returned if the page is not supported and 0 if it is. >> + */ >> +int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page) >> +{ >> + const struct scsi_vpd *vpd; >> + uint16_t page_len; > > Probably rather `u16` as per kernel-style. Sure. This can easily be done. > >> + int ret = -EINVAL; > > Been wondering, whether it would make sense to have two different error > levels here. One for the case where the page is not found in the loop > that searches within page 0, and one for when page 0 is not present when > we try to dereference the RCU protected pointer. > > That way we could have a safe fallback. If the page is there, we use its > data, if it is not, we blindly send the INQUIRY like we do today. > > Not sure whether this is a bit too paranoid.. VPD page 0 is mandatory > after all. That could be done, but the problem would still exist for the PURE target. We don’t support the page 0xb9, and we don’t advertise we do in the response to VPD 0. This approach would still lead to the INQUIRY being sent to devices who don’t support it, don’t expect it, and report an unexpected error. What I am trying to avoid is the INQUIRY being sent to devices who don’t invite it. Thanks, Brian > >> + int pos = 0; >> + >> + rcu_read_lock(); >> + vpd = rcu_dereference(sdev->vpd_pg0); >> + if (!vpd) >> + goto out; >> + >> + page_len = get_unaligned_be16(&vpd->data[2]); >> + >> + /* >> + * The first supported page starts at byte 4 in the buffer. >> + * Read from that byte to the last dictated by page_len above. >> + */ >> + for (pos = 4; pos < page_len + 4; ++pos) { >> + if (vpd->data[pos] == page) >> + ret = 0; >> + } >> + >> +out: >> + rcu_read_unlock(); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(scsi_vpd_page_supported); >> + >> /** >> * scsi_get_vpd_page - Get Vital Product Data from a SCSI device >> * @sdev: The device to ask >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 1624d528aa1f..0304b7d60747 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -3127,7 +3127,9 @@ static void sd_read_cpr(struct scsi_disk *sdkp) >> */ >> buf_len = 64 + 256*32; >> buffer = kmalloc(buf_len, GFP_KERNEL); >> - if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len)) >> + if (!buffer || >> + scsi_vpd_page_supported(sdkp->device, 0xb9) || > > Wouldn't it make sense to do this before the allocation? If it really > turns out to be unsupported, that seems like a waste. > >> + scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len)) >> goto out; >> >> /* We must have at least a 64B header and one 32B range descriptor */ >> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h >> index f10a008e5bfa..359cd8b94312 100644 >> --- a/include/scsi/scsi_device.h >> +++ b/include/scsi/scsi_device.h >> @@ -431,6 +431,7 @@ extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp, >> struct scsi_sense_hdr *); >> extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout, >> int retries, struct scsi_sense_hdr *sshdr); >> +extern int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page); >> extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf, >> int buf_len); >> extern int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer, > > -- > Best Regards, Benjamin Block / Linux on IBM Z Kernel Development > IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy > Vors. Aufs.-R.: Gregor Pillen / Geschäftsführung: David Faller > Sitz der Ges.: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Brian, > Do they report nothing when VPD page 0 is requested or an incomplete > list that doesn’t include this 0xb9? Unfortunately both scenarios are relatively common. >> What exactly is the problem you are experiencing wrt. your target >> getting an inquiry for the CPR page? > > What happens for us is that our support and customers are used to > looking for errors when they see a problem as you would expect. When > we are logging VPD page 0xb9 not supported for every device every time > a Linux host reboots I suggest you stop logging errors in that case.
On Mon, May 08, 2023 at 09:34:15AM -0700, Brian Bunker wrote: > > On May 8, 2023, at 3:09 AM, Benjamin Block <bblock@linux.ibm.com> wrote: > > On Fri, May 05, 2023 at 01:49:50PM -0700, Brian Bunker wrote: > > > >> + int ret = -EINVAL; > > > > Been wondering, whether it would make sense to have two different error > > levels here. One for the case where the page is not found in the loop > > that searches within page 0, and one for when page 0 is not present when > > we try to dereference the RCU protected pointer. > > > > That way we could have a safe fallback. If the page is there, we use its > > data, if it is not, we blindly send the INQUIRY like we do today. > > > > Not sure whether this is a bit too paranoid.. VPD page 0 is mandatory > > after all. > > That could be done, but the problem would still exist for the PURE target. > We don’t support the page 0xb9, and we don’t advertise we do in the response > to VPD 0. This approach would still lead to the INQUIRY being sent to devices I wasn't meaning to send the INQUIRY regardless to what the page says, if it is present. I was just thinking to having fall-back for when the page 0 is not there at all (initially, when you call `rcu_dereference()`). That would support your storage, as you have page 0, and it would be present for the check I assume. But anyway, it seems this is a no-go regardless. I didn't expect targets sending a valid page 0, but still supporting pages that are not listed in it. > who don’t support it, don’t expect it, and report an unexpected error. What I am > trying to avoid is the INQUIRY being sent to devices who don’t invite it.
On 2023-05-11 12:44, Benjamin Block wrote: > On Mon, May 08, 2023 at 09:34:15AM -0700, Brian Bunker wrote: >>> On May 8, 2023, at 3:09 AM, Benjamin Block <bblock@linux.ibm.com> wrote: >>> On Fri, May 05, 2023 at 01:49:50PM -0700, Brian Bunker wrote: >>> >>>> + int ret = -EINVAL; >>> >>> Been wondering, whether it would make sense to have two different error >>> levels here. One for the case where the page is not found in the loop >>> that searches within page 0, and one for when page 0 is not present when >>> we try to dereference the RCU protected pointer. >>> >>> That way we could have a safe fallback. If the page is there, we use its >>> data, if it is not, we blindly send the INQUIRY like we do today. >>> >>> Not sure whether this is a bit too paranoid.. VPD page 0 is mandatory >>> after all. >> >> That could be done, but the problem would still exist for the PURE target. >> We don’t support the page 0xb9, and we don’t advertise we do in the response >> to VPD 0. This approach would still lead to the INQUIRY being sent to devices > > I wasn't meaning to send the INQUIRY regardless to what the page says, > if it is present. I was just thinking to having fall-back for when the > page 0 is not there at all (initially, when you call > `rcu_dereference()`). That would support your storage, as you have page > 0, and it would be present for the check I assume. > > But anyway, it seems this is a no-go regardless. I didn't expect targets > sending a valid page 0, but still supporting pages that are not listed in it. SCSI standards did contain words that implied vendors would report all VPD/log/mode pages and operation codes using the mechanisms added to those standards over the years. I guess users pressured vendors to do this. Then recently, in T10 approved proposal 21-063r2, the weasel words "may or may not" were added to all those mechanisms. IMO that was a pretty disgraceful move by the storage vendors that control T10. Doug Gilbert P.S. I have a new option appearing in some of my utilities: --examine The idea is that it uses an iota function to check for unreported VPD/log/mode pages as the "report supported" device server responses "may or may not" be trustworthy :-) >> who don’t support it, don’t expect it, and report an unexpected error. What I am >> trying to avoid is the INQUIRY being sent to devices who don’t invite it. >
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 09ef0b31dfc0..9265b3d6a18f 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -356,6 +356,46 @@ static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page) return result; } +/** + * scsi_vpd_page_supported - Check if a VPD page is supported on a SCSI device + * @sdev: The device to ask + * @page: Check existence of this Vital Product Data page + * + * Functions which explicitly request a given VPD page + * should first check whether that page is among the + * supported VPD pages. This will avoid targets returning + * unnecessary errors which can cause confusion. -EINVAL is + * returned if the page is not supported and 0 if it is. + */ +int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page) +{ + const struct scsi_vpd *vpd; + uint16_t page_len; + int ret = -EINVAL; + int pos = 0; + + rcu_read_lock(); + vpd = rcu_dereference(sdev->vpd_pg0); + if (!vpd) + goto out; + + page_len = get_unaligned_be16(&vpd->data[2]); + + /* + * The first supported page starts at byte 4 in the buffer. + * Read from that byte to the last dictated by page_len above. + */ + for (pos = 4; pos < page_len + 4; ++pos) { + if (vpd->data[pos] == page) + ret = 0; + } + +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL_GPL(scsi_vpd_page_supported); + /** * scsi_get_vpd_page - Get Vital Product Data from a SCSI device * @sdev: The device to ask diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 1624d528aa1f..0304b7d60747 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3127,7 +3127,9 @@ static void sd_read_cpr(struct scsi_disk *sdkp) */ buf_len = 64 + 256*32; buffer = kmalloc(buf_len, GFP_KERNEL); - if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len)) + if (!buffer || + scsi_vpd_page_supported(sdkp->device, 0xb9) || + scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len)) goto out; /* We must have at least a 64B header and one 32B range descriptor */ diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index f10a008e5bfa..359cd8b94312 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -431,6 +431,7 @@ extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp, struct scsi_sense_hdr *); extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries, struct scsi_sense_hdr *sshdr); +extern int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page); extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf, int buf_len); extern int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,