diff mbox series

Optimal I/O size of 33553920 bytes should not appear.

Message ID 15dec4ee-9b2c-66b3-35fe-333add83bdc4@latius.de
State New
Headers show
Series Optimal I/O size of 33553920 bytes should not appear. | expand

Commit Message

Daniel Hofmann Aug. 3, 2023, 9:30 a.m. UTC
Hello Martin,

the "optimal" I/O size of 33553920 = 0xFFFF*512 bytes still appears
despite of commit 631669a256f9 (see
https://lore.kernel.org/linux-scsi/yq1ilehejy6.fsf@ca-mkp.ca.oracle.com/).

The reason is that there are devices in the wild which both report
sdkp->min_xfer_blocks == 1 and sdkp->opt_xfer_blocks == 0xFFFF. Hence,
opt_xfer_bytes % min_xfer_bytes == 0, and thus the check in
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/tree/drivers/scsi/sd.c?h=6.5/scsi-fixes&id=06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5#n3341
fails.

It seems that sdkp->opt_xfer_blocks == 0xFFFF serves the same purpose as
q->limits.io_opt = 0 in
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/tree/drivers/scsi/sd.c?h=6.5/scsi-fixes&id=06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5#n3453,
namely: to be a dummy value, indicating that no specific information is
available. Therefore, I suggest to explicitly test whether
sdkp->opt_xfer_blocks equals this dummy value 0xFFFF and to explicitly
fail the validation of the optimal transfer size if this is the case,
resulting in q->limits.io_opt = 0 instead of the bogus q->limits.io_opt
= 33553920. For example, consider this patch:


 				"Optimal transfer size %u logical blocks " \


(Additionally, it would be also possible to test whether the size of the
device is divisible (without remainder) by the assumed "optimal I/O
size". If it is not divisible by the assumed "optimal I/O size", it is
quite likely that the "optimal I/O size" is not useful, at least for
alignment of data onto that block device. In this case, the validation
should also fail.)


My real-world example:


# dmesg -H
[...]
[  +0.004324] scsi host2: uas
[  +0.002777] scsi 2:0:0:0: Direct-Access     Inateck  FE202x Series
1.00 PQ: 0 ANSI: 6
[  +0.019153] sd 2:0:0:0: Attached scsi generic sg2 type 0
[  +0.002565] sd 2:0:0:0: [sdb] 4000797360 512-byte logical blocks:
(2.05 TB/1.86 TiB)
[  +0.001319] sd 2:0:0:0: [sdb] Write Protect is off
[  +0.000005] sd 2:0:0:0: [sdb] Mode Sense: 37 00 00 08
[  +0.002499] sd 2:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[  +0.001295] sd 2:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[  +0.000004] sd 2:0:0:0: [sdb] Optimal transfer size 33553920 bytes
[  +0.010352] sd 2:0:0:0: [sdb] Attached SCSI disk
[...]
# lsblk -t /dev/sdb
NAME                                                          ALIGNMENT
MIN-IO   OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE    RA WSAME
sdb                                                                   0
  512 33553920     512     512    0 mq-deadline      58 65532    0B



Fixing this would benefit downstream consumers of this value (libparted
and thus parted, gparted, partitioning tools used during installation of
a Linux system) by preventing them from creating actually unaligned
partitions by taking the bogus "optimal I/O size" as the alignment size.


Cheers,
Dan

Comments

Martin K. Petersen Aug. 25, 2023, 11:02 p.m. UTC | #1
Hello Daniel,

Sorry about the delay!

> The reason is that there are devices in the wild which both report
> sdkp->min_xfer_blocks == 1 and sdkp->opt_xfer_blocks == 0xFFFF.

That's the first time I've seen that particular combination. Most
devices will report either a physical_block_size or a min_xfer_blocks of
4KB and the intent of the check was to validate opt_xfer_blocks against
those two granularities.

> It seems that sdkp->opt_xfer_blocks == 0xFFFF serves the same purpose
> as
> q->limits.io_opt = 0 in
> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/tree/drivers/scsi/sd.c?h=6.5/scsi-fixes&id=06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5#n3453,
> namely: to be a dummy value, indicating that no specific information is
> available.

0 means the value is not reported, 0xFFFF means the device is explicitly
asking us to send I/O in multiples of 0xFFFF blocks. Which probably
doesn't make sense but, given that min_xfer_blocks is reported as 1 in
your case, could technically be valid.

I would have preferred to check that the reported opt_xfer_blocks was a
nice power of two. However, there are devices out there which use
internal allocation units which are not power of two. So that won't
work.

Alternatively we could make sure that opt_xfer_bytes is a multiple of
PAGE_SIZE. I'll mull over that a bit...

> (Additionally, it would be also possible to test whether the size of the
> device is divisible (without remainder) by the assumed "optimal I/O
> size". If it is not divisible by the assumed "optimal I/O size", it is
> quite likely that the "optimal I/O size" is not useful, at least for
> alignment of data onto that block device. In this case, the validation
> should also fail.)

Unfortunately that's not a reliable heuristic since many devices report
a specific capacity requested by their customers which may not reflect
the actual capacity of the device, nor whichever granularity is in use
internally.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 68b12afa0721..319400ec333d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3314,6 +3314,9 @@  static bool sd_validate_opt_xfer_size(struct
scsi_disk *sdkp,
 	if (sdkp->opt_xfer_blocks == 0)
 		return false;

+	if (sdkp->opt_xfer_blocks == 0xffff)
+		return false;
+
 	if (sdkp->opt_xfer_blocks > dev_max) {
 		sd_first_printk(KERN_WARNING, sdkp,