diff mbox series

[1/4] block: add a hard-readonly flag to struct gendisk

Message ID 20201129181926.897775-2-hch@lst.de
State Superseded
Headers show
Series [1/4] block: add a hard-readonly flag to struct gendisk | expand

Commit Message

Christoph Hellwig Nov. 29, 2020, 6:19 p.m. UTC
Commit 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading
partition") addressed a long-standing problem with user read-only
policy being overridden as a result of a device-initiated revalidate.
The commit has since been reverted due to a regression that left some
USB devices read-only indefinitely.

To fix the underlying problems with revalidate we need to keep track
of hardware state and user policy separately.

The gendisk has been updated to reflect the current hardware state set
by the device driver. This is done to allow returning the device to
the hardware state once the user clears the BLKROSET flag.

The resulting semantics are as follows:

 - If BLKROSET is used to set a whole-disk device read-only, any
   partitions will end up in a read-only state until the user
   explicitly clears the flag.

 - If BLKROSET sets a given partition read-only, that partition will
   remain read-only even if the underlying storage stack initiates a
   revalidate. However, the BLKRRPART ioctl will cause the partition
   table to be dropped and any user policy on partitions will be lost.

 - If BLKROSET has not been set, both the whole disk device and any
   partitions will reflect the current write-protect state of the
   underlying device.

Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>.

Reported-by: Oleksii Kurochko <olkuroch@cisco.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c        |  2 +-
 block/genhd.c           | 34 +++++++++++++++++++---------------
 block/partitions/core.c |  3 +--
 include/linux/genhd.h   |  6 ++++--
 4 files changed, 25 insertions(+), 20 deletions(-)

Comments

Alex Elder Nov. 30, 2020, 1:23 a.m. UTC | #1
On 11/29/20 12:19 PM, Christoph Hellwig wrote:
> Commit 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading

> partition") addressed a long-standing problem with user read-only


Little nit I noticed below.	-Alex

. . .

> diff --git a/block/genhd.c b/block/genhd.c

> index 565cf36a5f1864..5e746223b6fa0f 100644

> --- a/block/genhd.c

> +++ b/block/genhd.c

> @@ -1625,31 +1625,35 @@ static void set_disk_ro_uevent(struct gendisk *gd, int ro)

>  	kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);

>  }

>  

> -void set_disk_ro(struct gendisk *disk, int flag)

> +/**

> + * set_disk_ro - set a gendisk read-only

> + * @disk:	The disk device

> + * @state:	true or false


s/state/read_only/

> + *

> + * This function is used to indicate whether a given disk device should have its

> + * read-only flag set. set_disk_ro() is typically used by device drivers to

> + * indicate whether the underlying physical device is write-protected.

> + */

> +void set_disk_ro(struct gendisk *disk, bool read_only)


. . .
Hannes Reinecke Nov. 30, 2020, 7:55 a.m. UTC | #2
On 11/29/20 7:19 PM, Christoph Hellwig wrote:
> Commit 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading

> partition") addressed a long-standing problem with user read-only

> policy being overridden as a result of a device-initiated revalidate.

> The commit has since been reverted due to a regression that left some

> USB devices read-only indefinitely.

> 

> To fix the underlying problems with revalidate we need to keep track

> of hardware state and user policy separately.

> 

> The gendisk has been updated to reflect the current hardware state set

> by the device driver. This is done to allow returning the device to

> the hardware state once the user clears the BLKROSET flag.

> 

> The resulting semantics are as follows:

> 

>   - If BLKROSET is used to set a whole-disk device read-only, any

>     partitions will end up in a read-only state until the user

>     explicitly clears the flag.

> 

>   - If BLKROSET sets a given partition read-only, that partition will

>     remain read-only even if the underlying storage stack initiates a

>     revalidate. However, the BLKRRPART ioctl will cause the partition

>     table to be dropped and any user policy on partitions will be lost.

> 

>   - If BLKROSET has not been set, both the whole disk device and any

>     partitions will reflect the current write-protect state of the

>     underlying device.

> 

> Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>.

> 

> Reported-by: Oleksii Kurochko <olkuroch@cisco.com>

> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221

> Signed-off-by: Christoph Hellwig <hch@lst.de>

> ---

>   block/blk-core.c        |  2 +-

>   block/genhd.c           | 34 +++++++++++++++++++---------------

>   block/partitions/core.c |  3 +--

>   include/linux/genhd.h   |  6 ++++--

>   4 files changed, 25 insertions(+), 20 deletions(-)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Martin K. Petersen Dec. 3, 2020, 4:04 a.m. UTC | #3
Hi Christoph!

>  - If BLKROSET is used to set a whole-disk device read-only, any

>    partitions will end up in a read-only state until the user

>    explicitly clears the flag.


This no longer appears to be the case with your tweak.

It's very common for database folks to twiddle the read-only state of
block devices and partitions. I know that our users will find it very
counter-intuitive that setting /dev/sda read-only won't prevent writes
to /dev/sda1.

>  int bdev_read_only(struct block_device *bdev)

>  {

>  	if (!bdev)

>  		return 0;

> -	return bdev->bd_read_only;

> +	return bdev->bd_read_only ||

> +		test_bit(GD_READ_ONLY, &bdev->bd_disk->state);

>  }


I suggest doing bd->bd_read_only || get_disk_ro(...) here. That does
take part0 into account.

>  static inline int get_disk_ro(struct gendisk *disk)

>  {

> -	return disk->part0->bd_read_only;

> +	return disk->part0->bd_read_only ||

> +		test_bit(GD_READ_ONLY, &disk->state);

>  }

>  

>  extern void disk_block_events(struct gendisk *disk);


-- 
Martin K. Petersen	Oracle Linux Engineering
Christoph Hellwig Dec. 3, 2020, 8:53 a.m. UTC | #4
On Wed, Dec 02, 2020 at 11:04:33PM -0500, Martin K. Petersen wrote:
> 

> Hi Christoph!

> 

> >  - If BLKROSET is used to set a whole-disk device read-only, any

> >    partitions will end up in a read-only state until the user

> >    explicitly clears the flag.

> 

> This no longer appears to be the case with your tweak.


True.

> 

> It's very common for database folks to twiddle the read-only state of

> block devices and partitions. I know that our users will find it very

> counter-intuitive that setting /dev/sda read-only won't prevent writes

> to /dev/sda1.


What I'm worried about it is that this would be a huge change from the
historic behavior.
Martin K. Petersen Dec. 3, 2020, 2:01 p.m. UTC | #5
Christoph,

>> It's very common for database folks to twiddle the read-only state of

>> block devices and partitions. I know that our users will find it very

>> counter-intuitive that setting /dev/sda read-only won't prevent

>> writes to /dev/sda1.

>

> What I'm worried about it is that this would be a huge change from the

> historic behavior.


But that's what my users complained about and what the patch tried to
address.

Also, the existing behavior is inconsistent in the sense that doing:

# blockdev --setro /dev/sda
# echo foo > /dev/sda1

permits writes. But:

# blockdev --setro /dev/sda
<something triggers revalidate>
# echo foo > /dev/sda1

doesn't.

And a subsequent:

# blockdev --setrw /dev/sda
# echo foo > /dev/sda1

doesn't work either since sda1's read-only policy has been inherited
from the whole-disk device.

You need to do:

# blockdev --rereadpt

after setting the whole-disk device rw to effectuate the same change on
the partitions, otherwise they are stuck being read-only indefinitely.

However, setting the read-only policy on a partition does *not* require
the revalidate step. As a matter of fact, doing the revalidate will blow
away the policy setting you just made.

So the user needs to take different actions depending on whether they
are trying to read-protect a whole-disk device or a partition. Despite
using the same ioctl. That is really confusing.

The intent of my patch was to ensure that:

 - Hardware-initiated read-only state changes would not alter the user's
   whole-disk or partition policy settings.

 - Setting a policy on the whole-disk device would prevent writes to the
   whole device as the user clearly intended.

I have lost count how many times our customers have had data clobbered
because of ambiguity of the existing whole-disk device policy. The
current behavior violates the principle of least surprise by letting the
user think they write protected the whole disk when they actually
didn't.

-- 
Martin K. Petersen	Oracle Linux Engineering
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index cee568389b7e11..0763d1eb85ce15 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -695,7 +695,7 @@  static inline bool bio_check_ro(struct bio *bio, struct block_device *part)
 {
 	const int op = bio_op(bio);
 
-	if (part->bd_read_only && op_is_write(op)) {
+	if (op_is_write(op) && bdev_read_only(part)) {
 		char b[BDEVNAME_SIZE];
 
 		if (op_is_flush(bio->bi_opf) && !bio_sectors(bio))
diff --git a/block/genhd.c b/block/genhd.c
index 565cf36a5f1864..5e746223b6fa0f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1625,31 +1625,35 @@  static void set_disk_ro_uevent(struct gendisk *gd, int ro)
 	kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);
 }
 
-void set_disk_ro(struct gendisk *disk, int flag)
+/**
+ * set_disk_ro - set a gendisk read-only
+ * @disk:	The disk device
+ * @state:	true or false
+ *
+ * This function is used to indicate whether a given disk device should have its
+ * read-only flag set. set_disk_ro() is typically used by device drivers to
+ * indicate whether the underlying physical device is write-protected.
+ */
+void set_disk_ro(struct gendisk *disk, bool read_only)
 {
-	struct disk_part_iter piter;
-	struct block_device *part;
-
-	if (disk->part0->bd_read_only != flag) {
-		set_disk_ro_uevent(disk, flag);
-		disk->part0->bd_read_only = flag;
+	if (read_only) {
+		if (test_and_set_bit(GD_READ_ONLY, &disk->state))
+			return;
+	} else {
+		if (!test_and_clear_bit(GD_READ_ONLY, &disk->state))
+			return;
 	}
-
-	disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY);
-	while ((part = disk_part_iter_next(&piter)))
-		part->bd_read_only = flag;
-	disk_part_iter_exit(&piter);
+	set_disk_ro_uevent(disk, read_only);
 }
-
 EXPORT_SYMBOL(set_disk_ro);
 
 int bdev_read_only(struct block_device *bdev)
 {
 	if (!bdev)
 		return 0;
-	return bdev->bd_read_only;
+	return bdev->bd_read_only ||
+		test_bit(GD_READ_ONLY, &bdev->bd_disk->state);
 }
-
 EXPORT_SYMBOL(bdev_read_only);
 
 /*
diff --git a/block/partitions/core.c b/block/partitions/core.c
index deca253583bd3f..5a9633183343c0 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -194,7 +194,7 @@  static ssize_t part_start_show(struct device *dev,
 static ssize_t part_ro_show(struct device *dev,
 			    struct device_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%d\n", dev_to_bdev(dev)->bd_read_only);
+	return sprintf(buf, "%d\n", bdev_read_only(dev_to_bdev(dev)));
 }
 
 static ssize_t part_alignment_offset_show(struct device *dev,
@@ -360,7 +360,6 @@  static struct block_device *add_partition(struct gendisk *disk, int partno,
 
 	bdev->bd_start_sect = start;
 	bdev_set_nr_sectors(bdev, len);
-	bdev->bd_read_only = get_disk_ro(disk);
 
 	if (info) {
 		err = -ENOMEM;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 809aaa32d53cba..a62ccbfac54b48 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -163,6 +163,7 @@  struct gendisk {
 	int flags;
 	unsigned long state;
 #define GD_NEED_PART_SCAN		0
+#define GD_READ_ONLY			1
 	struct kobject *slave_dir;
 
 	struct timer_rand_state *random;
@@ -249,11 +250,12 @@  static inline void add_disk_no_queue_reg(struct gendisk *disk)
 extern void del_gendisk(struct gendisk *gp);
 extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
 
-extern void set_disk_ro(struct gendisk *disk, int flag);
+void set_disk_ro(struct gendisk *disk, bool read_only);
 
 static inline int get_disk_ro(struct gendisk *disk)
 {
-	return disk->part0->bd_read_only;
+	return disk->part0->bd_read_only ||
+		test_bit(GD_READ_ONLY, &disk->state);
 }
 
 extern void disk_block_events(struct gendisk *disk);