diff mbox series

[v4,3/4] block: add support for notifications

Message ID 4ebef78f07ff1ea4d553c481ffa9e130d65db772.1719520771.git.daniel@makrotopia.org
State New
Headers show
Series block: preparations for NVMEM provider | expand

Commit Message

Daniel Golle June 27, 2024, 8:50 p.m. UTC
Add notifier block to notify other subsystems about the addition or
removal of block devices.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 block/Kconfig          |  6 +++
 block/Makefile         |  1 +
 block/blk-notify.c     | 87 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h | 11 ++++++
 4 files changed, 105 insertions(+)
 create mode 100644 block/blk-notify.c

Comments

'Christoph Hellwig' June 28, 2024, 4:45 a.m. UTC | #1
On Thu, Jun 27, 2024 at 09:50:50PM +0100, Daniel Golle wrote:
> Add notifier block to notify other subsystems about the addition or
> removal of block devices.

Notification for what?  I really hate the concept of random modular
code being able to hook into device discovery / partition scanning.
And not actually having a user for it is a complete no-go.
Daniel Golle June 28, 2024, 12:23 p.m. UTC | #2
On Thu, Jun 27, 2024 at 09:45:45PM -0700, Christoph Hellwig wrote:
> On Thu, Jun 27, 2024 at 09:50:50PM +0100, Daniel Golle wrote:
> > Add notifier block to notify other subsystems about the addition or
> > removal of block devices.
> 
> Notification for what?  I really hate the concept of random modular
> code being able to hook into device discovery / partition scanning.

Adding a dedicated notification interface (instead of using block
internals in the nvmem driver) has been requested in a previous review
by Bart Van Assche:

https://patchwork.kernel.org/comment/25771998/

Quote from that previous review comment:
>>> Why to add this functionality to the block layer instead of somewhere
>>> in the drivers/ directory?
>> 
>> Simply because we need notifications about appearing and disappearing
>> block devices, or a way to iterate over all block devices in a system.
>> For both there isn't currently any other interface than using a
>> class_interface for that, and that requires access to &block_class
>> which is considered a block subsystem internal.
>
> That's an argument for adding an interface to the block layer that
> implements this functionality but not for adding this code in the block
> layer.
---

So that's what I did consequently. Using the notification interface
the NVMEM driver can live in drivers/nvmem/ and doesn't need to be
using block internals.

> And not actually having a user for it is a complete no-go.
> 

The user will be the nvmem provider, you can see the code in earlier
versions of the patchset where I had included it:

https://patchwork.kernel.org/project/linux-block/patch/96554d6b4d9fa72f936c2c476eb0b023cdd60a64.1717031992.git.daniel@makrotopia.org/

Being another subsystem I thought it'd be better to deal with the
block related things first, and once that has been sorted out I will
move on to add the NVMEM driver and make the necessary changes for
using it on eMMC.


Thank you for taking a look at this. Let me know if anything else
is not clear or needs to be changed.


Cheers


Daniel
'Christoph Hellwig' June 28, 2024, 12:50 p.m. UTC | #3
On Fri, Jun 28, 2024 at 01:23:47PM +0100, Daniel Golle wrote:
> So that's what I did consequently. Using the notification interface
> the NVMEM driver can live in drivers/nvmem/ and doesn't need to be
> using block internals.
> 
> > And not actually having a user for it is a complete no-go.
> > 
> 
> The user will be the nvmem provider, you can see the code in earlier
> versions of the patchset where I had included it:
> 
> https://patchwork.kernel.org/project/linux-block/patch/96554d6b4d9fa72f936c2c476eb0b023cdd60a64.1717031992.git.daniel@makrotopia.org/
> 
> Being another subsystem I thought it'd be better to deal with the
> block related things first, and once that has been sorted out I will
> move on to add the NVMEM driver and make the necessary changes for
> using it on eMMC.

It is rather hard to review an interface without the users.

I still dislike the idea of notifications from bdev discovery /
partition scanning into the users of them.  We have one such users
in the MD legacy autodetect code, but that has largely been considered
at bad idea and distros tend to use mdadm based assembly from initramfs
instead.  Which IMHO feels like the right thing for nvmem as well,
just have an nvmem provider that opens a file for a user provided
path and use kernel_read on it.  This can covered block devices,
character devices and even regular files.  It will require initramfs
support, but that is pretty much used everywhere for storage discovery
and setup anyway.
Daniel Golle June 28, 2024, 2 p.m. UTC | #4
Hi Christoph,

On Fri, Jun 28, 2024 at 05:50:20AM -0700, Christoph Hellwig wrote:
> On Fri, Jun 28, 2024 at 01:23:47PM +0100, Daniel Golle wrote:
> > So that's what I did consequently. Using the notification interface
> > the NVMEM driver can live in drivers/nvmem/ and doesn't need to be
> > using block internals.
> > 
> > > And not actually having a user for it is a complete no-go.
> > > 
> > 
> > The user will be the nvmem provider, you can see the code in earlier
> > versions of the patchset where I had included it:
> > 
> > https://patchwork.kernel.org/project/linux-block/patch/96554d6b4d9fa72f936c2c476eb0b023cdd60a64.1717031992.git.daniel@makrotopia.org/
> > 
> > Being another subsystem I thought it'd be better to deal with the
> > block related things first, and once that has been sorted out I will
> > move on to add the NVMEM driver and make the necessary changes for
> > using it on eMMC.
> 
> It is rather hard to review an interface without the users.

I understand that, please take a look at previous iterations of this very
series or use the link to the patch which I have sent before (see above).

> 
> I still dislike the idea of notifications from bdev discovery /
> partition scanning into the users of them.  We have one such users
> in the MD legacy autodetect code, but that has largely been considered
> at bad idea and distros tend to use mdadm based assembly from initramfs
> instead.  Which IMHO feels like the right thing for nvmem as well,
> just have an nvmem provider that opens a file for a user provided
> path and use kernel_read on it.  This can covered block devices,
> character devices and even regular files.  It will require initramfs
> support, but that is pretty much used everywhere for storage discovery
> and setup anyway.

The problem there is that then we cannot use Device Tree to device the
NVMEM layouts, and reference NVMEM bits to the dirvers which need them.
Hence also the definition of the NVMEM layout would have to happen in
userspace, inside an initramfs. I know that having an initramfs is
common for classic desktop or server distributions, but the same is not
true on more simple embedded devices such as router/firewall or WiFi
access point appliances running OpenWrt.

Carrying all that board-specific knowledge in the form of scripts
identifying the board is not exactly nice, nor is creating an individual
initramfs for each of the 1000+ boards supported by OpenWrt, for
example. Extracting the layout information from /sys/firmware/devicetree
in userspace just to then somehow use libblkid to identify the block
device and throw that information back at the kernel which requested it
e.g. using a firmware hotplug call is an option, but imho an unnecessary
complication. And obviously it would still prevent things like nfsroot
(which requires the MAC address and potentially a WiFi calibration data
to be setup) from working out of the box.

Doing the detour via userspace only for devices with an eMMC would be
different from how it is done for any other type of backing storage such
as simple I2C EEPROMs, (SPI-)flashes or UBI volumes.
Having a unified approach for all of them would make things much more
convenient, as typically the actual layouts are even the same and can be
reused accross devices of the same vendor. GL-iNet or ASUS router
devices are a good example for that: The more high-end ones come with an
eMMC and use the same NVMEM layout inside a GPT partition than mid-field
devices on SPI-NAND or UBI, and older and/or lower-end devices on an
SPI-NOR flash MTD partition.

To better understand the situation, maybe look at a few examples for
which we are currently already using this patchset downstream at
OpenWrt:

Inside a GPT partition on an eMMC:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7981b-unielec-u7981-01-emmc.dts;h=abd4d4e59d74cc0d24e8c9e10e16c0859844e003;hb=HEAD#l39

On raw SPI-NAND (already possible with vanilla Linux):
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7981b-unielec-u7981-01-nand.dts;h=230a612a34f7d370eb09e92284950d9949bf10cd;hb=HEAD#l45

Inside a UBI volume (also already possible with vanilla Linux):
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7986a-asus-tuf-ax4200.dts;h=e40602fa215e1a677b534c85b767e824af041518;hb=HEAD#l255

Inside the boot hwpartition of the eMMC:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7981b-glinet-gl-mt2500.dts;h=15818a90fca02955879d1bcca55ba2153e390621;hb=HEAD#l156

Inside a GPT partition on an eMMC:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7986a-glinet-gl-mt6000.dts;h=fd0e1a69157ed0f27c32376aabab0fcc85ce23b9;hb=HEAD#l317

https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/dts/mt7988a-smartrg-mt-stuart.dtsi;h=2b468f9bb311141d083e51ca5edaa7dce253c91c;hb=HEAD#l504

Those are just the examples I have been working on myself, so it was
easy to come up with them. There are also ASUS devices with Qualcomm
SoC using the same layout as the MediaTek-based devices inside a UBI
volume.

Once a unified way to describe the loaction of the NVMEM bits is also
present in upstream Linux, all those downstream device trees could be
submitted for inclusion in Linux, and one could install a
general-purpose OS like Debian **which wouldn't need to know anything
about the details of where to read MAC addresses or calibration data
from**, all hardware-specific knowledge would reside in DT.

The failure of defining this in a nice way results in very ugly
situations such as distributions carrying the (board-specific!)
calibration data for very few but very common single-board computers
like the RaspberryPi in their rootfs or even in initramfs. Each
distribution with a different level of hacks to give them individual MAC
addresses and to load the correct file... And while this doesn't look so
bad for systems which anyway come only with removable microSD storage,
it **does* get ugly when it comes to systems which do store that
information on their eMMC (typically "appliances" rather than SBCs meant
for tinkerers).

Please take all that into consiration and also note that obviously, on
systems not making use of any of that, you may simple not enable
CONFIG_BLOCK_NOTIFIERS -- other than in previous iterations of the
patchset this is all completely optional now.


Cheers


Daniel
'Christoph Hellwig' July 1, 2024, 5:54 a.m. UTC | #5
On Fri, Jun 28, 2024 at 03:00:05PM +0100, Daniel Golle wrote:
> The problem there is that then we cannot use Device Tree to device the
> NVMEM layouts, and reference NVMEM bits to the dirvers which need them.
> Hence also the definition of the NVMEM layout would have to happen in
> userspace, inside an initramfs. I know that having an initramfs is
> common for classic desktop or server distributions, but the same is not
> true on more simple embedded devices such as router/firewall or WiFi
> access point appliances running OpenWrt.

Maybe it needs to become more common so that we don't need crazy
kernel workarounds for something that can be trivially done with
a few lines of userspace code?
diff mbox series

Patch

diff --git a/block/Kconfig b/block/Kconfig
index 5b623b876d3b..67cd4f92378a 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -209,6 +209,12 @@  config BLK_INLINE_ENCRYPTION_FALLBACK
 	  by falling back to the kernel crypto API when inline
 	  encryption hardware is not present.
 
+config BLOCK_NOTIFIERS
+	bool "Enable support for notifications in block layer"
+	help
+	  Enable this option to provide notifiers for other subsystems
+	  upon addition or removal of block devices.
+
 source "block/partitions/Kconfig"
 
 config BLK_MQ_PCI
diff --git a/block/Makefile b/block/Makefile
index ddfd21c1a9ff..a131fa7d6b26 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -38,3 +38,4 @@  obj-$(CONFIG_BLK_INLINE_ENCRYPTION)	+= blk-crypto.o blk-crypto-profile.o \
 					   blk-crypto-sysfs.o
 obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK)	+= blk-crypto-fallback.o
 obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED)	+= holder.o
+obj-$(CONFIG_BLOCK_NOTIFIERS) += blk-notify.o
diff --git a/block/blk-notify.c b/block/blk-notify.c
new file mode 100644
index 000000000000..fd727288ea19
--- /dev/null
+++ b/block/blk-notify.c
@@ -0,0 +1,87 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Notifiers for addition and removal of block devices
+ *
+ * Copyright (c) 2024 Daniel Golle <daniel@makrotopia.org>
+ */
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+
+#include "blk.h"
+
+struct blk_device_list {
+	struct device *dev;
+	struct list_head list;
+};
+
+static RAW_NOTIFIER_HEAD(blk_notifier_list);
+static DEFINE_MUTEX(blk_notifier_lock);
+static LIST_HEAD(blk_devices);
+
+void blk_register_notify(struct notifier_block *nb)
+{
+	struct blk_device_list *existing_blkdev;
+
+	mutex_lock(&blk_notifier_lock);
+	raw_notifier_chain_register(&blk_notifier_list, nb);
+
+	list_for_each_entry(existing_blkdev, &blk_devices, list)
+		nb->notifier_call(nb, BLK_DEVICE_ADD, existing_blkdev->dev);
+
+	mutex_unlock(&blk_notifier_lock);
+}
+EXPORT_SYMBOL_GPL(blk_register_notify);
+
+void blk_unregister_notify(struct notifier_block *nb)
+{
+	mutex_lock(&blk_notifier_lock);
+	raw_notifier_chain_unregister(&blk_notifier_list, nb);
+	mutex_unlock(&blk_notifier_lock);
+}
+EXPORT_SYMBOL_GPL(blk_unregister_notify);
+
+static int blk_call_notifier_add(struct device *dev)
+{
+	struct blk_device_list *new_blkdev;
+
+	new_blkdev = kmalloc(sizeof(*new_blkdev), GFP_KERNEL);
+	if (!new_blkdev)
+		return -ENOMEM;
+
+	new_blkdev->dev = dev;
+	mutex_lock(&blk_notifier_lock);
+	list_add_tail(&new_blkdev->list, &blk_devices);
+	raw_notifier_call_chain(&blk_notifier_list, BLK_DEVICE_ADD, dev);
+	mutex_unlock(&blk_notifier_lock);
+	return 0;
+}
+
+static void blk_call_notifier_remove(struct device *dev)
+{
+	struct blk_device_list *old_blkdev, *tmp;
+
+	mutex_lock(&blk_notifier_lock);
+	list_for_each_entry_safe(old_blkdev, tmp, &blk_devices, list) {
+		if (old_blkdev->dev != dev)
+			continue;
+
+		list_del(&old_blkdev->list);
+		kfree(old_blkdev);
+	}
+	raw_notifier_call_chain(&blk_notifier_list, BLK_DEVICE_REMOVE, dev);
+	mutex_unlock(&blk_notifier_lock);
+}
+
+static struct class_interface blk_notifications_bus_interface __refdata = {
+	.class = &block_class,
+	.add_dev = &blk_call_notifier_add,
+	.remove_dev = &blk_call_notifier_remove,
+};
+
+static int __init blk_notifications_init(void)
+{
+	return class_interface_register(&blk_notifications_bus_interface);
+}
+device_initcall(blk_notifications_init);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b2f1362c4681..c6e9fb16f76e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1687,4 +1687,15 @@  static inline bool bdev_can_atomic_write(struct block_device *bdev)
 
 #define DEFINE_IO_COMP_BATCH(name)	struct io_comp_batch name = { }
 
+
+#define BLK_DEVICE_ADD		1
+#define BLK_DEVICE_REMOVE	2
+#if defined(CONFIG_BLOCK_NOTIFIERS)
+void blk_register_notify(struct notifier_block *nb);
+void blk_unregister_notify(struct notifier_block *nb);
+#else
+static inline void blk_register_notify(struct notifier_block *nb) { };
+static inline void blk_unregister_notify(struct notifier_block *nb) { };
+#endif
+
 #endif /* _LINUX_BLKDEV_H */