Message ID | 20230118225447.2809787-1-bvanassche@acm.org |
---|---|
Headers | show |
Series | Add support for segments smaller than one page | expand |
On 1/18/23 15:02, Jens Axboe wrote: > On 1/18/23 3:54 PM, Bart Van Assche wrote: >> Prepare for introducing support for segments smaller than the page size >> by introducing the request queue flag QUEUE_FLAG_SUB_PAGE_SEGMENTS. >> Introduce CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that performance of >> block drivers that support segments >= PAGE_SIZE would be affected. >> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Ming Lei <ming.lei@redhat.com> >> Cc: Keith Busch <kbusch@kernel.org> >> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >> --- >> block/Kconfig | 9 +++++++++ >> include/linux/blkdev.h | 7 +++++++ >> 2 files changed, 16 insertions(+) >> >> diff --git a/block/Kconfig b/block/Kconfig >> index 5d9d9c84d516..e85061d2175b 100644 >> --- a/block/Kconfig >> +++ b/block/Kconfig >> @@ -35,6 +35,15 @@ config BLOCK_LEGACY_AUTOLOAD >> created on demand, but scripts that manually create device nodes and >> then call losetup might rely on this behavior. >> >> +config BLK_SUB_PAGE_SEGMENTS >> + bool "Support segments smaller than the page size" >> + default n >> + help >> + Most storage controllers support DMA segments larger than the typical >> + size of a virtual memory page. Some embedded controllers only support >> + DMA segments smaller than the page size. Enable this option to support >> + such controllers. > > This should not be a visible option at all, affected drivers should just > select it. Hi Jens, Thanks for the feedback. Unless anyone requests me to realize this in another way, I plan to merge the patch below into the patch above: diff --git a/block/Kconfig b/block/Kconfig index e85061d2175b..b44b449c47bf 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -36,13 +36,7 @@ config BLOCK_LEGACY_AUTOLOAD then call losetup might rely on this behavior. config BLK_SUB_PAGE_SEGMENTS - bool "Support segments smaller than the page size" - default n - help - Most storage controllers support DMA segments larger than the typical - size of a virtual memory page. Some embedded controllers only support - DMA segments smaller than the page size. Enable this option to support - such controllers. + bool config BLK_RQ_ALLOC_TIME bool
On 1/18/23 3:54?PM, Bart Van Assche wrote: > Hi Jens, > > Several embedded storage controllers need support for DMA segments that are > smaller than the size of one virtual memory page. Hence this patch series. > Please consider this patch series for the next merge window. Before any real reviews are done, I have to ask "why?". This is pretty hairy code in the middle of the fast path for some obscure controller. Why would anyone ship that with > 4k page sizes rather than ship it with a controller that is sane?
On 1/18/23 17:18, Jens Axboe wrote: > On 1/18/23 3:54?PM, Bart Van Assche wrote: >> Hi Jens, >> >> Several embedded storage controllers need support for DMA segments that are >> smaller than the size of one virtual memory page. Hence this patch series. >> Please consider this patch series for the next merge window. > > Before any real reviews are done, I have to ask "why?". This is pretty > hairy code in the middle of the fast path for some obscure controller. > Why would anyone ship that with > 4k page sizes rather than ship it with > a controller that is sane? Hi Jens, The new config variable CONFIG_BLK_SUB_PAGE_SEGMENTS has been introduced to make sure that the hot path is *not* affected if that config variable is disabled. Regarding the question "why": the Google Android team would like to improve performance by switching from 4 KiB pages to 16 KiB pages. One of the widely used UFS controllers (Exynos) has a maximum segment size of 4 KiB. Hence this patch series. This patch series is not only useful for phones but also for Tesla cars since Tesla cars use an Exynos UFS controller. A contributor to the MMC driver told me that this patch series would allow to simplify the MMC driver significantly (I have not yet double checked this). Bart.
On 1/18/23 15:02, Jens Axboe wrote: > On 1/18/23 3:54 PM, Bart Van Assche wrote: >> Prepare for introducing support for segments smaller than the page size >> by introducing the request queue flag QUEUE_FLAG_SUB_PAGE_SEGMENTS. >> Introduce CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that performance of >> block drivers that support segments >= PAGE_SIZE would be affected. >> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Ming Lei <ming.lei@redhat.com> >> Cc: Keith Busch <kbusch@kernel.org> >> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >> --- >> block/Kconfig | 9 +++++++++ >> include/linux/blkdev.h | 7 +++++++ >> 2 files changed, 16 insertions(+) >> >> diff --git a/block/Kconfig b/block/Kconfig >> index 5d9d9c84d516..e85061d2175b 100644 >> --- a/block/Kconfig >> +++ b/block/Kconfig >> @@ -35,6 +35,15 @@ config BLOCK_LEGACY_AUTOLOAD >> created on demand, but scripts that manually create device nodes and >> then call losetup might rely on this behavior. >> >> +config BLK_SUB_PAGE_SEGMENTS >> + bool "Support segments smaller than the page size" >> + default n >> + help >> + Most storage controllers support DMA segments larger than the typical >> + size of a virtual memory page. Some embedded controllers only support >> + DMA segments smaller than the page size. Enable this option to support >> + such controllers. > > This should not be a visible option at all, affected drivers should just > select it. Hi Jens, If CONFIG_BLK_SUB_PAGE_SEGMENTS is made invisible, how should this option be enabled for the scsi_debug and null_blk drivers? Adding "select BLK_SUB_PAGE_SEGMENTS" to the Kconfig section of these drivers would have the unfortunate side effect that enabling either driver would make all block drivers slower. How about making sub-page segment support configurable for the scsi_debug and null_blk drivers only? That would allow kernel developers who want to test the sub-page segment support to enable this functionality without making e.g. distro kernels slower. Thanks, Bart.
On 1/20/23 5:11?PM, Bart Van Assche wrote: > On 1/18/23 15:02, Jens Axboe wrote: >> On 1/18/23 3:54?PM, Bart Van Assche wrote: >>> Prepare for introducing support for segments smaller than the page size >>> by introducing the request queue flag QUEUE_FLAG_SUB_PAGE_SEGMENTS. >>> Introduce CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that performance of >>> block drivers that support segments >= PAGE_SIZE would be affected. >>> >>> Cc: Christoph Hellwig <hch@lst.de> >>> Cc: Ming Lei <ming.lei@redhat.com> >>> Cc: Keith Busch <kbusch@kernel.org> >>> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >>> --- >>> block/Kconfig | 9 +++++++++ >>> include/linux/blkdev.h | 7 +++++++ >>> 2 files changed, 16 insertions(+) >>> >>> diff --git a/block/Kconfig b/block/Kconfig >>> index 5d9d9c84d516..e85061d2175b 100644 >>> --- a/block/Kconfig >>> +++ b/block/Kconfig >>> @@ -35,6 +35,15 @@ config BLOCK_LEGACY_AUTOLOAD >>> created on demand, but scripts that manually create device nodes and >>> then call losetup might rely on this behavior. >>> +config BLK_SUB_PAGE_SEGMENTS >>> + bool "Support segments smaller than the page size" >>> + default n >>> + help >>> + Most storage controllers support DMA segments larger than the typical >>> + size of a virtual memory page. Some embedded controllers only support >>> + DMA segments smaller than the page size. Enable this option to support >>> + such controllers. >> >> This should not be a visible option at all, affected drivers should just >> select it. > > Hi Jens, > > If CONFIG_BLK_SUB_PAGE_SEGMENTS is made invisible, how should this > option be enabled for the scsi_debug and null_blk drivers? Adding > "select BLK_SUB_PAGE_SEGMENTS" to the Kconfig section of these drivers > would have the unfortunate side effect that enabling either driver > would make all block drivers slower. How about making sub-page segment > support configurable for the scsi_debug and null_blk drivers only? > That would allow kernel developers who want to test the sub-page > segment support to enable this functionality without making e.g. > distro kernels slower. You'd need to have a separate sub-option for each of them, Kconfig style. But this also highlights the usual issue with pretending that Kconfig options means that this doesn't matter, because inevitably they end up getting enabled by default in distros anyway. And then you may as well not even have them... Why can't we just handle this in the driver? The segment path is hard enough to grok in the first place, and this just makes it worse. Generally not a huge fan of punting this to the driver, but just maybe it'd make sense in this case since it's just the one. At least that seems a lot more palatable than the alternative.
On 1/20/23 18:43, Jens Axboe wrote: > On 1/20/23 5:11?PM, Bart Van Assche wrote: >> If CONFIG_BLK_SUB_PAGE_SEGMENTS is made invisible, how should this >> option be enabled for the scsi_debug and null_blk drivers? Adding >> "select BLK_SUB_PAGE_SEGMENTS" to the Kconfig section of these drivers >> would have the unfortunate side effect that enabling either driver >> would make all block drivers slower. How about making sub-page segment >> support configurable for the scsi_debug and null_blk drivers only? >> That would allow kernel developers who want to test the sub-page >> segment support to enable this functionality without making e.g. >> distro kernels slower. > > You'd need to have a separate sub-option for each of them, Kconfig > style. But this also highlights the usual issue with pretending that > Kconfig options means that this doesn't matter, because inevitably they > end up getting enabled by default in distros anyway. And then you may as > well not even have them... Hi Jens, How about the following approach? * Remove CONFIG_BLK_SUB_PAGE_SEGMENTS. * Use the static key mechanism instead of #ifdef CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that this patch series makes the hot path in the block layer slower. * Count the number of request queues that need support for segments smaller than a page or max_hw_sector values smaller than PAGE_SIZE >> SECTOR_SHIFT. If that number changes from zero to one, enable the code introduced by this patch series. If that number drops to zero, toggle the static key such that the overhead of the code that supports small segments is eliminated. > Why can't we just handle this in the driver? The segment path is hard > enough to grok in the first place, and this just makes it worse. > Generally not a huge fan of punting this to the driver, but just maybe > it'd make sense in this case since it's just the one. At least that > seems a lot more palatable than the alternative. One of the functions modified by this patch series is blk_rq_append_bio(). That function is called by code that builds a bio (e.g. filesystems). Code that builds a bio does not call any block driver code. This is why I think it would be hard to move the functionality introduced by this patch series into block drivers. Thanks, Bart.