Message ID | 20230106185526.260163-1-prabhakar.mahadev-lad.rj@bp.renesas.com |
---|---|
Headers | show |
Series | RISC-V non-coherent function pointer based cache management operations + non-coherent DMA support for AX45MP | expand |
On Fri, Jan 06, 2023 at 06:55:23PM +0000, Prabhakar wrote: > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > Add required ports of the Alternative scheme for Andes CPU cores. > > I/O Coherence Port (IOCP) provides an AXI interface for connecting external > non-caching masters, such as DMA controllers. IOCP is a specification > option and is disabled on the Renesas RZ/Five SoC due to this reason cache > management needs a software workaround. > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > --- > v5 -> v6 > * Dropped patching alternative and now just probing IOCP > > v4 -> v5 > * Sorted the Kconfig/Makefile/Switch based on Core name > * Added a comments > * Introduced RZFIVE_SBI_EXT_IOCP_SW_WORKAROUND SBI EXT ID to check if > CMO needs to be applied. Is there a way we can access the DTB while patching > as we can drop this SBI EXT ID and add a DT property instead for cmo? > > RFC v3 -> v4 > * New patch > --- > arch/riscv/Kconfig.erratas | 22 +++++++++ > arch/riscv/errata/Makefile | 1 + > arch/riscv/errata/andes/Makefile | 1 + > arch/riscv/errata/andes/errata.c | 71 ++++++++++++++++++++++++++++ > arch/riscv/include/asm/alternative.h | 3 ++ > arch/riscv/kernel/alternative.c | 5 ++ > 6 files changed, 103 insertions(+) > create mode 100644 arch/riscv/errata/andes/Makefile > create mode 100644 arch/riscv/errata/andes/errata.c > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > index 69621ae6d647..f0f0c1abd52b 100644 > --- a/arch/riscv/Kconfig.erratas > +++ b/arch/riscv/Kconfig.erratas > @@ -1,5 +1,27 @@ > menu "CPU errata selection" > > +config ERRATA_ANDES > + bool "Andes AX45MP errata" > + depends on !XIP_KERNEL > + select RISCV_ALTERNATIVE > + help > + All Andes errata Kconfig depend on this Kconfig. Disabling > + this Kconfig will disable all Andes errata. Please say "Y" > + here if your platform uses Andes CPU cores. > + > + Otherwise, please say "N" here to avoid unnecessary overhead. > + > +config ERRATA_ANDES_CMO > + bool "Apply Andes cache management errata" > + depends on ERRATA_ANDES && MMU && ARCH_R9A07G043 > + select RISCV_DMA_NONCOHERENT > + default y > + help > + This will apply the cache management errata to handle the > + non-standard handling on non-coherent operations on Andes cores. > + > + If you don't know what to do here, say "Y". Ideally we would not need errata to turn this stuff on at all, but, as you pointed out to me off-list, arch_setup_dma_ops() complains if we have not set up. I'm happy to commit to trying to sort that out in follow on work w/ MPFS, since in that case it really isn't errata, and not require it for this series as you do fit that particular bill IMO. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Thanks, Conor.
On Fri, Jan 6, 2023, at 19:55, Prabhakar wrote: > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > The current implementation of CMO was handled using the ALTERNATIVE_X() > macro; this was manageable when there were a limited number of platforms > using this. Now that we are having more and more platforms coming through > with the CMO the use of the ALTERNATIVE_X() macro becomes unmanageable. > > To avoid such issues this patch switches to use of function pointers > instead of ALTERNATIVE_X() macro for cache management (the only draw being > performance over the previous approach). > > void (*clean_range)(unsigned long addr, unsigned long size); > void (*inv_range)(unsigned long addr, unsigned long size); > void (*flush_range)(unsigned long addr, unsigned long size); > > The above function pointers are provided to be overridden where platforms > using standard approach and for platforms who want handle the operation > based on the operation the below function pointer is provided: > > void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, > enum dma_data_direction dir, > enum dma_noncoherent_ops ops); > > In the current patch we have moved the ZICBOM and T-Head CMO to use > function pointers. > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> This looks like a nice improvement! I have a few suggestions for improvements, but no objections here. > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > index fac5742d1c1e..826b2ba3e61e 100644 > --- a/arch/riscv/errata/thead/errata.c > +++ b/arch/riscv/errata/thead/errata.c ... > @@ -44,6 +106,15 @@ static bool errata_probe_cmo(unsigned int stage, > > riscv_cbom_block_size = L1_CACHE_BYTES; > riscv_noncoherent_supported(); > + > + memset(&thead_cmo_ops, 0x0, sizeof(thead_cmo_ops)); > + if (IS_ENABLED(CONFIG_ERRATA_THEAD_CMO)) { > + thead_cmo_ops.clean_range = &thead_cmo_clean_range; > + thead_cmo_ops.inv_range = &thead_cmo_inval_range; > + thead_cmo_ops.flush_range = &thead_cmo_flush_range; > + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); > + } The implementation here looks reasonable, just wonder whether the classification as an 'errata' makes sense. I would probably consider this a 'driver' at this point, but that's just a question of personal preference. For the operations structure, I think a 'static const struct riscv_cache_ops' is more intuitive than assigning the members individually. > + > +enum dma_noncoherent_ops { > + NON_COHERENT_SYNC_DMA_FOR_DEVICE = 0, > + NON_COHERENT_SYNC_DMA_FOR_CPU, > + NON_COHERENT_DMA_PREP, > + NON_COHERENT_DMA_PMEM, > +}; > + > +/* > + * struct riscv_cache_ops - Structure for CMO function pointers > + * @clean_range: Function pointer for clean cache > + * @inv_range: Function pointer for invalidate cache > + * @flush_range: Function pointer for flushing the cache > + * @riscv_dma_noncoherent_cmo_ops: Function pointer for platforms who > want > + * to handle CMO themselves. If this function pointer is set rest of > the > + * function pointers will be NULL. > + */ > +struct riscv_cache_ops { > + void (*clean_range)(unsigned long addr, unsigned long size); > + void (*inv_range)(unsigned long addr, unsigned long size); > + void (*flush_range)(unsigned long addr, unsigned long size); > + void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, > + enum dma_data_direction dir, > + enum dma_noncoherent_ops ops); > +}; I don't quite see how the fourth operation is used here. Are there cache controllers that need something beyond clean/inv/flush? > + > +#else > + > +static void riscv_noncoherent_register_cache_ops(struct > riscv_cache_ops *ops) {} > + > +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t > size) {} > + > +static inline void riscv_dma_noncoherent_flush(void *vaddr, size_t > size) {} > + > +static inline void riscv_dma_noncoherent_inval(void *vaddr, size_t > size) {} I think you can drop the #else path here: if there is no noncoherent DMA, then nothing should ever call these functions, right? > + > +#ifdef CONFIG_RISCV_ISA_ZICBOM ... > +struct riscv_cache_ops zicbom_cmo_ops = { > + .clean_range = &zicbom_cmo_clean_range, > + .inv_range = &zicbom_cmo_inval_range, > + .flush_range = &zicbom_cmo_flush_range, > +}; > +#else > +struct riscv_cache_ops zicbom_cmo_ops = { > + .clean_range = NULL, > + .inv_range = NULL, > + .flush_range = NULL, > + .riscv_dma_noncoherent_cmo_ops = NULL, > +}; > +#endif > +EXPORT_SYMBOL(zicbom_cmo_ops); Same here: If the ZICBOM ISA is disabled, nothing should reference zicbom_cmo_ops. Also, since ZICBOM is a standard extension, I think it makes sense to always have it enabled, at least whenever noncoherent DMA is supported, that way it can be the default that gets used in the absence of any nonstandard cache controller. Arnd
On Fri, Jan 06, 2023 at 11:31:33PM +0100, Arnd Bergmann wrote: > On Fri, Jan 6, 2023, at 19:55, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > +struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = &zicbom_cmo_clean_range, > > + .inv_range = &zicbom_cmo_inval_range, > > + .flush_range = &zicbom_cmo_flush_range, > > +}; > > +#else > > +struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = NULL, > > + .inv_range = NULL, > > + .flush_range = NULL, > > + .riscv_dma_noncoherent_cmo_ops = NULL, > > +}; > > +#endif > > +EXPORT_SYMBOL(zicbom_cmo_ops); > > Same here: If the ZICBOM ISA is disabled, nothing should > reference zicbom_cmo_ops. > Also, since ZICBOM is a standard > extension, I think it makes sense to always have it enabled, > at least whenever noncoherent DMA is supported, that way > it can be the default that gets used in the absence of any > nonstandard cache controller. While I think of it, this is not possible as Zicbom requires toolchain support whereas the alternative methods for non-coherent DMA do not. Thanks, Conor.
FWIW:
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
On Sat, Jan 7, 2023, at 00:29, Conor Dooley wrote: > On Fri, Jan 06, 2023 at 11:31:33PM +0100, Arnd Bergmann wrote: >> On Fri, Jan 6, 2023, at 19:55, Prabhakar wrote: >> > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> >> > +struct riscv_cache_ops zicbom_cmo_ops = { >> > + .clean_range = &zicbom_cmo_clean_range, >> > + .inv_range = &zicbom_cmo_inval_range, >> > + .flush_range = &zicbom_cmo_flush_range, >> > +}; >> > +#else >> > +struct riscv_cache_ops zicbom_cmo_ops = { >> > + .clean_range = NULL, >> > + .inv_range = NULL, >> > + .flush_range = NULL, >> > + .riscv_dma_noncoherent_cmo_ops = NULL, >> > +}; >> > +#endif >> > +EXPORT_SYMBOL(zicbom_cmo_ops); >> >> Same here: If the ZICBOM ISA is disabled, nothing should >> reference zicbom_cmo_ops. > >> Also, since ZICBOM is a standard >> extension, I think it makes sense to always have it enabled, >> at least whenever noncoherent DMA is supported, that way >> it can be the default that gets used in the absence of any >> nonstandard cache controller. > > While I think of it, this is not possible as Zicbom requires toolchain > support whereas the alternative methods for non-coherent DMA do not. Ah, I see. Would it be possible to use the same .long trick as in the other ones though? Something like #if CONFIG_AS_VERSION >= 23600 /* or whichever version */ /* proper inline asm */ #else /* .long hack */ #endif That way everyone can use it, and the hack would automatically go away in a few years after linux requires a newer toolchain. Alternatively, the entire noncoherent-dma support could be made to depend on whichever toolchain introduced Zicbom. Arnd
Hi Arnd, Thank you for the review. On Fri, Jan 6, 2023 at 10:31 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Jan 6, 2023, at 19:55, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > The current implementation of CMO was handled using the ALTERNATIVE_X() > > macro; this was manageable when there were a limited number of platforms > > using this. Now that we are having more and more platforms coming through > > with the CMO the use of the ALTERNATIVE_X() macro becomes unmanageable. > > > > To avoid such issues this patch switches to use of function pointers > > instead of ALTERNATIVE_X() macro for cache management (the only draw being > > performance over the previous approach). > > > > void (*clean_range)(unsigned long addr, unsigned long size); > > void (*inv_range)(unsigned long addr, unsigned long size); > > void (*flush_range)(unsigned long addr, unsigned long size); > > > > The above function pointers are provided to be overridden where platforms > > using standard approach and for platforms who want handle the operation > > based on the operation the below function pointer is provided: > > > > void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, > > enum dma_data_direction dir, > > enum dma_noncoherent_ops ops); > > > > In the current patch we have moved the ZICBOM and T-Head CMO to use > > function pointers. > > > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > This looks like a nice improvement! I have a few suggestions > for improvements, but no objections here. > > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > > index fac5742d1c1e..826b2ba3e61e 100644 > > --- a/arch/riscv/errata/thead/errata.c > > +++ b/arch/riscv/errata/thead/errata.c > ... > > @@ -44,6 +106,15 @@ static bool errata_probe_cmo(unsigned int stage, > > > > riscv_cbom_block_size = L1_CACHE_BYTES; > > riscv_noncoherent_supported(); > > + > > + memset(&thead_cmo_ops, 0x0, sizeof(thead_cmo_ops)); > > + if (IS_ENABLED(CONFIG_ERRATA_THEAD_CMO)) { > > + thead_cmo_ops.clean_range = &thead_cmo_clean_range; > > + thead_cmo_ops.inv_range = &thead_cmo_inval_range; > > + thead_cmo_ops.flush_range = &thead_cmo_flush_range; > > + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); > > + } > > The implementation here looks reasonable, just wonder whether > the classification as an 'errata' makes sense. I would probably > consider this a 'driver' at this point, but that's just > a question of personal preference. > zicbom is a CPU feature that doesn't have any DT node and hence no driver and similarly for T-HEAD SoC. Also the arch_setup_dma_ops() happens quite early before driver probing due to which we get WARN() messages during bootup hence I have implemented it as errata; as errata patching happens quite early. > For the operations structure, I think a 'static const struct > riscv_cache_ops' is more intuitive than assigning the > members individually. OK. > > + > > +enum dma_noncoherent_ops { > > + NON_COHERENT_SYNC_DMA_FOR_DEVICE = 0, > > + NON_COHERENT_SYNC_DMA_FOR_CPU, > > + NON_COHERENT_DMA_PREP, > > + NON_COHERENT_DMA_PMEM, > > +}; > > + > > +/* > > + * struct riscv_cache_ops - Structure for CMO function pointers > > + * @clean_range: Function pointer for clean cache > > + * @inv_range: Function pointer for invalidate cache > > + * @flush_range: Function pointer for flushing the cache > > + * @riscv_dma_noncoherent_cmo_ops: Function pointer for platforms who > > want > > + * to handle CMO themselves. If this function pointer is set rest of > > the > > + * function pointers will be NULL. > > + */ > > +struct riscv_cache_ops { > > + void (*clean_range)(unsigned long addr, unsigned long size); > > + void (*inv_range)(unsigned long addr, unsigned long size); > > + void (*flush_range)(unsigned long addr, unsigned long size); > > + void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, > > + enum dma_data_direction dir, > > + enum dma_noncoherent_ops ops); > > +}; > > I don't quite see how the fourth operation is used here. > Are there cache controllers that need something beyond > clean/inv/flush? > This is for platforms that dont follow standard cache operations (like done in patch 5/6) and there drivers decide on the operations depending on the ops and dir. > > + > > +#else > > + > > +static void riscv_noncoherent_register_cache_ops(struct > > riscv_cache_ops *ops) {} > > + > > +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t > > size) {} > > + > > +static inline void riscv_dma_noncoherent_flush(void *vaddr, size_t > > size) {} > > + > > +static inline void riscv_dma_noncoherent_inval(void *vaddr, size_t > > size) {} > > I think you can drop the #else path here: if there is no > noncoherent DMA, then nothing should ever call these > functions, right? > Yes it shouldn't be called. > > + > > +#ifdef CONFIG_RISCV_ISA_ZICBOM > ... > > +struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = &zicbom_cmo_clean_range, > > + .inv_range = &zicbom_cmo_inval_range, > > + .flush_range = &zicbom_cmo_flush_range, > > +}; > > +#else > > +struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = NULL, > > + .inv_range = NULL, > > + .flush_range = NULL, > > + .riscv_dma_noncoherent_cmo_ops = NULL, > > +}; > > +#endif > > +EXPORT_SYMBOL(zicbom_cmo_ops); > > Same here: If the ZICBOM ISA is disabled, nothing should > reference zicbom_cmo_ops. OK. Cheers, Prabhakar
On Sat, Jan 07, 2023 at 10:52:55PM +0100, Arnd Bergmann wrote: > On Sat, Jan 7, 2023, at 00:29, Conor Dooley wrote: > > On Fri, Jan 06, 2023 at 11:31:33PM +0100, Arnd Bergmann wrote: > >> On Fri, Jan 6, 2023, at 19:55, Prabhakar wrote: > >> > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > >> > +struct riscv_cache_ops zicbom_cmo_ops = { > >> > + .clean_range = &zicbom_cmo_clean_range, > >> > + .inv_range = &zicbom_cmo_inval_range, > >> > + .flush_range = &zicbom_cmo_flush_range, > >> > +}; > >> > +#else > >> > +struct riscv_cache_ops zicbom_cmo_ops = { > >> > + .clean_range = NULL, > >> > + .inv_range = NULL, > >> > + .flush_range = NULL, > >> > + .riscv_dma_noncoherent_cmo_ops = NULL, > >> > +}; > >> > +#endif > >> > +EXPORT_SYMBOL(zicbom_cmo_ops); > >> > >> Same here: If the ZICBOM ISA is disabled, nothing should > >> reference zicbom_cmo_ops. > > > >> Also, since ZICBOM is a standard > >> extension, I think it makes sense to always have it enabled, > >> at least whenever noncoherent DMA is supported, that way > >> it can be the default that gets used in the absence of any > >> nonstandard cache controller. > > > > While I think of it, this is not possible as Zicbom requires toolchain > > support whereas the alternative methods for non-coherent DMA do not. > > Ah, I see. Would it be possible to use the same .long trick > as in the other ones though? Something like > > #if CONFIG_AS_VERSION >= 23600 /* or whichever version */ > /* proper inline asm */ > #else > /* .long hack */ > #endif > > That way everyone can use it, and the hack would automatically > go away in a few years after linux requires a newer toolchain. > Alternatively, the entire noncoherent-dma support could be > made to depend on whichever toolchain introduced Zicbom. Ehh, I don't think that's a great idea. It'd require far too recent a toolchain IMO. Ideally, in my opinion, we'd just do something like what Drew has proposed for Zicboz, negating the need for a check at all: https://lore.kernel.org/linux-riscv/20221027130247.31634-4-ajones@ventanamicro.com/ Been waiting for that to be re-spun and Palmer to accept it before doing the same thing for Zicbom. At present we have this in the arch Kconfig: config TOOLCHAIN_HAS_ZICBOM bool default y depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zicbom) depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zicbom) depends on LLD_VERSION >= 150000 || LD_VERSION >= 23800 config RISCV_ISA_ZICBOM bool "Zicbom extension support for non-coherent DMA operation" depends on TOOLCHAIN_HAS_ZICBOM The linker version check is entirely due to the linker having issues if it sees zicbom in the ISA string in object files. I'd been intending to do that for Zicbom anyway, so I guess I'll just go do it & Prabhakar can attach it to his v7.. Thanks, Conor.
Hi Conor, Thank you for the review. On Fri, Jan 6, 2023 at 11:48 PM Conor Dooley <conor@kernel.org> wrote: > > +CC Icenowy > > Hey Prabhakar, > > On Fri, Jan 06, 2023 at 06:55:21PM +0000, Prabhakar wrote: > > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > > > The current implementation of CMO was handled using the ALTERNATIVE_X() > > macro; this was manageable when there were a limited number of platforms > > using this. Now that we are having more and more platforms coming through > > with the CMO the use of the ALTERNATIVE_X() macro becomes unmanageable. > > Nitpick time! This opening paragraph is all a little oddly worded IMO. > How about: > > Currently, selecting which CMOs to use on a given platform is done using > and ALTERNATIVE_X() macro. This was manageable when there were just two > CMO implementations, but now that there are more and more platforms coming > needing custom CMOs, the use of the ALTERNATIVE_X() macro is unmanageable. > sounds good. > > To avoid such issues this patch switches to use of function pointers > > instead of ALTERNATIVE_X() macro for cache management (the only draw being > > s/draw/drawback > oops. > > performance over the previous approach). > > Did you notice a performance decrease or are you speculating? > Perhaps Icenowy - who I know benched their implmentation of a new CMO > macro for THEAD stuff can do so again for this. > I actually haven't benchmarked the speeds to tbh I am speculating it because of the additional checks. Maybe If I export the functions instead of having them in the header and have static keys for each callback maybe that will reduce some lag if any. Does that sound good? > > void (*clean_range)(unsigned long addr, unsigned long size); > > void (*inv_range)(unsigned long addr, unsigned long size); > > void (*flush_range)(unsigned long addr, unsigned long size); > > > > The above function pointers are provided to be overridden where platforms > > using standard approach and for platforms who want handle the operation > > based on the operation the below function pointer is provided: > > Think you've left out some words here chief! > Perhaps s/and for platforms who/. For platforms that/ & on the line > after, s/operation/direction/ ? > I'll re-word it. > > void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, > > enum dma_data_direction dir, > > enum dma_noncoherent_ops ops); > > > > In the current patch we have moved the ZICBOM and T-Head CMO to use > > function pointers. > > "Convert ZICBOM..." > OK. > > Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > --- > > v5->v6 > > * New patch > > --- > > arch/riscv/errata/thead/errata.c | 71 ++++++++++++++++++ > > arch/riscv/include/asm/dma-noncoherent.h | 83 +++++++++++++++++++++ > > arch/riscv/include/asm/errata_list.h | 53 ------------- > > arch/riscv/kernel/cpufeature.c | 2 + > > arch/riscv/mm/dma-noncoherent.c | 94 ++++++++++++++++++++++-- > > arch/riscv/mm/pmem.c | 18 ++++- > > 6 files changed, 260 insertions(+), 61 deletions(-) > > create mode 100644 arch/riscv/include/asm/dma-noncoherent.h > > > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > > index fac5742d1c1e..826b2ba3e61e 100644 > > --- a/arch/riscv/errata/thead/errata.c > > +++ b/arch/riscv/errata/thead/errata.c > > @@ -8,12 +8,72 @@ > > #include <linux/module.h> > > #include <linux/string.h> > > #include <linux/uaccess.h> > > +#include <asm/dma-noncoherent.h> > > #include <asm/alternative.h> > > #include <asm/cacheflush.h> > > #include <asm/errata_list.h> > > #include <asm/patch.h> > > #include <asm/vendorid_list.h> > > > > +#ifdef CONFIG_ERRATA_THEAD_CMO > > +/* > > + * dcache.ipa rs1 (invalidate, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01010 rs1 000 00000 0001011 > > + * dache.iva rs1 (invalida, virtual address) > > + * 0000001 00110 rs1 000 00000 0001011 > > + * > > + * dcache.cpa rs1 (clean, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01001 rs1 000 00000 0001011 > > + * dcache.cva rs1 (clean, virtual address) > > + * 0000001 00100 rs1 000 00000 0001011 > > + * > > + * dcache.cipa rs1 (clean then invalidate, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01011 rs1 000 00000 0001011 > > + * dcache.civa rs1 (... virtual address) > > + * 0000001 00111 rs1 000 00000 0001011 > > + * > > + * sync.s (make sure all cache operations finished) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000000 11001 00000 000 00000 0001011 > > + */ > > +#define THEAD_inval_A0 ".long 0x0265000b" > > +#define THEAD_clean_A0 ".long 0x0245000b" > > +#define THEAD_flush_A0 ".long 0x0275000b" > > +#define THEAD_SYNC_S ".long 0x0190000b" > > + > > +#define THEAD_CMO_OP(_op, _start, _size, _cachesize) \ > > + asm volatile("mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + THEAD_##_op##_A0 "\n\t" \ > > + "add a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t" \ > > + THEAD_SYNC_S \ > > + : : "r"(_cachesize), \ > > + "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ > > + "r"((unsigned long)(_start) + (_size)) \ > > + : "a0") > > I'm not going to provide a tested-by for the THEAD stuff, as I have no > metrics - but I booted my usual NFS and did some tyre kicking. Seemed > grand. > \o/ > > +static void thead_cmo_clean_range(unsigned long addr, unsigned long size) > > +{ > > + THEAD_CMO_OP(clean, addr, size, riscv_cbom_block_size); > > +} > > + > > +static void thead_cmo_flush_range(unsigned long addr, unsigned long size) > > +{ > > + THEAD_CMO_OP(flush, addr, size, riscv_cbom_block_size); > > +} > > + > > +static void thead_cmo_inval_range(unsigned long addr, unsigned long size) > > +{ > > + THEAD_CMO_OP(inval, addr, size, riscv_cbom_block_size); > > +} > > +#endif > > + > > static bool errata_probe_pbmt(unsigned int stage, > > unsigned long arch_id, unsigned long impid) > > { > > @@ -33,6 +93,8 @@ static bool errata_probe_pbmt(unsigned int stage, > > static bool errata_probe_cmo(unsigned int stage, > > unsigned long arch_id, unsigned long impid) > > { > > + struct riscv_cache_ops thead_cmo_ops; > > + > > if (!IS_ENABLED(CONFIG_ERRATA_THEAD_CMO)) > > return false; > > > > @@ -44,6 +106,15 @@ static bool errata_probe_cmo(unsigned int stage, > > > > riscv_cbom_block_size = L1_CACHE_BYTES; > > riscv_noncoherent_supported(); > > + > > + memset(&thead_cmo_ops, 0x0, sizeof(thead_cmo_ops)); > > + if (IS_ENABLED(CONFIG_ERRATA_THEAD_CMO)) { > > With CONFIG_ERRATA_THEAD_CMO=n: > > /stuff/linux/arch/riscv/errata/thead/errata.c: In function 'errata_probe_cmo': > /stuff/linux/arch/riscv/errata/thead/errata.c:112:46: error: 'thead_cmo_clean_range' undeclared (first use in this function) > 112 | thead_cmo_ops.clean_range = &thead_cmo_clean_range; > | ^~~~~~~~~~~~~~~~~~~~~ > /stuff/linux/arch/riscv/errata/thead/errata.c:112:46: note: each undeclared identifier is reported only once for each function it appears in > /stuff/linux/arch/riscv/errata/thead/errata.c:113:44: error: 'thead_cmo_inval_range' undeclared (first use in this function) > 113 | thead_cmo_ops.inv_range = &thead_cmo_inval_range; > | ^~~~~~~~~~~~~~~~~~~~~ > /stuff/linux/arch/riscv/errata/thead/errata.c:114:46: error: 'thead_cmo_flush_range' undeclared (first use in this function) > 114 | thead_cmo_ops.flush_range = &thead_cmo_flush_range; > | ^~~~~~~~~~~~~~~~~~~~~ > > These functions are only present if CONFIG_ERRATA_THEAD_CMO is set, > so this cannot be an IS_ENABLED() as things stand. > OK, I'll fix this. > > + thead_cmo_ops.clean_range = &thead_cmo_clean_range; > > + thead_cmo_ops.inv_range = &thead_cmo_inval_range; > > + thead_cmo_ops.flush_range = &thead_cmo_flush_range; > > As Arnd suggested, I'd rather see a `static const struct > riscv_cache_ops` type deal too, so you can just call register() > directly. > Sure will do that. > > + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); > > + } > > + > > return true; > > } > > > > diff --git a/arch/riscv/include/asm/dma-noncoherent.h b/arch/riscv/include/asm/dma-noncoherent.h > > new file mode 100644 > > index 000000000000..a2af863d2608 > > --- /dev/null > > +++ b/arch/riscv/include/asm/dma-noncoherent.h > > @@ -0,0 +1,83 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* > > + * Copyright (C) 2022 Renesas Electronics Corp. > > + */ > > + > > +#ifndef __ASM_DMA_NONCOHERENT_H > > +#define __ASM_DMA_NONCOHERENT_H > > + > > +#include <linux/dma-direct.h> > > + > > +enum dma_noncoherent_ops { > > + NON_COHERENT_SYNC_DMA_FOR_DEVICE = 0, > > + NON_COHERENT_SYNC_DMA_FOR_CPU, > > + NON_COHERENT_DMA_PREP, > > + NON_COHERENT_DMA_PMEM, > > +}; > > + > > +/* > > + * struct riscv_cache_ops - Structure for CMO function pointers > > Drop the "function pointers" from everything here IMO. > OK. > > + * @clean_range: Function pointer for clean cache > > + * @inv_range: Function pointer for invalidate cache > > + * @flush_range: Function pointer for flushing the cache > > + * @riscv_dma_noncoherent_cmo_ops: Function pointer for platforms who want > > + * to handle CMO themselves. If this function pointer is set rest of the > > + * function pointers will be NULL. > > Will be NULL? Or must be NULL? > Assuming you keep it, as I see Arnd is questioning it, I think a better > description is needed for this one. And a rename of the function name, > the one you have right now is oddly juxtaposed and a bit confusing. > I don't really have something much better to suggest, so I am gonna use > cmo_omnibus here... > > @cmo_omnibus: For platforms whose CMO capabilities do not align with the > standard cache management operations. If set, the specific > ops must be left unset. > OK, I'll update the description as above. > Circling back ages later, cmo_universal()? > Sounds good to me. > > +struct riscv_cache_ops { > > + void (*clean_range)(unsigned long addr, unsigned long size); > > + void (*inv_range)(unsigned long addr, unsigned long size); > > + void (*flush_range)(unsigned long addr, unsigned long size); > > + void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, > > + enum dma_data_direction dir, > > + enum dma_noncoherent_ops ops); > > +}; > > + > > +extern struct riscv_cache_ops zicbom_cmo_ops; > > + > > +#ifdef CONFIG_RISCV_DMA_NONCOHERENT > > + > > +extern struct riscv_cache_ops noncoherent_cache_ops; > > + > > +void riscv_noncoherent_register_cache_ops(struct riscv_cache_ops *ops); > > + > > +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t size) > > +{ > > + if (noncoherent_cache_ops.clean_range) { > > + unsigned long addr = (unsigned long)vaddr; > > + > > + noncoherent_cache_ops.clean_range(addr, size); > > + } > > +} > > + > > +static inline void riscv_dma_noncoherent_flush(void *vaddr, size_t size) > > +{ > > + if (noncoherent_cache_ops.flush_range) { > > + unsigned long addr = (unsigned long)vaddr; > > + > > + noncoherent_cache_ops.flush_range(addr, size); > > + } > > +} > > + > > +static inline void riscv_dma_noncoherent_inval(void *vaddr, size_t size) > > +{ > > + if (noncoherent_cache_ops.inv_range) { > > + unsigned long addr = (unsigned long)vaddr; > > + > > + noncoherent_cache_ops.inv_range(addr, size); > > + } > > +} > > + > > +#else > > + > > +static void riscv_noncoherent_register_cache_ops(struct riscv_cache_ops *ops) {} > > + > > +static inline void riscv_dma_noncoherent_clean(void *vaddr, size_t size) {} > > + > > +static inline void riscv_dma_noncoherent_flush(void *vaddr, size_t size) {} > > + > > +static inline void riscv_dma_noncoherent_inval(void *vaddr, size_t size) {} > > Can these ever be used if DMA_NONCOHERENT is not set? I think > riscv/mm/Makefile gates their usage on the symbol, no? > I think I can get rid of these. > > + > > +#endif > > + > > +#endif /* __ASM_DMA_NONCOHERENT_H */ > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > index 4180312d2a70..ae3fc8b80edd 100644 > > [...] > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..d9445c266bfd 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -9,23 +9,82 @@ > > #include <linux/dma-map-ops.h> > > #include <linux/mm.h> > > #include <asm/cacheflush.h> > > +#include <asm/dma-noncoherent.h> > > > > static bool noncoherent_supported; > > > > +struct riscv_cache_ops noncoherent_cache_ops = { > > + .clean_range = NULL, > > + .inv_range = NULL, > > + .flush_range = NULL, > > + .riscv_dma_noncoherent_cmo_ops = NULL, > > +}; > > +EXPORT_SYMBOL(noncoherent_cache_ops); > > These exports should be _GPL, no? The file is GPLed. Ditto the others. > Agreed. > > +#ifdef CONFIG_RISCV_ISA_ZICBOM > > +#define ZICBOM_CMO_OP(_op, _start, _size, _cachesize) \ > > + asm volatile("mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + "cbo." __stringify(_op) " (a0)\n\t" \ > > + "add a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t" \ > > + : : "r"(_cachesize), \ > > + "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \ > > + "r"((unsigned long)(_start) + (_size)) \ > > + : "a0") > > + > > +static void zicbom_cmo_clean_range(unsigned long addr, unsigned long size) > > +{ > > + ZICBOM_CMO_OP(clean, addr, size, riscv_cbom_block_size); > > +} > > + > > +static void zicbom_cmo_flush_range(unsigned long addr, unsigned long size) > > +{ > > + ZICBOM_CMO_OP(flush, addr, size, riscv_cbom_block_size); > > +} > > + > > +static void zicbom_cmo_inval_range(unsigned long addr, unsigned long size) > > +{ > > + ZICBOM_CMO_OP(inval, addr, size, riscv_cbom_block_size); > > +} > > + > > +struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = &zicbom_cmo_clean_range, > > + .inv_range = &zicbom_cmo_inval_range, > > + .flush_range = &zicbom_cmo_flush_range, > > +}; > > +#else > > +struct riscv_cache_ops zicbom_cmo_ops = { > > + .clean_range = NULL, > > + .inv_range = NULL, > > + .flush_range = NULL, > > + .riscv_dma_noncoherent_cmo_ops = NULL, > > +}; > > +#endif > > +EXPORT_SYMBOL(zicbom_cmo_ops); > > + > > void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > enum dma_data_direction dir) > > { > > void *vaddr = phys_to_virt(paddr); > > > > + if (noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops) { > > + noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops(vaddr, size, dir, > > + NON_COHERENT_SYNC_DMA_FOR_DEVICE); > > + return; > > + } > > + > > switch (dir) { > > case DMA_TO_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > + riscv_dma_noncoherent_clean(vaddr, size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > + riscv_dma_noncoherent_clean(vaddr, size); > > break; > > case DMA_BIDIRECTIONAL: > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + riscv_dma_noncoherent_flush(vaddr, size); > > break; > > default: > > break; > > @@ -37,12 +96,18 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > { > > void *vaddr = phys_to_virt(paddr); > > > > + if (noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops) { > > + noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops(vaddr, size, dir, > > + NON_COHERENT_SYNC_DMA_FOR_CPU); > > + return; > > + } > > + > > switch (dir) { > > case DMA_TO_DEVICE: > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + riscv_dma_noncoherent_flush(vaddr, size); > > break; > > default: > > break; > > @@ -53,7 +118,13 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > > { > > void *flush_addr = page_address(page); > > > > - ALT_CMO_OP(flush, flush_addr, size, riscv_cbom_block_size); > > + if (noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops) { > > + noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops(flush_addr, size, -1, > > + NON_COHERENT_DMA_PREP); > > + return; > > + } > > + > > + riscv_dma_noncoherent_flush(flush_addr, size); > > } > > > > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > > @@ -78,3 +149,16 @@ void riscv_noncoherent_supported(void) > > "Non-coherent DMA support enabled without a block size\n"); > > noncoherent_supported = true; > > } > > Barring the function name, which I really don't like - it really is > confusingly named. Something like cmo_universal() would get it's point > across a bit better I think. > OK, I'll rename riscv_dma_noncoherent_cmo_ops to cmo_universal. > > +void riscv_noncoherent_register_cache_ops(struct riscv_cache_ops *ops) > > +{ > > + if (!ops) > > + return; > > + > > + if (ops->riscv_dma_noncoherent_cmo_ops) > > + noncoherent_cache_ops.riscv_dma_noncoherent_cmo_ops = > > + ops->riscv_dma_noncoherent_cmo_ops; > > + else > > + noncoherent_cache_ops = *ops; > > Can we just chop off the extra step here? Behaviourally this is going to > be no different whether someone sets the universal op and the individual > ones anyway. > Agreed. > > +} > > +EXPORT_SYMBOL(riscv_noncoherent_register_cache_ops); > > _GPL again I think! > OK. > I'm happy with this approach, modulo the minor bits I've pointed out. > I do really like that it takes us away from messing with an asm > alternative every time someone wants to do this stuff differently or for > a new platform. > \o/ > I would like Heiko to have a look at what you've done here, but ye looks > promising IMO. > > Thanks, > Conor. > Cheers, Prabhakar
On Sat, Jan 7, 2023, at 23:10, Lad, Prabhakar wrote: >> > + >> > + memset(&thead_cmo_ops, 0x0, sizeof(thead_cmo_ops)); >> > + if (IS_ENABLED(CONFIG_ERRATA_THEAD_CMO)) { >> > + thead_cmo_ops.clean_range = &thead_cmo_clean_range; >> > + thead_cmo_ops.inv_range = &thead_cmo_inval_range; >> > + thead_cmo_ops.flush_range = &thead_cmo_flush_range; >> > + riscv_noncoherent_register_cache_ops(&thead_cmo_ops); >> > + } >> >> The implementation here looks reasonable, just wonder whether >> the classification as an 'errata' makes sense. I would probably >> consider this a 'driver' at this point, but that's just >> a question of personal preference. >> > zicbom is a CPU feature that doesn't have any DT node and hence no > driver and similarly for T-HEAD SoC. A driver does not have to be a 'struct platform_driver' that matches to a device node, my point was more about what to name it, regardless of how the code is entered. > Also the arch_setup_dma_ops() > happens quite early before driver probing due to which we get WARN() > messages during bootup hence I have implemented it as errata; as > errata patching happens quite early. But there is no more patching here, just setting the function pointers, right? >> > +struct riscv_cache_ops { >> > + void (*clean_range)(unsigned long addr, unsigned long size); >> > + void (*inv_range)(unsigned long addr, unsigned long size); >> > + void (*flush_range)(unsigned long addr, unsigned long size); >> > + void (*riscv_dma_noncoherent_cmo_ops)(void *vaddr, size_t size, >> > + enum dma_data_direction dir, >> > + enum dma_noncoherent_ops ops); >> > +}; >> >> I don't quite see how the fourth operation is used here. >> Are there cache controllers that need something beyond >> clean/inv/flush? >> > This is for platforms that dont follow standard cache operations (like > done in patch 5/6) and there drivers decide on the operations > depending on the ops and dir. My feeling is that the set of operations that get called should not depend on the cache controller but at best the CPU. I tried to enumerate how zicbom and ax45 differ here, and how that compares to other architectures: zicbom ax45,mips,arc arm arm64 fromdevice clean/flush inval/inval inval/inval clean/inval todevice clean/- clean/- clean/- clean/- bidi flush/flush flush/inval clean/inval clean/inval So everyone does the same operation for DMA_TO_DEVICE, but they differ in the DMA_FROM_DEVICE handling, for reasons I don't quite see: Your ax45 code does the same as arc and mips. arm and arm64 skip invalidating the cache before bidi mappings, but arm has a FIXME comment about that. arm64 does a 'clean' instead of 'inval' when mapping a fromdevice page, which seems valid but slower than necessary. Could the zicbom operations be changed to do the same things as the ax45/mips/arc ones, or are there specific details in the zicbom spec that require this? Arnd
On Sat, Jan 07, 2023 at 10:21:52PM +0000, Conor Dooley wrote: > On Sat, Jan 07, 2023 at 10:52:55PM +0100, Arnd Bergmann wrote: > > On Sat, Jan 7, 2023, at 00:29, Conor Dooley wrote: > > > On Fri, Jan 06, 2023 at 11:31:33PM +0100, Arnd Bergmann wrote: > > >> On Fri, Jan 6, 2023, at 19:55, Prabhakar wrote: > > >> > From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> > > >> > +struct riscv_cache_ops zicbom_cmo_ops = { > > >> > + .clean_range = &zicbom_cmo_clean_range, > > >> > + .inv_range = &zicbom_cmo_inval_range, > > >> > + .flush_range = &zicbom_cmo_flush_range, > > >> > +}; > > >> > +#else > > >> > +struct riscv_cache_ops zicbom_cmo_ops = { > > >> > + .clean_range = NULL, > > >> > + .inv_range = NULL, > > >> > + .flush_range = NULL, > > >> > + .riscv_dma_noncoherent_cmo_ops = NULL, > > >> > +}; > > >> > +#endif > > >> > +EXPORT_SYMBOL(zicbom_cmo_ops); > > >> > > >> Same here: If the ZICBOM ISA is disabled, nothing should > > >> reference zicbom_cmo_ops. > > > > > >> Also, since ZICBOM is a standard > > >> extension, I think it makes sense to always have it enabled, > > >> at least whenever noncoherent DMA is supported, that way > > >> it can be the default that gets used in the absence of any > > >> nonstandard cache controller. > > > > > > While I think of it, this is not possible as Zicbom requires toolchain > > > support whereas the alternative methods for non-coherent DMA do not. > > > > Ah, I see. Would it be possible to use the same .long trick > > as in the other ones though? Something like > > > > #if CONFIG_AS_VERSION >= 23600 /* or whichever version */ > > > > /* proper inline asm */ > > #else > > /* .long hack */ > > #endif > > > > That way everyone can use it, and the hack would automatically > > go away in a few years after linux requires a newer toolchain. > > > Alternatively, the entire noncoherent-dma support could be > > made to depend on whichever toolchain introduced Zicbom. > > Ehh, I don't think that's a great idea. It'd require far too recent a > toolchain IMO. > > Ideally, in my opinion, we'd just do something like what Drew has > proposed for Zicboz, negating the need for a check at all: > https://lore.kernel.org/linux-riscv/20221027130247.31634-4-ajones@ventanamicro.com/ > > Been waiting for that to be re-spun and Palmer to accept it before doing > the same thing for Zicbom. At present we have this in the arch Kconfig: > > config TOOLCHAIN_HAS_ZICBOM > bool > default y > depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zicbom) > depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zicbom) > depends on LLD_VERSION >= 150000 || LD_VERSION >= 23800 > > config RISCV_ISA_ZICBOM > bool "Zicbom extension support for non-coherent DMA operation" > depends on TOOLCHAIN_HAS_ZICBOM > > The linker version check is entirely due to the linker having issues if > it sees zicbom in the ISA string in object files. > > I'd been intending to do that for Zicbom anyway, so I guess I'll just go > do it & Prabhakar can attach it to his v7.. Should pop up here in a few minutes.. https://lore.kernel.org/linux-riscv/20230108163356.3063839-1-conor@kernel.org/ Hopefully that both works & makes life easier. Certainly from a CI coverage point of view, relaxing toolchain requirements makes *my* life easier!
On Mon, Jan 09, 2023 at 01:59:12PM +0100, Arnd Bergmann wrote: > I had another look at the arm64 side, which (like the zicbom > variant) uses 'clean' on dma_sync_single_for_device(DMA_FROM_DEVICE), > as that has changed not that long ago, see > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50f11c6196f45c92ca48b16a5071615d4ae0572 which IIRC has been reverted recently. > I'm still not sure what the correct set of operations has > to be, but nothing in that patch description sounds ISA > or even microarchitecture specific. Nothing is ISA specific, and the only micro architecture related thing is if the specific core can speculate memory accesses. See the table in arch/arc/mm/dma.c for details. And as commented on the arm64 patch I really hate architectures getting creative here, as I'd much prefer to move the choice of primitives to the core DMA code and just provide helpers to invalidate/writeback/ writeback+invalidate from the architectures eventually.
On Tue, Jan 10, 2023, at 08:01, Christoph Hellwig wrote: > On Mon, Jan 09, 2023 at 01:59:12PM +0100, Arnd Bergmann wrote: >> I had another look at the arm64 side, which (like the zicbom >> variant) uses 'clean' on dma_sync_single_for_device(DMA_FROM_DEVICE), >> as that has changed not that long ago, see >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50f11c6196f45c92ca48b16a5071615d4ae0572 > > which IIRC has been reverted recently. To clarify: I was looking at arch_sync_dma_for_device(), which changed from 'invalidate' to 'clean' last June in commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer"). I don't see a revert for that. The one that was reverted recently is arch_dma_prep_coherent, which was changed and reverted in c44094eee32 Aug 23 2022 flush->clean b7d9aae4048 Dec 6 2022 clean->flush I'm primarily interested in the streaming mappings (arch_sync_*) at the moment. >> I'm still not sure what the correct set of operations has >> to be, but nothing in that patch description sounds ISA >> or even microarchitecture specific. > > Nothing is ISA specific, and the only micro architecture related thing > is if the specific core can speculate memory accesses. See the table > in arch/arc/mm/dma.c for details. > > And as commented on the arm64 patch I really hate architectures getting > creative here, as I'd much prefer to move the choice of primitives to > the core DMA code and just provide helpers to invalidate/writeback/ > writeback+invalidate from the architectures eventually. Agreed, that's why I particularly don't like the idea of giving SoC specific L2$ drivers the option of making custom choices here. The arch_dma_prep_coherent() change is arguably arm64 specific because it is only needed for machines sharing memory with EL3 firmware that enforces buffer ownership, but even for that it would be nice to have a central definition and some architecture specific flag that picks one behavior or the other. Same thing for the speculative access difference. I looked at all the implementations now and put them in a table[1] to see what the differences are. The only bit that I think needs discussion is the dma_sync_single_for_device(DMA_FROM_DEVICE) op that I mentioned above. I see that arm64, csky, powerpc, riscv and parisc all write out at least partical cache lines first to avoid losing dirty data in the part that is not written by the device[2][3], while the other ones don't[4]. The other differences I found are: - arm and arm64 use clean instead of flush for dma_sync_single_for_device(DMA_BIDIRECTIONAL). This seems harmless, since there is another invalidation at the _for_cpu() step. - hexagon, m68k, openrisc and sh don't deal with speculative loads - m68k, nios2, openrisc, parisc and xtensa use flush instead of clean for DMA_TO_DEVICE, presumably because they don't have a flush without invalidate. - microblaze, parisc and powerpc use the same function for _for_device and _for_cpu, which is slightly wasteful but harmless. - openrisc lacks support for bidirectional mappings, which is a bug - sparc32 and xtensa only supports writethrough caches and consequently skips the clean before DMA but instead invalidate the buffer in the _for_cpu operation. I also see that at least arc, arm, mips and riscv all want CPU specific cache management operations to be registered at boot time. While Russell had some concerns about your suggestion to generalize the arm version, we could start by moving the abstracted riscv version into kernel/dma/direct.c and make sure it can be shared with at least mips and arc, provided that we can agree on the DMA_FROM_DEVICE behavior. Arnd [1] https://docs.google.com/spreadsheets/d/1qDuMqB6TnRTj_CgUwgIIm_RJ6EZO76qohpTJUMQjUEo/edit#gid=0 [2] https://git.kernel.org/torvalds/c/c50f11c6196f [3] https://git.kernel.org/torvalds/c/03d70617b8a7 [4] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
On Tue, Jan 10, 2023 at 04:03:06PM +0100, Arnd Bergmann wrote: > On Tue, Jan 10, 2023, at 08:01, Christoph Hellwig wrote: > > On Mon, Jan 09, 2023 at 01:59:12PM +0100, Arnd Bergmann wrote: > >> I had another look at the arm64 side, which (like the zicbom > >> variant) uses 'clean' on dma_sync_single_for_device(DMA_FROM_DEVICE), > >> as that has changed not that long ago, see > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50f11c6196f45c92ca48b16a5071615d4ae0572 > > > > which IIRC has been reverted recently. > > To clarify: I was looking at arch_sync_dma_for_device(), which > changed from 'invalidate' to 'clean' last June in commit > c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers > at start of DMA transfer"). I don't see a revert for that. > > The one that was reverted recently is arch_dma_prep_coherent, which > was changed and reverted in > > c44094eee32 Aug 23 2022 flush->clean > b7d9aae4048 Dec 6 2022 clean->flush > > I'm primarily interested in the streaming mappings (arch_sync_*) > at the moment. Just as an FYI, but we plan to revert the revert (i.e. go back to 'clean') here once Qualcomm's modem firmware loader has been updated: https://lore.kernel.org/r/20230109034843.23759-1-quic_sibis@quicinc.com Will
On Fri, Jan 20, 2023 at 06:04:37PM +0100, Arnd Bergmann wrote: > Having looked at this some more, I see that the powerpc > version is a bit problematic here as well: this one > flushes the partial cache lines before and after the > DMA transfer, while only invalidating the full > cache lines. That feels really odd, and might be worth a bug report to the PPC maintainers. > Obviously there is no winning either way if the same > cache line gets written by both CPU and device, I'm > just trying to figure out what behavior we actually > want here. There isn't, and that's why we require DMAed regions to be cache line aligned. > Aside from the question for how to handle flush vs invalidate > on DMA_FROM_DEVICE, I'm still trying to figure out how to > best handle highmem with architecture specific cache management > operations. The easy approach would be to leave that up > to the architecture, passing only a physical address to > the flush function. I suspect that is a good enough first step. Especially as I remember that some architectures have physical address based cache management anyway (unless we removed them in the meantime). > A nicer interface might be to move the > loop over highmem pages out into common code, flush > lowmem pages by virtual addresss, and have a separate > callback for highmem pages that takes a page pointer, > like I'd rather avoid multiple callbacks if we can. But maybe solve the simple problem first and just pass the paddr and then iterate from there. > > struct dma_cache_ops { > void (*dma_cache_wback_inv)(void *start, unsigned long sz); > void (*dma_cache_inv)(void *start, unsigned long sz); > void (*dma_cache_wback)(void *start, unsigned long sz); > #ifdef CONFIG_HIGHMEM > void (*dma_cache_wback_inv_high_page)(struct page *, size_t start, unsigned long sz); > void (*dma_cache_inv_high_page)(struct page *, size_t start, unsigned long sz); > void (*dma_cache_wback_high_page)(struct page *, size_t start, unsigned long sz); Btw, I really don't think these should be indirect calls. For sane architectures there should be exactly one way to call them, and the onces that have different implementations really should be using alternatives instead of expensive indirect calls.
On Sat, Jan 21, 2023 at 08:30:23PM +0100, Arnd Bergmann wrote: > > That feels really odd, and might be worth a bug report to the > > PPC maintainers. > > Right, my first step would be to change all of the current > outliers to use the same set of operations where possible. Sounds good. > > I'd rather avoid multiple callbacks if we can. But maybe solve > > the simple problem first and just pass the paddr and then > > iterate from there. > > Ok, fair enough. This means we can't easily put the kmap_atomic() > into common code for highmem, though the per-page loop would > still work. Yes. Given how messy many of the ops are I think one step at a time is always good. > I was thinking of using STATIC_CALL() as an optimization here, which > I find easier to read and understand than alternatives. One advantage > here is that this allows the actual cache operations to be declared > locally in the architecture without letting drivers call them, > but still update the common code to work without indirect branches. > > The main downside is that this is currently only optimized on > powerpc and x86, both of which don't actually need CPU specific > callbacks. ARC, ARM, and MIPS on the other hand already > have indirect function pointers, RISC-V would likely benefit the > most from either alternatives or static_call, as it already > uses alternatives and has one implementation that is clearly > preferred over the others. For now I'd just keep doing direct calls into the arch code, just for the lower level invalidate, writeback, invalidate+writeback calls as that helps cementinc the logic of which of those to use in well documented core code. And I'm not really sure I'd like to go beyond that - making it too easy pluggable will make people feel more comfortable doing stupid things here. And yes, maybe that's personal because I've warned the RISC-V people years ago that they'll need architectural cache management instructions yesterday and the answer was that no one is going to use them on modern CPUs. *sigh*
On Sun, Jan 22, 2023 at 12:04:35PM +0100, Arnd Bergmann wrote: > > And I'm not really sure I'd like to go beyond that - making it too > > easy pluggable will make people feel more comfortable doing stupid > > things here. > > I fear the bigger risk is still making the functions callable > from device driver code than it is to make the functions > globally settable. > > You introduced the mips version in f8c55dc6e828 ("MIPS: use generic > dma noncoherent ops for simple noncoherent platforms"), which > was clearly meant as an implementation detail, yet we already > have a driver that slipped in with 3bdffa8ffb45 ("Input: Add > N64 controller driver") that just calls this directly rather > than using the dma-mapping interface. MIPS actually has a bit of a history of these odd bypasses that it seems like this driver copied. rmk has been very worried by this bypassing, and in general I agree. But there's only so much we can do except for auditing drivers. Especially as none of these helpers is exported and built-in only drivers are quite rare. > > And yes, maybe that's personal because I've warned > > the RISC-V people years ago that they'll need architectural > > cache management instructions yesterday and the answer was that > > no one is going to use them on modern CPUs. *sigh* > > To be fair, from the ISA point of view, it really shouldn't > be necessary as long as you have a sane SoC design. > In practice there are always chips that are cutting corners, > or use the new CPU core as a drop-in for an existing > design. Arm SBSA tried to enforce the same thing and also > failed for pretty much the same reason. Not wiring up IP blocks for cache coherency is cheap. So it totally makes sense for dirt cheap SOCs, or even for low performance periphals in general. The GPU folks also believe they can make some things faster by deliberately turning coherency off on SOCs that support coherency. In addition to that cache maintainance is absolutely needed for NVDIMM support.
From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Hi All, RISC-V non-coherent function pointer based cache management operations ======================================================================= As based on the previous discussion [0]. [1], It was suggested upon the below: * Move out cache drivers from drivers/soc/folder * Implement function pointers for CMO instead of ALTERNATIVE_x() macro * Add cache drivers to drivers/cache folder [0] https://lore.kernel.org/all/43aee000-5b89-4d94-98d2-b37b1a18a83e@app.fastmail.com/ [1] https://lore.kernel.org/linux-riscv/Y62nOqzyuUKqYDpq@spud/#R This v6 version of the patch series add support to use function pointers for CMO and switches the current CMO implementations for zicbom and T-HEAD to use function pointers. Note, with the above approach we might see some performance degradation due to additional checks. ----------x---------------------x--------------------x---------------x-------------- non-coherent DMA support for AX45MP ==================================== On the Andes AX45MP core, cache coherency is a specification option so it may not be supported. In this case DMA will fail. To get around with this issue this patch series does the below: 1] Andes alternative ports is implemented as errata which checks if the IOCP is missing and only then applies to CMO errata. One vendor specific SBI EXT (RZFIVE_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. If we could access the DTB in errata I can get rid of this EXT ID from OpenSBI. Is there any approach we can access the DTB in patch callback? Below are the configs which Andes port provides (and are selected by RZ/Five): - ERRATA_ANDES - ERRATA_ANDES_CMO 2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA) block that allows dynamic adjustment of memory attributes in the runtime. It contains a configurable amount of PMA entries implemented as CSR registers to control the attributes of memory locations in interest. OpenSBI configures the PMA regions as required and creates a reserve memory node and propagates it to the higher boot stack. An RFC patch for OpenSBI is posted here: https://patchwork.ozlabs.org/project/opensbi/patch/20221212094421.14556-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; pma_resv0@58000000 { compatible = "shared-dma-pool"; reg = <0x0 0x58000000 0x0 0x08000000>; no-map; linux,dma-default; }; }; The above shared DMA pool gets appended to Linux DTB so the DMA memory requests go through this region. 3] We provide callbacks to synchronize specific content between memory and cache. - arch_sync_dma_for_device() - arch_sync_dma_for_cpu() 4] RZ/Five SoC selects the below configs - AX45MP_L2_CACHE - DMA_GLOBAL_POOL - ERRATA_ANDES - ERRATA_ANDES_CMO OpenSBI implementation patches can be found here: 1] https://patchwork.ozlabs.org/project/opensbi/cover/20221210103011.7814-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ 2]https://patchwork.ozlabs.org/project/opensbi/patch/20221212094421.14556-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ ----------x---------------------x--------------------x---------------x-------------- Note, - This series requires testing on Cores with zicbom and T-Head SoCs - Ive used GCC 12.2.0 for compilation (tested with allmodconfig) - Tested all the IP blocks on RZ/Five which use DMA v5.1 -> v6 * Dropped use of ALTERNATIVE_x() macro * Now switched to used function pointers for CMO * Moved driver to drivers/cache folder v5 -> v5.1 * https://patchwork.kernel.org/project/linux-riscv/list/?series=708610&state=%2A&archive=both v4 -> v5 * Rebased ALTERNATIVE_3() macro on top of Andrew's patches * Rebased the changes on top of Heiko's alternative call patches * Dropped configuring the PMA from Linux * Dropped configuring the L2 cache from Linux and dropped the binding for same * Now using runtime patching mechanism instead of compile time config RFC v3 -> v4 * Implemented ALTERNATIVE_3() macro * Now using runtime patching mechanism instead of compile time config * Added Andes CMO as and errata * Fixed comments pointed by Geert RFC v2-> RFC v3 * Fixed review comments pointed by Conor * Move DT binding into cache folder * Fixed DT binding check issue * Added andestech,ax45mp-cache.h header file * Now passing the flags for the PMA setup as part of andestech,pma-regions property. * Added andestech,inst/data-prefetch and andestech,tag/data-ram-ctl properties to configure the L2 cache. * Registered the cache driver as platform driver RFC v1-> RFC v2 * Moved out the code from arc/riscv to drivers/soc/renesas * Now handling the PMA setup as part of the L2 cache * Now making use of dma-noncoherent.c instead SoC specific implementation. * Dropped arch_dma_alloc() and arch_dma_free() * Switched to RISCV_DMA_NONCOHERENT * Included DT binding doc RFC v2: https://patchwork.kernel.org/project/linux-renesas-soc/cover/20221003223222.448551-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ RFC v1: https://patchwork.kernel.org/project/linux-renesas-soc/cover/20220906102154.32526-1-prabhakar.mahadev-lad.rj@bp.renesas.com/ Cheers, Prabhakar Lad Prabhakar (6): riscv: mm: dma-noncoherent: Switch using function pointers for cache management riscv: asm: vendorid_list: Add Andes Technology to the vendors list riscv: errata: Add Andes alternative ports dt-bindings: cache: r9a07g043f-l2-cache: Add DT binding documentation for L2 cache controller cache: Add L2 cache management for Andes AX45MP RISC-V core soc: renesas: Kconfig: Select the required configs for RZ/Five SoC .../cache/andestech,ax45mp-cache.yaml | 81 +++++ arch/riscv/Kconfig.erratas | 22 ++ arch/riscv/errata/Makefile | 1 + arch/riscv/errata/andes/Makefile | 1 + arch/riscv/errata/andes/errata.c | 71 +++++ arch/riscv/errata/thead/errata.c | 71 +++++ arch/riscv/include/asm/alternative.h | 3 + arch/riscv/include/asm/dma-noncoherent.h | 83 ++++++ arch/riscv/include/asm/errata_list.h | 53 ---- arch/riscv/include/asm/vendorid_list.h | 1 + arch/riscv/kernel/alternative.c | 5 + arch/riscv/kernel/cpufeature.c | 2 + arch/riscv/mm/dma-noncoherent.c | 94 +++++- arch/riscv/mm/pmem.c | 18 +- drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/cache/Kconfig | 10 + drivers/cache/Makefile | 3 + drivers/cache/ax45mp_cache.c | 279 ++++++++++++++++++ drivers/soc/renesas/Kconfig | 4 + 20 files changed, 744 insertions(+), 61 deletions(-) create mode 100644 Documentation/devicetree/bindings/cache/andestech,ax45mp-cache.yaml create mode 100644 arch/riscv/errata/andes/Makefile create mode 100644 arch/riscv/errata/andes/errata.c create mode 100644 arch/riscv/include/asm/dma-noncoherent.h create mode 100644 drivers/cache/Kconfig create mode 100644 drivers/cache/Makefile create mode 100644 drivers/cache/ax45mp_cache.c