diff mbox series

[RFC,3/5] dma-mapping: Enable global non-coherent pool support for RISC-V

Message ID 20210723214031.3251801-4-atish.patra@wdc.com
State New
Headers show
Series Support non-coherent DMA on RISC-V using a global pool | expand

Commit Message

Atish Patra July 23, 2021, 9:40 p.m. UTC
Currently, linux,dma-default is used to reserve a global non-coherent pool
to allocate memory for dma operations. This can be useful for RISC-V as
well as the ISA specification doesn't specify a method to modify PMA
attributes or page table entries to define non-cacheable area yet.
A non-cacheable memory window is an alternate options for vendors to
support non-coherent devices. "dma-ranges" must be used in conjunction with
"linux,dma-default" property to define one or more mappings between device
and cpu accesible memory regions.

This allows RISC-V to use global pool for non-coherent platforms that
relies on a uncached memory region that is outside of the system ram.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
---
 kernel/dma/coherent.c | 49 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 41 insertions(+), 8 deletions(-)

Comments

Rob Herring July 25, 2021, 10:29 p.m. UTC | #1
On Fri, Jul 23, 2021 at 3:40 PM Atish Patra <atish.patra@wdc.com> wrote:
>

> Currently, linux,dma-default is used to reserve a global non-coherent pool

> to allocate memory for dma operations. This can be useful for RISC-V as

> well as the ISA specification doesn't specify a method to modify PMA

> attributes or page table entries to define non-cacheable area yet.

> A non-cacheable memory window is an alternate options for vendors to

> support non-coherent devices. "dma-ranges" must be used in conjunction with

> "linux,dma-default" property to define one or more mappings between device

> and cpu accesible memory regions.


'dma-ranges' applies to buses. And, well, maybe devices when the bus
is not well defined. It is not a reserved-memory property.

Rob
Christoph Hellwig July 26, 2021, 7 a.m. UTC | #2
On Fri, Jul 23, 2021 at 02:40:29PM -0700, Atish Patra wrote:
> Currently, linux,dma-default is used to reserve a global non-coherent pool

> to allocate memory for dma operations. This can be useful for RISC-V as

> well as the ISA specification doesn't specify a method to modify PMA

> attributes or page table entries to define non-cacheable area yet.

> A non-cacheable memory window is an alternate options for vendors to

> support non-coherent devices.


Please explain why you do not want to use the simply non-cachable
window support using arch_dma_set_uncached as used by mips, niops2 and
xtensa.

> +static int __dma_init_global_coherent(phys_addr_t phys_addr, dma_addr_t device_addr, size_t size)





>  {

>  	struct dma_coherent_mem *mem;

>  

> -	mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true);

> +	if (phys_addr == device_addr)

> +		mem = dma_init_coherent_memory(phys_addr, device_addr, size, true);

> +	else

> +		mem = dma_init_coherent_memory(phys_addr, device_addr, size, false);


Nak.  The phys_addr != device_addr support is goign away.  This needs
to be filled in using dma-ranges property hanging of the struct device.
Atish Patra July 26, 2021, 10:47 p.m. UTC | #3
On Mon, Jul 26, 2021 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote:
>

> On Fri, Jul 23, 2021 at 02:40:29PM -0700, Atish Patra wrote:

> > Currently, linux,dma-default is used to reserve a global non-coherent pool

> > to allocate memory for dma operations. This can be useful for RISC-V as

> > well as the ISA specification doesn't specify a method to modify PMA

> > attributes or page table entries to define non-cacheable area yet.

> > A non-cacheable memory window is an alternate options for vendors to

> > support non-coherent devices.

>

> Please explain why you do not want to use the simply non-cachable

> window support using arch_dma_set_uncached as used by mips, niops2 and

> xtensa.

>


arch_dma_set_uncached works as well in this case. However, mips,
niops2 & xtensa uses a
fixed (via config) value for the offset. Similar approach can't be
used here because the platform specific
offset value has to be determined at runtime so that a single kernel
image can boot on all platforms.
That's why we need the following additional changes for RISC-V to make it work.

1. a new DT property so that arch specific code is aware of the
non-cacheable window offset.
    - either under /chosen node or a completely separate node with
multiple non-cacheable window support
   We also need to define how it is going to referenced from
individual device if a per-device non-cacheable
   window support is required in future. As of now, the beagleV memory
region lies in 0x10_0000_00000 - x17_FFFF_FFFF
   which is mapped to start of DRAM 0x80000000. All of the
non-coherent devices can do 32bit DMA only.

2. Use the dma-ranges and modify the arch_dma_set_uncached function to
pass the struct device as an argument.

Either way, we will need arch specific hook ups and additional changes
while the global non-coherent pool
provides a more elegant solution without any additional arch specific code.

If arch_dma_set_uncached is still preferred way to solve this problem,
I can revise the patch with either approach 1 or approach 2

> > +static int __dma_init_global_coherent(phys_addr_t phys_addr, dma_addr_t device_addr, size_t size)

>

>

>

>

> >  {

> >       struct dma_coherent_mem *mem;

> >

> > -     mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true);

> > +     if (phys_addr == device_addr)

> > +             mem = dma_init_coherent_memory(phys_addr, device_addr, size, true);

> > +     else

> > +             mem = dma_init_coherent_memory(phys_addr, device_addr, size, false);

>

> Nak.  The phys_addr != device_addr support is goign away.  This needs


ok.

> to be filled in using dma-ranges property hanging of the struct device.


struct device is only accessible in rmem_dma_device_init. I couldn't
find a proper way to access it during
dma_reserved_default_memory setup under global pool.

Does that mean we should use a per-device memory pool instead of a
global non-coherent pool ?

> _______________________________________________

> iommu mailing list

> iommu@lists.linux-foundation.org

> https://lists.linuxfoundation.org/mailman/listinfo/iommu




-- 
Regards,
Atish
Christoph Hellwig July 27, 2021, 8:52 a.m. UTC | #4
On Mon, Jul 26, 2021 at 03:47:54PM -0700, Atish Patra wrote:
> arch_dma_set_uncached works as well in this case. However, mips,

> niops2 & xtensa uses a

> fixed (via config) value for the offset. Similar approach can't be

> used here because the platform specific

> offset value has to be determined at runtime so that a single kernel

> image can boot on all platforms.


Nothing in the interface requires a fixed offset.  And using the offset
has one enormous advantage in that there is no need to declare a
statically sized pool - allocations are fully dynamic.  And any kind of
fixed pool tends to cause huge problems.

> 1. a new DT property so that arch specific code is aware of the

> non-cacheable window offset.


Yes.

> individual device if a per-device non-cacheable

>    window support is required in future. As of now, the beagleV memory


If you require a per-device noncachable area you can use the per-device
coherent pools.  But why would you want that?

> region lies in 0x10_0000_00000 - x17_FFFF_FFFF

>    which is mapped to start of DRAM 0x80000000. All of the

> non-coherent devices can do 32bit DMA only.


Adjust ZONE_DMA32 so that it takes the uncached offset into account.

> > > -     mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true);

> > > +     if (phys_addr == device_addr)

> > > +             mem = dma_init_coherent_memory(phys_addr, device_addr, size, true);

> > > +     else

> > > +             mem = dma_init_coherent_memory(phys_addr, device_addr, size, false);

> >

> > Nak.  The phys_addr != device_addr support is goign away.  This needs

> 

> ok.

> 

> > to be filled in using dma-ranges property hanging of the struct device.

> 

> struct device is only accessible in rmem_dma_device_init. I couldn't

> find a proper way to access it during

> dma_reserved_default_memory setup under global pool.

> 

> Does that mean we should use a per-device memory pool instead of a

> global non-coherent pool ?


Indeed, that would be a problem in this case.  But if we can just
use the uncached offset directly I think everything will be much
simpler.
Atish Patra Aug. 2, 2021, 6:22 p.m. UTC | #5
On Tue, Jul 27, 2021 at 1:52 AM Christoph Hellwig <hch@lst.de> wrote:
>

> On Mon, Jul 26, 2021 at 03:47:54PM -0700, Atish Patra wrote:

> > arch_dma_set_uncached works as well in this case. However, mips,

> > niops2 & xtensa uses a

> > fixed (via config) value for the offset. Similar approach can't be

> > used here because the platform specific

> > offset value has to be determined at runtime so that a single kernel

> > image can boot on all platforms.

>

> Nothing in the interface requires a fixed offset.  And using the offset

> has one enormous advantage in that there is no need to declare a

> statically sized pool - allocations are fully dynamic.  And any kind of

> fixed pool tends to cause huge problems.

>

> > 1. a new DT property so that arch specific code is aware of the

> > non-cacheable window offset.

>

> Yes.

>

> > individual device if a per-device non-cacheable

> >    window support is required in future. As of now, the beagleV memory

>

> If you require a per-device noncachable area you can use the per-device

> coherent pools.  But why would you want that?

>

> > region lies in 0x10_0000_00000 - x17_FFFF_FFFF

> >    which is mapped to start of DRAM 0x80000000. All of the

> > non-coherent devices can do 32bit DMA only.

>

> Adjust ZONE_DMA32 so that it takes the uncached offset into account.

>

> > > > -     mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true);

> > > > +     if (phys_addr == device_addr)

> > > > +             mem = dma_init_coherent_memory(phys_addr, device_addr, size, true);

> > > > +     else

> > > > +             mem = dma_init_coherent_memory(phys_addr, device_addr, size, false);

> > >

> > > Nak.  The phys_addr != device_addr support is goign away.  This needs

> >

> > ok.

> >

> > > to be filled in using dma-ranges property hanging of the struct device.

> >

> > struct device is only accessible in rmem_dma_device_init. I couldn't

> > find a proper way to access it during

> > dma_reserved_default_memory setup under global pool.

> >

> > Does that mean we should use a per-device memory pool instead of a

> > global non-coherent pool ?

>

> Indeed, that would be a problem in this case.  But if we can just

> use the uncached offset directly I think everything will be much

> simpler.


Yes. I was planning to change this to use an uncached offset.
However, the planned mass production for beaglev starlight sbc is
cancelled now [1].
As there is no other board that requires an uncached offset, I don't
think there is no usecase
for adding uncached offset support for RISC-V right now. I will
revisit(hopefully we don't have to)
this in case any platform implements uncached window support in future.

[1] https://www.cnx-software.com/2021/07/31/beaglev-starlight-sbc-wont-be-mass-manufactured-redesigned-beaglev-risc-v-sbc-expected-in-q1-2022/
-- 
Regards,
Atish
diff mbox series

Patch

diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 97677df5408b..d0b33b1a76f0 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -9,6 +9,8 @@ 
 #include <linux/module.h>
 #include <linux/dma-direct.h>
 #include <linux/dma-map-ops.h>
+#include <linux/of_address.h>
+#include <linux/libfdt.h>
 
 struct dma_coherent_mem {
 	void		*virt_base;
@@ -302,19 +304,27 @@  int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *vaddr,
 					vaddr, size, ret);
 }
 
-int dma_init_global_coherent(phys_addr_t phys_addr, size_t size)
+static int __dma_init_global_coherent(phys_addr_t phys_addr, dma_addr_t device_addr, size_t size)
 {
 	struct dma_coherent_mem *mem;
 
-	mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true);
+	if (phys_addr == device_addr)
+		mem = dma_init_coherent_memory(phys_addr, device_addr, size, true);
+	else
+		mem = dma_init_coherent_memory(phys_addr, device_addr, size, false);
+
 	if (IS_ERR(mem))
 		return PTR_ERR(mem);
 	dma_coherent_default_memory = mem;
 	pr_info("DMA: default coherent area is set\n");
 	return 0;
 }
-#endif /* CONFIG_DMA_GLOBAL_POOL */
 
+int dma_init_global_coherent(phys_addr_t phys_addr, size_t size)
+{
+	return __dma_init_global_coherent(phys_addr, phys_addr, size);
+}
+#endif /* CONFIG_DMA_GLOBAL_POOL */
 /*
  * Support for reserved memory regions defined in device tree
  */
@@ -329,8 +339,8 @@  static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev)
 	if (!rmem->priv) {
 		struct dma_coherent_mem *mem;
 
-		mem = dma_init_coherent_memory(rmem->base, rmem->base,
-					       rmem->size, true);
+		mem = dma_init_coherent_memory(rmem->base, rmem->base, rmem->size, true);
+
 		if (IS_ERR(mem))
 			return PTR_ERR(mem);
 		rmem->priv = mem;
@@ -358,7 +368,7 @@  static int __init rmem_dma_setup(struct reserved_mem *rmem)
 	if (of_get_flat_dt_prop(node, "reusable", NULL))
 		return -EINVAL;
 
-#ifdef CONFIG_ARM
+#if defined(CONFIG_ARM) || defined(CONFIG_RISCV)
 	if (!of_get_flat_dt_prop(node, "no-map", NULL)) {
 		pr_err("Reserved memory: regions without no-map are not yet supported\n");
 		return -EINVAL;
@@ -382,10 +392,33 @@  static int __init rmem_dma_setup(struct reserved_mem *rmem)
 #ifdef CONFIG_DMA_GLOBAL_POOL
 static int __init dma_init_reserved_memory(void)
 {
+	struct device_node *np;
+	const struct bus_dma_region *map = NULL;
+	int ret;
+	int64_t uc_offset = 0;
+
 	if (!dma_reserved_default_memory)
 		return -ENOMEM;
-	return dma_init_global_coherent(dma_reserved_default_memory->base,
-					dma_reserved_default_memory->size);
+
+	/* dma-ranges is only valid for global pool i.e. dma-default is set */
+	np = of_find_node_with_property(NULL, "linux,dma-default");
+	if (!np)
+		goto global_init;
+	of_node_put(np);
+
+	ret = of_dma_get_range(np, &map);
+	if (ret < 0)
+		goto global_init;
+
+	/* Sanity check for the non-coherent global pool from uncached region */
+	if (map->dma_start == dma_reserved_default_memory->base &&
+	    map->size == dma_reserved_default_memory->size)
+		uc_offset = map->offset;
+
+global_init:
+	return __dma_init_global_coherent(dma_reserved_default_memory->base + uc_offset,
+					  dma_reserved_default_memory->base,
+					  dma_reserved_default_memory->size);
 }
 core_initcall(dma_init_reserved_memory);
 #endif /* CONFIG_DMA_GLOBAL_POOL */