mbox series

[00/18] BCM2835 DMA mapping cleanups and fixes

Message ID 20240524182702.1317935-1-dave.stevenson@raspberrypi.com
Headers show
Series BCM2835 DMA mapping cleanups and fixes | expand

Message

Dave Stevenson May 24, 2024, 6:26 p.m. UTC
Hi All

This series initially cleans up the BCM2835 DMA driver in preparation for
supporting the 40bit version. It then fixes up the incorrect mapping behaviour
we've had to date.

The cleanups are based on Stefan Wahren's RFC [1], with a couple of minor bugs
fixed, but stopping before actually adding the 40bit support. If we can sort
the mapping issue, it avoids having to have workarounds in the 40bit support.

The mapping issues were discussed in [2].
Up until this point all DMA users have been passing in dma addresses rather than
CPU physical addresses, and the DMA driver has been using those directly rather
than using dma_map_resource() to map them.
The DT has also been missing some of the required mappings in "dma-ranges", but
they have been present in "ranges". I've therefore duplicated the minimum amount
of of_dma_get_range and translate_phys_to_dma to be able to use "ranges" as 
discussed in that thread. I'm assuming that sort of code is not desirable in the
core code as it shouldn't be necessary, so keeping it contained within a driver
is the better solution.

When Andrea posted our downstream patches in [3], Robin Murphy stated that
dma_map_resource is the correct API, but as it currently doesn't check the
dma_range_map we need Sergey Semin's patch [4].
There seemed to be no follow up over the implications of it. I've therefore
included it in the series at least for discussion. If it's not acceptable then
I'm not sure of the route forward in fixing this mapping issue.

I'm expecting there to be some discussion, but also acknowledge that merging this
will need to be phased with the patches 1-13 needing to be merged before any of
14-17, and then 18 merged last to remove the workaround. I suspect that's the
least of my worries though.


I will apologise in advance if I don't respond immediately to comments - I'm
out of the office for the next week, but do appreciate any feedback.

Thanks
  Dave

[1] https://lore.kernel.org/linux-arm-kernel/13ec386b-2305-27da-9765-8fa3ad71146c@i2se.com/T/
[2] https://lore.kernel.org/linux-arm-kernel/CAPY8ntBua=wPVUj+SM0WGcUL0fT56uEHo8YZUTMB8Z54X_aPRw@mail.gmail.com/T/
[3] https://lore.kernel.org/lkml/cover.1706948717.git.andrea.porta@suse.com/T/
[4] https://lore.kernel.org/linux-iommu/20220610080802.11147-1-Sergey.Semin@baikalelectronics.ru/

Dave Stevenson (7):
  ARM: dts: bcm283x: Update to use dma-channel-mask
  dmaengine: bcm2835: Add function to handle DMA mapping
  dmaengine: bcm2835: Add backwards compatible handling until clients
    updated
  dmaengine: bcm2835: Use dma_map_resource to map addresses
  dmaengine: bcm2835: Read ranges if dma-ranges aren't mapped
  arm: dt: Add dma-ranges to the bcm283x platforms
  dmaengine: bcm2835: Revert the workaround for DMA addresses

Phil Elwell (4):
  mmc: bcm2835: Use phys addresses for slave DMA config
  spi: bcm2835: Use phys addresses for slave DMA config
  drm/vc4: Use phys addresses for slave DMA config
  ASoC: bcm2835-i2s: Use phys addresses for DAI DMA

Serge Semin (1):
  dma-direct: take dma-ranges/offsets into account in resource mapping

Stefan Wahren (6):
  dmaengine: bcm2835: Support common dma-channel-mask
  dmaengine: bcm2835: move CB info generation into separate function
  dmaengine: bcm2835: move CB final extra info generation into function
  dmaengine: bcm2835: make address increment platform independent
  dmaengine: bcm2385: drop info parameters
  dmaengine: bcm2835: pass dma_chan to generic functions

 arch/arm/boot/dts/broadcom/bcm2711.dtsi       |  14 +-
 .../arm/boot/dts/broadcom/bcm2835-common.dtsi |   2 +-
 arch/arm/boot/dts/broadcom/bcm2835.dtsi       |   3 +-
 arch/arm/boot/dts/broadcom/bcm2836.dtsi       |   3 +-
 arch/arm/boot/dts/broadcom/bcm2837.dtsi       |   3 +-
 drivers/dma/bcm2835-dma.c                     | 432 ++++++++++++++----
 drivers/gpu/drm/vc4/vc4_hdmi.c                |  15 +-
 drivers/mmc/host/bcm2835.c                    |  17 +-
 drivers/spi/spi-bcm2835.c                     |  23 +-
 kernel/dma/direct.c                           |   2 +-
 sound/soc/bcm/bcm2835-i2s.c                   |  18 +-
 11 files changed, 383 insertions(+), 149 deletions(-)

Comments

Christoph Hellwig May 28, 2024, 6:33 a.m. UTC | #1
On Fri, May 24, 2024 at 07:26:45PM +0100, Dave Stevenson wrote:
> From: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> 
> A basic device-specific linear memory mapping was introduced back in
> commit ("dma: Take into account dma_pfn_offset") as a single-valued offset
> preserved in the device.dma_pfn_offset field, which was initialized for
> instance by means of the "dma-ranges" DT property. Afterwards the
> functionality was extended to support more than one device-specific region
> defined in the device.dma_range_map list of maps. But all of these
> improvements concerned a single pointer, page or sg DMA-mapping methods,
> while the system resource mapping function turned to miss the
> corresponding modification. Thus the dma_direct_map_resource() method now
> just casts the CPU physical address to the device DMA address with no
> dma-ranges-based mapping taking into account, which is obviously wrong.
> Let's fix it by using the phys_to_dma_direct() method to get the
> device-specific bus address from the passed memory resource for the case
> of the directly mapped DMA.

My memory is getting a little bad, but as dma_direct_map_resource is
mostly used for (non-PCIe) peer to peer transfers, any kind of mapping
from the host address should be excluded.

(dma_direct_map_resource in general is a horrible interface and I'd
prefer everyone to switch to the map_sg based P2P support, but we
have plenty of users for it unfortunately)
Florian Fainelli June 5, 2024, 12:22 p.m. UTC | #2
On 5/24/2024 8:26 PM, Dave Stevenson wrote:
> Now the driver looks for the common dma-channel-mask property
> rather than the vendor-specific brcm,dma-channel-mask, update
> the dt files to follow suit.
> 
> Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>

Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Florian Fainelli June 5, 2024, 12:23 p.m. UTC | #3
On 5/24/2024 8:26 PM, 'Dave Stevenson' via BCM-KERNEL-FEEDBACK-LIST,PDL 
wrote:
> In order to use the dma_map_resource for mappings, add the
> dma-ranges to the relevant DT files.

Subject should be "arm: dts: " prefixed to be consistent with patch #2 
and prior submissions to those files. With that:

Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Florian Fainelli June 5, 2024, 12:24 p.m. UTC | #4
On 5/24/2024 8:26 PM, Dave Stevenson wrote:
> Hi All
> 
> This series initially cleans up the BCM2835 DMA driver in preparation for
> supporting the 40bit version. It then fixes up the incorrect mapping behaviour
> we've had to date.
> 
> The cleanups are based on Stefan Wahren's RFC [1], with a couple of minor bugs
> fixed, but stopping before actually adding the 40bit support. If we can sort
> the mapping issue, it avoids having to have workarounds in the 40bit support.
> 
> The mapping issues were discussed in [2].
> Up until this point all DMA users have been passing in dma addresses rather than
> CPU physical addresses, and the DMA driver has been using those directly rather
> than using dma_map_resource() to map them.
> The DT has also been missing some of the required mappings in "dma-ranges", but
> they have been present in "ranges". I've therefore duplicated the minimum amount
> of of_dma_get_range and translate_phys_to_dma to be able to use "ranges" as
> discussed in that thread. I'm assuming that sort of code is not desirable in the
> core code as it shouldn't be necessary, so keeping it contained within a driver
> is the better solution.
> 
> When Andrea posted our downstream patches in [3], Robin Murphy stated that
> dma_map_resource is the correct API, but as it currently doesn't check the
> dma_range_map we need Sergey Semin's patch [4].
> There seemed to be no follow up over the implications of it. I've therefore
> included it in the series at least for discussion. If it's not acceptable then
> I'm not sure of the route forward in fixing this mapping issue.
> 
> I'm expecting there to be some discussion, but also acknowledge that merging this
> will need to be phased with the patches 1-13 needing to be merged before any of
> 14-17, and then 18 merged last to remove the workaround. I suspect that's the
> least of my worries though.
> 
> 
> I will apologise in advance if I don't respond immediately to comments - I'm
> out of the office for the next week, but do appreciate any feedback.

Those patches should be routed via the dmaengine tree, including the DTS 
files to minimize the possibility of introducing regressions if people 
happen to bisect changes. I don't expect conflicts when these changes 
reach linux-next.
Frank Li June 5, 2024, 4:18 p.m. UTC | #5
On Fri, May 24, 2024 at 07:26:49PM +0100, Dave Stevenson wrote:
> From: Stefan Wahren <stefan.wahren@i2se.com>
> 
> Similar to the info generation, generate the final extra info with a
> separate function. This is necessary to introduce other platforms
> with different info bits.

Each patch commit is independent. 

Introduce common help function to generate the final extra info to reduce
duplicate codes in each DMA operation.


> 
> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
> Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
> ---
>  drivers/dma/bcm2835-dma.c | 34 ++++++++++++++++++++++++++++------
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index 7cef7ff89575..ef452ebb3c15 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -229,6 +229,29 @@ static u32 bcm2835_dma_prepare_cb_info(struct bcm2835_chan *c,
>  	return result;
>  }
>  
> +static u32 bcm2835_dma_prepare_cb_extra(struct bcm2835_chan *c,
> +					enum dma_transfer_direction direction,
> +					bool cyclic, bool final,
> +					unsigned long flags)
> +{
> +	u32 result = 0;
> +
> +	if (cyclic) {
> +		if (flags & DMA_PREP_INTERRUPT)
> +			result |= BCM2835_DMA_INT_EN;
> +	} else {
> +		if (!final)
> +			return 0;
> +
> +		result |= BCM2835_DMA_INT_EN;
> +
> +		if (direction == DMA_MEM_TO_MEM)
> +			result |= BCM2835_DMA_WAIT_RESP;
> +	}


move if (direction == DMA_MEM_TO_MEM) outof else branch. 
DMA_MEM_TO_MEM is impossible for cyclic. Reduce if level can help
easy to follow up.


	if (cyclic)
		...
	else
		...

	if (direction == DMA_MEM_TO_MEM)
		result |= BCM2835_DMA_WAIT_RESP; 



> +
> +	return result;
> +}
> +
>  static void bcm2835_dma_free_cb_chain(struct bcm2835_desc *desc)
>  {
>  	size_t i;
> @@ -644,7 +667,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	u32 info = bcm2835_dma_prepare_cb_info(c, DMA_MEM_TO_MEM, false);
> -	u32 extra = BCM2835_DMA_INT_EN | BCM2835_DMA_WAIT_RESP;
> +	u32 extra = bcm2835_dma_prepare_cb_extra(c, DMA_MEM_TO_MEM, false,
> +						 true, 0);
>  	size_t max_len = bcm2835_dma_max_frame_length(c);
>  	size_t frames;
>  
> @@ -675,7 +699,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
>  	struct bcm2835_desc *d;
>  	dma_addr_t src = 0, dst = 0;
>  	u32 info = bcm2835_dma_prepare_cb_info(c, direction, false);
> -	u32 extra = BCM2835_DMA_INT_EN;
> +	u32 extra = bcm2835_dma_prepare_cb_extra(c, direction, false, true, 0);
>  	size_t frames;
>  
>  	if (!is_slave_direction(direction)) {
> @@ -723,7 +747,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  	dma_addr_t src, dst;
>  	u32 info = bcm2835_dma_prepare_cb_info(c, direction,
>  					       buf_addr == od->zero_page);
> -	u32 extra = 0;
> +	u32 extra = bcm2835_dma_prepare_cb_extra(c, direction, true, true, 0);
>  	size_t max_len = bcm2835_dma_max_frame_length(c);
>  	size_t frames;
>  
> @@ -739,9 +763,7 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  		return NULL;
>  	}
>  
> -	if (flags & DMA_PREP_INTERRUPT)
> -		extra |= BCM2835_DMA_INT_EN;
> -	else
> +	if (!(flags & DMA_PREP_INTERRUPT))
>  		period_len = buf_len;
>  
>  	/*
> -- 
> 2.34.1
>
Frank Li June 5, 2024, 6 p.m. UTC | #6
On Fri, May 24, 2024 at 07:26:51PM +0100, Dave Stevenson wrote:
> From: Stefan Wahren <stefan.wahren@i2se.com>
> 
> The parameters info and finalextrainfo are platform specific. So drop
> them by generating them within bcm2835_dma_create_cb_chain().

Drop 'info' and 'finalextrainfo' because these can be generated by 
bcm2835_dma_create_cb_chain().

> 
> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
> Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
> ---
>  drivers/dma/bcm2835-dma.c | 83 +++++++++++++++++++--------------------
>  1 file changed, 40 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index d6c5a2762a46..e2f9c8692e6b 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -287,13 +287,11 @@ static void bcm2835_dma_desc_free(struct virt_dma_desc *vd)
>  		container_of(vd, struct bcm2835_desc, vd));
>  }
>  
> -static void bcm2835_dma_create_cb_set_length(
> -	struct bcm2835_chan *chan,
> -	struct bcm2835_dma_cb *control_block,
> -	size_t len,
> -	size_t period_len,
> -	size_t *total_len,
> -	u32 finalextrainfo)
> +static bool
> +bcm2835_dma_create_cb_set_length(struct bcm2835_chan *chan,
> +				 struct bcm2835_dma_cb *control_block,
> +				 size_t len, size_t period_len,
> +				 size_t *total_len)

Can you document this function, what's return value means? look like if
need extrainfo?

>  {
>  	size_t max_len = bcm2835_dma_max_frame_length(chan);
>  
> @@ -302,7 +300,7 @@ static void bcm2835_dma_create_cb_set_length(
>  
>  	/* finished if we have no period_length */
>  	if (!period_len)
> -		return;
> +		return false;
>  
>  	/*
>  	 * period_len means: that we need to generate
> @@ -316,7 +314,7 @@ static void bcm2835_dma_create_cb_set_length(
>  	if (*total_len + control_block->length < period_len) {
>  		/* update number of bytes in this period so far */
>  		*total_len += control_block->length;
> -		return;
> +		return false;
>  	}
>  
>  	/* calculate the length that remains to reach period_length */
> @@ -325,8 +323,7 @@ static void bcm2835_dma_create_cb_set_length(
>  	/* reset total_length for next period */
>  	*total_len = 0;
>  
> -	/* add extrainfo bits in info */
> -	control_block->info |= finalextrainfo;
> +	return true;
>  }
>  
>  static inline size_t bcm2835_dma_count_frames_for_sg(
> @@ -352,7 +349,6 @@ static inline size_t bcm2835_dma_count_frames_for_sg(
>   * @chan:           the @dma_chan for which we run this
>   * @direction:      the direction in which we transfer
>   * @cyclic:         it is a cyclic transfer
> - * @info:           the default info bits to apply per controlblock
>   * @frames:         number of controlblocks to allocate
>   * @src:            the src address to assign
>   * @dst:            the dst address to assign
> @@ -360,22 +356,24 @@ static inline size_t bcm2835_dma_count_frames_for_sg(
>   * @period_len:     the period length when to apply @finalextrainfo
>   *                  in addition to the last transfer
>   *                  this will also break some control-blocks early
> - * @finalextrainfo: additional bits in last controlblock
> - *                  (or when period_len is reached in case of cyclic)
>   * @gfp:            the GFP flag to use for allocation
> + * @flags
>   */
>  static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  	struct dma_chan *chan, enum dma_transfer_direction direction,
> -	bool cyclic, u32 info, u32 finalextrainfo, size_t frames,
> -	dma_addr_t src, dma_addr_t dst, size_t buf_len,
> -	size_t period_len, gfp_t gfp)
> +	bool cyclic, size_t frames, dma_addr_t src, dma_addr_t dst,
> +	size_t buf_len,	size_t period_len, gfp_t gfp, unsigned long flags)
>  {
> +	struct bcm2835_dmadev *od = to_bcm2835_dma_dev(chan->device);
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	size_t len = buf_len, total_len;
>  	size_t frame;
>  	struct bcm2835_desc *d;
>  	struct bcm2835_cb_entry *cb_entry;
>  	struct bcm2835_dma_cb *control_block;
> +	u32 extrainfo = bcm2835_dma_prepare_cb_extra(c, direction, cyclic,
> +						     false, flags);
> +	bool zero_page = false;
>  
>  	if (!frames)
>  		return NULL;
> @@ -389,6 +387,14 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  	d->dir = direction;
>  	d->cyclic = cyclic;
>  
> +	switch (direction) {
> +	case DMA_MEM_TO_MEM:
> +	case DMA_DEV_TO_MEM:
> +		break;
> +	default:
> +		zero_page = src == od->zero_page;
> +	}
> +
>  	/*
>  	 * Iterate over all frames, create a control block
>  	 * for each frame and link them together.
> @@ -402,7 +408,8 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  
>  		/* fill in the control block */
>  		control_block = cb_entry->cb;
> -		control_block->info = info;
> +		control_block->info = bcm2835_dma_prepare_cb_info(c, direction,
> +								  zero_page);
>  		control_block->src = src;
>  		control_block->dst = dst;
>  		control_block->stride = 0;
> @@ -410,10 +417,12 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  		/* set up length in control_block if requested */
>  		if (buf_len) {
>  			/* calculate length honoring period_length */
> -			bcm2835_dma_create_cb_set_length(
> -				c, control_block,
> -				len, period_len, &total_len,
> -				cyclic ? finalextrainfo : 0);
> +			if (bcm2835_dma_create_cb_set_length(c, control_block,
> +							     len, period_len,
> +							     &total_len)) {
> +				/* add extrainfo bits in info */
> +				control_block->info |= extrainfo;
> +			}
>  
>  			/* calculate new remaining length */
>  			len -= control_block->length;
> @@ -434,7 +443,9 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  	}
>  
>  	/* the last frame requires extra flags */
> -	d->cb_list[d->frames - 1].cb->info |= finalextrainfo;
> +	extrainfo = bcm2835_dma_prepare_cb_extra(c, direction, cyclic, true,
> +						 flags);
> +	d->cb_list[d->frames - 1].cb->info |= extrainfo;
>  
>  	/* detect a size missmatch */
>  	if (buf_len && (d->size != buf_len))
> @@ -682,9 +693,6 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
>  {
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
> -	u32 info = bcm2835_dma_prepare_cb_info(c, DMA_MEM_TO_MEM, false);
> -	u32 extra = bcm2835_dma_prepare_cb_extra(c, DMA_MEM_TO_MEM, false,
> -						 true, 0);
>  	size_t max_len = bcm2835_dma_max_frame_length(c);
>  	size_t frames;
>  
> @@ -696,9 +704,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_memcpy(
>  	frames = bcm2835_dma_frames_for_length(len, max_len);
>  
>  	/* allocate the CB chain - this also fills in the pointers */
> -	d = bcm2835_dma_create_cb_chain(chan, DMA_MEM_TO_MEM, false,
> -					info, extra, frames,
> -					src, dst, len, 0, GFP_KERNEL);
> +	d = bcm2835_dma_create_cb_chain(chan, DMA_MEM_TO_MEM, false, frames,
> +					src, dst, len, 0, GFP_KERNEL, 0);
>  	if (!d)
>  		return NULL;
>  
> @@ -714,8 +721,6 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	dma_addr_t src = 0, dst = 0;
> -	u32 info = bcm2835_dma_prepare_cb_info(c, direction, false);
> -	u32 extra = bcm2835_dma_prepare_cb_extra(c, direction, false, true, 0);
>  	size_t frames;
>  
>  	if (!is_slave_direction(direction)) {
> @@ -738,10 +743,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
>  	frames = bcm2835_dma_count_frames_for_sg(c, sgl, sg_len);
>  
>  	/* allocate the CB chain */
> -	d = bcm2835_dma_create_cb_chain(chan, direction, false,
> -					info, extra,
> -					frames, src, dst, 0, 0,
> -					GFP_NOWAIT);
> +	d = bcm2835_dma_create_cb_chain(chan, direction, false, frames, src,
> +					dst, 0, 0, GFP_NOWAIT, 0);
>  	if (!d)
>  		return NULL;
>  
> @@ -757,13 +760,9 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  	size_t period_len, enum dma_transfer_direction direction,
>  	unsigned long flags)
>  {
> -	struct bcm2835_dmadev *od = to_bcm2835_dma_dev(chan->device);
>  	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	dma_addr_t src, dst;
> -	u32 info = bcm2835_dma_prepare_cb_info(c, direction,
> -					       buf_addr == od->zero_page);
> -	u32 extra = bcm2835_dma_prepare_cb_extra(c, direction, true, true, 0);
>  	size_t max_len = bcm2835_dma_max_frame_length(c);
>  	size_t frames;
>  
> @@ -814,10 +813,8 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  	 * note that we need to use GFP_NOWAIT, as the ALSA i2s dmaengine
>  	 * implementation calls prep_dma_cyclic with interrupts disabled.
>  	 */
> -	d = bcm2835_dma_create_cb_chain(chan, direction, true,
> -					info, extra,
> -					frames, src, dst, buf_len,
> -					period_len, GFP_NOWAIT);
> +	d = bcm2835_dma_create_cb_chain(chan, direction, true, frames, src, dst,
> +					buf_len, period_len, GFP_NOWAIT, flags);
>  	if (!d)
>  		return NULL;
>  
> -- 
> 2.34.1
>
Frank Li June 5, 2024, 6:05 p.m. UTC | #7
On Fri, May 24, 2024 at 07:26:52PM +0100, Dave Stevenson wrote:
> From: Stefan Wahren <stefan.wahren@i2se.com>
> 
> In preparation to support more platforms pass the dma_chan to the
> generic functions. This provides access to the DMA device and possible
> platform specific data.

why need this change? you can easy convert between dma_chan and
bcm2835_chan.


> 
> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
> Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
> ---
>  drivers/dma/bcm2835-dma.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index e2f9c8692e6b..aefaa1f01d7f 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -288,12 +288,13 @@ static void bcm2835_dma_desc_free(struct virt_dma_desc *vd)
>  }
>  
>  static bool
> -bcm2835_dma_create_cb_set_length(struct bcm2835_chan *chan,
> +bcm2835_dma_create_cb_set_length(struct dma_chan *chan,
>  				 struct bcm2835_dma_cb *control_block,
>  				 size_t len, size_t period_len,
>  				 size_t *total_len)
>  {
> -	size_t max_len = bcm2835_dma_max_frame_length(chan);
> +	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
> +	size_t max_len = bcm2835_dma_max_frame_length(c);
>  
>  	/* set the length taking lite-channel limitations into account */
>  	control_block->length = min_t(u32, len, max_len);
> @@ -417,7 +418,7 @@ static struct bcm2835_desc *bcm2835_dma_create_cb_chain(
>  		/* set up length in control_block if requested */
>  		if (buf_len) {
>  			/* calculate length honoring period_length */
> -			if (bcm2835_dma_create_cb_set_length(c, control_block,
> +			if (bcm2835_dma_create_cb_set_length(chan, control_block,
>  							     len, period_len,
>  							     &total_len)) {
>  				/* add extrainfo bits in info */
> @@ -485,8 +486,9 @@ static void bcm2835_dma_fill_cb_chain_with_sg(
>  	}
>  }
>  
> -static void bcm2835_dma_abort(struct bcm2835_chan *c)
> +static void bcm2835_dma_abort(struct dma_chan *chan)
>  {
> +	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	void __iomem *chan_base = c->chan_base;
>  	long int timeout = 10000;
>  
> @@ -513,8 +515,9 @@ static void bcm2835_dma_abort(struct bcm2835_chan *c)
>  	writel(BCM2835_DMA_RESET, chan_base + BCM2835_DMA_CS);
>  }
>  
> -static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
> +static void bcm2835_dma_start_desc(struct dma_chan *chan)
>  {
> +	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct virt_dma_desc *vd = vchan_next_desc(&c->vc);
>  	struct bcm2835_desc *d;
>  
> @@ -533,7 +536,8 @@ static void bcm2835_dma_start_desc(struct bcm2835_chan *c)
>  
>  static irqreturn_t bcm2835_dma_callback(int irq, void *data)
>  {
> -	struct bcm2835_chan *c = data;
> +	struct dma_chan *chan = data;
> +	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
>  	struct bcm2835_desc *d;
>  	unsigned long flags;
>  
> @@ -566,7 +570,7 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
>  			vchan_cyclic_callback(&d->vd);
>  		} else if (!readl(c->chan_base + BCM2835_DMA_ADDR)) {
>  			vchan_cookie_complete(&c->desc->vd);
> -			bcm2835_dma_start_desc(c);
> +			bcm2835_dma_start_desc(chan);
>  		}
>  	}
>  
> @@ -594,7 +598,7 @@ static int bcm2835_dma_alloc_chan_resources(struct dma_chan *chan)
>  	}
>  
>  	return request_irq(c->irq_number, bcm2835_dma_callback,
> -			   c->irq_flags, "DMA IRQ", c);
> +			   c->irq_flags, "DMA IRQ", chan);
>  }
>  
>  static void bcm2835_dma_free_chan_resources(struct dma_chan *chan)
> @@ -682,7 +686,7 @@ static void bcm2835_dma_issue_pending(struct dma_chan *chan)
>  
>  	spin_lock_irqsave(&c->vc.lock, flags);
>  	if (vchan_issue_pending(&c->vc) && !c->desc)
> -		bcm2835_dma_start_desc(c);
> +		bcm2835_dma_start_desc(chan);
>  
>  	spin_unlock_irqrestore(&c->vc.lock, flags);
>  }
> @@ -846,7 +850,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
>  	if (c->desc) {
>  		vchan_terminate_vdesc(&c->desc->vd);
>  		c->desc = NULL;
> -		bcm2835_dma_abort(c);
> +		bcm2835_dma_abort(chan);
>  	}
>  
>  	vchan_get_all_descriptors(&c->vc, &head);
> -- 
> 2.34.1
>
Frank Li June 5, 2024, 6:13 p.m. UTC | #8
On Fri, May 24, 2024 at 07:26:53PM +0100, Dave Stevenson wrote:
> The code handling DMA mapping is currently incorrect and
> needs a sequence of fixups.

Can you descript what incorrect here? 

> Move the mapping out into a separate function and structure
> to allow for those fixes to be applied more cleanly.
> 
> Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
> ---
>  drivers/dma/bcm2835-dma.c | 46 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index aefaa1f01d7f..ef1d95bae84e 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -65,6 +65,10 @@ struct bcm2835_cb_entry {
>  	dma_addr_t paddr;
>  };
>  
> +struct bcm2835_dma_chan_map {
> +	dma_addr_t addr;
> +};
> +
>  struct bcm2835_chan {
>  	struct virt_dma_chan vc;
>  
> @@ -74,6 +78,7 @@ struct bcm2835_chan {
>  	int ch;
>  	struct bcm2835_desc *desc;
>  	struct dma_pool *cb_pool;
> +	struct bcm2835_dma_chan_map map;

I suppose map should in bcm2835_desc.  if put in chan, how about client
driver create two desc by bcm2835_dma_prep_slave_sg()?

prep_slave_sg()
submit()
prep_savle_sg()
submit()
issue_pending()

Frank

>  
>  	void __iomem *chan_base;
>  	int irq_number;
> @@ -268,6 +273,19 @@ static inline bool need_dst_incr(enum dma_transfer_direction direction)
>  	}
>  
>  	return false;
> +};
> +
> +static int bcm2835_dma_map_slave_addr(struct dma_chan *chan,
> +				      phys_addr_t dev_addr,
> +				      size_t dev_size,
> +				      enum dma_data_direction dev_dir)
> +{
> +	struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
> +	struct bcm2835_dma_chan_map *map = &c->map;
> +
> +	map->addr = dev_addr;
> +
> +	return 0;
>  }
>  
>  static void bcm2835_dma_free_cb_chain(struct bcm2835_desc *desc)
> @@ -734,13 +752,19 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
>  	}
>  
>  	if (direction == DMA_DEV_TO_MEM) {
> -		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> +		if (bcm2835_dma_map_slave_addr(chan, c->cfg.src_addr,
> +					       c->cfg.src_addr_width,
> +					       DMA_TO_DEVICE))
>  			return NULL;
> -		src = c->cfg.src_addr;
> +
> +		src = c->map.addr;
>  	} else {
> -		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> +		if (bcm2835_dma_map_slave_addr(chan, c->cfg.dst_addr,
> +					       c->cfg.dst_addr_width,
> +					       DMA_FROM_DEVICE))
>  			return NULL;
> -		dst = c->cfg.dst_addr;
> +
> +		dst = c->map.addr;
>  	}
>  
>  	/* count frames in sg list */
> @@ -795,14 +819,20 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
>  			      __func__, buf_len, period_len);
>  
>  	if (direction == DMA_DEV_TO_MEM) {
> -		if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> +		if (bcm2835_dma_map_slave_addr(chan, c->cfg.src_addr,
> +					       c->cfg.src_addr_width,
> +					       DMA_TO_DEVICE))
>  			return NULL;
> -		src = c->cfg.src_addr;
> +
> +		src = c->map.addr;
>  		dst = buf_addr;
>  	} else {
> -		if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> +		if (bcm2835_dma_map_slave_addr(chan, c->cfg.dst_addr,
> +					       c->cfg.dst_addr_width,
> +					       DMA_FROM_DEVICE))
>  			return NULL;
> -		dst = c->cfg.dst_addr;
> +
> +		dst = c->map.addr;
>  		src = buf_addr;
>  	}
>  
> -- 
> 2.34.1
>
Dave Stevenson June 24, 2024, 6:27 p.m. UTC | #9
On Wed, 5 Jun 2024 at 19:13, Frank Li <Frank.li@nxp.com> wrote:
>
> On Fri, May 24, 2024 at 07:26:53PM +0100, Dave Stevenson wrote:
> > The code handling DMA mapping is currently incorrect and
> > needs a sequence of fixups.
>
> Can you descript what incorrect here?

Clients are passing in DMA addresses, not CPU physical addresses. I'll
update the commit message.

> > Move the mapping out into a separate function and structure
> > to allow for those fixes to be applied more cleanly.
> >
> > Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
> > ---
> >  drivers/dma/bcm2835-dma.c | 46 ++++++++++++++++++++++++++++++++-------
> >  1 file changed, 38 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> > index aefaa1f01d7f..ef1d95bae84e 100644
> > --- a/drivers/dma/bcm2835-dma.c
> > +++ b/drivers/dma/bcm2835-dma.c
> > @@ -65,6 +65,10 @@ struct bcm2835_cb_entry {
> >       dma_addr_t paddr;
> >  };
> >
> > +struct bcm2835_dma_chan_map {
> > +     dma_addr_t addr;
> > +};
> > +
> >  struct bcm2835_chan {
> >       struct virt_dma_chan vc;
> >
> > @@ -74,6 +78,7 @@ struct bcm2835_chan {
> >       int ch;
> >       struct bcm2835_desc *desc;
> >       struct dma_pool *cb_pool;
> > +     struct bcm2835_dma_chan_map map;
>
> I suppose map should in bcm2835_desc.  if put in chan, how about client
> driver create two desc by bcm2835_dma_prep_slave_sg()?
>
> prep_slave_sg()
> submit()
> prep_savle_sg()
> submit()
> issue_pending()

I'm basing this on rcar-dmac.c which has a similar mode of operation.

For devices such as HDMI audio, I2S, SPI, etc, the peripheral's
address is constant. Mapping and unmapping adds an overhead. Retaining
the mapping in the chan structure allows the mapping to be cached
whilst the address remains the same, and is released whenever the
address changes.

If the map is in bcm2835_desc then as the desc is freed after every
transfer you'd have to unmap.

  Dave

> Frank
>
> >
> >       void __iomem *chan_base;
> >       int irq_number;
> > @@ -268,6 +273,19 @@ static inline bool need_dst_incr(enum dma_transfer_direction direction)
> >       }
> >
> >       return false;
> > +};
> > +
> > +static int bcm2835_dma_map_slave_addr(struct dma_chan *chan,
> > +                                   phys_addr_t dev_addr,
> > +                                   size_t dev_size,
> > +                                   enum dma_data_direction dev_dir)
> > +{
> > +     struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
> > +     struct bcm2835_dma_chan_map *map = &c->map;
> > +
> > +     map->addr = dev_addr;
> > +
> > +     return 0;
> >  }
> >
> >  static void bcm2835_dma_free_cb_chain(struct bcm2835_desc *desc)
> > @@ -734,13 +752,19 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_slave_sg(
> >       }
> >
> >       if (direction == DMA_DEV_TO_MEM) {
> > -             if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> > +             if (bcm2835_dma_map_slave_addr(chan, c->cfg.src_addr,
> > +                                            c->cfg.src_addr_width,
> > +                                            DMA_TO_DEVICE))
> >                       return NULL;
> > -             src = c->cfg.src_addr;
> > +
> > +             src = c->map.addr;
> >       } else {
> > -             if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> > +             if (bcm2835_dma_map_slave_addr(chan, c->cfg.dst_addr,
> > +                                            c->cfg.dst_addr_width,
> > +                                            DMA_FROM_DEVICE))
> >                       return NULL;
> > -             dst = c->cfg.dst_addr;
> > +
> > +             dst = c->map.addr;
> >       }
> >
> >       /* count frames in sg list */
> > @@ -795,14 +819,20 @@ static struct dma_async_tx_descriptor *bcm2835_dma_prep_dma_cyclic(
> >                             __func__, buf_len, period_len);
> >
> >       if (direction == DMA_DEV_TO_MEM) {
> > -             if (c->cfg.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> > +             if (bcm2835_dma_map_slave_addr(chan, c->cfg.src_addr,
> > +                                            c->cfg.src_addr_width,
> > +                                            DMA_TO_DEVICE))
> >                       return NULL;
> > -             src = c->cfg.src_addr;
> > +
> > +             src = c->map.addr;
> >               dst = buf_addr;
> >       } else {
> > -             if (c->cfg.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> > +             if (bcm2835_dma_map_slave_addr(chan, c->cfg.dst_addr,
> > +                                            c->cfg.dst_addr_width,
> > +                                            DMA_FROM_DEVICE))
> >                       return NULL;
> > -             dst = c->cfg.dst_addr;
> > +
> > +             dst = c->map.addr;
> >               src = buf_addr;
> >       }
> >
> > --
> > 2.34.1
> >
Dave Stevenson June 25, 2024, 4:21 p.m. UTC | #10
Hi Christoph

Sorry for the delay in coming back to you.

On Tue, 28 May 2024 at 07:33, Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, May 24, 2024 at 07:26:45PM +0100, Dave Stevenson wrote:
> > From: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> >
> > A basic device-specific linear memory mapping was introduced back in
> > commit ("dma: Take into account dma_pfn_offset") as a single-valued offset
> > preserved in the device.dma_pfn_offset field, which was initialized for
> > instance by means of the "dma-ranges" DT property. Afterwards the
> > functionality was extended to support more than one device-specific region
> > defined in the device.dma_range_map list of maps. But all of these
> > improvements concerned a single pointer, page or sg DMA-mapping methods,
> > while the system resource mapping function turned to miss the
> > corresponding modification. Thus the dma_direct_map_resource() method now
> > just casts the CPU physical address to the device DMA address with no
> > dma-ranges-based mapping taking into account, which is obviously wrong.
> > Let's fix it by using the phys_to_dma_direct() method to get the
> > device-specific bus address from the passed memory resource for the case
> > of the directly mapped DMA.
>
> My memory is getting a little bad, but as dma_direct_map_resource is
> mostly used for (non-PCIe) peer to peer transfers, any kind of mapping
> from the host address should be excluded.

Could you elaborate on mapping from the host address being excluded?
On BCM283x DMA address != CPU physical address, so some mapping has to occur.

Robin Murphy directed us at dma_map_resource() in [1], and referenced
this patch as necessary because dma_map_resource() didn't currently
use dma-ranges mappings.
Mark Brown also hadn't corrected/objected to the statement that
dma_map_resource() was the correct call when I was querying how to
tackle this historic mismatch in [2].

I'll happily defer to the experts on DMA (I would never classify
myself as such), but I'm not clear on the direction you want here.

[1] https://lore.kernel.org/lkml/ee19a95d-fe1e-4f3f-bc81-bdef38475469@arm.com/
[2] https://lore.kernel.org/linux-arm-kernel/CAPY8ntBua=wPVUj+SM0WGcUL0fT56uEHo8YZUTMB8Z54X_aPRw@mail.gmail.com/T/

> (dma_direct_map_resource in general is a horrible interface and I'd
> prefer everyone to switch to the map_sg based P2P support, but we
> have plenty of users for it unfortunately)

Is that applicable for mapping device addresses with DMA_DEV_TO_MEM or
DMA_MEM_TO_DEV transfers?
Example use case on BCM283x is HDMI audio where the HDMI driver should
be passing in the CPU physical address of the audio FIFO, and that
needs to be mapped to the DMA address for the DMA controller. How do I
get a sglist for the peripheral address?

As noted in the cover letter for this series, if this isn't the
approved mechanism, then please let me know what is.

Many thanks
  Dave