Message ID | 20191209094332.4047-1-peter.ujfalusi@ti.com |
---|---|
Headers | show |
Series | dmaengine/soc: Add Texas Instruments UDMA support | expand |
On 09/12/2019 11:43, Peter Ujfalusi wrote: > The system controller's resource manager have support for configuring the > TDTYPE of TCHAN_CFG register on j721e. > With this parameter the teardown completion can be controlled: > TDTYPE == 0: Return without waiting for peer to complete the teardown > TDTYPE == 1: Wait for peer to complete the teardown > > Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Hi Peter, You somehow dropped my reviewed by tag from this patch, this appears identical to the v6 one. So, Reviewed-by: Tero Kristo <t-kristo@ti.com> > --- > drivers/firmware/ti_sci.c | 1 + > drivers/firmware/ti_sci.h | 7 +++++++ > include/linux/soc/ti/ti_sci_protocol.h | 2 ++ > 3 files changed, 10 insertions(+) > > diff --git a/drivers/firmware/ti_sci.c b/drivers/firmware/ti_sci.c > index 4126be9e3216..f13e4a96f3b7 100644 > --- a/drivers/firmware/ti_sci.c > +++ b/drivers/firmware/ti_sci.c > @@ -2412,6 +2412,7 @@ static int ti_sci_cmd_rm_udmap_tx_ch_cfg(const struct ti_sci_handle *handle, > req->fdepth = params->fdepth; > req->tx_sched_priority = params->tx_sched_priority; > req->tx_burst_size = params->tx_burst_size; > + req->tx_tdtype = params->tx_tdtype; > > ret = ti_sci_do_xfer(info, xfer); > if (ret) { > diff --git a/drivers/firmware/ti_sci.h b/drivers/firmware/ti_sci.h > index f0d068c03944..255327171dae 100644 > --- a/drivers/firmware/ti_sci.h > +++ b/drivers/firmware/ti_sci.h > @@ -910,6 +910,7 @@ struct rm_ti_sci_msg_udmap_rx_flow_opt_cfg { > * 12 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::tx_credit_count > * 13 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::fdepth > * 14 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::tx_burst_size > + * 15 - Valid bit for @ref ti_sci_msg_rm_udmap_tx_ch_cfg::tx_tdtype > * > * @nav_id: SoC device ID of Navigator Subsystem where tx channel is located > * > @@ -973,6 +974,11 @@ struct rm_ti_sci_msg_udmap_rx_flow_opt_cfg { > * > * @tx_burst_size: UDMAP transmit channel burst size configuration to be > * programmed into the tx_burst_size field of the TCHAN_TCFG register. > + * > + * @tx_tdtype: UDMAP transmit channel teardown type configuration to be > + * programmed into the tdtype field of the TCHAN_TCFG register: > + * 0 - Return immediately > + * 1 - Wait for completion message from remote peer > */ > struct ti_sci_msg_rm_udmap_tx_ch_cfg_req { > struct ti_sci_msg_hdr hdr; > @@ -994,6 +1000,7 @@ struct ti_sci_msg_rm_udmap_tx_ch_cfg_req { > u16 fdepth; > u8 tx_sched_priority; > u8 tx_burst_size; > + u8 tx_tdtype; > } __packed; > > /** > diff --git a/include/linux/soc/ti/ti_sci_protocol.h b/include/linux/soc/ti/ti_sci_protocol.h > index 9531ec823298..f3aed0b91564 100644 > --- a/include/linux/soc/ti/ti_sci_protocol.h > +++ b/include/linux/soc/ti/ti_sci_protocol.h > @@ -342,6 +342,7 @@ struct ti_sci_msg_rm_udmap_tx_ch_cfg { > #define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID BIT(11) > #define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_CREDIT_COUNT_VALID BIT(12) > #define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FDEPTH_VALID BIT(13) > +#define TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_TDTYPE_VALID BIT(15) > u16 nav_id; > u16 index; > u8 tx_pause_on_err; > @@ -359,6 +360,7 @@ struct ti_sci_msg_rm_udmap_tx_ch_cfg { > u16 fdepth; > u8 tx_sched_priority; > u8 tx_burst_size; > + u8 tx_tdtype; > }; > > /** > -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 09/12/19 3:13 pm, Peter Ujfalusi wrote: > Hi, > > Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent > upstream over a month ago: > https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/ > > I'm still waiting on it's fate (Tero has given his r-b). > The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and > let the maintainers decide if it can go via DMAengine for 5.6 or to later > releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch). Tested this series for sa2ul crypto for AES & 3DES which need the metadata implementation by this series for sa2ul specific functionalities. FWIW: Tested-by: Keerthy <j-keerthy@ti.com> > > Changes since v6: > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*) > > - UDMAP DMAengine driver: > - Squashed the split patches > - Squashed the early TX completion handling update > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*) > - Hard reset fix for RX channels to avoid channel lockdown > - Correct completed descriptor's residue value > > Changes since v5: > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*) > - Based on 5.4 > > - cppi5 header > - clear the bits before setting new value with '|=' > > - UDMAP DT bindings: > - valid compatibles as single enum list > > - UDMAP DMAengine driver: > - Fix udma_is_chan_running() > - Use flags for acc32, burst support instead of a bool in udma_match_data > struct > - TDTYPE handling (teardown completion handling for j721e) is moved to separate > patch as the tisci core patch has not moved for over a month. > Both ti_sci and the iterative patch to udma is included in the series. > > Changes since v4 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*) > - Based on 5.4-rc7 > > - ringacc DT bindings: > - clarify the meaning of ti,sci-dev-id > > - ringacc driver: > - Remove 'default y' from Kconfig > - Fix struct comments > - Move try_module_get() earlier in k3_ringacc_request_ring() > > - PSI-L thread database: > - Add kernel style struct/enum documentation > - Add missing thread description for sa2ul second interface > - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL > > - UDMAP DT bindings: > - move to dual license > - change compatible from const to enum > - items dropped for ti,sci-rm-ranges-* > - description text moved from literal block when it is sensible > - example fixed to compile cleanly > - added parent to provide correct address-cells > - navss is moved to simple-mfd from simple-bus > > - UDMAP DMAengine driver: > - move fd_ring/r_ring under rflow > - get rid of unused iomem for rflows > - Remove 'default y' from Kconfig > - Use defines for rflow src/dst tag selection > - Merge the udma_ring_callback() and udma_tr_event_callback() to their > corresponding interrupt handler > - Create new defines for tx/rx channel's tisci valid parameter flags > - Remove re-initialization to 0 of tisci request struct members > - Make sure that vchan tasklets are also stopped when removing the module > - Additional checkpatch --strict fixes when it made sense > - make W=1 was clean > > - UDMAP glue layer: > - Remove 'default y' from Kconfig > - commit message update for features needing the glue layer > > Changes since v3 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*): > - Based on 5.4-rc5 > - Fixed typos pointed out by Tero > - Added reviewed-by tags from Tero > > - ring accelerator driver > - TODO_GS is removed from the header > - pm_runtime removed as NAVSS and it's components are always on > - Check validity of Message mode setup (element size > 8 bytes must use proxy) > > - cppi5 header > - add commit message > > - UDMAP DT bindings > - Drop the psil-config node use on the remote PSI-L side and use only one cell > which is the remote threadID: > > dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>; > dma-names = "tx", "rx"; > > - The PSI-L thread configuration description is moved to kernel as a new module: > k3-psil/k3-psil-am654/k3-psil-j721e > - ti,psil-base has been removed and moved to kernel > - removed the no longer needed dt-bindings/dma/k3-udma.h > - Convert the document to schema (yaml) > > - NEW PSI-L endpoint configuration database > - a simple database holding the remote end's configuration needed for UDMAP > configuration. All previous parameters from DT has been moved here and merged > with the linux only tr mode channel flag. > - Client drivers can update the remote endpoint configuration as it can be > different based on system configuration and the endpoint itself is under the > control of the peripheral driver. > - database for am654 and j721e > > - UDMAP DMAengine driver > - pm_runtime removed as NAVSS and it's components are always on > - rchan_oes_offset added to MSI dommain allocation > - Use the new PSI-L endpoint database for UDMAP configuration > - Support for waiting for PDMA teardown completion on j721e instead of > returning right away. depends on: > https://lkml.org/lkml/2019/10/25/189 > Not included in this series, but it is in the branch I have prepared. > - psil-base is moved from DT to be part of udma_match_data > - tr_thread maps is removed and using the PSI-L endpoint configuration for it > > - UDMAP glue layer > - pm_runtime removed as NAVSS and it's components are always on > - Use the new PSI-L endpoint database for UDMAP configuration > > Changes since v2 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*) > - Based on 5.4-rc1 > - Support for Flow only data transfer for the glue layer > > - cppi5 header > - comments converted to kernel-doc style > - Remove the excessive WARN_ONs and rely on the user for sanity > - new macro for checking TearDown Completion Message > > - ring accelerator driver > - fixed up th commit message (SoB, TI-SCI) > - fixed ring reset > - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions > and use dev_dbg() > - k3_ringacc_ring_dump() is moved to static > - step numbering removed from k3_ringacc_ring_reset_dma() > - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg() > - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined > - K3_RINGACC_RING_MODE_QM is removed as it is not supported > > - UDMAP DT bindings > - Fix property prefixing: s/pdma,/ti,pdma- > - Add ti,notdpkt property to suppress teardown completion message on tchan > - example updated accordingly > > - UDMAP DMAengine driver > - Change __raw_readl/writel to readl/writel > - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci > configuration functions for clarity > - DT bindings change: s/pdma,/ti,pdma- > - Cleanup of udma_tx_status(): > - residue calculation fix for m2m > - no need to read packet counter as it is not used > - peer byte counter only available in PDMAs > - Proper locking to avoid race with interrupt handler (polled m2m fix) > - Support for ti,notdpkt > - RFLOW management rework to support data movement without channel: > - the channel is not controlled by Linux but other core and we only have > rflows and rings to do the DMA transfers. > This mode is only supported by the Glue layer for now. > > - UDMAP glue layer > - Debug print improvements > - Support for rflow/ring only data movement > > Changes since v1 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*) > - Added support for j721e > - Based on 5.3-rc2 > - dropped ti_sci API patch for RM management as it is already upstream > - dropped dmadev_get_slave_channel() patch, using __dma_request_channel() > - Added Rob's Reviewed-by to ringacc DT binding document patch > - DT bindings changes: > - linux,udma-mode is gone, I have a simple lookup table in the driver to flag > TR channels. > - Support for j721e > - Fix bug in of_node_put() handling in xlate function > > Changes since RFC (https://patchwork.kernel.org/cover/10612465/): > - Based on linux-next (20190506) which now have the ti_sci interrupt support > - The series can be applied and the UDMA via DMAengine API will be functional > - Included in the series: ti_sci Resource management API, cppi5 header and > driver for the ring accelerator. > - The DMAengine core patches have been updated as per the review comments for > earlier submittion. > - The DMAengine driver patch is artificially split up to 6 smaller patches > > The k3-udma driver implements the Data Movement Architecture described in > AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and > j721e TRM (http://www.ti.com/lit/pdf/spruil1) > > This DMA architecture is a big departure from 'traditional' architecture where > we had either EDMA or sDMA as system DMA. > > Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2) > or USB (am335x) while other peripherals were serviced by EDMA. > > In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the > SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). > > The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface) > and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and > TR mode (similar to EDMA descriptor). > The data movement is done within a PSI-L fabric, peripherals (including the > UDMA-P) are not addressed by their I/O register as with traditional DMAs but > with their PSI-L thread ID. > > In AM65x/j721e we have two main type of peripherals: > Legacy: McASP, McSPI, UART, etc. > to provide connectivity they are serviced by PDMA (Peripheral DMA) > PDMA threads are locked to service a given peripheral, for example PSI-L thread > 0x4400/0xc400 is to service McASP0 rx/tx. > The PDMa configuration can be done via the UDMA Real Time Peer registers. > Native: Networking, security accelerator > these peripherals have native support for PSI-L. > > To be able to use the DMA the following generic steps need to be taken: > - configure a DMA channel (tchan for TX, rchan for RX) > - channel mode: Packet or TR mode > - for memcpy a tchan and rchan pair is used. > - for packet mode RX we also need to configure a receive flow to configure the > packet receiption > - the source and destination threads must be paired > - at minimum one pair of rings need to be configured: > - tx: transfer ring and transfer completion ring > - rx: free descriptor ring and receive ring > - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring > - If the channel is in packet mode or configured to memcpy then we only need > one interrupt from the ring, events from UDMAP is not used. > > When the channel setup is completed we only interract with the rings: > - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by > the UDMA-P > - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to > the r_ring. > > Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the > case in previous DMAs we need to report the amount of data held in these FIFOs > to clients (delay calculation for ALSA, UART FIFO flush support). > > Metadata support: > DMAengine user driver was posted upstream based/tested on the v1 of the UDMA > series: https://lkml.org/lkml/2019/6/28/20 > SA2UL is using the metadata DMAengine API. > > Note on the last patch: > In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case > anymore and the DMAengine API currently missing support for the features we > would need to support networking, things like > - support for receive descriptor 'classification' > - we need to support several receive queues for a channel. > - the queues are used for packet priority handling for example, but they can be > used to have pools of descriptors for different sizes. > - out of order completion of descriptors on a channel > - when we have several queues to handle different priority packets the > descriptors will be completed 'out-of-order' > - NAPI type of operation (polling instead of interrupt driven transfer) > - without this we can not sustain gigabit speeds and we need to support NAPI > - not to limit this to networking, but other high performance operations > > It is my intention to work on these to be able to remove the 'glue' layer and > switch to DMAengine API - or have an API aside of DMAengine to have generic way > to support networking, but given how controversial and not trivial these changes > are we need something to support networking. > > The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is > available: > https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1 > > Regards, > Peter > --- > Grygorii Strashko (3): > bindings: soc: ti: add documentation for k3 ringacc > soc: ti: k3: add navss ringacc driver > dmaengine: ti: k3-udma: Add glue layer for non DMAengine users > > Peter Ujfalusi (9): > dmaengine: doc: Add sections for per descriptor metadata support > dmaengine: Add metadata_ops for dma_async_tx_descriptor > dmaengine: Add support for reporting DMA cached data amount > dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA > dmaengine: ti: k3 PSI-L remote endpoint configuration > dt-bindings: dma: ti: Add document for K3 UDMA > dmaengine: ti: New driver for K3 UDMA > firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx > channel > dmaengine: ti: k3-udma: Wait for peer teardown completion if supported > > .../devicetree/bindings/dma/ti/k3-udma.yaml | 185 + > .../devicetree/bindings/soc/ti/k3-ringacc.txt | 59 + > Documentation/driver-api/dmaengine/client.rst | 75 + > .../driver-api/dmaengine/provider.rst | 46 + > drivers/dma/dmaengine.c | 73 + > drivers/dma/dmaengine.h | 8 + > drivers/dma/ti/Kconfig | 24 + > drivers/dma/ti/Makefile | 3 + > drivers/dma/ti/k3-psil-am654.c | 175 + > drivers/dma/ti/k3-psil-j721e.c | 222 ++ > drivers/dma/ti/k3-psil-priv.h | 39 + > drivers/dma/ti/k3-psil.c | 97 + > drivers/dma/ti/k3-udma-glue.c | 1198 ++++++ > drivers/dma/ti/k3-udma-private.c | 133 + > drivers/dma/ti/k3-udma.c | 3452 +++++++++++++++++ > drivers/dma/ti/k3-udma.h | 151 + > drivers/firmware/ti_sci.c | 1 + > drivers/firmware/ti_sci.h | 7 + > drivers/soc/ti/Kconfig | 11 + > drivers/soc/ti/Makefile | 1 + > drivers/soc/ti/k3-ringacc.c | 1180 ++++++ > include/linux/dma/k3-psil.h | 71 + > include/linux/dma/k3-udma-glue.h | 134 + > include/linux/dma/ti-cppi5.h | 1061 +++++ > include/linux/dmaengine.h | 110 + > include/linux/soc/ti/k3-ringacc.h | 244 ++ > include/linux/soc/ti/ti_sci_protocol.h | 2 + > 27 files changed, 8762 insertions(+) > create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml > create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt > create mode 100644 drivers/dma/ti/k3-psil-am654.c > create mode 100644 drivers/dma/ti/k3-psil-j721e.c > create mode 100644 drivers/dma/ti/k3-psil-priv.h > create mode 100644 drivers/dma/ti/k3-psil.c > create mode 100644 drivers/dma/ti/k3-udma-glue.c > create mode 100644 drivers/dma/ti/k3-udma-private.c > create mode 100644 drivers/dma/ti/k3-udma.c > create mode 100644 drivers/dma/ti/k3-udma.h > create mode 100644 drivers/soc/ti/k3-ringacc.c > create mode 100644 include/linux/dma/k3-psil.h > create mode 100644 include/linux/dma/k3-udma-glue.h > create mode 100644 include/linux/dma/ti-cppi5.h > create mode 100644 include/linux/soc/ti/k3-ringacc.h >
On 09/12/2019 11.43, Peter Ujfalusi wrote: > Hi, > > Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent > upstream over a month ago: > https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/ > > I'm still waiting on it's fate (Tero has given his r-b). > The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and > let the maintainers decide if it can go via DMAengine for 5.6 or to later > releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch). > > Changes since v6: > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*) > > - UDMAP DMAengine driver: > - Squashed the split patches > - Squashed the early TX completion handling update > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*) > - Hard reset fix for RX channels to avoid channel lockdown > - Correct completed descriptor's residue value I got build failure with allmodconfig: ERROR: "devm_ti_sci_get_of_resource" [drivers/soc/ti/k3-ringacc.ko] undefined! ERROR: "of_msi_get_domain" [drivers/soc/ti/k3-ringacc.ko] undefined! ERROR: "devm_ti_sci_get_of_resource" [drivers/dma/ti/k3-udma.ko] undefined! ERROR: "of_msi_get_domain" [drivers/dma/ti/k3-udma.ko] undefined! They are because both devm_ti_sci_get_of_resource and of_msi_get_domain is missing EXPORT_SYMBOL_GPL(), so they can not be used from modules. There were patches in the past to add it for of_msi_get_domain: https://lore.kernel.org/patchwork/patch/668123/ https://lore.kernel.org/patchwork/patch/716046/ I can not find a reason why these are not merged. Matthias's patch looks to be the earlier one, is it OK if I resend it within v8? > Changes since v5: > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*) > - Based on 5.4 > > - cppi5 header > - clear the bits before setting new value with '|=' > > - UDMAP DT bindings: > - valid compatibles as single enum list > > - UDMAP DMAengine driver: > - Fix udma_is_chan_running() > - Use flags for acc32, burst support instead of a bool in udma_match_data > struct > - TDTYPE handling (teardown completion handling for j721e) is moved to separate > patch as the tisci core patch has not moved for over a month. > Both ti_sci and the iterative patch to udma is included in the series. > > Changes since v4 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*) > - Based on 5.4-rc7 > > - ringacc DT bindings: > - clarify the meaning of ti,sci-dev-id > > - ringacc driver: > - Remove 'default y' from Kconfig > - Fix struct comments > - Move try_module_get() earlier in k3_ringacc_request_ring() > > - PSI-L thread database: > - Add kernel style struct/enum documentation > - Add missing thread description for sa2ul second interface > - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL > > - UDMAP DT bindings: > - move to dual license > - change compatible from const to enum > - items dropped for ti,sci-rm-ranges-* > - description text moved from literal block when it is sensible > - example fixed to compile cleanly > - added parent to provide correct address-cells > - navss is moved to simple-mfd from simple-bus > > - UDMAP DMAengine driver: > - move fd_ring/r_ring under rflow > - get rid of unused iomem for rflows > - Remove 'default y' from Kconfig > - Use defines for rflow src/dst tag selection > - Merge the udma_ring_callback() and udma_tr_event_callback() to their > corresponding interrupt handler > - Create new defines for tx/rx channel's tisci valid parameter flags > - Remove re-initialization to 0 of tisci request struct members > - Make sure that vchan tasklets are also stopped when removing the module > - Additional checkpatch --strict fixes when it made sense > - make W=1 was clean > > - UDMAP glue layer: > - Remove 'default y' from Kconfig > - commit message update for features needing the glue layer > > Changes since v3 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*): > - Based on 5.4-rc5 > - Fixed typos pointed out by Tero > - Added reviewed-by tags from Tero > > - ring accelerator driver > - TODO_GS is removed from the header > - pm_runtime removed as NAVSS and it's components are always on > - Check validity of Message mode setup (element size > 8 bytes must use proxy) > > - cppi5 header > - add commit message > > - UDMAP DT bindings > - Drop the psil-config node use on the remote PSI-L side and use only one cell > which is the remote threadID: > > dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>; > dma-names = "tx", "rx"; > > - The PSI-L thread configuration description is moved to kernel as a new module: > k3-psil/k3-psil-am654/k3-psil-j721e > - ti,psil-base has been removed and moved to kernel > - removed the no longer needed dt-bindings/dma/k3-udma.h > - Convert the document to schema (yaml) > > - NEW PSI-L endpoint configuration database > - a simple database holding the remote end's configuration needed for UDMAP > configuration. All previous parameters from DT has been moved here and merged > with the linux only tr mode channel flag. > - Client drivers can update the remote endpoint configuration as it can be > different based on system configuration and the endpoint itself is under the > control of the peripheral driver. > - database for am654 and j721e > > - UDMAP DMAengine driver > - pm_runtime removed as NAVSS and it's components are always on > - rchan_oes_offset added to MSI dommain allocation > - Use the new PSI-L endpoint database for UDMAP configuration > - Support for waiting for PDMA teardown completion on j721e instead of > returning right away. depends on: > https://lkml.org/lkml/2019/10/25/189 > Not included in this series, but it is in the branch I have prepared. > - psil-base is moved from DT to be part of udma_match_data > - tr_thread maps is removed and using the PSI-L endpoint configuration for it > > - UDMAP glue layer > - pm_runtime removed as NAVSS and it's components are always on > - Use the new PSI-L endpoint database for UDMAP configuration > > Changes since v2 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*) > - Based on 5.4-rc1 > - Support for Flow only data transfer for the glue layer > > - cppi5 header > - comments converted to kernel-doc style > - Remove the excessive WARN_ONs and rely on the user for sanity > - new macro for checking TearDown Completion Message > > - ring accelerator driver > - fixed up th commit message (SoB, TI-SCI) > - fixed ring reset > - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions > and use dev_dbg() > - k3_ringacc_ring_dump() is moved to static > - step numbering removed from k3_ringacc_ring_reset_dma() > - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg() > - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined > - K3_RINGACC_RING_MODE_QM is removed as it is not supported > > - UDMAP DT bindings > - Fix property prefixing: s/pdma,/ti,pdma- > - Add ti,notdpkt property to suppress teardown completion message on tchan > - example updated accordingly > > - UDMAP DMAengine driver > - Change __raw_readl/writel to readl/writel > - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci > configuration functions for clarity > - DT bindings change: s/pdma,/ti,pdma- > - Cleanup of udma_tx_status(): > - residue calculation fix for m2m > - no need to read packet counter as it is not used > - peer byte counter only available in PDMAs > - Proper locking to avoid race with interrupt handler (polled m2m fix) > - Support for ti,notdpkt > - RFLOW management rework to support data movement without channel: > - the channel is not controlled by Linux but other core and we only have > rflows and rings to do the DMA transfers. > This mode is only supported by the Glue layer for now. > > - UDMAP glue layer > - Debug print improvements > - Support for rflow/ring only data movement > > Changes since v1 > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*) > - Added support for j721e > - Based on 5.3-rc2 > - dropped ti_sci API patch for RM management as it is already upstream > - dropped dmadev_get_slave_channel() patch, using __dma_request_channel() > - Added Rob's Reviewed-by to ringacc DT binding document patch > - DT bindings changes: > - linux,udma-mode is gone, I have a simple lookup table in the driver to flag > TR channels. > - Support for j721e > - Fix bug in of_node_put() handling in xlate function > > Changes since RFC (https://patchwork.kernel.org/cover/10612465/): > - Based on linux-next (20190506) which now have the ti_sci interrupt support > - The series can be applied and the UDMA via DMAengine API will be functional > - Included in the series: ti_sci Resource management API, cppi5 header and > driver for the ring accelerator. > - The DMAengine core patches have been updated as per the review comments for > earlier submittion. > - The DMAengine driver patch is artificially split up to 6 smaller patches > > The k3-udma driver implements the Data Movement Architecture described in > AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and > j721e TRM (http://www.ti.com/lit/pdf/spruil1) > > This DMA architecture is a big departure from 'traditional' architecture where > we had either EDMA or sDMA as system DMA. > > Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2) > or USB (am335x) while other peripherals were serviced by EDMA. > > In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the > SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). > > The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface) > and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and > TR mode (similar to EDMA descriptor). > The data movement is done within a PSI-L fabric, peripherals (including the > UDMA-P) are not addressed by their I/O register as with traditional DMAs but > with their PSI-L thread ID. > > In AM65x/j721e we have two main type of peripherals: > Legacy: McASP, McSPI, UART, etc. > to provide connectivity they are serviced by PDMA (Peripheral DMA) > PDMA threads are locked to service a given peripheral, for example PSI-L thread > 0x4400/0xc400 is to service McASP0 rx/tx. > The PDMa configuration can be done via the UDMA Real Time Peer registers. > Native: Networking, security accelerator > these peripherals have native support for PSI-L. > > To be able to use the DMA the following generic steps need to be taken: > - configure a DMA channel (tchan for TX, rchan for RX) > - channel mode: Packet or TR mode > - for memcpy a tchan and rchan pair is used. > - for packet mode RX we also need to configure a receive flow to configure the > packet receiption > - the source and destination threads must be paired > - at minimum one pair of rings need to be configured: > - tx: transfer ring and transfer completion ring > - rx: free descriptor ring and receive ring > - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring > - If the channel is in packet mode or configured to memcpy then we only need > one interrupt from the ring, events from UDMAP is not used. > > When the channel setup is completed we only interract with the rings: > - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by > the UDMA-P > - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to > the r_ring. > > Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the > case in previous DMAs we need to report the amount of data held in these FIFOs > to clients (delay calculation for ALSA, UART FIFO flush support). > > Metadata support: > DMAengine user driver was posted upstream based/tested on the v1 of the UDMA > series: https://lkml.org/lkml/2019/6/28/20 > SA2UL is using the metadata DMAengine API. > > Note on the last patch: > In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case > anymore and the DMAengine API currently missing support for the features we > would need to support networking, things like > - support for receive descriptor 'classification' > - we need to support several receive queues for a channel. > - the queues are used for packet priority handling for example, but they can be > used to have pools of descriptors for different sizes. > - out of order completion of descriptors on a channel > - when we have several queues to handle different priority packets the > descriptors will be completed 'out-of-order' > - NAPI type of operation (polling instead of interrupt driven transfer) > - without this we can not sustain gigabit speeds and we need to support NAPI > - not to limit this to networking, but other high performance operations > > It is my intention to work on these to be able to remove the 'glue' layer and > switch to DMAengine API - or have an API aside of DMAengine to have generic way > to support networking, but given how controversial and not trivial these changes > are we need something to support networking. > > The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is > available: > https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1 > > Regards, > Peter > --- > Grygorii Strashko (3): > bindings: soc: ti: add documentation for k3 ringacc > soc: ti: k3: add navss ringacc driver > dmaengine: ti: k3-udma: Add glue layer for non DMAengine users > > Peter Ujfalusi (9): > dmaengine: doc: Add sections for per descriptor metadata support > dmaengine: Add metadata_ops for dma_async_tx_descriptor > dmaengine: Add support for reporting DMA cached data amount > dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA > dmaengine: ti: k3 PSI-L remote endpoint configuration > dt-bindings: dma: ti: Add document for K3 UDMA > dmaengine: ti: New driver for K3 UDMA > firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx > channel > dmaengine: ti: k3-udma: Wait for peer teardown completion if supported > > .../devicetree/bindings/dma/ti/k3-udma.yaml | 185 + > .../devicetree/bindings/soc/ti/k3-ringacc.txt | 59 + > Documentation/driver-api/dmaengine/client.rst | 75 + > .../driver-api/dmaengine/provider.rst | 46 + > drivers/dma/dmaengine.c | 73 + > drivers/dma/dmaengine.h | 8 + > drivers/dma/ti/Kconfig | 24 + > drivers/dma/ti/Makefile | 3 + > drivers/dma/ti/k3-psil-am654.c | 175 + > drivers/dma/ti/k3-psil-j721e.c | 222 ++ > drivers/dma/ti/k3-psil-priv.h | 39 + > drivers/dma/ti/k3-psil.c | 97 + > drivers/dma/ti/k3-udma-glue.c | 1198 ++++++ > drivers/dma/ti/k3-udma-private.c | 133 + > drivers/dma/ti/k3-udma.c | 3452 +++++++++++++++++ > drivers/dma/ti/k3-udma.h | 151 + > drivers/firmware/ti_sci.c | 1 + > drivers/firmware/ti_sci.h | 7 + > drivers/soc/ti/Kconfig | 11 + > drivers/soc/ti/Makefile | 1 + > drivers/soc/ti/k3-ringacc.c | 1180 ++++++ > include/linux/dma/k3-psil.h | 71 + > include/linux/dma/k3-udma-glue.h | 134 + > include/linux/dma/ti-cppi5.h | 1061 +++++ > include/linux/dmaengine.h | 110 + > include/linux/soc/ti/k3-ringacc.h | 244 ++ > include/linux/soc/ti/ti_sci_protocol.h | 2 + > 27 files changed, 8762 insertions(+) > create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml > create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt > create mode 100644 drivers/dma/ti/k3-psil-am654.c > create mode 100644 drivers/dma/ti/k3-psil-j721e.c > create mode 100644 drivers/dma/ti/k3-psil-priv.h > create mode 100644 drivers/dma/ti/k3-psil.c > create mode 100644 drivers/dma/ti/k3-udma-glue.c > create mode 100644 drivers/dma/ti/k3-udma-private.c > create mode 100644 drivers/dma/ti/k3-udma.c > create mode 100644 drivers/dma/ti/k3-udma.h > create mode 100644 drivers/soc/ti/k3-ringacc.c > create mode 100644 include/linux/dma/k3-psil.h > create mode 100644 include/linux/dma/k3-udma-glue.h > create mode 100644 include/linux/dma/ti-cppi5.h > create mode 100644 include/linux/soc/ti/k3-ringacc.h > - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 12/12/2019 10:46, Peter Ujfalusi wrote: > > > On 09/12/2019 11.43, Peter Ujfalusi wrote: >> Hi, >> >> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent >> upstream over a month ago: >> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/ >> >> I'm still waiting on it's fate (Tero has given his r-b). >> The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and >> let the maintainers decide if it can go via DMAengine for 5.6 or to later >> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch). >> >> Changes since v6: >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*) >> >> - UDMAP DMAengine driver: >> - Squashed the split patches >> - Squashed the early TX completion handling update >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*) >> - Hard reset fix for RX channels to avoid channel lockdown >> - Correct completed descriptor's residue value > > I got build failure with allmodconfig: > > ERROR: "devm_ti_sci_get_of_resource" [drivers/soc/ti/k3-ringacc.ko] > undefined! > ERROR: "of_msi_get_domain" [drivers/soc/ti/k3-ringacc.ko] undefined! > ERROR: "devm_ti_sci_get_of_resource" [drivers/dma/ti/k3-udma.ko] undefined! > ERROR: "of_msi_get_domain" [drivers/dma/ti/k3-udma.ko] undefined! > > They are because both devm_ti_sci_get_of_resource and of_msi_get_domain > is missing EXPORT_SYMBOL_GPL(), so they can not be used from modules. > > There were patches in the past to add it for of_msi_get_domain: > https://lore.kernel.org/patchwork/patch/668123/ > https://lore.kernel.org/patchwork/patch/716046/ > > I can not find a reason why these are not merged. > Matthias's patch looks to be the earlier one, is it OK if I resend it > within v8? You can just send those two patches separately, I can apply them first before this series. No need to resend this series. -Tero > >> Changes since v5: >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*) >> - Based on 5.4 >> >> - cppi5 header >> - clear the bits before setting new value with '|=' >> >> - UDMAP DT bindings: >> - valid compatibles as single enum list >> >> - UDMAP DMAengine driver: >> - Fix udma_is_chan_running() >> - Use flags for acc32, burst support instead of a bool in udma_match_data >> struct >> - TDTYPE handling (teardown completion handling for j721e) is moved to separate >> patch as the tisci core patch has not moved for over a month. >> Both ti_sci and the iterative patch to udma is included in the series. >> >> Changes since v4 >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*) >> - Based on 5.4-rc7 >> >> - ringacc DT bindings: >> - clarify the meaning of ti,sci-dev-id >> >> - ringacc driver: >> - Remove 'default y' from Kconfig >> - Fix struct comments >> - Move try_module_get() earlier in k3_ringacc_request_ring() >> >> - PSI-L thread database: >> - Add kernel style struct/enum documentation >> - Add missing thread description for sa2ul second interface >> - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL >> >> - UDMAP DT bindings: >> - move to dual license >> - change compatible from const to enum >> - items dropped for ti,sci-rm-ranges-* >> - description text moved from literal block when it is sensible >> - example fixed to compile cleanly >> - added parent to provide correct address-cells >> - navss is moved to simple-mfd from simple-bus >> >> - UDMAP DMAengine driver: >> - move fd_ring/r_ring under rflow >> - get rid of unused iomem for rflows >> - Remove 'default y' from Kconfig >> - Use defines for rflow src/dst tag selection >> - Merge the udma_ring_callback() and udma_tr_event_callback() to their >> corresponding interrupt handler >> - Create new defines for tx/rx channel's tisci valid parameter flags >> - Remove re-initialization to 0 of tisci request struct members >> - Make sure that vchan tasklets are also stopped when removing the module >> - Additional checkpatch --strict fixes when it made sense >> - make W=1 was clean >> >> - UDMAP glue layer: >> - Remove 'default y' from Kconfig >> - commit message update for features needing the glue layer >> >> Changes since v3 >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*): >> - Based on 5.4-rc5 >> - Fixed typos pointed out by Tero >> - Added reviewed-by tags from Tero >> >> - ring accelerator driver >> - TODO_GS is removed from the header >> - pm_runtime removed as NAVSS and it's components are always on >> - Check validity of Message mode setup (element size > 8 bytes must use proxy) >> >> - cppi5 header >> - add commit message >> >> - UDMAP DT bindings >> - Drop the psil-config node use on the remote PSI-L side and use only one cell >> which is the remote threadID: >> >> dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>; >> dma-names = "tx", "rx"; >> >> - The PSI-L thread configuration description is moved to kernel as a new module: >> k3-psil/k3-psil-am654/k3-psil-j721e >> - ti,psil-base has been removed and moved to kernel >> - removed the no longer needed dt-bindings/dma/k3-udma.h >> - Convert the document to schema (yaml) >> >> - NEW PSI-L endpoint configuration database >> - a simple database holding the remote end's configuration needed for UDMAP >> configuration. All previous parameters from DT has been moved here and merged >> with the linux only tr mode channel flag. >> - Client drivers can update the remote endpoint configuration as it can be >> different based on system configuration and the endpoint itself is under the >> control of the peripheral driver. >> - database for am654 and j721e >> >> - UDMAP DMAengine driver >> - pm_runtime removed as NAVSS and it's components are always on >> - rchan_oes_offset added to MSI dommain allocation >> - Use the new PSI-L endpoint database for UDMAP configuration >> - Support for waiting for PDMA teardown completion on j721e instead of >> returning right away. depends on: >> https://lkml.org/lkml/2019/10/25/189 >> Not included in this series, but it is in the branch I have prepared. >> - psil-base is moved from DT to be part of udma_match_data >> - tr_thread maps is removed and using the PSI-L endpoint configuration for it >> >> - UDMAP glue layer >> - pm_runtime removed as NAVSS and it's components are always on >> - Use the new PSI-L endpoint database for UDMAP configuration >> >> Changes since v2 >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*) >> - Based on 5.4-rc1 >> - Support for Flow only data transfer for the glue layer >> >> - cppi5 header >> - comments converted to kernel-doc style >> - Remove the excessive WARN_ONs and rely on the user for sanity >> - new macro for checking TearDown Completion Message >> >> - ring accelerator driver >> - fixed up th commit message (SoB, TI-SCI) >> - fixed ring reset >> - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the dbg_write/read functions >> and use dev_dbg() >> - k3_ringacc_ring_dump() is moved to static >> - step numbering removed from k3_ringacc_ring_reset_dma() >> - Add clarification comment for shared ring usage in k3_ringacc_ring_cfg() >> - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined >> - K3_RINGACC_RING_MODE_QM is removed as it is not supported >> >> - UDMAP DT bindings >> - Fix property prefixing: s/pdma,/ti,pdma- >> - Add ti,notdpkt property to suppress teardown completion message on tchan >> - example updated accordingly >> >> - UDMAP DMAengine driver >> - Change __raw_readl/writel to readl/writel >> - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci >> configuration functions for clarity >> - DT bindings change: s/pdma,/ti,pdma- >> - Cleanup of udma_tx_status(): >> - residue calculation fix for m2m >> - no need to read packet counter as it is not used >> - peer byte counter only available in PDMAs >> - Proper locking to avoid race with interrupt handler (polled m2m fix) >> - Support for ti,notdpkt >> - RFLOW management rework to support data movement without channel: >> - the channel is not controlled by Linux but other core and we only have >> rflows and rings to do the DMA transfers. >> This mode is only supported by the Glue layer for now. >> >> - UDMAP glue layer >> - Debug print improvements >> - Support for rflow/ring only data movement >> >> Changes since v1 >> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*) >> - Added support for j721e >> - Based on 5.3-rc2 >> - dropped ti_sci API patch for RM management as it is already upstream >> - dropped dmadev_get_slave_channel() patch, using __dma_request_channel() >> - Added Rob's Reviewed-by to ringacc DT binding document patch >> - DT bindings changes: >> - linux,udma-mode is gone, I have a simple lookup table in the driver to flag >> TR channels. >> - Support for j721e >> - Fix bug in of_node_put() handling in xlate function >> >> Changes since RFC (https://patchwork.kernel.org/cover/10612465/): >> - Based on linux-next (20190506) which now have the ti_sci interrupt support >> - The series can be applied and the UDMA via DMAengine API will be functional >> - Included in the series: ti_sci Resource management API, cppi5 header and >> driver for the ring accelerator. >> - The DMAengine core patches have been updated as per the review comments for >> earlier submittion. >> - The DMAengine driver patch is artificially split up to 6 smaller patches >> >> The k3-udma driver implements the Data Movement Architecture described in >> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and >> j721e TRM (http://www.ti.com/lit/pdf/spruil1) >> >> This DMA architecture is a big departure from 'traditional' architecture where >> we had either EDMA or sDMA as system DMA. >> >> Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2) >> or USB (am335x) while other peripherals were serviced by EDMA. >> >> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment within the >> SoC, tasked to service all peripherals (UART, McSPI, McASP, networking, etc). >> >> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface) >> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and >> TR mode (similar to EDMA descriptor). >> The data movement is done within a PSI-L fabric, peripherals (including the >> UDMA-P) are not addressed by their I/O register as with traditional DMAs but >> with their PSI-L thread ID. >> >> In AM65x/j721e we have two main type of peripherals: >> Legacy: McASP, McSPI, UART, etc. >> to provide connectivity they are serviced by PDMA (Peripheral DMA) >> PDMA threads are locked to service a given peripheral, for example PSI-L thread >> 0x4400/0xc400 is to service McASP0 rx/tx. >> The PDMa configuration can be done via the UDMA Real Time Peer registers. >> Native: Networking, security accelerator >> these peripherals have native support for PSI-L. >> >> To be able to use the DMA the following generic steps need to be taken: >> - configure a DMA channel (tchan for TX, rchan for RX) >> - channel mode: Packet or TR mode >> - for memcpy a tchan and rchan pair is used. >> - for packet mode RX we also need to configure a receive flow to configure the >> packet receiption >> - the source and destination threads must be paired >> - at minimum one pair of rings need to be configured: >> - tx: transfer ring and transfer completion ring >> - rx: free descriptor ring and receive ring >> - two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring >> - If the channel is in packet mode or configured to memcpy then we only need >> one interrupt from the ring, events from UDMAP is not used. >> >> When the channel setup is completed we only interract with the rings: >> - TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by >> the UDMA-P >> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to >> the r_ring. >> >> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the >> case in previous DMAs we need to report the amount of data held in these FIFOs >> to clients (delay calculation for ALSA, UART FIFO flush support). >> >> Metadata support: >> DMAengine user driver was posted upstream based/tested on the v1 of the UDMA >> series: https://lkml.org/lkml/2019/6/28/20 >> SA2UL is using the metadata DMAengine API. >> >> Note on the last patch: >> In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case >> anymore and the DMAengine API currently missing support for the features we >> would need to support networking, things like >> - support for receive descriptor 'classification' >> - we need to support several receive queues for a channel. >> - the queues are used for packet priority handling for example, but they can be >> used to have pools of descriptors for different sizes. >> - out of order completion of descriptors on a channel >> - when we have several queues to handle different priority packets the >> descriptors will be completed 'out-of-order' >> - NAPI type of operation (polling instead of interrupt driven transfer) >> - without this we can not sustain gigabit speeds and we need to support NAPI >> - not to limit this to networking, but other high performance operations >> >> It is my intention to work on these to be able to remove the 'glue' layer and >> switch to DMAengine API - or have an API aside of DMAengine to have generic way >> to support networking, but given how controversial and not trivial these changes >> are we need something to support networking. >> >> The series (+DT patches to enabled DMA on AM65x and j721e) on top of 5.5-rc1 is >> available: >> https://github.com/omap-audio/linux-audio.git peter/udma/series_v7-5.5-rc1 >> >> Regards, >> Peter >> --- >> Grygorii Strashko (3): >> bindings: soc: ti: add documentation for k3 ringacc >> soc: ti: k3: add navss ringacc driver >> dmaengine: ti: k3-udma: Add glue layer for non DMAengine users >> >> Peter Ujfalusi (9): >> dmaengine: doc: Add sections for per descriptor metadata support >> dmaengine: Add metadata_ops for dma_async_tx_descriptor >> dmaengine: Add support for reporting DMA cached data amount >> dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA >> dmaengine: ti: k3 PSI-L remote endpoint configuration >> dt-bindings: dma: ti: Add document for K3 UDMA >> dmaengine: ti: New driver for K3 UDMA >> firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx >> channel >> dmaengine: ti: k3-udma: Wait for peer teardown completion if supported >> >> .../devicetree/bindings/dma/ti/k3-udma.yaml | 185 + >> .../devicetree/bindings/soc/ti/k3-ringacc.txt | 59 + >> Documentation/driver-api/dmaengine/client.rst | 75 + >> .../driver-api/dmaengine/provider.rst | 46 + >> drivers/dma/dmaengine.c | 73 + >> drivers/dma/dmaengine.h | 8 + >> drivers/dma/ti/Kconfig | 24 + >> drivers/dma/ti/Makefile | 3 + >> drivers/dma/ti/k3-psil-am654.c | 175 + >> drivers/dma/ti/k3-psil-j721e.c | 222 ++ >> drivers/dma/ti/k3-psil-priv.h | 39 + >> drivers/dma/ti/k3-psil.c | 97 + >> drivers/dma/ti/k3-udma-glue.c | 1198 ++++++ >> drivers/dma/ti/k3-udma-private.c | 133 + >> drivers/dma/ti/k3-udma.c | 3452 +++++++++++++++++ >> drivers/dma/ti/k3-udma.h | 151 + >> drivers/firmware/ti_sci.c | 1 + >> drivers/firmware/ti_sci.h | 7 + >> drivers/soc/ti/Kconfig | 11 + >> drivers/soc/ti/Makefile | 1 + >> drivers/soc/ti/k3-ringacc.c | 1180 ++++++ >> include/linux/dma/k3-psil.h | 71 + >> include/linux/dma/k3-udma-glue.h | 134 + >> include/linux/dma/ti-cppi5.h | 1061 +++++ >> include/linux/dmaengine.h | 110 + >> include/linux/soc/ti/k3-ringacc.h | 244 ++ >> include/linux/soc/ti/ti_sci_protocol.h | 2 + >> 27 files changed, 8762 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml >> create mode 100644 Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt >> create mode 100644 drivers/dma/ti/k3-psil-am654.c >> create mode 100644 drivers/dma/ti/k3-psil-j721e.c >> create mode 100644 drivers/dma/ti/k3-psil-priv.h >> create mode 100644 drivers/dma/ti/k3-psil.c >> create mode 100644 drivers/dma/ti/k3-udma-glue.c >> create mode 100644 drivers/dma/ti/k3-udma-private.c >> create mode 100644 drivers/dma/ti/k3-udma.c >> create mode 100644 drivers/dma/ti/k3-udma.h >> create mode 100644 drivers/soc/ti/k3-ringacc.c >> create mode 100644 include/linux/dma/k3-psil.h >> create mode 100644 include/linux/dma/k3-udma-glue.h >> create mode 100644 include/linux/dma/ti-cppi5.h >> create mode 100644 include/linux/soc/ti/k3-ringacc.h >> > > - Péter > > -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 12/12/2019 12:55, Tero Kristo wrote: > On 12/12/2019 10:46, Peter Ujfalusi wrote: >> >> >> On 09/12/2019 11.43, Peter Ujfalusi wrote: >>> Hi, >>> >>> Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent >>> upstream over a month ago: >>> https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/ >>> >>> >>> I'm still waiting on it's fate (Tero has given his r-b). >>> The ti_sci patch did not made it to 5.5-rc1, but I included it in the >>> series and >>> let the maintainers decide if it can go via DMAengine for 5.6 or to >>> later >>> releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver >>> patch). >>> >>> Changes since v6: >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*) >>> >>> >>> - UDMAP DMAengine driver: >>> - Squashed the split patches >>> - Squashed the early TX completion handling update >>> >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*) >>> >>> - Hard reset fix for RX channels to avoid channel lockdown >>> - Correct completed descriptor's residue value >> >> I got build failure with allmodconfig: >> >> ERROR: "devm_ti_sci_get_of_resource" [drivers/soc/ti/k3-ringacc.ko] >> undefined! >> ERROR: "of_msi_get_domain" [drivers/soc/ti/k3-ringacc.ko] undefined! >> ERROR: "devm_ti_sci_get_of_resource" [drivers/dma/ti/k3-udma.ko] >> undefined! >> ERROR: "of_msi_get_domain" [drivers/dma/ti/k3-udma.ko] undefined! >> >> They are because both devm_ti_sci_get_of_resource and of_msi_get_domain >> is missing EXPORT_SYMBOL_GPL(), so they can not be used from modules. >> >> There were patches in the past to add it for of_msi_get_domain: >> https://lore.kernel.org/patchwork/patch/668123/ >> https://lore.kernel.org/patchwork/patch/716046/ >> >> I can not find a reason why these are not merged. >> Matthias's patch looks to be the earlier one, is it OK if I resend it >> within v8? > > You can just send those two patches separately, I can apply them first > before this series. No need to resend this series. Oops, sorry about the noise, I got confused with the internal mailing list and this one (trying to get it merged internally at the same time.) Ignore my comment. -Tero > >> >>> Changes since v5: >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=201051&state=*) >>> >>> - Based on 5.4 >>> >>> - cppi5 header >>> - clear the bits before setting new value with '|=' >>> >>> - UDMAP DT bindings: >>> - valid compatibles as single enum list >>> >>> - UDMAP DMAengine driver: >>> - Fix udma_is_chan_running() >>> - Use flags for acc32, burst support instead of a bool in >>> udma_match_data >>> struct >>> - TDTYPE handling (teardown completion handling for j721e) is moved >>> to separate >>> patch as the tisci core patch has not moved for over a month. >>> Both ti_sci and the iterative patch to udma is included in the >>> series. >>> >>> Changes since v4 >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=196619&state=*) >>> >>> - Based on 5.4-rc7 >>> >>> - ringacc DT bindings: >>> - clarify the meaning of ti,sci-dev-id >>> >>> - ringacc driver: >>> - Remove 'default y' from Kconfig >>> - Fix struct comments >>> - Move try_module_get() earlier in k3_ringacc_request_ring() >>> >>> - PSI-L thread database: >>> - Add kernel style struct/enum documentation >>> - Add missing thread description for sa2ul second interface >>> - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL >>> >>> - UDMAP DT bindings: >>> - move to dual license >>> - change compatible from const to enum >>> - items dropped for ti,sci-rm-ranges-* >>> - description text moved from literal block when it is sensible >>> - example fixed to compile cleanly >>> - added parent to provide correct address-cells >>> - navss is moved to simple-mfd from simple-bus >>> >>> - UDMAP DMAengine driver: >>> - move fd_ring/r_ring under rflow >>> - get rid of unused iomem for rflows >>> - Remove 'default y' from Kconfig >>> - Use defines for rflow src/dst tag selection >>> - Merge the udma_ring_callback() and udma_tr_event_callback() to their >>> corresponding interrupt handler >>> - Create new defines for tx/rx channel's tisci valid parameter flags >>> - Remove re-initialization to 0 of tisci request struct members >>> - Make sure that vchan tasklets are also stopped when removing the >>> module >>> - Additional checkpatch --strict fixes when it made sense >>> - make W=1 was clean >>> >>> - UDMAP glue layer: >>> - Remove 'default y' from Kconfig >>> - commit message update for features needing the glue layer >>> >>> Changes since v3 >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=180679&state=*): >>> >>> - Based on 5.4-rc5 >>> - Fixed typos pointed out by Tero >>> - Added reviewed-by tags from Tero >>> >>> - ring accelerator driver >>> - TODO_GS is removed from the header >>> - pm_runtime removed as NAVSS and it's components are always on >>> - Check validity of Message mode setup (element size > 8 bytes must >>> use proxy) >>> >>> - cppi5 header >>> - add commit message >>> >>> - UDMAP DT bindings >>> - Drop the psil-config node use on the remote PSI-L side and use >>> only one cell >>> which is the remote threadID: >>> >>> dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>; >>> dma-names = "tx", "rx"; >>> >>> - The PSI-L thread configuration description is moved to kernel as >>> a new module: >>> k3-psil/k3-psil-am654/k3-psil-j721e >>> - ti,psil-base has been removed and moved to kernel >>> - removed the no longer needed dt-bindings/dma/k3-udma.h >>> - Convert the document to schema (yaml) >>> >>> - NEW PSI-L endpoint configuration database >>> - a simple database holding the remote end's configuration needed >>> for UDMAP >>> configuration. All previous parameters from DT has been moved >>> here and merged >>> with the linux only tr mode channel flag. >>> - Client drivers can update the remote endpoint configuration as it >>> can be >>> different based on system configuration and the endpoint itself >>> is under the >>> control of the peripheral driver. >>> - database for am654 and j721e >>> >>> - UDMAP DMAengine driver >>> - pm_runtime removed as NAVSS and it's components are always on >>> - rchan_oes_offset added to MSI dommain allocation >>> - Use the new PSI-L endpoint database for UDMAP configuration >>> - Support for waiting for PDMA teardown completion on j721e instead of >>> returning right away. depends on: >>> https://lkml.org/lkml/2019/10/25/189 >>> Not included in this series, but it is in the branch I have >>> prepared. >>> - psil-base is moved from DT to be part of udma_match_data >>> - tr_thread maps is removed and using the PSI-L endpoint >>> configuration for it >>> >>> - UDMAP glue layer >>> - pm_runtime removed as NAVSS and it's components are always on >>> - Use the new PSI-L endpoint database for UDMAP configuration >>> >>> Changes since v2 >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=152609&state=*) >>> >>> - Based on 5.4-rc1 >>> - Support for Flow only data transfer for the glue layer >>> >>> - cppi5 header >>> - comments converted to kernel-doc style >>> - Remove the excessive WARN_ONs and rely on the user for sanity >>> - new macro for checking TearDown Completion Message >>> >>> - ring accelerator driver >>> - fixed up th commit message (SoB, TI-SCI) >>> - fixed ring reset >>> - CONFIG_TI_K3_RINGACC_DEBUG is removed along with the >>> dbg_write/read functions >>> and use dev_dbg() >>> - k3_ringacc_ring_dump() is moved to static >>> - step numbering removed from k3_ringacc_ring_reset_dma() >>> - Add clarification comment for shared ring usage in >>> k3_ringacc_ring_cfg() >>> - Magic shift values in k3_ringacc_ring_cfg_proxy() got defined >>> - K3_RINGACC_RING_MODE_QM is removed as it is not supported >>> >>> - UDMAP DT bindings >>> - Fix property prefixing: s/pdma,/ti,pdma- >>> - Add ti,notdpkt property to suppress teardown completion message >>> on tchan >>> - example updated accordingly >>> >>> - UDMAP DMAengine driver >>> - Change __raw_readl/writel to readl/writel >>> - Split up the udma_tisci_channel_config() into m2m, tx and rx tisci >>> configuration functions for clarity >>> - DT bindings change: s/pdma,/ti,pdma- >>> - Cleanup of udma_tx_status(): >>> - residue calculation fix for m2m >>> - no need to read packet counter as it is not used >>> - peer byte counter only available in PDMAs >>> - Proper locking to avoid race with interrupt handler (polled m2m >>> fix) >>> - Support for ti,notdpkt >>> - RFLOW management rework to support data movement without channel: >>> - the channel is not controlled by Linux but other core and we >>> only have >>> rflows and rings to do the DMA transfers. >>> This mode is only supported by the Glue layer for now. >>> >>> - UDMAP glue layer >>> - Debug print improvements >>> - Support for rflow/ring only data movement >>> >>> Changes since v1 >>> (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=114105&state=*) >>> >>> - Added support for j721e >>> - Based on 5.3-rc2 >>> - dropped ti_sci API patch for RM management as it is already upstream >>> - dropped dmadev_get_slave_channel() patch, using >>> __dma_request_channel() >>> - Added Rob's Reviewed-by to ringacc DT binding document patch >>> - DT bindings changes: >>> - linux,udma-mode is gone, I have a simple lookup table in the >>> driver to flag >>> TR channels. >>> - Support for j721e >>> - Fix bug in of_node_put() handling in xlate function >>> >>> Changes since RFC (https://patchwork.kernel.org/cover/10612465/): >>> - Based on linux-next (20190506) which now have the ti_sci interrupt >>> support >>> - The series can be applied and the UDMA via DMAengine API will be >>> functional >>> - Included in the series: ti_sci Resource management API, cppi5 >>> header and >>> driver for the ring accelerator. >>> - The DMAengine core patches have been updated as per the review >>> comments for >>> earlier submittion. >>> - The DMAengine driver patch is artificially split up to 6 smaller >>> patches >>> >>> The k3-udma driver implements the Data Movement Architecture >>> described in >>> AM65x TRM (http://www.ti.com/lit/pdf/spruid7) and >>> j721e TRM (http://www.ti.com/lit/pdf/spruil1) >>> >>> This DMA architecture is a big departure from 'traditional' >>> architecture where >>> we had either EDMA or sDMA as system DMA. >>> >>> Packet DMAs were used as dedicated DMAs to service only networking >>> (Kesytone2) >>> or USB (am335x) while other peripherals were serviced by EDMA. >>> >>> In AM65x/j721e the UDMA (Unified DMA) is used for all data movment >>> within the >>> SoC, tasked to service all peripherals (UART, McSPI, McASP, >>> networking, etc). >>> >>> The NAVSS/UDMA is built around CPPI5 (Communications Port Programming >>> Interface) >>> and it supports Packet mode (similar to CPPI4.1 in Keystone2 for >>> networking) and >>> TR mode (similar to EDMA descriptor). >>> The data movement is done within a PSI-L fabric, peripherals >>> (including the >>> UDMA-P) are not addressed by their I/O register as with traditional >>> DMAs but >>> with their PSI-L thread ID. >>> >>> In AM65x/j721e we have two main type of peripherals: >>> Legacy: McASP, McSPI, UART, etc. >>> to provide connectivity they are serviced by PDMA (Peripheral DMA) >>> PDMA threads are locked to service a given peripheral, for example >>> PSI-L thread >>> 0x4400/0xc400 is to service McASP0 rx/tx. >>> The PDMa configuration can be done via the UDMA Real Time Peer >>> registers. >>> Native: Networking, security accelerator >>> these peripherals have native support for PSI-L. >>> >>> To be able to use the DMA the following generic steps need to be taken: >>> - configure a DMA channel (tchan for TX, rchan for RX) >>> - channel mode: Packet or TR mode >>> - for memcpy a tchan and rchan pair is used. >>> - for packet mode RX we also need to configure a receive flow to >>> configure the >>> packet receiption >>> - the source and destination threads must be paired >>> - at minimum one pair of rings need to be configured: >>> - tx: transfer ring and transfer completion ring >>> - rx: free descriptor ring and receive ring >>> - two interrupts: UDMA-P channel interrupt and ring interrupt for >>> tc_ring/r_ring >>> - If the channel is in packet mode or configured to memcpy then we >>> only need >>> one interrupt from the ring, events from UDMAP is not used. >>> >>> When the channel setup is completed we only interract with the rings: >>> - TX: push a descriptor to t_ring and wait for it to be pushed to the >>> tc_ring by >>> the UDMA-P >>> - RX: push a descriptor to the fd_ring and waith for UDMA-P to push >>> it back to >>> the r_ring. >>> >>> Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which >>> was not the >>> case in previous DMAs we need to report the amount of data held in >>> these FIFOs >>> to clients (delay calculation for ALSA, UART FIFO flush support). >>> >>> Metadata support: >>> DMAengine user driver was posted upstream based/tested on the v1 of >>> the UDMA >>> series: https://lkml.org/lkml/2019/6/28/20 >>> SA2UL is using the metadata DMAengine API. >>> >>> Note on the last patch: >>> In Keystone2 the networking had dedicated DMA (packet DMA) which is >>> not the case >>> anymore and the DMAengine API currently missing support for the >>> features we >>> would need to support networking, things like >>> - support for receive descriptor 'classification' >>> - we need to support several receive queues for a channel. >>> - the queues are used for packet priority handling for example, but >>> they can be >>> used to have pools of descriptors for different sizes. >>> - out of order completion of descriptors on a channel >>> - when we have several queues to handle different priority packets the >>> descriptors will be completed 'out-of-order' >>> - NAPI type of operation (polling instead of interrupt driven transfer) >>> - without this we can not sustain gigabit speeds and we need to >>> support NAPI >>> - not to limit this to networking, but other high performance >>> operations >>> >>> It is my intention to work on these to be able to remove the 'glue' >>> layer and >>> switch to DMAengine API - or have an API aside of DMAengine to have >>> generic way >>> to support networking, but given how controversial and not trivial >>> these changes >>> are we need something to support networking. >>> >>> The series (+DT patches to enabled DMA on AM65x and j721e) on top of >>> 5.5-rc1 is >>> available: >>> https://github.com/omap-audio/linux-audio.git >>> peter/udma/series_v7-5.5-rc1 >>> >>> Regards, >>> Peter >>> --- >>> Grygorii Strashko (3): >>> bindings: soc: ti: add documentation for k3 ringacc >>> soc: ti: k3: add navss ringacc driver >>> dmaengine: ti: k3-udma: Add glue layer for non DMAengine users >>> >>> Peter Ujfalusi (9): >>> dmaengine: doc: Add sections for per descriptor metadata support >>> dmaengine: Add metadata_ops for dma_async_tx_descriptor >>> dmaengine: Add support for reporting DMA cached data amount >>> dmaengine: ti: Add cppi5 header for K3 NAVSS/UDMA >>> dmaengine: ti: k3 PSI-L remote endpoint configuration >>> dt-bindings: dma: ti: Add document for K3 UDMA >>> dmaengine: ti: New driver for K3 UDMA >>> firmware: ti_sci: rm: Add support for tx_tdtype parameter for tx >>> channel >>> dmaengine: ti: k3-udma: Wait for peer teardown completion if >>> supported >>> >>> .../devicetree/bindings/dma/ti/k3-udma.yaml | 185 + >>> .../devicetree/bindings/soc/ti/k3-ringacc.txt | 59 + >>> Documentation/driver-api/dmaengine/client.rst | 75 + >>> .../driver-api/dmaengine/provider.rst | 46 + >>> drivers/dma/dmaengine.c | 73 + >>> drivers/dma/dmaengine.h | 8 + >>> drivers/dma/ti/Kconfig | 24 + >>> drivers/dma/ti/Makefile | 3 + >>> drivers/dma/ti/k3-psil-am654.c | 175 + >>> drivers/dma/ti/k3-psil-j721e.c | 222 ++ >>> drivers/dma/ti/k3-psil-priv.h | 39 + >>> drivers/dma/ti/k3-psil.c | 97 + >>> drivers/dma/ti/k3-udma-glue.c | 1198 ++++++ >>> drivers/dma/ti/k3-udma-private.c | 133 + >>> drivers/dma/ti/k3-udma.c | 3452 +++++++++++++++++ >>> drivers/dma/ti/k3-udma.h | 151 + >>> drivers/firmware/ti_sci.c | 1 + >>> drivers/firmware/ti_sci.h | 7 + >>> drivers/soc/ti/Kconfig | 11 + >>> drivers/soc/ti/Makefile | 1 + >>> drivers/soc/ti/k3-ringacc.c | 1180 ++++++ >>> include/linux/dma/k3-psil.h | 71 + >>> include/linux/dma/k3-udma-glue.h | 134 + >>> include/linux/dma/ti-cppi5.h | 1061 +++++ >>> include/linux/dmaengine.h | 110 + >>> include/linux/soc/ti/k3-ringacc.h | 244 ++ >>> include/linux/soc/ti/ti_sci_protocol.h | 2 + >>> 27 files changed, 8762 insertions(+) >>> create mode 100644 >>> Documentation/devicetree/bindings/dma/ti/k3-udma.yaml >>> create mode 100644 >>> Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt >>> create mode 100644 drivers/dma/ti/k3-psil-am654.c >>> create mode 100644 drivers/dma/ti/k3-psil-j721e.c >>> create mode 100644 drivers/dma/ti/k3-psil-priv.h >>> create mode 100644 drivers/dma/ti/k3-psil.c >>> create mode 100644 drivers/dma/ti/k3-udma-glue.c >>> create mode 100644 drivers/dma/ti/k3-udma-private.c >>> create mode 100644 drivers/dma/ti/k3-udma.c >>> create mode 100644 drivers/dma/ti/k3-udma.h >>> create mode 100644 drivers/soc/ti/k3-ringacc.c >>> create mode 100644 include/linux/dma/k3-psil.h >>> create mode 100644 include/linux/dma/k3-udma-glue.h >>> create mode 100644 include/linux/dma/ti-cppi5.h >>> create mode 100644 include/linux/soc/ti/k3-ringacc.h >>> >> >> - Péter >> >> > -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 09/12/2019 11:43, Peter Ujfalusi wrote: > Hi, > > Vinod, Nishanth, Tero, Santosh: the ti_sci patch in this series was sent > upstream over a month ago: > https://lore.kernel.org/lkml/20191025084715.25098-1-peter.ujfalusi@ti.com/ > > I'm still waiting on it's fate (Tero has given his r-b). > The ti_sci patch did not made it to 5.5-rc1, but I included it in the series and > let the maintainers decide if it can go via DMAengine for 5.6 or to later > releases (5.6 probably for the ti_sci and 5.7 for the UDMA driver patch). > > Changes since v6: > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=209455&state=*) > > - UDMAP DMAengine driver: > - Squashed the split patches > - Squashed the early TX completion handling update > (https://patchwork.kernel.org/project/linux-dmaengine/list/?series=210713&state=*) > - Hard reset fix for RX channels to avoid channel lockdown > - Correct completed descriptor's residue value > Thank you Peter, Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> -- Best regards, grygorii
Hi Peter, On 09-12-19, 11:43, Peter Ujfalusi wrote: > + Optional: per descriptor metadata > + --------------------------------- > + DMAengine provides two ways for metadata support. > + > + DESC_METADATA_CLIENT > + > + The metadata buffer is allocated/provided by the client driver and it is > + attached to the descriptor. > + > + .. code-block:: c > + > + int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc, > + void *data, size_t len); > + > + DESC_METADATA_ENGINE > + > + The metadata buffer is allocated/managed by the DMA driver. The client and when would it be freed? > + driver can ask for the pointer, maximum size and the currently used size of > + the metadata and can directly update or read it. > + > + .. code-block:: c > + > + void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc, > + size_t *payload_len, size_t *max_len); > + > + int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc, > + size_t payload_len); > + > + Client drivers can query if a given mode is supported with: > + > + .. code-block:: c > + > + bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan, > + enum dma_desc_metadata_mode mode); > + > + Depending on the used mode client drivers must follow different flow. > + > + DESC_METADATA_CLIENT > + > + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM: > + 1. prepare the descriptor (dmaengine_prep_*) > + construct the metadata in the client's buffer > + 2. use dmaengine_desc_attach_metadata() to attach the buffer to the > + descriptor > + 3. submit the transfer This is simpler, txn finished the metadata would be freed up right? > + - DMA_DEV_TO_MEM: > + 1. prepare the descriptor (dmaengine_prep_*) > + 2. use dmaengine_desc_attach_metadata() to attach the buffer to the > + descriptor > + 3. submit the transfer > + 4. when the transfer is completed, the metadata should be available in the > + attached buffer and when and how would driver free that up :) > + > + DESC_METADATA_ENGINE > + > + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM: > + 1. prepare the descriptor (dmaengine_prep_*) > + 2. use dmaengine_desc_get_metadata_ptr() to get the pointer to the > + engine's metadata area > + 3. update the metadata at the pointer > + 4. use dmaengine_desc_set_metadata_len() to tell the DMA engine the > + amount of data the client has placed into the metadata buffer > + 5. submit the transfer > + - DMA_DEV_TO_MEM: > + 1. prepare the descriptor (dmaengine_prep_*) > + 2. submit the transfer > + 3. on transfer completion, use dmaengine_desc_get_metadata_ptr() to get the > + pointer to the engine's metadata area > + 4. Read out the metadata from the pointer > + > + .. note:: > + > + Mixed use of DESC_METADATA_CLIENT / DESC_METADATA_ENGINE is not allowed, > + client drivers must use either of the modes per descriptor. We should check that if not done already! -- ~Vinod
Hi Peter, On 09-12-19, 11:43, Peter Ujfalusi wrote: > +int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc, > + void *data, size_t len) > +{ > + int ret; > + > + if (!desc) > + return -EINVAL; > + > + ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_CLIENT); > + if (ret) > + return ret; > + > + if (!desc->metadata_ops || !desc->metadata_ops->attach) > + return -ENOTSUPP; > + > + return desc->metadata_ops->attach(desc, data, len); this looks good to me, only thing is we should check if people are mixing the modes :) -- ~Vinod
On 09-12-19, 11:43, Peter Ujfalusi wrote: > A DMA hardware can have big cache or FIFO and the amount of data sitting in > the DMA fabric can be an interest for the clients. > > For example in audio we want to know the delay in the data flow and in case > the DMA have significantly large FIFO/cache, it can affect the latenc/delay > > Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> > Reviewed-by: Tero Kristo <t-kristo@ti.com> > --- > drivers/dma/dmaengine.h | 8 ++++++++ > include/linux/dmaengine.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/drivers/dma/dmaengine.h b/drivers/dma/dmaengine.h > index 501c0b063f85..b0b97475707a 100644 > --- a/drivers/dma/dmaengine.h > +++ b/drivers/dma/dmaengine.h > @@ -77,6 +77,7 @@ static inline enum dma_status dma_cookie_status(struct dma_chan *chan, > state->last = complete; > state->used = used; > state->residue = 0; > + state->in_flight_bytes = 0; > } > return dma_async_is_complete(cookie, complete, used); > } > @@ -87,6 +88,13 @@ static inline void dma_set_residue(struct dma_tx_state *state, u32 residue) > state->residue = residue; > } > > +static inline void dma_set_in_flight_bytes(struct dma_tx_state *state, > + u32 in_flight_bytes) > +{ > + if (state) > + state->in_flight_bytes = in_flight_bytes; > +} This would be used by dmaengine drivers right, so lets move it to drivers/dma/dmaengine.h lets not expose this to users :) -- ~Vinod
Hi Vinod, On 20/12/2019 10.32, Vinod Koul wrote: > Hi Peter, > > On 09-12-19, 11:43, Peter Ujfalusi wrote: > >> +int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc, >> + void *data, size_t len) >> +{ >> + int ret; >> + >> + if (!desc) >> + return -EINVAL; >> + >> + ret = desc_check_and_set_metadata_mode(desc, DESC_METADATA_CLIENT); >> + if (ret) >> + return ret; >> + >> + if (!desc->metadata_ops || !desc->metadata_ops->attach) >> + return -ENOTSUPP; >> + >> + return desc->metadata_ops->attach(desc, data, len); > > this looks good to me, only thing is we should check if people are > mixing the modes :) desc_check_and_set_metadata_mode() does the checking to make sure that the modes are not mixed. - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 20/12/2019 11.57, Vinod Koul wrote: > On 20-12-19, 10:49, Peter Ujfalusi wrote: > >>>> +static inline void dma_set_in_flight_bytes(struct dma_tx_state *state, >>>> + u32 in_flight_bytes) >>>> +{ >>>> + if (state) >>>> + state->in_flight_bytes = in_flight_bytes; >>>> +} >>> >>> This would be used by dmaengine drivers right, so lets move it to drivers/dma/dmaengine.h >>> >>> lets not expose this to users :) >> >> I have put it where the dma_set_residue() was. >> I can add a patch first to move dma_set_residue() then add > > not sure I follow, but dma_set_residue() in already in drivers/dma/dmaengine.h and this patch adds the dma_set_in_flight_bytes() to drivers/dma/dmaengine.h in include/linux/dmaengine.h the dma_tx_state struct is updated only. - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Hi Vinod, On 20/12/2019 11.54, Vinod Koul wrote: > On 09-12-19, 11:43, Peter Ujfalusi wrote: > >> +#define CPPI5_INFO2_DESC_RETPUSHPOLICY BIT(16) >> +#define CPPI5_INFO2_DESC_RETP_MASK GENMASK(18, 16) >> + >> +#define CPPI5_INFO2_DESC_RETQ_SHIFT (0) >> +#define CPPI5_INFO2_DESC_RETQ_MASK GENMASK(15, 0) >> + >> +#define CPPI5_INFO3_DESC_SRCTAG_SHIFT (16U) >> +#define CPPI5_INFO3_DESC_SRCTAG_MASK GENMASK(31, 16) >> +#define CPPI5_INFO3_DESC_DSTTAG_SHIFT (0) >> +#define CPPI5_INFO3_DESC_DSTTAG_MASK GENMASK(15, 0) >> + >> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_SHIFT (0) >> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_MASK GENMASK(27, 0) >> + >> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_SHIFT (0) >> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK GENMASK(27, 0) > > I think you can remove the SHIFT defines and use ffs() to get the bit > position for shift Right. I'll convert to use ffs() > >> +static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size, >> + u32 sw_data_size) >> +{ >> + u32 desc_size; >> + >> + if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE) >> + return 0; >> + >> + desc_size = sizeof(struct cppi5_host_desc_t) + psdata_size + >> + sw_data_size; > > I think there was an API for this kind of mem allocation of struct and > buffer attached... The returned size is not only used when allocating memory or setting up the dma_pool, but for UDMAP's fetch size parameter. >> +static inline void cppi5_hdesc_reset_hbdesc(struct cppi5_host_desc_t *desc) >> +{ >> + desc->hdr = (struct cppi5_desc_hdr_t) { 0 }; >> + desc->next_desc = 0; > > would this not be superfluous? Or if you want a memset call? The intention is to reset the header and the next descriptor link but leave the backing buffer information intact. This allows the reuse of a descriptor+buffer and we only need to set the header bits + next descriptor pointer if any. >> +static inline u32 *cppi5_hdesc_get_psdata32(struct cppi5_host_desc_t *desc) >> +{ >> + return (u32 *)cppi5_hdesc_get_psdata(desc); > > you dont need casts away from void * Hrm, or just remove this, clients can use the cppi5_hdesc_get_psdata() directly. - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 09-12-19, 11:43, Peter Ujfalusi wrote: > New binding document for > Texas Instruments K3 NAVSS Unified DMA – Peripheral Root Complex (UDMA-P). > > UDMA-P is introduced as part of the K3 architecture and can be found in > AM654 and j721e. > > Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> > Reviewed-by: Rob Herring <robh@kernel.org> > --- > .../devicetree/bindings/dma/ti/k3-udma.yaml | 185 ++++++++++++++++++ > 1 file changed, 185 insertions(+) > create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.yaml > > diff --git a/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml b/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml > new file mode 100644 > index 000000000000..77aef4a4abce > --- /dev/null > +++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml > @@ -0,0 +1,185 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/dma/ti/k3-udma.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Texas Instruments K3 NAVSS Unified DMA Device Tree Bindings > + > +maintainers: > + - Peter Ujfalusi <peter.ujfalusi@ti.com> > + > +description: | > + The UDMA-P is intended to perform similar (but significantly upgraded) > + functions as the packet-oriented DMA used on previous SoC devices. The UDMA-P > + module supports the transmission and reception of various packet types. > + The UDMA-P is architected to facilitate the segmentation and reassembly of How about: The UDMA-P architecture facilitates the segmentation... > + SoC DMA data structure compliant packets to/from smaller data blocks that are > + natively compatible with the specific requirements of each connected > + peripheral. > + Multiple Tx and Rx channels are provided within the DMA which allow multiple > + segmentation or reassembly operations to be ongoing. The DMA controller > + maintains state information for each of the channels which allows packet > + segmentation and reassembly operations to be time division multiplexed between > + channels in order to share the underlying DMA hardware. An external DMA > + scheduler is used to control the ordering and rate at which this multiplexing > + occurs for Transmit operations. The ordering and rate of Receive operations > + is indirectly controlled by the order in which blocks are pushed into the DMA > + on the Rx PSI-L interface. > + > + The UDMA-P also supports acting as both a UTC and UDMA-C for its internal > + channels. Channels in the UDMA-P can be configured to be either Packet-Based > + or Third-Party channels on a channel by channel basis. > + > + All transfers within NAVSS is done between PSI-L source and destination > + threads. > + The peripherals serviced by UDMA can be PSI-L native (sa2ul, cpsw, etc) or > + legacy, non PSI-L native peripherals. In the later case a special, small PDMA > + is tasked to act as a bridge between the PSI-L fabric and the legacy > + peripheral. > + > + PDMAs can be configured via UDMAP peer registers to match with the > + configuration of the legacy peripheral. > + > +allOf: > + - $ref: "../dma-controller.yaml#" > + > +properties: > + "#dma-cells": > + const: 1 > + description: | > + The cell is the PSI-L thread ID of the remote (to UDMAP) end. > + Valid ranges for thread ID depends on the data movement direction: > + for source thread IDs (rx): 0 - 0x7fff > + for destination thread IDs (tx): 0x8000 - 0xffff > + > + PLease refer to the device documentation for the PSI-L thread map and also s/PLease/Please -- ~Vinod
Hi Vinod, On 20/12/2019 12.42, Peter Ujfalusi wrote: > Hi Vinod, > > On 20/12/2019 11.54, Vinod Koul wrote: >> On 09-12-19, 11:43, Peter Ujfalusi wrote: >> >>> +#define CPPI5_INFO2_DESC_RETPUSHPOLICY BIT(16) >>> +#define CPPI5_INFO2_DESC_RETP_MASK GENMASK(18, 16) >>> + >>> +#define CPPI5_INFO2_DESC_RETQ_SHIFT (0) >>> +#define CPPI5_INFO2_DESC_RETQ_MASK GENMASK(15, 0) >>> + >>> +#define CPPI5_INFO3_DESC_SRCTAG_SHIFT (16U) >>> +#define CPPI5_INFO3_DESC_SRCTAG_MASK GENMASK(31, 16) >>> +#define CPPI5_INFO3_DESC_DSTTAG_SHIFT (0) >>> +#define CPPI5_INFO3_DESC_DSTTAG_MASK GENMASK(15, 0) >>> + >>> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_SHIFT (0) >>> +#define CPPI5_BUFINFO1_HDESC_DATA_LEN_MASK GENMASK(27, 0) >>> + >>> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_SHIFT (0) >>> +#define CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK GENMASK(27, 0) >> >> I think you can remove the SHIFT defines and use ffs() to get the bit >> position for shift > > Right. I'll convert to use ffs() I rather keep the defines. While ffs() is simple, it is going to have effect in speeds gigabit or beyond. >>> +static inline u32 cppi5_hdesc_calc_size(bool epib, u32 psdata_size, >>> + u32 sw_data_size) >>> +{ >>> + u32 desc_size; >>> + >>> + if (psdata_size > CPPI5_INFO0_HDESC_PSDATA_MAX_SIZE) >>> + return 0; >>> + >>> + desc_size = sizeof(struct cppi5_host_desc_t) + psdata_size + >>> + sw_data_size; >> >> I think there was an API for this kind of mem allocation of struct and >> buffer attached... > > The returned size is not only used when allocating memory or setting up > the dma_pool, but for UDMAP's fetch size parameter. > >>> +static inline void cppi5_hdesc_reset_hbdesc(struct cppi5_host_desc_t *desc) >>> +{ >>> + desc->hdr = (struct cppi5_desc_hdr_t) { 0 }; >>> + desc->next_desc = 0; >> >> would this not be superfluous? Or if you want a memset call? > > The intention is to reset the header and the next descriptor link but > leave the backing buffer information intact. This allows the reuse of a > descriptor+buffer and we only need to set the header bits + next > descriptor pointer if any. > >>> +static inline u32 *cppi5_hdesc_get_psdata32(struct cppi5_host_desc_t *desc) >>> +{ >>> + return (u32 *)cppi5_hdesc_get_psdata(desc); >> >> you dont need casts away from void * > > Hrm, or just remove this, clients can use the cppi5_hdesc_get_psdata() > directly. > > > - Péter > > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. > Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki > - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 09-12-19, 11:43, Peter Ujfalusi wrote: > +#include <linux/kernel.h> > +#include <linux/dmaengine.h> > +#include <linux/dma-mapping.h> > +#include <linux/dmapool.h> > +#include <linux/err.h> > +#include <linux/init.h> > +#include <linux/interrupt.h> > +#include <linux/list.h> > +#include <linux/module.h> > +#include <linux/platform_device.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > +#include <linux/of.h> > +#include <linux/of_dma.h> > +#include <linux/of_device.h> > +#include <linux/of_irq.h> to many of headers, do we need all! > +static char *udma_get_dir_text(enum dma_transfer_direction dir) > +{ > + switch (dir) { > + case DMA_DEV_TO_MEM: > + return "DEV_TO_MEM"; > + case DMA_MEM_TO_DEV: > + return "MEM_TO_DEV"; > + case DMA_MEM_TO_MEM: > + return "MEM_TO_MEM"; > + case DMA_DEV_TO_DEV: > + return "DEV_TO_DEV"; > + default: > + break; > + } > + > + return "invalid"; > +} this seems generic which other ppl may need, can we move it to core. > + > +static void udma_reset_uchan(struct udma_chan *uc) > +{ > + uc->state = UDMA_CHAN_IS_IDLE; > + uc->remote_thread_id = -1; > + uc->dir = DMA_MEM_TO_MEM; > + uc->pkt_mode = false; > + uc->ep_type = PSIL_EP_NATIVE; > + uc->enable_acc32 = 0; > + uc->enable_burst = 0; > + uc->channel_tpl = 0; > + uc->psd_size = 0; > + uc->metadata_size = 0; > + uc->hdesc_size = 0; > + uc->notdpkt = 0; rather than do setting zero, why note memset and then set the nonzero members only? > +static void udma_reset_counters(struct udma_chan *uc) > +{ > + u32 val; > + > + if (uc->tchan) { > + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG); > + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val); so you read back from UDMA_TCHAN_RT_BCNT_REG and write same value to it?? > + > + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG); > + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG, val); > + > + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PCNT_REG); > + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PCNT_REG, val); > + > + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG); > + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG, val); > + } > + > + if (uc->rchan) { > + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_BCNT_REG); > + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_BCNT_REG, val); > + > + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG); > + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG, val); > + > + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PCNT_REG); > + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PCNT_REG, val); > + > + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG); > + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG, val); True for all of these, what am I missing :) > +static int udma_start(struct udma_chan *uc) > +{ > + struct virt_dma_desc *vd = vchan_next_desc(&uc->vc); > + > + if (!vd) { > + uc->desc = NULL; > + return -ENOENT; > + } > + > + list_del(&vd->node); > + > + uc->desc = to_udma_desc(&vd->tx); > + > + /* Channel is already running and does not need reconfiguration */ > + if (udma_is_chan_running(uc) && !udma_chan_needs_reconfiguration(uc)) { > + udma_start_desc(uc); > + goto out; How about the case where settings are different than the current one? > +static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc, > + size_t tr_size, int tr_count, > + enum dma_transfer_direction dir) > +{ > + struct udma_hwdesc *hwdesc; > + struct cppi5_desc_hdr_t *tr_desc; > + struct udma_desc *d; > + u32 reload_count = 0; > + u32 ring_id; > + > + switch (tr_size) { > + case 16: > + case 32: > + case 64: > + case 128: > + break; > + default: > + dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size); > + return NULL; > + } > + > + /* We have only one descriptor containing multiple TRs */ > + d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC); this is invoked from prep_ so should use GFP_NOWAIT, we dont use GFP_ATOMIC :) > +static struct udma_desc * > +udma_prep_slave_sg_tr(struct udma_chan *uc, struct scatterlist *sgl, > + unsigned int sglen, enum dma_transfer_direction dir, > + unsigned long tx_flags, void *context) > +{ > + enum dma_slave_buswidth dev_width; > + struct scatterlist *sgent; > + struct udma_desc *d; > + size_t tr_size; > + struct cppi5_tr_type1_t *tr_req = NULL; > + unsigned int i; > + u32 burst; > + > + if (dir == DMA_DEV_TO_MEM) { > + dev_width = uc->cfg.src_addr_width; > + burst = uc->cfg.src_maxburst; > + } else if (dir == DMA_MEM_TO_DEV) { > + dev_width = uc->cfg.dst_addr_width; > + burst = uc->cfg.dst_maxburst; > + } else { > + dev_err(uc->ud->dev, "%s: bad direction?\n", __func__); > + return NULL; > + } > + > + if (!burst) > + burst = 1; > + > + /* Now allocate and setup the descriptor. */ > + tr_size = sizeof(struct cppi5_tr_type1_t); > + d = udma_alloc_tr_desc(uc, tr_size, sglen, dir); > + if (!d) > + return NULL; > + > + d->sglen = sglen; > + > + tr_req = (struct cppi5_tr_type1_t *)d->hwdesc[0].tr_req_base; cast away from void *? > +static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d, > + enum dma_slave_buswidth dev_width, > + u16 elcnt) > +{ > + if (uc->ep_type != PSIL_EP_PDMA_XY) > + return 0; > + > + /* Bus width translates to the element size (ES) */ > + switch (dev_width) { > + case DMA_SLAVE_BUSWIDTH_1_BYTE: > + d->static_tr.elsize = 0; > + break; > + case DMA_SLAVE_BUSWIDTH_2_BYTES: > + d->static_tr.elsize = 1; > + break; > + case DMA_SLAVE_BUSWIDTH_3_BYTES: > + d->static_tr.elsize = 2; > + break; > + case DMA_SLAVE_BUSWIDTH_4_BYTES: > + d->static_tr.elsize = 3; > + break; > + case DMA_SLAVE_BUSWIDTH_8_BYTES: > + d->static_tr.elsize = 4; seems like ffs(dev_width) to me? > +static struct udma_desc * > +udma_prep_slave_sg_pkt(struct udma_chan *uc, struct scatterlist *sgl, > + unsigned int sglen, enum dma_transfer_direction dir, > + unsigned long tx_flags, void *context) > +{ > + struct scatterlist *sgent; > + struct cppi5_host_desc_t *h_desc = NULL; > + struct udma_desc *d; > + u32 ring_id; > + unsigned int i; > + > + d = kzalloc(sizeof(*d) + sglen * sizeof(d->hwdesc[0]), GFP_ATOMIC); GFP_NOWAIT here and few other places > +static struct udma_desc * > +udma_prep_dma_cyclic_pkt(struct udma_chan *uc, dma_addr_t buf_addr, > + size_t buf_len, size_t period_len, > + enum dma_transfer_direction dir, unsigned long flags) > +{ > + struct udma_desc *d; > + u32 ring_id; > + int i; > + int periods = buf_len / period_len; > + > + if (periods > (K3_UDMA_DEFAULT_RING_SIZE - 1)) > + return NULL; > + > + if (period_len > 0x3FFFFF) Magic? > +static enum dma_status udma_tx_status(struct dma_chan *chan, > + dma_cookie_t cookie, > + struct dma_tx_state *txstate) > +{ > + struct udma_chan *uc = to_udma_chan(chan); > + enum dma_status ret; > + unsigned long flags; > + > + spin_lock_irqsave(&uc->vc.lock, flags); > + > + ret = dma_cookie_status(chan, cookie, txstate); > + > + if (!udma_is_chan_running(uc)) > + ret = DMA_COMPLETE; Even for paused, not started channel? Not sure what will be return on those cases -- ~Vinod
On 23/12/2019 9.34, Vinod Koul wrote: > On 09-12-19, 11:43, Peter Ujfalusi wrote: > >> +#include <linux/kernel.h> >> +#include <linux/dmaengine.h> >> +#include <linux/dma-mapping.h> >> +#include <linux/dmapool.h> >> +#include <linux/err.h> >> +#include <linux/init.h> >> +#include <linux/interrupt.h> >> +#include <linux/list.h> >> +#include <linux/module.h> >> +#include <linux/platform_device.h> >> +#include <linux/slab.h> >> +#include <linux/spinlock.h> >> +#include <linux/of.h> >> +#include <linux/of_dma.h> >> +#include <linux/of_device.h> >> +#include <linux/of_irq.h> > > to many of headers, do we need all! I'll try to cut them back. >> +static char *udma_get_dir_text(enum dma_transfer_direction dir) >> +{ >> + switch (dir) { >> + case DMA_DEV_TO_MEM: >> + return "DEV_TO_MEM"; >> + case DMA_MEM_TO_DEV: >> + return "MEM_TO_DEV"; >> + case DMA_MEM_TO_MEM: >> + return "MEM_TO_MEM"; >> + case DMA_DEV_TO_DEV: >> + return "DEV_TO_DEV"; >> + default: >> + break; >> + } >> + >> + return "invalid"; >> +} > > this seems generic which other ppl may need, can we move it to core. dmaengine_get_direction_text() to include/linux/dmaengine.h This way client drivers can use it if they need it? >> +static void udma_reset_uchan(struct udma_chan *uc) >> +{ >> + uc->state = UDMA_CHAN_IS_IDLE; >> + uc->remote_thread_id = -1; >> + uc->dir = DMA_MEM_TO_MEM; >> + uc->pkt_mode = false; >> + uc->ep_type = PSIL_EP_NATIVE; >> + uc->enable_acc32 = 0; >> + uc->enable_burst = 0; >> + uc->channel_tpl = 0; >> + uc->psd_size = 0; >> + uc->metadata_size = 0; >> + uc->hdesc_size = 0; >> + uc->notdpkt = 0; > > rather than do setting zero, why note memset and then set the nonzero > members only? I have lots of other things in udma_chan which can not be memset, vchan struct, tasklet, name (for irq), etc. to use memset, I think I could move parameters under a new struct (udma_chan_params) keeping only the state in udma_chan. >> +static void udma_reset_counters(struct udma_chan *uc) >> +{ >> + u32 val; >> + >> + if (uc->tchan) { >> + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG); >> + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val); > > so you read back from UDMA_TCHAN_RT_BCNT_REG and write same value to > it?? Yes, that's correct. This is how we can reset it. The counter is decremented with the value you have written to the register. >> + >> + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG); >> + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_SBCNT_REG, val); >> + >> + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PCNT_REG); >> + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PCNT_REG, val); >> + >> + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG); >> + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_PEER_BCNT_REG, val); >> + } >> + >> + if (uc->rchan) { >> + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_BCNT_REG); >> + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_BCNT_REG, val); >> + >> + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG); >> + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_SBCNT_REG, val); >> + >> + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PCNT_REG); >> + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PCNT_REG, val); >> + >> + val = udma_rchanrt_read(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG); >> + udma_rchanrt_write(uc->rchan, UDMA_RCHAN_RT_PEER_BCNT_REG, val); > > True for all of these, what am I missing :) Decrement on write. > >> +static int udma_start(struct udma_chan *uc) >> +{ >> + struct virt_dma_desc *vd = vchan_next_desc(&uc->vc); >> + >> + if (!vd) { >> + uc->desc = NULL; >> + return -ENOENT; >> + } >> + >> + list_del(&vd->node); >> + >> + uc->desc = to_udma_desc(&vd->tx); >> + >> + /* Channel is already running and does not need reconfiguration */ >> + if (udma_is_chan_running(uc) && !udma_chan_needs_reconfiguration(uc)) { >> + udma_start_desc(uc); >> + goto out; > > How about the case where settings are different than the current one? udma_chan_needs_reconfiguration() is checking that. I only need to reconfigure UDMAP/PDMA if the settings have changed. >> +static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc, >> + size_t tr_size, int tr_count, >> + enum dma_transfer_direction dir) >> +{ >> + struct udma_hwdesc *hwdesc; >> + struct cppi5_desc_hdr_t *tr_desc; >> + struct udma_desc *d; >> + u32 reload_count = 0; >> + u32 ring_id; >> + >> + switch (tr_size) { >> + case 16: >> + case 32: >> + case 64: >> + case 128: >> + break; >> + default: >> + dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size); >> + return NULL; >> + } >> + >> + /* We have only one descriptor containing multiple TRs */ >> + d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC); > > this is invoked from prep_ so should use GFP_NOWAIT, we dont use > GFP_ATOMIC :) Ok. btw: EDMA and sDMA driver is using GFP_ATOMIC :o > >> +static struct udma_desc * >> +udma_prep_slave_sg_tr(struct udma_chan *uc, struct scatterlist *sgl, >> + unsigned int sglen, enum dma_transfer_direction dir, >> + unsigned long tx_flags, void *context) >> +{ >> + enum dma_slave_buswidth dev_width; >> + struct scatterlist *sgent; >> + struct udma_desc *d; >> + size_t tr_size; >> + struct cppi5_tr_type1_t *tr_req = NULL; >> + unsigned int i; >> + u32 burst; >> + >> + if (dir == DMA_DEV_TO_MEM) { >> + dev_width = uc->cfg.src_addr_width; >> + burst = uc->cfg.src_maxburst; >> + } else if (dir == DMA_MEM_TO_DEV) { >> + dev_width = uc->cfg.dst_addr_width; >> + burst = uc->cfg.dst_maxburst; >> + } else { >> + dev_err(uc->ud->dev, "%s: bad direction?\n", __func__); >> + return NULL; >> + } >> + >> + if (!burst) >> + burst = 1; >> + >> + /* Now allocate and setup the descriptor. */ >> + tr_size = sizeof(struct cppi5_tr_type1_t); >> + d = udma_alloc_tr_desc(uc, tr_size, sglen, dir); >> + if (!d) >> + return NULL; >> + >> + d->sglen = sglen; >> + >> + tr_req = (struct cppi5_tr_type1_t *)d->hwdesc[0].tr_req_base; > > cast away from void *? True, it is not needed. >> +static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d, >> + enum dma_slave_buswidth dev_width, >> + u16 elcnt) >> +{ >> + if (uc->ep_type != PSIL_EP_PDMA_XY) >> + return 0; >> + >> + /* Bus width translates to the element size (ES) */ >> + switch (dev_width) { >> + case DMA_SLAVE_BUSWIDTH_1_BYTE: >> + d->static_tr.elsize = 0; >> + break; >> + case DMA_SLAVE_BUSWIDTH_2_BYTES: >> + d->static_tr.elsize = 1; >> + break; >> + case DMA_SLAVE_BUSWIDTH_3_BYTES: >> + d->static_tr.elsize = 2; >> + break; >> + case DMA_SLAVE_BUSWIDTH_4_BYTES: >> + d->static_tr.elsize = 3; >> + break; >> + case DMA_SLAVE_BUSWIDTH_8_BYTES: >> + d->static_tr.elsize = 4; > > seems like ffs(dev_width) to me? Not really: ffs(DMA_SLAVE_BUSWIDTH_1_BYTE) = 1 ffs(DMA_SLAVE_BUSWIDTH_2_BYTES) = 2 ffs(DMA_SLAVE_BUSWIDTH_3_BYTES) = 1 ffs(DMA_SLAVE_BUSWIDTH_4_BYTES) = 3 ffs(DMA_SLAVE_BUSWIDTH_8_BYTES) = 4 > >> +static struct udma_desc * >> +udma_prep_slave_sg_pkt(struct udma_chan *uc, struct scatterlist *sgl, >> + unsigned int sglen, enum dma_transfer_direction dir, >> + unsigned long tx_flags, void *context) >> +{ >> + struct scatterlist *sgent; >> + struct cppi5_host_desc_t *h_desc = NULL; >> + struct udma_desc *d; >> + u32 ring_id; >> + unsigned int i; >> + >> + d = kzalloc(sizeof(*d) + sglen * sizeof(d->hwdesc[0]), GFP_ATOMIC); > > GFP_NOWAIT here and few other places Yes, I have fixed them up by this time. > >> +static struct udma_desc * >> +udma_prep_dma_cyclic_pkt(struct udma_chan *uc, dma_addr_t buf_addr, >> + size_t buf_len, size_t period_len, >> + enum dma_transfer_direction dir, unsigned long flags) >> +{ >> + struct udma_desc *d; >> + u32 ring_id; >> + int i; >> + int periods = buf_len / period_len; >> + >> + if (periods > (K3_UDMA_DEFAULT_RING_SIZE - 1)) >> + return NULL; >> + >> + if (period_len > 0x3FFFFF) > > Magic? I'll add a define to cppi5. It is the packet length limit. > >> +static enum dma_status udma_tx_status(struct dma_chan *chan, >> + dma_cookie_t cookie, >> + struct dma_tx_state *txstate) >> +{ >> + struct udma_chan *uc = to_udma_chan(chan); >> + enum dma_status ret; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&uc->vc.lock, flags); >> + >> + ret = dma_cookie_status(chan, cookie, txstate); >> + >> + if (!udma_is_chan_running(uc)) >> + ret = DMA_COMPLETE; > > Even for paused, not started channel? Not sure what will be return on those cases Hrm, if the channel is not started yet, then I think it should be still DMA_IN_PROGRESS, right? The udma_is_chan_running() can be dropped from here. I did missed the DMA_PAUSED state. - if (!udma_is_chan_running(uc)) - ret = DMA_COMPLETE; + if (ret == DMA_IN_PROGRESS && udma_is_chan_paused(uc)) + ret = DMA_PAUSED; - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 23-12-19, 10:59, Peter Ujfalusi wrote: > >> +static void udma_reset_counters(struct udma_chan *uc) > >> +{ > >> + u32 val; > >> + > >> + if (uc->tchan) { > >> + val = udma_tchanrt_read(uc->tchan, UDMA_TCHAN_RT_BCNT_REG); > >> + udma_tchanrt_write(uc->tchan, UDMA_TCHAN_RT_BCNT_REG, val); > > > > so you read back from UDMA_TCHAN_RT_BCNT_REG and write same value to > > it?? > > Yes, that's correct. This is how we can reset it. The counter is > decremented with the value you have written to the register. aha, with so many read+write back I would have added a helper.. Not a big deal though can be updated later > >> +static struct udma_desc *udma_alloc_tr_desc(struct udma_chan *uc, > >> + size_t tr_size, int tr_count, > >> + enum dma_transfer_direction dir) > >> +{ > >> + struct udma_hwdesc *hwdesc; > >> + struct cppi5_desc_hdr_t *tr_desc; > >> + struct udma_desc *d; > >> + u32 reload_count = 0; > >> + u32 ring_id; > >> + > >> + switch (tr_size) { > >> + case 16: > >> + case 32: > >> + case 64: > >> + case 128: > >> + break; > >> + default: > >> + dev_err(uc->ud->dev, "Unsupported TR size of %zu\n", tr_size); > >> + return NULL; > >> + } > >> + > >> + /* We have only one descriptor containing multiple TRs */ > >> + d = kzalloc(sizeof(*d) + sizeof(d->hwdesc[0]), GFP_ATOMIC); > > > > this is invoked from prep_ so should use GFP_NOWAIT, we dont use > > GFP_ATOMIC :) > > Ok. btw: EDMA and sDMA driver is using GFP_ATOMIC :o heh, we made sure to document this bit :) > >> +static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d, > >> + enum dma_slave_buswidth dev_width, > >> + u16 elcnt) > >> +{ > >> + if (uc->ep_type != PSIL_EP_PDMA_XY) > >> + return 0; > >> + > >> + /* Bus width translates to the element size (ES) */ > >> + switch (dev_width) { > >> + case DMA_SLAVE_BUSWIDTH_1_BYTE: > >> + d->static_tr.elsize = 0; > >> + break; > >> + case DMA_SLAVE_BUSWIDTH_2_BYTES: > >> + d->static_tr.elsize = 1; > >> + break; > >> + case DMA_SLAVE_BUSWIDTH_3_BYTES: > >> + d->static_tr.elsize = 2; > >> + break; > >> + case DMA_SLAVE_BUSWIDTH_4_BYTES: > >> + d->static_tr.elsize = 3; > >> + break; > >> + case DMA_SLAVE_BUSWIDTH_8_BYTES: > >> + d->static_tr.elsize = 4; > > > > seems like ffs(dev_width) to me? > > Not really: > ffs(DMA_SLAVE_BUSWIDTH_1_BYTE) = 1 > ffs(DMA_SLAVE_BUSWIDTH_2_BYTES) = 2 > ffs(DMA_SLAVE_BUSWIDTH_3_BYTES) = 1 I missed this! > ffs(DMA_SLAVE_BUSWIDTH_4_BYTES) = 3 > ffs(DMA_SLAVE_BUSWIDTH_8_BYTES) = 4 Otherwise you are ffs() - 1 -- ~Vinod