Message ID | 20181115184904.27223-1-srinivas.kandagatla@linaro.org |
---|---|
State | Accepted |
Commit | 1ac19ad799f880af58b9a8a4321334f6f6fc72e6 |
Headers | show |
Series | qcom: apr: Make apr callbacks in non-atomic context | expand |
On Thu 31 Jan 02:44 PST 2019, Srinivas Kandagatla wrote: > > > On 31/01/2019 01:16, Bjorn Andersson wrote: > > On Thu 15 Nov 10:49 PST 2018, Srinivas Kandagatla wrote: > > > > > APR communication with DSP is not atomic in nature. > > > Its request-response type. Trying to pretend that these are atomic > > > and invoking apr client callbacks directly under atomic/irq context has > > > endless issues with soundcard. It makes more sense to convert these > > > to nonatomic calls. This also coverts all the dais to be nonatomic. > > > > > Hi Srinivas, > > > > Sorry for not looking at this before. > > > NP, thanks for the review! > > > Are you sure that you're meeting the latency requirements of low-latency > > audio with this change? > > Low and Ultra Low Latency audio is not supported in the exiting upstream > qdsp drivers. > Sure, but we want the design to allow for that still, either in future upstream or by additional downstream code. > Also it depends on definition of "latency", is the latency referring to > "filling the data" or "latency between DSP command and response". > I'm referring to the latency between the message from the DSP until we give it a new buffer. > For former case as long as we have more samples in our ring buffer there > should be no latency in filling the data. > For later case it should not really matter as long as former case is taken > care off. > > Low latency audio involves smaller sample sizes and no or minimal > preprocessing in DSP so am guessing that we should be okay with responses in > workqueue as long as we have good size ring buffer. > Relying on more buffered data will increase the latency of the audio, preventing you from doing really low-latency things. Regards, Bjorn
On Thu 31 Jan 09:33 PST 2019, Srinivas Kandagatla wrote: > On 31/01/2019 16:05, Bjorn Andersson wrote: > > Sure, but we want the design to allow for that still, either in future > > upstream or by additional downstream code. > > > Yes, I agree, I don't have solution for this ATM. > It will be interesting to see how Intel handles this kind of usecase on > there DSP. > > The whole issue is the APR messaging is not really atomic in nature, it is > basically request->response but the fact in existing code is that smd/glink > callbacks run in interrupt context. > > Trying to pretend that APR is atomic in nature is problem with audio. > > As audio (dai-links) can be marked as atomic or non-atomic depending on > which bus it links with, for example when it has to work with other buses > like slimbus, soundwire, i2c whose transactions can sleep we mark the audio > dai-link as non-atomic which means that the functions can sleep. > In the above case, invoking any audio functions as part of the rpmsg > callback is an issue. > > The only solution I found to address this is handle the callbacks in > workqueue. > Okay, I think we should merge this and once we have the means of doing low latency playback we can measure and worst case revisit this decision. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Regards, Bjorn > > > Also it depends on definition of "latency", is the latency referring to > > > "filling the data" or "latency between DSP command and response". > > > > > I'm referring to the latency between the message from the DSP until we > > give it a new buffer. > > > > > For former case as long as we have more samples in our ring buffer there > > > should be no latency in filling the data. > > > For later case it should not really matter as long as former case is taken > > > care off. > > > > > > Low latency audio involves smaller sample sizes and no or minimal > > > preprocessing in DSP so am guessing that we should be okay with responses in > > > workqueue as long as we have good size ring buffer. > > > > > Relying on more buffered data will increase the latency of the audio, > > preventing you from doing really low-latency things. > My bad!.. Yes, in low latency case we would have very less buffers! > > srini >
diff --git a/drivers/soc/qcom/apr.c b/drivers/soc/qcom/apr.c index 74f8b9607daa..8cfa825fce81 100644 --- a/drivers/soc/qcom/apr.c +++ b/drivers/soc/qcom/apr.c @@ -8,6 +8,7 @@ #include <linux/spinlock.h> #include <linux/idr.h> #include <linux/slab.h> +#include <linux/workqueue.h> #include <linux/of_device.h> #include <linux/soc/qcom/apr.h> #include <linux/rpmsg.h> @@ -17,8 +18,18 @@ struct apr { struct rpmsg_endpoint *ch; struct device *dev; spinlock_t svcs_lock; + spinlock_t rx_lock; struct idr svcs_idr; int dest_domain_id; + struct workqueue_struct *rxwq; + struct work_struct rx_work; + struct list_head rx_list; +}; + +struct apr_rx_buf { + struct list_head node; + int len; + uint8_t buf[]; }; /** @@ -62,11 +73,7 @@ static int apr_callback(struct rpmsg_device *rpdev, void *buf, int len, void *priv, u32 addr) { struct apr *apr = dev_get_drvdata(&rpdev->dev); - uint16_t hdr_size, msg_type, ver, svc_id; - struct apr_device *svc = NULL; - struct apr_driver *adrv = NULL; - struct apr_resp_pkt resp; - struct apr_hdr *hdr; + struct apr_rx_buf *abuf; unsigned long flags; if (len <= APR_HDR_SIZE) { @@ -75,6 +82,34 @@ static int apr_callback(struct rpmsg_device *rpdev, void *buf, return -EINVAL; } + abuf = kzalloc(sizeof(*abuf) + len, GFP_ATOMIC); + if (!abuf) + return -ENOMEM; + + abuf->len = len; + memcpy(abuf->buf, buf, len); + + spin_lock_irqsave(&apr->rx_lock, flags); + list_add_tail(&abuf->node, &apr->rx_list); + spin_unlock_irqrestore(&apr->rx_lock, flags); + + queue_work(apr->rxwq, &apr->rx_work); + + return 0; +} + + +static int apr_do_rx_callback(struct apr *apr, struct apr_rx_buf *abuf) +{ + uint16_t hdr_size, msg_type, ver, svc_id; + struct apr_device *svc = NULL; + struct apr_driver *adrv = NULL; + struct apr_resp_pkt resp; + struct apr_hdr *hdr; + unsigned long flags; + void *buf = abuf->buf; + int len = abuf->len; + hdr = buf; ver = APR_HDR_FIELD_VER(hdr->hdr_field); if (ver > APR_PKT_VER + 1) @@ -132,6 +167,23 @@ static int apr_callback(struct rpmsg_device *rpdev, void *buf, return 0; } +static void apr_rxwq(struct work_struct *work) +{ + struct apr *apr = container_of(work, struct apr, rx_work); + struct apr_rx_buf *abuf, *b; + unsigned long flags; + + if (!list_empty(&apr->rx_list)) { + list_for_each_entry_safe(abuf, b, &apr->rx_list, node) { + apr_do_rx_callback(apr, abuf); + spin_lock_irqsave(&apr->rx_lock, flags); + list_del(&abuf->node); + spin_unlock_irqrestore(&apr->rx_lock, flags); + kfree(abuf); + } + } +} + static int apr_device_match(struct device *dev, struct device_driver *drv) { struct apr_device *adev = to_apr_device(dev); @@ -285,6 +337,14 @@ static int apr_probe(struct rpmsg_device *rpdev) dev_set_drvdata(dev, apr); apr->ch = rpdev->ept; apr->dev = dev; + apr->rxwq = create_singlethread_workqueue("qcom_apr_rx"); + if (!apr->rxwq) { + dev_err(apr->dev, "Failed to start Rx WQ\n"); + return -ENOMEM; + } + INIT_WORK(&apr->rx_work, apr_rxwq); + INIT_LIST_HEAD(&apr->rx_list); + spin_lock_init(&apr->rx_lock); spin_lock_init(&apr->svcs_lock); idr_init(&apr->svcs_idr); of_register_apr_devices(dev); @@ -303,6 +363,10 @@ static int apr_remove_device(struct device *dev, void *null) static void apr_remove(struct rpmsg_device *rpdev) { + struct apr *apr = dev_get_drvdata(&rpdev->dev); + + flush_workqueue(apr->rxwq); + destroy_workqueue(apr->rxwq); device_for_each_child(&rpdev->dev, NULL, apr_remove_device); }
APR communication with DSP is not atomic in nature. Its request-response type. Trying to pretend that these are atomic and invoking apr client callbacks directly under atomic/irq context has endless issues with soundcard. It makes more sense to convert these to nonatomic calls. This also coverts all the dais to be nonatomic. All the callbacks are now invoked as part of rx work queue. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> --- drivers/soc/qcom/apr.c | 74 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 5 deletions(-) -- 2.19.1