diff mbox series

[1/2] usb: dwc3: core: Call cpu_relax() in registers polling busy loops

Message ID 20240820121501.3593245-2-quic_zhonhan@quicinc.com
State New
Headers show
Series usb: Call cpu_relax() when polling registers | expand

Commit Message

Zhongqiu Han Aug. 20, 2024, 12:15 p.m. UTC
Busy loops that poll on a register should call cpu_relax(). On some
architectures, it can lower CPU power consumption or yield to a
hyperthreaded twin processor. It also serves as a compiler barrier,
see Documentation/process/volatile-considered-harmful.rst. In addition,
if something goes wrong in the busy loop at least it can prevent things
from getting worse.

Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
---
 drivers/usb/dwc3/core.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Thinh Nguyen Aug. 20, 2024, 10:05 p.m. UTC | #1
On Tue, Aug 20, 2024, Zhongqiu Han wrote:
> Busy loops that poll on a register should call cpu_relax(). On some
> architectures, it can lower CPU power consumption or yield to a
> hyperthreaded twin processor. It also serves as a compiler barrier,
> see Documentation/process/volatile-considered-harmful.rst. In addition,
> if something goes wrong in the busy loop at least it can prevent things
> from getting worse.
> 
> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
> ---
>  drivers/usb/dwc3/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 734de2a8bd21..498f08dbbdb5 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -2050,6 +2050,8 @@ static int dwc3_get_num_ports(struct dwc3 *dwc)
>  		if (!offset)
>  			break;
>  
> +		cpu_relax();
> +
>  		val = readl(base + offset);
>  		major_revision = XHCI_EXT_PORT_MAJOR(val);
>  
> -- 
> 2.25.1
> 

We're not polling on a register here. We're just traversing and reading
the next port capability. The loop in dwc3_get_num_ports() should not be
more than DWC3_USB2_MAX_PORTS + DWC3_USB3_MAX_PORTS.

What's really causing this busy loop you found?

If polling for a register is really a problem, then we would have that
problem everywhere else in dwc3. But why here?

Thanks,
Thinh
Zhongqiu Han Aug. 21, 2024, 1:59 p.m. UTC | #2
On 8/21/2024 6:05 AM, Thinh Nguyen wrote:
> On Tue, Aug 20, 2024, Zhongqiu Han wrote:
>> Busy loops that poll on a register should call cpu_relax(). On some
>> architectures, it can lower CPU power consumption or yield to a
>> hyperthreaded twin processor. It also serves as a compiler barrier,
>> see Documentation/process/volatile-considered-harmful.rst. In addition,
>> if something goes wrong in the busy loop at least it can prevent things
>> from getting worse.
>>
>> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
>> ---
>>   drivers/usb/dwc3/core.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>> index 734de2a8bd21..498f08dbbdb5 100644
>> --- a/drivers/usb/dwc3/core.c
>> +++ b/drivers/usb/dwc3/core.c
>> @@ -2050,6 +2050,8 @@ static int dwc3_get_num_ports(struct dwc3 *dwc)
>>   		if (!offset)
>>   			break;
>>   
>> +		cpu_relax();
>> +
>>   		val = readl(base + offset);
>>   		major_revision = XHCI_EXT_PORT_MAJOR(val);
>>   
>> -- 
>> 2.25.1
>>
> 
> We're not polling on a register here. We're just traversing and reading
> the next port capability. The loop in dwc3_get_num_ports() should not be
> more than DWC3_USB2_MAX_PORTS + DWC3_USB3_MAX_PORTS.
> 
Hi Thinh,
Thanks a lot for the review~

yes, now i know that the iterations are limited so it wouldn't make
sense to relax here. I will be careful about this next time and sorry
for this.

> What's really causing this busy loop you found?
> 
actually no practical issue.
> If polling for a register is really a problem, then we would have that
> problem everywhere else in dwc3. But why here?
> 

I also think that polling for a register is not a problem, but if there
polling for a register in the potential infinite loop, It's better to
relax the cpu and as i saw, basically there are two types of
implementations in other codes for the relax cpu target,

1. use (u/m)sleep or (u/n)delay function or the iterations limited,
such as:

(1)
core.c- if (DWC3_VER_IS_WITHIN(DWC31, 190A, ANY) || DWC3_IP_IS(DWC32))
core.c-         retries = 10;
core.c-
core.c- do {
core.c:         reg = dwc3_readl(dwc->regs, DWC3_DCTL);
core.c-         if (!(reg & DWC3_DCTL_CSFTRST))
core.c-                 goto done;
core.c-
core.c-         if (DWC3_VER_IS_WITHIN(DWC31, 190A, ANY) ||
                     DWC3_IP_IS(DWC32))
core.c-                 msleep(20);
core.c-         else
core.c-                 udelay(1);
core.c- } while (--retries);

(2)
gadget.c-       /* poll until Link State changes to ON */
gadget.c-       retries = 20000;
gadget.c-
gadget.c-       while (retries--) {
gadget.c:               reg = dwc3_readl(dwc->regs, DWC3_DSTS);
gadget.c-
gadget.c-               /* in HS, means ON */
gadget.c-               if (DWC3_DSTS_USBLNKST(reg) ==
                                          DWC3_LINK_STATE_U0)
gadget.c-                       break;
gadget.c-       }

By the way, for (2) case, the retries is 20000, seems the value is large
without relax if break the loop only while retries is 0, but as we know
although if there need delay/relax, we cannot easily use (u/m)delay or m
u(sleep) functions because we need consider to avoid the "scheduling on
atomic/invalid context" BUG. Just shared my guess, unless there is
optimization comparison data after relax cpu or practical issue here.


2. use cpu_relax() to relax for busy loop, such as:

(1)
ulpi.c-static int dwc3_ulpi_busyloop(struct dwc3 *dwc, u8 addr, bool read)
..........................
ulpi.c-
ulpi.c- while (count--) {
ulpi.c-         ndelay(ns);
ulpi.c:         reg = dwc3_readl(dwc->regs, DWC3_GUSB2PHYACC(0));
ulpi.c-         if (reg & DWC3_GUSB2PHYACC_DONE)
ulpi.c-                 return 0;
ulpi.c-         cpu_relax();
ulpi.c- }

(2)
host/ohci-pxa27x.c:     while (__raw_readl(pxa_ohci->mmio_base + UHCHR) 
& UHCHR_FSBIR)
host/ohci-pxa27x.c-             cpu_relax();

(3)
gadget/udc/fsl_udc_core.c:      while (fsl_readl(&dr_regs->usbcmd) & 
USB_CMD_CTRL_RESET) {
gadget/udc/fsl_udc_core.c-              if (time_after(jiffies, timeout)) {
gadget/udc/fsl_udc_core.c- 
dev_err(&udc->gadget.dev, "udc reset timeout!\n");
gadget/udc/fsl_udc_core.c-                      return -ETIMEDOUT;
gadget/udc/fsl_udc_core.c-              }
gadget/udc/fsl_udc_core.c-              cpu_relax();
gadget/udc/fsl_udc_core.c-      }


Anyways, for current patch there the iterations are limited, thanks a
lot for the review and the discussion and i will be careful next time.

> Thanks,
> Thinh
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..498f08dbbdb5 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2050,6 +2050,8 @@  static int dwc3_get_num_ports(struct dwc3 *dwc)
 		if (!offset)
 			break;
 
+		cpu_relax();
+
 		val = readl(base + offset);
 		major_revision = XHCI_EXT_PORT_MAJOR(val);