diff mbox series

usb: gadget: aspeed: fixup vhub port irq handling

Message ID 20200528011154.30355-1-rentao.bupt@gmail.com
State New
Headers show
Series usb: gadget: aspeed: fixup vhub port irq handling | expand

Commit Message

Tao Ren May 28, 2020, 1:11 a.m. UTC
From: Tao Ren <rentao.bupt@gmail.com>

This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
improve vhub port irq handling"): for_each_set_bit() is replaced with
simple for() loop because for() loop runs faster on ASPEED BMC.

Signed-off-by: Tao Ren <rentao.bupt@gmail.com>
---
 drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
 drivers/usb/gadget/udc/aspeed-vhub/vhub.h |  3 +++
 2 files changed, 6 insertions(+), 7 deletions(-)

Comments

Felipe Balbi Aug. 17, 2020, 1:49 p.m. UTC | #1
Hi,

rentao.bupt@gmail.com writes:
> From: Tao Ren <rentao.bupt@gmail.com>

>

> This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:

> improve vhub port irq handling"): for_each_set_bit() is replaced with

> simple for() loop because for() loop runs faster on ASPEED BMC.

>

> Signed-off-by: Tao Ren <rentao.bupt@gmail.com>

> ---

>  drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------

>  drivers/usb/gadget/udc/aspeed-vhub/vhub.h |  3 +++

>  2 files changed, 6 insertions(+), 7 deletions(-)

>

> diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c

> index cdf96911e4b1..be7bb64e3594 100644

> --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c

> +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c

> @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)

>  

>  	/* Handle device interrupts */

>  	if (istat & vhub->port_irq_mask) {

> -		unsigned long bitmap = istat;

> -		int offset = VHUB_IRQ_DEV1_BIT;

> -		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;

> -

> -		for_each_set_bit_from(offset, &bitmap, size) {

> -			i = offset - VHUB_IRQ_DEV1_BIT;

> -			ast_vhub_dev_irq(&vhub->ports[i].dev);

> +		for (i = 0; i < vhub->max_ports; i++) {

> +			if (istat & VHUB_DEV_IRQ(i))

> +				ast_vhub_dev_irq(&vhub->ports[i].dev);


how have you measured your statement above? for_each_set_bit() does
exactly what you did. Unless your architecture has an instruction which
helps finds the next set bit (like cls on ARM), which, then, makes it
much faster.

-- 
balbi
Tao Ren Aug. 17, 2020, 10:56 p.m. UTC | #2
On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:
> 

> Hi,

> 

> rentao.bupt@gmail.com writes:

> > From: Tao Ren <rentao.bupt@gmail.com>

> >

> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:

> > improve vhub port irq handling"): for_each_set_bit() is replaced with

> > simple for() loop because for() loop runs faster on ASPEED BMC.

> >

> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com>

> > ---

> >  drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------

> >  drivers/usb/gadget/udc/aspeed-vhub/vhub.h |  3 +++

> >  2 files changed, 6 insertions(+), 7 deletions(-)

> >

> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c

> > index cdf96911e4b1..be7bb64e3594 100644

> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c

> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c

> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)

> >  

> >  	/* Handle device interrupts */

> >  	if (istat & vhub->port_irq_mask) {

> > -		unsigned long bitmap = istat;

> > -		int offset = VHUB_IRQ_DEV1_BIT;

> > -		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;

> > -

> > -		for_each_set_bit_from(offset, &bitmap, size) {

> > -			i = offset - VHUB_IRQ_DEV1_BIT;

> > -			ast_vhub_dev_irq(&vhub->ports[i].dev);

> > +		for (i = 0; i < vhub->max_ports; i++) {

> > +			if (istat & VHUB_DEV_IRQ(i))

> > +				ast_vhub_dev_irq(&vhub->ports[i].dev);

> 

> how have you measured your statement above? for_each_set_bit() does

> exactly what you did. Unless your architecture has an instruction which

> helps finds the next set bit (like cls on ARM), which, then, makes it

> much faster.


I did some testing and result shows for() loop runs faster than
for_each_set_bit() loop. Please refer to details below (discussion with
Benjamin in the original patch) and kindly let me know your suggestions.

> On Mon, 2020-04-06 at 23:02 -0700, Tao Ren wrote:

> > I ran some testing on my ast2400 and ast2500 BMC and looks like the

> > for() loop runs faster than for_each_set_bit_from() loop in my

> > environment. I'm not sure if something needs to be revised in my test

> > code, but please kindly share your suggestions:

> >

> > I use get_cycles() to calculate execution time of 2 different loops, and

> > ast_vhub_dev_irq() is replaced with barrier() to avoid "noise"; below

> > are the results:

> >

> >   - when downstream port number is 5 and only 1 irq bit is set, it takes

> >     ~30 cycles to finish for_each_set_bit() loop, and 20-25 cycles to

> >     finish the for() loop.

> >

> >   - if downstream port number is 5 and all 5 bits are set, then

> >     for_each_set_bit() loop takes ~50 cycles and for() loop takes ~25

> >     cycles.

> >

> >   - when I increase downsteam port number to 16 and set 1 irq bit, the

> >     for_each_set_bit() loop takes ~30 cycles and for() loop takes 25

> >     cycles. It's a little surprise to me because I thought for() loop

> >     would cost 60+ cycles (3 times of the value when port number is 5).

> >

> >   - if downstream port number is 16 and all irq status bits are set,

> >     then for_each_set_bit() loop takes 60-70 cycles and for() loop takes

> >     30+ cycles.



Cheers,

Tao
Felipe Balbi Aug. 31, 2020, 9:54 a.m. UTC | #3
Hi,

Tao Ren <rentao.bupt@gmail.com> writes:
> On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:

>> 

>> Hi,

>> 

>> rentao.bupt@gmail.com writes:

>> > From: Tao Ren <rentao.bupt@gmail.com>

>> >

>> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:

>> > improve vhub port irq handling"): for_each_set_bit() is replaced with

>> > simple for() loop because for() loop runs faster on ASPEED BMC.

>> >

>> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com>

>> > ---

>> >  drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------

>> >  drivers/usb/gadget/udc/aspeed-vhub/vhub.h |  3 +++

>> >  2 files changed, 6 insertions(+), 7 deletions(-)

>> >

>> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > index cdf96911e4b1..be7bb64e3594 100644

>> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)

>> >  

>> >  	/* Handle device interrupts */

>> >  	if (istat & vhub->port_irq_mask) {

>> > -		unsigned long bitmap = istat;

>> > -		int offset = VHUB_IRQ_DEV1_BIT;

>> > -		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;

>> > -

>> > -		for_each_set_bit_from(offset, &bitmap, size) {

>> > -			i = offset - VHUB_IRQ_DEV1_BIT;

>> > -			ast_vhub_dev_irq(&vhub->ports[i].dev);

>> > +		for (i = 0; i < vhub->max_ports; i++) {

>> > +			if (istat & VHUB_DEV_IRQ(i))

>> > +				ast_vhub_dev_irq(&vhub->ports[i].dev);

>> 

>> how have you measured your statement above? for_each_set_bit() does

>> exactly what you did. Unless your architecture has an instruction which

>> helps finds the next set bit (like cls on ARM), which, then, makes it

>> much faster.

>

> I did some testing and result shows for() loop runs faster than

> for_each_set_bit() loop. Please refer to details below (discussion with

> Benjamin in the original patch) and kindly let me know your suggestions.


no strong feelings, just surprised that you're already worried about
20~40 cycles of cpu time ;-)

Patch applied for next merge window

-- 
balbi
Felipe Balbi Aug. 31, 2020, 9:54 a.m. UTC | #4
Hi,

Tao Ren <rentao.bupt@gmail.com> writes:
> On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:

>> 

>> Hi,

>> 

>> rentao.bupt@gmail.com writes:

>> > From: Tao Ren <rentao.bupt@gmail.com>

>> >

>> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:

>> > improve vhub port irq handling"): for_each_set_bit() is replaced with

>> > simple for() loop because for() loop runs faster on ASPEED BMC.

>> >

>> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com>

>> > ---

>> >  drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------

>> >  drivers/usb/gadget/udc/aspeed-vhub/vhub.h |  3 +++

>> >  2 files changed, 6 insertions(+), 7 deletions(-)

>> >

>> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > index cdf96911e4b1..be7bb64e3594 100644

>> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)

>> >  

>> >  	/* Handle device interrupts */

>> >  	if (istat & vhub->port_irq_mask) {

>> > -		unsigned long bitmap = istat;

>> > -		int offset = VHUB_IRQ_DEV1_BIT;

>> > -		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;

>> > -

>> > -		for_each_set_bit_from(offset, &bitmap, size) {

>> > -			i = offset - VHUB_IRQ_DEV1_BIT;

>> > -			ast_vhub_dev_irq(&vhub->ports[i].dev);

>> > +		for (i = 0; i < vhub->max_ports; i++) {

>> > +			if (istat & VHUB_DEV_IRQ(i))

>> > +				ast_vhub_dev_irq(&vhub->ports[i].dev);

>> 

>> how have you measured your statement above? for_each_set_bit() does

>> exactly what you did. Unless your architecture has an instruction which

>> helps finds the next set bit (like cls on ARM), which, then, makes it

>> much faster.

>

> I did some testing and result shows for() loop runs faster than

> for_each_set_bit() loop. Please refer to details below (discussion with

> Benjamin in the original patch) and kindly let me know your suggestions.


no strong feelings, just surprised that you're already worried about
20~40 cycles of cpu time ;-)

Patch applied for next merge window

-- 
balbi
Felipe Balbi Aug. 31, 2020, 9:56 a.m. UTC | #5
Hi,

Tao Ren <rentao.bupt@gmail.com> writes:
>> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > index cdf96911e4b1..be7bb64e3594 100644

>> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c

>> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)

>> >  

>> >  	/* Handle device interrupts */

>> >  	if (istat & vhub->port_irq_mask) {

>> > -		unsigned long bitmap = istat;

>> > -		int offset = VHUB_IRQ_DEV1_BIT;

>> > -		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;

>> > -

>> > -		for_each_set_bit_from(offset, &bitmap, size) {

>> > -			i = offset - VHUB_IRQ_DEV1_BIT;

>> > -			ast_vhub_dev_irq(&vhub->ports[i].dev);

>> > +		for (i = 0; i < vhub->max_ports; i++) {

>> > +			if (istat & VHUB_DEV_IRQ(i))

>> > +				ast_vhub_dev_irq(&vhub->ports[i].dev);

>> 

>> how have you measured your statement above? for_each_set_bit() does

>> exactly what you did. Unless your architecture has an instruction which

>> helps finds the next set bit (like cls on ARM), which, then, makes it

>> much faster.

>

> I did some testing and result shows for() loop runs faster than

> for_each_set_bit() loop. Please refer to details below (discussion with

> Benjamin in the original patch) and kindly let me know your

> suggestions.


no strong feelings, just surprised you're already worried about 20~40
cycles of cpu time ;-)

patch queued for next merge window

-- 
balbi
Tao Ren Aug. 31, 2020, 11:26 p.m. UTC | #6
On Mon, Aug 31, 2020 at 12:54:57PM +0300, Felipe Balbi wrote:
> 

> Hi,

> 

> Tao Ren <rentao.bupt@gmail.com> writes:

> > On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:

> >> 

> >> Hi,

> >> 

> >> rentao.bupt@gmail.com writes:

> >> > From: Tao Ren <rentao.bupt@gmail.com>

> >> >

> >> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:

> >> > improve vhub port irq handling"): for_each_set_bit() is replaced with

> >> > simple for() loop because for() loop runs faster on ASPEED BMC.

> >> >

> >> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com>

> >> > ---

> >> >  drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------

> >> >  drivers/usb/gadget/udc/aspeed-vhub/vhub.h |  3 +++

> >> >  2 files changed, 6 insertions(+), 7 deletions(-)

> >> >

> >> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c

> >> > index cdf96911e4b1..be7bb64e3594 100644

> >> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c

> >> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c

> >> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)

> >> >  

> >> >  	/* Handle device interrupts */

> >> >  	if (istat & vhub->port_irq_mask) {

> >> > -		unsigned long bitmap = istat;

> >> > -		int offset = VHUB_IRQ_DEV1_BIT;

> >> > -		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;

> >> > -

> >> > -		for_each_set_bit_from(offset, &bitmap, size) {

> >> > -			i = offset - VHUB_IRQ_DEV1_BIT;

> >> > -			ast_vhub_dev_irq(&vhub->ports[i].dev);

> >> > +		for (i = 0; i < vhub->max_ports; i++) {

> >> > +			if (istat & VHUB_DEV_IRQ(i))

> >> > +				ast_vhub_dev_irq(&vhub->ports[i].dev);

> >> 

> >> how have you measured your statement above? for_each_set_bit() does

> >> exactly what you did. Unless your architecture has an instruction which

> >> helps finds the next set bit (like cls on ARM), which, then, makes it

> >> much faster.

> >

> > I did some testing and result shows for() loop runs faster than

> > for_each_set_bit() loop. Please refer to details below (discussion with

> > Benjamin in the original patch) and kindly let me know your suggestions.

> 

> no strong feelings, just surprised that you're already worried about

> 20~40 cycles of cpu time ;-)

> 

> Patch applied for next merge window


Thanks Felipe. Ben had some concerns about interrupt handling cost on
AST2400 BMC (ARM9), hence we did the comparison and noticed the small
difference :)


Cheers,

Tao
diff mbox series

Patch

diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
index cdf96911e4b1..be7bb64e3594 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
+++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
@@ -135,13 +135,9 @@  static irqreturn_t ast_vhub_irq(int irq, void *data)
 
 	/* Handle device interrupts */
 	if (istat & vhub->port_irq_mask) {
-		unsigned long bitmap = istat;
-		int offset = VHUB_IRQ_DEV1_BIT;
-		int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
-
-		for_each_set_bit_from(offset, &bitmap, size) {
-			i = offset - VHUB_IRQ_DEV1_BIT;
-			ast_vhub_dev_irq(&vhub->ports[i].dev);
+		for (i = 0; i < vhub->max_ports; i++) {
+			if (istat & VHUB_DEV_IRQ(i))
+				ast_vhub_dev_irq(&vhub->ports[i].dev);
 		}
 	}
 
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
index 2e5a1ef14a75..87a5dea12d3c 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
+++ b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
@@ -67,6 +67,9 @@ 
 #define VHUB_IRQ_HUB_EP0_SETUP			(1 << 0)
 #define VHUB_IRQ_ACK_ALL			0x1ff
 
+/* Downstream device IRQ mask. */
+#define VHUB_DEV_IRQ(n)				(VHUB_IRQ_DEVICE1 << (n))
+
 /* SW reset reg */
 #define VHUB_SW_RESET_EP_POOL			(1 << 9)
 #define VHUB_SW_RESET_DMA_CONTROLLER		(1 << 8)