diff mbox series

powerpc: reduce number of WATCHDOG_RESET calls from flush_cache

Message ID 20200604093039.26781-1-rasmus.villemoes@prevas.dk
State New
Headers show
Series powerpc: reduce number of WATCHDOG_RESET calls from flush_cache | expand

Commit Message

Rasmus Villemoes June 4, 2020, 9:30 a.m. UTC
Calling WATCHDOG_RESET for each and every cache line is overkill.

In our case, the kernel image is a little over 7MB, and the almost
500000 calls of WATCHDOG_RESET() adds about one second to the
boottime.

I very highly doubt there's any real hardware where flushing 64K
from cache to memory takes more than a few milliseconds, so this
should be completely safe. Since it reduces the number of
WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from
those is practically eliminated. (Just in case the range flushed is so
small that it doesn't cross a 64K boundary, add a single
WATCHDOG_RESET() between the loops).

64K is chosen because that's also the default chunk size used by the
hashing algorithms, and when, say, a sha256 digest of a kernel image of
a few MB is being verified, that's almost guaranteed to be cache-cold,
so apart from the computations being done, the hashing is also bounded
by memory speed - so if 64K works for those cases, it should certainly
also work when memory access is the only thing being done.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
---
 arch/powerpc/lib/cache.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Stefan Roese June 4, 2020, 10:31 a.m. UTC | #1
On 04.06.20 11:30, Rasmus Villemoes wrote:
> Calling WATCHDOG_RESET for each and every cache line is overkill.
> 
> In our case, the kernel image is a little over 7MB, and the almost
> 500000 calls of WATCHDOG_RESET() adds about one second to the
> boottime.
> 
> I very highly doubt there's any real hardware where flushing 64K
> from cache to memory takes more than a few milliseconds, so this
> should be completely safe. Since it reduces the number of
> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from
> those is practically eliminated. (Just in case the range flushed is so
> small that it doesn't cross a 64K boundary, add a single
> WATCHDOG_RESET() between the loops).
> 
> 64K is chosen because that's also the default chunk size used by the
> hashing algorithms, and when, say, a sha256 digest of a kernel image of
> a few MB is being verified, that's almost guaranteed to be cache-cold,
> so apart from the computations being done, the hashing is also bounded
> by memory speed - so if 64K works for those cases, it should certainly
> also work when memory access is the only thing being done.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>

Reviewed-by: Stefan Roese <sr at denx.de>

Thanks,
Stefan

> ---
>   arch/powerpc/lib/cache.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c
> index 528361e972..df2310f4e2 100644
> --- a/arch/powerpc/lib/cache.c
> +++ b/arch/powerpc/lib/cache.c
> @@ -8,6 +8,7 @@
>   #include <cpu_func.h>
>   #include <asm/cache.h>
>   #include <watchdog.h>
> +#include <linux/sizes.h>
>   
>   void flush_cache(ulong start_addr, ulong size)
>   {
> @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size)
>   	for (addr = start; (addr <= end) && (addr >= start);
>   			addr += CONFIG_SYS_CACHELINE_SIZE) {
>   		asm volatile("dcbst 0,%0" : : "r" (addr) : "memory");
> -		WATCHDOG_RESET();
> +		if ((addr & (SZ_64K - 1)) == 0)
> +			WATCHDOG_RESET();
>   	}
>   	/* wait for all dcbst to complete on bus */
>   	asm volatile("sync" : : : "memory");
> +	WATCHDOG_RESET();
>   
>   	for (addr = start; (addr <= end) && (addr >= start);
>   			addr += CONFIG_SYS_CACHELINE_SIZE) {
>   		asm volatile("icbi 0,%0" : : "r" (addr) : "memory");
> -		WATCHDOG_RESET();
> +		if ((addr & (SZ_64K - 1)) == 0)
> +			WATCHDOG_RESET();
>   	}
>   	asm volatile("sync" : : : "memory");
>   	/* flush prefetch queue */
>
Rasmus Villemoes Sept. 18, 2020, 8:20 a.m. UTC | #2
On 04/06/2020 12.31, Stefan Roese wrote:
> On 04.06.20 11:30, Rasmus Villemoes wrote:

>> Calling WATCHDOG_RESET for each and every cache line is overkill.

>>

>> In our case, the kernel image is a little over 7MB, and the almost

>> 500000 calls of WATCHDOG_RESET() adds about one second to the

>> boottime.

>>

>> I very highly doubt there's any real hardware where flushing 64K

>> from cache to memory takes more than a few milliseconds, so this

>> should be completely safe. Since it reduces the number of

>> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from

>> those is practically eliminated. (Just in case the range flushed is so

>> small that it doesn't cross a 64K boundary, add a single

>> WATCHDOG_RESET() between the loops).

>>

>> 64K is chosen because that's also the default chunk size used by the

>> hashing algorithms, and when, say, a sha256 digest of a kernel image of

>> a few MB is being verified, that's almost guaranteed to be cache-cold,

>> so apart from the computations being done, the hashing is also bounded

>> by memory speed - so if 64K works for those cases, it should certainly

>> also work when memory access is the only thing being done.

>>

>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> 

> Reviewed-by: Stefan Roese <sr@denx.de>


Ping.

> Thanks,

> Stefan

> 

>> ---

>>   arch/powerpc/lib/cache.c | 8 ++++++--

>>   1 file changed, 6 insertions(+), 2 deletions(-)

>>

>> diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c

>> index 528361e972..df2310f4e2 100644

>> --- a/arch/powerpc/lib/cache.c

>> +++ b/arch/powerpc/lib/cache.c

>> @@ -8,6 +8,7 @@

>>   #include <cpu_func.h>

>>   #include <asm/cache.h>

>>   #include <watchdog.h>

>> +#include <linux/sizes.h>

>>     void flush_cache(ulong start_addr, ulong size)

>>   {

>> @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size)

>>       for (addr = start; (addr <= end) && (addr >= start);

>>               addr += CONFIG_SYS_CACHELINE_SIZE) {

>>           asm volatile("dcbst 0,%0" : : "r" (addr) : "memory");

>> -        WATCHDOG_RESET();

>> +        if ((addr & (SZ_64K - 1)) == 0)

>> +            WATCHDOG_RESET();

>>       }

>>       /* wait for all dcbst to complete on bus */

>>       asm volatile("sync" : : : "memory");

>> +    WATCHDOG_RESET();

>>         for (addr = start; (addr <= end) && (addr >= start);

>>               addr += CONFIG_SYS_CACHELINE_SIZE) {

>>           asm volatile("icbi 0,%0" : : "r" (addr) : "memory");

>> -        WATCHDOG_RESET();

>> +        if ((addr & (SZ_64K - 1)) == 0)

>> +            WATCHDOG_RESET();

>>       }

>>       asm volatile("sync" : : : "memory");

>>       /* flush prefetch queue */

>>



-- 
Rasmus Villemoes
Software Developer
Prevas A/S
Hedeager 3
DK-8200 Aarhus N
+45 51210274
rasmus.villemoes@prevas.dk
www.prevas.dk
Tom Rini Sept. 18, 2020, 2:50 p.m. UTC | #3
On Fri, Sep 18, 2020 at 10:20:46AM +0200, Rasmus Villemoes wrote:
> On 04/06/2020 12.31, Stefan Roese wrote:

> > On 04.06.20 11:30, Rasmus Villemoes wrote:

> >> Calling WATCHDOG_RESET for each and every cache line is overkill.

> >>

> >> In our case, the kernel image is a little over 7MB, and the almost

> >> 500000 calls of WATCHDOG_RESET() adds about one second to the

> >> boottime.

> >>

> >> I very highly doubt there's any real hardware where flushing 64K

> >> from cache to memory takes more than a few milliseconds, so this

> >> should be completely safe. Since it reduces the number of

> >> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from

> >> those is practically eliminated. (Just in case the range flushed is so

> >> small that it doesn't cross a 64K boundary, add a single

> >> WATCHDOG_RESET() between the loops).

> >>

> >> 64K is chosen because that's also the default chunk size used by the

> >> hashing algorithms, and when, say, a sha256 digest of a kernel image of

> >> a few MB is being verified, that's almost guaranteed to be cache-cold,

> >> so apart from the computations being done, the hashing is also bounded

> >> by memory speed - so if 64K works for those cases, it should certainly

> >> also work when memory access is the only thing being done.

> >>

> >> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> > 

> > Reviewed-by: Stefan Roese <sr@denx.de>

> 

> Ping.


This too should end up in -next "soon".  Thanks.

-- 
Tom
Wolfgang Denk Sept. 23, 2020, 12:48 p.m. UTC | #4
Dear Rasmus,

In message <afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk> you wrote:
> On 04/06/2020 12.31, Stefan Roese wrote:

> > On 04.06.20 11:30, Rasmus Villemoes wrote:

> >> Calling WATCHDOG_RESET for each and every cache line is overkill.

> >>

> >> In our case, the kernel image is a little over 7MB, and the almost

> >> 500000 calls of WATCHDOG_RESET() adds about one second to the

> >> boottime.

> >>

> >> I very highly doubt there's any real hardware where flushing 64K

> >> from cache to memory takes more than a few milliseconds, so this

> >> should be completely safe. Since it reduces the number of


Maybe it is, maybe it is not.  I remember boards where the watchdog
trigger had to happen in pretty tight intervals, something like min
20, max 60 milliseconds.

> >> +        if ((addr & (SZ_64K - 1)) == 0)


In any case, please write readable code.  The use of SZ_ here is not
a good idea.  Also, you might want to make the actual parameter
configurable, so boards with specific timing requirements can adjust
this as needed.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Perfection is reached, not when there is no longer anything  to  add,
but when there is no longer anything to take away.
                                           - Antoine de Saint-Exupery
Priyanka Jain (OSS) Sept. 23, 2020, 1:28 p.m. UTC | #5
>-----Original Message-----

>From: U-Boot <u-boot-bounces@lists.denx.de> On Behalf Of Rasmus Villemoes

>Sent: Friday, September 18, 2020 1:51 PM

>To: Stefan Roese <sr@denx.de>; u-boot@lists.denx.de

>Cc: Christophe Leroy <christophe.leroy@c-s.fr>; Wolfgang Denk

><wd@denx.de>; Simon Glass <sjg@chromium.org>; Tom Rini

><trini@konsulko.com>

>Subject: Re: [PATCH] powerpc: reduce number of WATCHDOG_RESET calls

>from flush_cache

>

>On 04/06/2020 12.31, Stefan Roese wrote:

>> On 04.06.20 11:30, Rasmus Villemoes wrote:

>>> Calling WATCHDOG_RESET for each and every cache line is overkill.

>>>

>>> In our case, the kernel image is a little over 7MB, and the almost

>>> 500000 calls of WATCHDOG_RESET() adds about one second to the

>>> boottime.

>>>

>>> I very highly doubt there's any real hardware where flushing 64K from

>>> cache to memory takes more than a few milliseconds, so this should be

>>> completely safe. Since it reduces the number of

>>> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from

>>> those is practically eliminated. (Just in case the range flushed is

>>> so small that it doesn't cross a 64K boundary, add a single

>>> WATCHDOG_RESET() between the loops).

>>>

>>> 64K is chosen because that's also the default chunk size used by the

>>> hashing algorithms, and when, say, a sha256 digest of a kernel image

>>> of a few MB is being verified, that's almost guaranteed to be

>>> cache-cold, so apart from the computations being done, the hashing is

>>> also bounded by memory speed - so if 64K works for those cases, it

>>> should certainly also work when memory access is the only thing being

>done.

>>>

>>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

>>

>> Reviewed-by: Stefan Roese <sr@denx.de>

>

<snip>
Applied to u-boot-mpc85xx. Awaiting upstream

Thanks
Priyanka
Wolfgang Denk Sept. 24, 2020, 10:57 a.m. UTC | #6
Dear Priyanka,

In message <VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com> you wrote:
>

> >>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> >>

> >> Reviewed-by: Stefan Roese <sr@denx.de>

> >

> <snip>

> Applied to u-boot-mpc85xx. Awaiting upstream


I object.  This must at least be configurable, so that boards with
tight timing requirements can still work.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The human mind  ordinarily  operates  at  only  ten  percent  of  its
capacity. The rest is overhead for the operating system.
Tom Rini Sept. 24, 2020, 12:32 p.m. UTC | #7
On Thu, Sep 24, 2020 at 12:57:36PM +0200, Wolfgang Denk wrote:
> Dear Priyanka,

> 

> In message <VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com> you wrote:

> >

> > >>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> > >>

> > >> Reviewed-by: Stefan Roese <sr@denx.de>

> > >

> > <snip>

> > Applied to u-boot-mpc85xx. Awaiting upstream

> 

> I object.  This must at least be configurable, so that boards with

> tight timing requirements can still work.


I'm removing just this patch from the -next pull request.

-- 
Tom
Tom Rini Sept. 24, 2020, 12:32 p.m. UTC | #8
On Wed, Sep 23, 2020 at 02:48:18PM +0200, Wolfgang Denk wrote:
> Dear Rasmus,

> 

> In message <afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk> you wrote:

> > On 04/06/2020 12.31, Stefan Roese wrote:

> > > On 04.06.20 11:30, Rasmus Villemoes wrote:

> > >> Calling WATCHDOG_RESET for each and every cache line is overkill.

> > >>

> > >> In our case, the kernel image is a little over 7MB, and the almost

> > >> 500000 calls of WATCHDOG_RESET() adds about one second to the

> > >> boottime.

> > >>

> > >> I very highly doubt there's any real hardware where flushing 64K

> > >> from cache to memory takes more than a few milliseconds, so this

> > >> should be completely safe. Since it reduces the number of

> 

> Maybe it is, maybe it is not.  I remember boards where the watchdog

> trigger had to happen in pretty tight intervals, something like min

> 20, max 60 milliseconds.

> 

> > >> +        if ((addr & (SZ_64K - 1)) == 0)

> 

> In any case, please write readable code.  The use of SZ_ here is not

> a good idea.  Also, you might want to make the actual parameter

> configurable, so boards with specific timing requirements can adjust

> this as needed.


We should add a comment somewhere here to match the commit message, but
since we're flushing on 64K chunks, SZ_64K makes sense.  If we make it
build-time configurable then it's not a concern.  I assume we don't want
to make this run time configurable.

-- 
Tom
diff mbox series

Patch

diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c
index 528361e972..df2310f4e2 100644
--- a/arch/powerpc/lib/cache.c
+++ b/arch/powerpc/lib/cache.c
@@ -8,6 +8,7 @@ 
 #include <cpu_func.h>
 #include <asm/cache.h>
 #include <watchdog.h>
+#include <linux/sizes.h>
 
 void flush_cache(ulong start_addr, ulong size)
 {
@@ -21,15 +22,18 @@  void flush_cache(ulong start_addr, ulong size)
 	for (addr = start; (addr <= end) && (addr >= start);
 			addr += CONFIG_SYS_CACHELINE_SIZE) {
 		asm volatile("dcbst 0,%0" : : "r" (addr) : "memory");
-		WATCHDOG_RESET();
+		if ((addr & (SZ_64K - 1)) == 0)
+			WATCHDOG_RESET();
 	}
 	/* wait for all dcbst to complete on bus */
 	asm volatile("sync" : : : "memory");
+	WATCHDOG_RESET();
 
 	for (addr = start; (addr <= end) && (addr >= start);
 			addr += CONFIG_SYS_CACHELINE_SIZE) {
 		asm volatile("icbi 0,%0" : : "r" (addr) : "memory");
-		WATCHDOG_RESET();
+		if ((addr & (SZ_64K - 1)) == 0)
+			WATCHDOG_RESET();
 	}
 	asm volatile("sync" : : : "memory");
 	/* flush prefetch queue */