Message ID | 20200604093039.26781-1-rasmus.villemoes@prevas.dk |
---|---|
State | New |
Headers | show |
Series | powerpc: reduce number of WATCHDOG_RESET calls from flush_cache | expand |
On 04.06.20 11:30, Rasmus Villemoes wrote: > Calling WATCHDOG_RESET for each and every cache line is overkill. > > In our case, the kernel image is a little over 7MB, and the almost > 500000 calls of WATCHDOG_RESET() adds about one second to the > boottime. > > I very highly doubt there's any real hardware where flushing 64K > from cache to memory takes more than a few milliseconds, so this > should be completely safe. Since it reduces the number of > WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from > those is practically eliminated. (Just in case the range flushed is so > small that it doesn't cross a 64K boundary, add a single > WATCHDOG_RESET() between the loops). > > 64K is chosen because that's also the default chunk size used by the > hashing algorithms, and when, say, a sha256 digest of a kernel image of > a few MB is being verified, that's almost guaranteed to be cache-cold, > so apart from the computations being done, the hashing is also bounded > by memory speed - so if 64K works for those cases, it should certainly > also work when memory access is the only thing being done. > > Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk> Reviewed-by: Stefan Roese <sr at denx.de> Thanks, Stefan > --- > arch/powerpc/lib/cache.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c > index 528361e972..df2310f4e2 100644 > --- a/arch/powerpc/lib/cache.c > +++ b/arch/powerpc/lib/cache.c > @@ -8,6 +8,7 @@ > #include <cpu_func.h> > #include <asm/cache.h> > #include <watchdog.h> > +#include <linux/sizes.h> > > void flush_cache(ulong start_addr, ulong size) > { > @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) > for (addr = start; (addr <= end) && (addr >= start); > addr += CONFIG_SYS_CACHELINE_SIZE) { > asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); > - WATCHDOG_RESET(); > + if ((addr & (SZ_64K - 1)) == 0) > + WATCHDOG_RESET(); > } > /* wait for all dcbst to complete on bus */ > asm volatile("sync" : : : "memory"); > + WATCHDOG_RESET(); > > for (addr = start; (addr <= end) && (addr >= start); > addr += CONFIG_SYS_CACHELINE_SIZE) { > asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); > - WATCHDOG_RESET(); > + if ((addr & (SZ_64K - 1)) == 0) > + WATCHDOG_RESET(); > } > asm volatile("sync" : : : "memory"); > /* flush prefetch queue */ >
On 04/06/2020 12.31, Stefan Roese wrote: > On 04.06.20 11:30, Rasmus Villemoes wrote: >> Calling WATCHDOG_RESET for each and every cache line is overkill. >> >> In our case, the kernel image is a little over 7MB, and the almost >> 500000 calls of WATCHDOG_RESET() adds about one second to the >> boottime. >> >> I very highly doubt there's any real hardware where flushing 64K >> from cache to memory takes more than a few milliseconds, so this >> should be completely safe. Since it reduces the number of >> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from >> those is practically eliminated. (Just in case the range flushed is so >> small that it doesn't cross a 64K boundary, add a single >> WATCHDOG_RESET() between the loops). >> >> 64K is chosen because that's also the default chunk size used by the >> hashing algorithms, and when, say, a sha256 digest of a kernel image of >> a few MB is being verified, that's almost guaranteed to be cache-cold, >> so apart from the computations being done, the hashing is also bounded >> by memory speed - so if 64K works for those cases, it should certainly >> also work when memory access is the only thing being done. >> >> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > > Reviewed-by: Stefan Roese <sr@denx.de> Ping. > Thanks, > Stefan > >> --- >> arch/powerpc/lib/cache.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c >> index 528361e972..df2310f4e2 100644 >> --- a/arch/powerpc/lib/cache.c >> +++ b/arch/powerpc/lib/cache.c >> @@ -8,6 +8,7 @@ >> #include <cpu_func.h> >> #include <asm/cache.h> >> #include <watchdog.h> >> +#include <linux/sizes.h> >> void flush_cache(ulong start_addr, ulong size) >> { >> @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) >> for (addr = start; (addr <= end) && (addr >= start); >> addr += CONFIG_SYS_CACHELINE_SIZE) { >> asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); >> - WATCHDOG_RESET(); >> + if ((addr & (SZ_64K - 1)) == 0) >> + WATCHDOG_RESET(); >> } >> /* wait for all dcbst to complete on bus */ >> asm volatile("sync" : : : "memory"); >> + WATCHDOG_RESET(); >> for (addr = start; (addr <= end) && (addr >= start); >> addr += CONFIG_SYS_CACHELINE_SIZE) { >> asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); >> - WATCHDOG_RESET(); >> + if ((addr & (SZ_64K - 1)) == 0) >> + WATCHDOG_RESET(); >> } >> asm volatile("sync" : : : "memory"); >> /* flush prefetch queue */ >> -- Rasmus Villemoes Software Developer Prevas A/S Hedeager 3 DK-8200 Aarhus N +45 51210274 rasmus.villemoes@prevas.dk www.prevas.dk
On Fri, Sep 18, 2020 at 10:20:46AM +0200, Rasmus Villemoes wrote: > On 04/06/2020 12.31, Stefan Roese wrote: > > On 04.06.20 11:30, Rasmus Villemoes wrote: > >> Calling WATCHDOG_RESET for each and every cache line is overkill. > >> > >> In our case, the kernel image is a little over 7MB, and the almost > >> 500000 calls of WATCHDOG_RESET() adds about one second to the > >> boottime. > >> > >> I very highly doubt there's any real hardware where flushing 64K > >> from cache to memory takes more than a few milliseconds, so this > >> should be completely safe. Since it reduces the number of > >> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from > >> those is practically eliminated. (Just in case the range flushed is so > >> small that it doesn't cross a 64K boundary, add a single > >> WATCHDOG_RESET() between the loops). > >> > >> 64K is chosen because that's also the default chunk size used by the > >> hashing algorithms, and when, say, a sha256 digest of a kernel image of > >> a few MB is being verified, that's almost guaranteed to be cache-cold, > >> so apart from the computations being done, the hashing is also bounded > >> by memory speed - so if 64K works for those cases, it should certainly > >> also work when memory access is the only thing being done. > >> > >> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > > > > Reviewed-by: Stefan Roese <sr@denx.de> > > Ping. This too should end up in -next "soon". Thanks. -- Tom
Dear Rasmus, In message <afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk> you wrote: > On 04/06/2020 12.31, Stefan Roese wrote: > > On 04.06.20 11:30, Rasmus Villemoes wrote: > >> Calling WATCHDOG_RESET for each and every cache line is overkill. > >> > >> In our case, the kernel image is a little over 7MB, and the almost > >> 500000 calls of WATCHDOG_RESET() adds about one second to the > >> boottime. > >> > >> I very highly doubt there's any real hardware where flushing 64K > >> from cache to memory takes more than a few milliseconds, so this > >> should be completely safe. Since it reduces the number of Maybe it is, maybe it is not. I remember boards where the watchdog trigger had to happen in pretty tight intervals, something like min 20, max 60 milliseconds. > >> + if ((addr & (SZ_64K - 1)) == 0) In any case, please write readable code. The use of SZ_ here is not a good idea. Also, you might want to make the actual parameter configurable, so boards with specific timing requirements can adjust this as needed. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away. - Antoine de Saint-Exupery
>-----Original Message----- >From: U-Boot <u-boot-bounces@lists.denx.de> On Behalf Of Rasmus Villemoes >Sent: Friday, September 18, 2020 1:51 PM >To: Stefan Roese <sr@denx.de>; u-boot@lists.denx.de >Cc: Christophe Leroy <christophe.leroy@c-s.fr>; Wolfgang Denk ><wd@denx.de>; Simon Glass <sjg@chromium.org>; Tom Rini ><trini@konsulko.com> >Subject: Re: [PATCH] powerpc: reduce number of WATCHDOG_RESET calls >from flush_cache > >On 04/06/2020 12.31, Stefan Roese wrote: >> On 04.06.20 11:30, Rasmus Villemoes wrote: >>> Calling WATCHDOG_RESET for each and every cache line is overkill. >>> >>> In our case, the kernel image is a little over 7MB, and the almost >>> 500000 calls of WATCHDOG_RESET() adds about one second to the >>> boottime. >>> >>> I very highly doubt there's any real hardware where flushing 64K from >>> cache to memory takes more than a few milliseconds, so this should be >>> completely safe. Since it reduces the number of >>> WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from >>> those is practically eliminated. (Just in case the range flushed is >>> so small that it doesn't cross a 64K boundary, add a single >>> WATCHDOG_RESET() between the loops). >>> >>> 64K is chosen because that's also the default chunk size used by the >>> hashing algorithms, and when, say, a sha256 digest of a kernel image >>> of a few MB is being verified, that's almost guaranteed to be >>> cache-cold, so apart from the computations being done, the hashing is >>> also bounded by memory speed - so if 64K works for those cases, it >>> should certainly also work when memory access is the only thing being >done. >>> >>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> >> >> Reviewed-by: Stefan Roese <sr@denx.de> > <snip> Applied to u-boot-mpc85xx. Awaiting upstream Thanks Priyanka
Dear Priyanka, In message <VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com> you wrote: > > >>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > >> > >> Reviewed-by: Stefan Roese <sr@denx.de> > > > <snip> > Applied to u-boot-mpc85xx. Awaiting upstream I object. This must at least be configurable, so that boards with tight timing requirements can still work. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The human mind ordinarily operates at only ten percent of its capacity. The rest is overhead for the operating system.
On Thu, Sep 24, 2020 at 12:57:36PM +0200, Wolfgang Denk wrote: > Dear Priyanka, > > In message <VE1PR04MB649416C255B3379B43E7F7FDE6380@VE1PR04MB6494.eurprd04.prod.outlook.com> you wrote: > > > > >>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > > >> > > >> Reviewed-by: Stefan Roese <sr@denx.de> > > > > > <snip> > > Applied to u-boot-mpc85xx. Awaiting upstream > > I object. This must at least be configurable, so that boards with > tight timing requirements can still work. I'm removing just this patch from the -next pull request. -- Tom
On Wed, Sep 23, 2020 at 02:48:18PM +0200, Wolfgang Denk wrote: > Dear Rasmus, > > In message <afea5640-9d95-bf92-316a-0d674a5129ce@prevas.dk> you wrote: > > On 04/06/2020 12.31, Stefan Roese wrote: > > > On 04.06.20 11:30, Rasmus Villemoes wrote: > > >> Calling WATCHDOG_RESET for each and every cache line is overkill. > > >> > > >> In our case, the kernel image is a little over 7MB, and the almost > > >> 500000 calls of WATCHDOG_RESET() adds about one second to the > > >> boottime. > > >> > > >> I very highly doubt there's any real hardware where flushing 64K > > >> from cache to memory takes more than a few milliseconds, so this > > >> should be completely safe. Since it reduces the number of > > Maybe it is, maybe it is not. I remember boards where the watchdog > trigger had to happen in pretty tight intervals, something like min > 20, max 60 milliseconds. > > > >> + if ((addr & (SZ_64K - 1)) == 0) > > In any case, please write readable code. The use of SZ_ here is not > a good idea. Also, you might want to make the actual parameter > configurable, so boards with specific timing requirements can adjust > this as needed. We should add a comment somewhere here to match the commit message, but since we're flushing on 64K chunks, SZ_64K makes sense. If we make it build-time configurable then it's not a concern. I assume we don't want to make this run time configurable. -- Tom
diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 528361e972..df2310f4e2 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -8,6 +8,7 @@ #include <cpu_func.h> #include <asm/cache.h> #include <watchdog.h> +#include <linux/sizes.h> void flush_cache(ulong start_addr, ulong size) { @@ -21,15 +22,18 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory"); + WATCHDOG_RESET(); for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + if ((addr & (SZ_64K - 1)) == 0) + WATCHDOG_RESET(); } asm volatile("sync" : : : "memory"); /* flush prefetch queue */
Calling WATCHDOG_RESET for each and every cache line is overkill. In our case, the kernel image is a little over 7MB, and the almost 500000 calls of WATCHDOG_RESET() adds about one second to the boottime. I very highly doubt there's any real hardware where flushing 64K from cache to memory takes more than a few milliseconds, so this should be completely safe. Since it reduces the number of WATCHDOG_RESET() calls by roughly a factor of 1000, the overhead from those is practically eliminated. (Just in case the range flushed is so small that it doesn't cross a 64K boundary, add a single WATCHDOG_RESET() between the loops). 64K is chosen because that's also the default chunk size used by the hashing algorithms, and when, say, a sha256 digest of a kernel image of a few MB is being verified, that's almost guaranteed to be cache-cold, so apart from the computations being done, the hashing is also bounded by memory speed - so if 64K works for those cases, it should certainly also work when memory access is the only thing being done. Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk> --- arch/powerpc/lib/cache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)