Message ID | alpine.LFD.2.03.1307221353360.15022@syhkavp.arg |
---|---|
State | New |
Headers | show |
On Mon, Jul 22, 2013 at 01:58:12PM -0400, Nicolas Pitre wrote: > On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote: > > > On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote: > > > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote: > > > > On Thu, 18 Jul 2013, Dave Martin wrote: > > > > > > > > > I had other names in mind, like "coherency_exit" or "cache_exit". > > > > > Those are not very intelligible, but that might at least make people > > > > > pause and think before blindly using it. > > > > > > > > Good point. It should still embody the architecture name for which it > > > > is valid though. > > > > > > Sure, I was assuming something would be pasted on the start of the name. > > > > v7 :-) with a comment describing the assumptions (in particular related > > as Dave mentioned to the SMP bit behaviour) ? > > OK... What about this then: > > ----- >8 > From: Nicolas Pitre <nicolas.pitre@linaro.org> > Date: Thu, 18 Jul 2013 13:12:48 -0400 > Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling > code > > This code is becoming duplicated in many places. So let's consolidate > it into a handy macro that is known to be right and available for reuse. > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h > index 17d0ae8672..8f4e2297e2 100644 > --- a/arch/arm/include/asm/cacheflush.h > +++ b/arch/arm/include/asm/cacheflush.h > @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size) > #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr)) > #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr)) > > +/* > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky. > + * To do so we must: > + * > + * - Clear the SCTLR.C bit to prevent further cache allocations > + * - Flush the desired level of cache > + * - Clear the ACTLR "SMP" bit to disable local coherency > + * > + * ... and so without any intervening memory access in between those steps, > + * not even to the stack. > + * > + * WARNING -- After this has been called: > + * > + * - No ldr/str exclusive must be used. Maybe write ldrex/strex (I misread that the first time around as "no ldr/str ... must be used", which seemed a bit drastic) > + * - The CPU is obviously no longer coherent with the other CPUs. > + * > + * Further considerations: > + * > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > + * documented in the ARMv7 TRM. Vendor implementations that deviate from Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is not part of ARMv7 at all. Also, it seems that A9 isn't precisely the same: two ACTLR bits need to be twiddled. R-class CPUs are generally not the same either. This is why I preferred to treat the whole sequence as specific to a particular CPU implementation. The similarity between A7 and A15 might be viewed as a happy coincidence (it also makes life easier in big.LITTLE land). (Also, you mix the names "ACTLR" and "AUXCR", but people will generally be able to guess those are the same thing) > + * that will need their own procedure. > + * - This is unlikely to work if Linux is running non-secure. > + */ > +#define v7_exit_coherency_flush(level) \ ... hence: #define cortex_a15_exit_coherency(level) ... #define cortex_a7_exit_coherency(level) cortex_a15_exit_coherency(level) What do you think? Cheers ---Dave > + asm volatile( \ > + "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" \ > + "bic r0, r0, #"__stringify(CR_C)" \n\t" \ > + "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" \ > + "isb \n\t" \ > + "bl v7_flush_dcache_"__stringify(level)" \n\t" \ > + "clrex \n\t" \ > + "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" \ > + "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" \ > + "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" \ > + "isb \n\t" \ > + "dsb " \ > + /* The clobber list is dictated by the call to v7_flush_dcache_* */ \ > + : : : "r0","r1","r2","r3","r4","r5","r6","r7", \ > + "r9","r10","r11","lr","memory" ) > + > #endif > diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c > index 85fffa702f..14d4996887 100644 > --- a/arch/arm/mach-vexpress/dcscb.c > +++ b/arch/arm/mach-vexpress/dcscb.c > @@ -133,32 +133,8 @@ static void dcscb_power_down(void) > if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) { > arch_spin_unlock(&dcscb_lock); > > - /* > - * Flush all cache levels for this cluster. > - * > - * To do so we do: > - * - Clear the SCTLR.C bit to prevent further cache allocations > - * - Flush the whole cache > - * - Clear the ACTLR "SMP" bit to disable local coherency > - * > - * Let's do it in the safest possible way i.e. with > - * no memory access within the following sequence > - * including to the stack. > - */ > - asm volatile( > - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" > - "bic r0, r0, #"__stringify(CR_C)" \n\t" > - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" > - "isb \n\t" > - "bl v7_flush_dcache_all \n\t" > - "clrex \n\t" > - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" > - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" > - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" > - "isb \n\t" > - "dsb " > - : : : "r0","r1","r2","r3","r4","r5","r6","r7", > - "r9","r10","r11","lr","memory"); > + /* Flush all cache levels for this cluster. */ > + v7_exit_coherency_flush(all); > > /* > * This is a harmless no-op. On platforms with a real > @@ -177,24 +153,8 @@ static void dcscb_power_down(void) > } else { > arch_spin_unlock(&dcscb_lock); > > - /* > - * Flush the local CPU cache. > - * Let's do it in the safest possible way as above. > - */ > - asm volatile( > - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" > - "bic r0, r0, #"__stringify(CR_C)" \n\t" > - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" > - "isb \n\t" > - "bl v7_flush_dcache_louis \n\t" > - "clrex \n\t" > - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" > - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" > - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" > - "isb \n\t" > - "dsb " > - : : : "r0","r1","r2","r3","r4","r5","r6","r7", > - "r9","r10","r11","lr","memory"); > + /* Disable and flush the local CPU cache. */ > + v7_exit_coherency_flush(louis); > } > > __mcpm_cpu_down(cpu, cluster); > diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c > index dfb55d45b6..5940f1e317 100644 > --- a/arch/arm/mach-vexpress/tc2_pm.c > +++ b/arch/arm/mach-vexpress/tc2_pm.c > @@ -134,26 +134,7 @@ static void tc2_pm_down(u64 residency) > : : "r" (0x400) ); > } > > - /* > - * We need to disable and flush the whole (L1 and L2) cache. > - * Let's do it in the safest possible way i.e. with > - * no memory access within the following sequence > - * including the stack. > - */ > - asm volatile( > - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" > - "bic r0, r0, #"__stringify(CR_C)" \n\t" > - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" > - "isb \n\t" > - "bl v7_flush_dcache_all \n\t" > - "clrex \n\t" > - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" > - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" > - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" > - "isb \n\t" > - "dsb " > - : : : "r0","r1","r2","r3","r4","r5","r6","r7", > - "r9","r10","r11","lr","memory"); > + v7_exit_coherency_flush(all); > > cci_disable_port_by_cpu(mpidr); > > @@ -169,24 +150,7 @@ static void tc2_pm_down(u64 residency) > > arch_spin_unlock(&tc2_pm_lock); > > - /* > - * We need to disable and flush only the L1 cache. > - * Let's do it in the safest possible way as above. > - */ > - asm volatile( > - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" > - "bic r0, r0, #"__stringify(CR_C)" \n\t" > - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" > - "isb \n\t" > - "bl v7_flush_dcache_louis \n\t" > - "clrex \n\t" > - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" > - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" > - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" > - "isb \n\t" > - "dsb " > - : : : "r0","r1","r2","r3","r4","r5","r6","r7", > - "r9","r10","r11","lr","memory"); > + v7_exit_coherency_flush(louis); > } > > __mcpm_cpu_down(cpu, cluster); > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Tue, 23 Jul 2013, Dave Martin wrote: > On Mon, Jul 22, 2013 at 01:58:12PM -0400, Nicolas Pitre wrote: > > On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote: > > > > > On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote: > > > > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote: > > > > > On Thu, 18 Jul 2013, Dave Martin wrote: > > > > > > > > > > > I had other names in mind, like "coherency_exit" or "cache_exit". > > > > > > Those are not very intelligible, but that might at least make people > > > > > > pause and think before blindly using it. > > > > > > > > > > Good point. It should still embody the architecture name for which it > > > > > is valid though. > > > > > > > > Sure, I was assuming something would be pasted on the start of the name. > > > > > > v7 :-) with a comment describing the assumptions (in particular related > > > as Dave mentioned to the SMP bit behaviour) ? > > > > OK... What about this then: > > > > ----- >8 > > From: Nicolas Pitre <nicolas.pitre@linaro.org> > > Date: Thu, 18 Jul 2013 13:12:48 -0400 > > Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling > > code > > > > This code is becoming duplicated in many places. So let's consolidate > > it into a handy macro that is known to be right and available for reuse. > > > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > > > > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h > > index 17d0ae8672..8f4e2297e2 100644 > > --- a/arch/arm/include/asm/cacheflush.h > > +++ b/arch/arm/include/asm/cacheflush.h > > @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size) > > #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr)) > > #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr)) > > > > +/* > > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky. > > + * To do so we must: > > + * > > + * - Clear the SCTLR.C bit to prevent further cache allocations > > + * - Flush the desired level of cache > > + * - Clear the ACTLR "SMP" bit to disable local coherency > > + * > > + * ... and so without any intervening memory access in between those steps, > > + * not even to the stack. > > + * > > + * WARNING -- After this has been called: > > + * > > + * - No ldr/str exclusive must be used. > > Maybe write ldrex/strex (I misread that the first time around as "no > ldr/str ... must be used", which seemed a bit drastic) OK. > > + * - The CPU is obviously no longer coherent with the other CPUs. > > + * > > + * Further considerations: > > + * > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > > + * documented in the ARMv7 TRM. Vendor implementations that deviate from > > Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is > not part of ARMv7 at all. Well, I just copied Lorenzo's words here, trusting he knew more about it than I do. > Also, it seems that A9 isn't precisely the > same: two ACTLR bits need to be twiddled. R-class CPUs are generally > not the same either. > > This is why I preferred to treat the whole sequence as specific to a > particular CPU implementation. The similarity between A7 and A15 > might be viewed as a happy coincidence (it also makes life easier in > big.LITTLE land). Fair enough. > (Also, you mix the names "ACTLR" and "AUXCR", but people will generally > be able to guess those are the same thing) We apparently use both in the kernel already. Which one is the canonical one? > > + * that will need their own procedure. > > + * - This is unlikely to work if Linux is running non-secure. > > + */ > > +#define v7_exit_coherency_flush(level) \ > > ... hence: > > #define cortex_a15_exit_coherency(level) ... > > #define cortex_a7_exit_coherency(level) cortex_a15_exit_coherency(level) > > What do you think? To be pedantic again, we don't really exit coherency at some cache level. Hence my usage of "flush" indicating that the level argument applies only to that and not to the coherency status. Nicolas
On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote: [...] > > > + * - The CPU is obviously no longer coherent with the other CPUs. > > > + * > > > + * Further considerations: > > > + * > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > > > + * documented in the ARMv7 TRM. Vendor implementations that deviate from > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is > > not part of ARMv7 at all. > > Well, I just copied Lorenzo's words here, trusting he knew more about it > than I do. > > > Also, it seems that A9 isn't precisely the > > same: two ACTLR bits need to be twiddled. R-class CPUs are generally > > not the same either. If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to clear it, clearing the SMP bit is enough. See, Dave has a point, there is no explicit "unified v7 TRM disable clean and exit coherency procedure" even though the designers end goal is to have one and that's the one you wrote. The code you posted is perfectly ok on all v7 implementations in the kernel I am aware of, I stand to be corrected but to the best of my knowledge that's the case. > > This is why I preferred to treat the whole sequence as specific to a > > particular CPU implementation. The similarity between A7 and A15 > > might be viewed as a happy coincidence (it also makes life easier in > > big.LITTLE land). > > Fair enough. I disagree on the happy coincidence but the point is taken. I am not sure about what we should do, but I reiterate my point, the sequence as it stands is OK on all NS v7 implementations I am aware of. We can add macros to differentiate processors when we need them, but again that's just my opinion, as correct as it can be. Lorenzo
On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote: > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote: > > [...] > > > > > + * - The CPU is obviously no longer coherent with the other CPUs. > > > > + * > > > > + * Further considerations: > > > > + * > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > > > > + * documented in the ARMv7 TRM. Vendor implementations that deviate from > > > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is > > > not part of ARMv7 at all. > > > > Well, I just copied Lorenzo's words here, trusting he knew more about it > > than I do. > > > > > Also, it seems that A9 isn't precisely the > > > same: two ACTLR bits need to be twiddled. R-class CPUs are generally > > > not the same either. > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to > clear it, clearing the SMP bit is enough. > > See, Dave has a point, there is no explicit "unified v7 TRM disable > clean and exit coherency procedure" even though the designers end goal is to > have one and that's the one you wrote. The code you posted is perfectly ok on > all v7 implementations in the kernel I am aware of, I stand to be corrected > but to the best of my knowledge that's the case. What I'm really concerned about here is not Cortex-*, but the vendors who have their own CPU implementations. I don't think I've ever seen a TRM for one of those. > > > > This is why I preferred to treat the whole sequence as specific to a > > > particular CPU implementation. The similarity between A7 and A15 > > > might be viewed as a happy coincidence (it also makes life easier in > > > big.LITTLE land). > > > > Fair enough. > > I disagree on the happy coincidence but the point is taken. I am not I confess to exaggeration! There was obviously an effort to align A15 and A7, but we know there are differences at the system level, and there is always the possibility of b.L pairings which differ on things like the SMP bit, or b.L pairings of CPUs not originally designed to pair but which are "similar enough" to be pairable. > sure about what we should do, but I reiterate my point, the sequence as > it stands is OK on all NS v7 implementations I am aware of. We can add > macros to differentiate processors when we need them, but again that's > just my opinion, as correct as it can be. I still vote for naming them a15_* a7_* or similar. For big.LITTLE pairings, I think we should assume that the big CPU's macro is usable on the LITTLE CPU too. Assuming sane CPU designs, that will be true, and we know it works for A15+A7. So we would use the a15 macro on TC2, but the a7 macro on an a7-only platform. The other way to abstract this stuff would be to add a new per-CPU proc_fn, but that may be overkill, particularly since we know the whole mcpm backend may tend to be platform-specific, not just the CPU coherency exit part. Cheers ---Dave
On Thu, 25 Jul 2013, Dave Martin wrote: > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote: > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote: > > > > [...] > > > > > > > + * - The CPU is obviously no longer coherent with the other CPUs. > > > > > + * > > > > > + * Further considerations: > > > > > + * > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > > > > > + * documented in the ARMv7 TRM. Vendor implementations that deviate from > > > > > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is > > > > not part of ARMv7 at all. > > > > > > Well, I just copied Lorenzo's words here, trusting he knew more about it > > > than I do. > > > > > > > Also, it seems that A9 isn't precisely the > > > > same: two ACTLR bits need to be twiddled. R-class CPUs are generally > > > > not the same either. > > > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to > > clear it, clearing the SMP bit is enough. > > > > See, Dave has a point, there is no explicit "unified v7 TRM disable > > clean and exit coherency procedure" even though the designers end goal is to > > have one and that's the one you wrote. The code you posted is perfectly ok on > > all v7 implementations in the kernel I am aware of, I stand to be corrected > > but to the best of my knowledge that's the case. > > What I'm really concerned about here is not Cortex-*, but the vendors > who have their own CPU implementations. I don't think I've ever seen > a TRM for one of those. Could we wait until they materialize before worrying about possible incompatibility issues? After all this is not a user space ABI that cannot be changed afterwards. Remember this is Linux internal code we have the ability to change when needed, hence we should resist the temptation to over-engineer. > I confess to exaggeration! There was obviously an effort to align A15 > and A7, but we know there are differences at the system level, and there > is always the possibility of b.L pairings which differ on things like > the SMP bit, or b.L pairings of CPUs not originally designed to pair but > which are "similar enough" to be pairable. > > > sure about what we should do, but I reiterate my point, the sequence as > > it stands is OK on all NS v7 implementations I am aware of. We can add > > macros to differentiate processors when we need them, but again that's > > just my opinion, as correct as it can be. > > I still vote for naming them a15_* a7_* or similar. > > For big.LITTLE pairings, I think we should assume that the big CPU's > macro is usable on the LITTLE CPU too. Assuming sane CPU designs, that > will be true, and we know it works for A15+A7. The MCPM backend don't have to assume any kind of pairing. The TC2 backend already has to test whether or not it is running on an A15 in order to turn off L2 prefetching which is an A15-only feature. This is done outside of this macro exactly to keep the macro generic to all known (so far) v7 implementations. Nicolas
On Fri, Jul 26, 2013 at 10:58:27AM -0400, Nicolas Pitre wrote: > On Thu, 25 Jul 2013, Dave Martin wrote: > > > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote: > > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote: > > > > > > [...] > > > > > > > > > + * - The CPU is obviously no longer coherent with the other CPUs. > > > > > > + * > > > > > > + * Further considerations: > > > > > > + * > > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > > > > > > + * documented in the ARMv7 TRM. Vendor implementations that deviate from > > > > > > > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is > > > > > not part of ARMv7 at all. > > > > > > > > Well, I just copied Lorenzo's words here, trusting he knew more about it > > > > than I do. > > > > > > > > > Also, it seems that A9 isn't precisely the > > > > > same: two ACTLR bits need to be twiddled. R-class CPUs are generally > > > > > not the same either. > > > > > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to > > > clear it, clearing the SMP bit is enough. > > > > > > See, Dave has a point, there is no explicit "unified v7 TRM disable > > > clean and exit coherency procedure" even though the designers end goal is to > > > have one and that's the one you wrote. The code you posted is perfectly ok on > > > all v7 implementations in the kernel I am aware of, I stand to be corrected > > > but to the best of my knowledge that's the case. > > > > What I'm really concerned about here is not Cortex-*, but the vendors > > who have their own CPU implementations. I don't think I've ever seen > > a TRM for one of those. > > Could we wait until they materialize before worrying about possible > incompatibility issues? After all this is not a user space ABI that > cannot be changed afterwards. Remember this is Linux internal code we > have the ability to change when needed, hence we should resist the > temptation to over-engineer. If I could think of a better name, I would suggest it ... but unless we call it "cortexa" or something, I don't know what it should be. I'm not sure how much I like that either. Strictly speaking, that's still not correct. So I guess this comes down to being clear about which CPUs we expect it to work for. Right now, we know it works for A7 and A15. It sounds like we believe it might work for A9. Lorenzo might know about A5, A7 etc. But I'm not sure this exact sequence has been tested on any of those CPUs(?) We shouldn't expect it to work for any non ARM Ltd CPU, but on platforms where firmware does the power control it's not Linux's responsibility anyhow. (Probably the case for most of those vendors). > > I confess to exaggeration! There was obviously an effort to align A15 > > and A7, but we know there are differences at the system level, and there > > is always the possibility of b.L pairings which differ on things like > > the SMP bit, or b.L pairings of CPUs not originally designed to pair but > > which are "similar enough" to be pairable. > > > > > sure about what we should do, but I reiterate my point, the sequence as > > > it stands is OK on all NS v7 implementations I am aware of. We can add > > > macros to differentiate processors when we need them, but again that's > > > just my opinion, as correct as it can be. > > > > I still vote for naming them a15_* a7_* or similar. > > > > For big.LITTLE pairings, I think we should assume that the big CPU's > > macro is usable on the LITTLE CPU too. Assuming sane CPU designs, that > > will be true, and we know it works for A15+A7. > > The MCPM backend don't have to assume any kind of pairing. The TC2 > backend already has to test whether or not it is running on an A15 in > order to turn off L2 prefetching which is an A15-only feature. This is > done outside of this macro exactly to keep the macro generic to all > known (so far) v7 implementations. That's true. Cheers ---Dave
On Fri, Jul 26, 2013 at 05:00:15PM +0100, Dave Martin wrote: > On Fri, Jul 26, 2013 at 10:58:27AM -0400, Nicolas Pitre wrote: > > On Thu, 25 Jul 2013, Dave Martin wrote: > > > > > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote: > > > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote: > > > > > > > > [...] > > > > > > > > > > > + * - The CPU is obviously no longer coherent with the other CPUs. > > > > > > > + * > > > > > > > + * Further considerations: > > > > > > > + * > > > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as > > > > > > > + * documented in the ARMv7 TRM. Vendor implementations that deviate from > > > > > > > > > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM". The SMP bit is > > > > > > not part of ARMv7 at all. > > > > > > > > > > Well, I just copied Lorenzo's words here, trusting he knew more about it > > > > > than I do. > > > > > > > > > > > Also, it seems that A9 isn't precisely the > > > > > > same: two ACTLR bits need to be twiddled. R-class CPUs are generally > > > > > > not the same either. > > > > > > > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to > > > > clear it, clearing the SMP bit is enough. > > > > > > > > See, Dave has a point, there is no explicit "unified v7 TRM disable > > > > clean and exit coherency procedure" even though the designers end goal is to > > > > have one and that's the one you wrote. The code you posted is perfectly ok on > > > > all v7 implementations in the kernel I am aware of, I stand to be corrected > > > > but to the best of my knowledge that's the case. > > > > > > What I'm really concerned about here is not Cortex-*, but the vendors > > > who have their own CPU implementations. I don't think I've ever seen > > > a TRM for one of those. > > > > Could we wait until they materialize before worrying about possible > > incompatibility issues? After all this is not a user space ABI that > > cannot be changed afterwards. Remember this is Linux internal code we > > have the ability to change when needed, hence we should resist the > > temptation to over-engineer. > > If I could think of a better name, I would suggest it ... but unless we call > it "cortexa" or something, I don't know what it should be. I'm not sure > how much I like that either. Strictly speaking, that's still not correct. > > So I guess this comes down to being clear about which CPUs we expect it to > work for. > > Right now, we know it works for A7 and A15. > > It sounds like we believe it might work for A9. Lorenzo might know about > A5, A7 etc. But I'm not sure this exact sequence has been tested on any > of those CPUs(?) I tested this sequence on A9 MP a while ago, even though, obviously was not this same macro. I know for certain it has been tested on A5, again not the same macro but the same sequence. On R7 honestly I can't certify. What if we postpone the consolidation till MCPM for some A9 platforms (eg Tegra) is posted (and in the meantime you leave the macro in mach-vexpress to avoid duplication) ? Thank you, Lorenzo
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 17d0ae8672..8f4e2297e2 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size) #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr)) #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr)) +/* + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky. + * To do so we must: + * + * - Clear the SCTLR.C bit to prevent further cache allocations + * - Flush the desired level of cache + * - Clear the ACTLR "SMP" bit to disable local coherency + * + * ... and so without any intervening memory access in between those steps, + * not even to the stack. + * + * WARNING -- After this has been called: + * + * - No ldr/str exclusive must be used. + * - The CPU is obviously no longer coherent with the other CPUs. + * + * Further considerations: + * + * - This relies on the presence and behavior of the AUXCR.SMP bit as + * documented in the ARMv7 TRM. Vendor implementations that deviate from + * that will need their own procedure. + * - This is unlikely to work if Linux is running non-secure. + */ +#define v7_exit_coherency_flush(level) \ + asm volatile( \ + "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" \ + "bic r0, r0, #"__stringify(CR_C)" \n\t" \ + "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" \ + "isb \n\t" \ + "bl v7_flush_dcache_"__stringify(level)" \n\t" \ + "clrex \n\t" \ + "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" \ + "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" \ + "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" \ + "isb \n\t" \ + "dsb " \ + /* The clobber list is dictated by the call to v7_flush_dcache_* */ \ + : : : "r0","r1","r2","r3","r4","r5","r6","r7", \ + "r9","r10","r11","lr","memory" ) + #endif diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c index 85fffa702f..14d4996887 100644 --- a/arch/arm/mach-vexpress/dcscb.c +++ b/arch/arm/mach-vexpress/dcscb.c @@ -133,32 +133,8 @@ static void dcscb_power_down(void) if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) { arch_spin_unlock(&dcscb_lock); - /* - * Flush all cache levels for this cluster. - * - * To do so we do: - * - Clear the SCTLR.C bit to prevent further cache allocations - * - Flush the whole cache - * - Clear the ACTLR "SMP" bit to disable local coherency - * - * Let's do it in the safest possible way i.e. with - * no memory access within the following sequence - * including to the stack. - */ - asm volatile( - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" - "bic r0, r0, #"__stringify(CR_C)" \n\t" - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" - "isb \n\t" - "bl v7_flush_dcache_all \n\t" - "clrex \n\t" - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" - "isb \n\t" - "dsb " - : : : "r0","r1","r2","r3","r4","r5","r6","r7", - "r9","r10","r11","lr","memory"); + /* Flush all cache levels for this cluster. */ + v7_exit_coherency_flush(all); /* * This is a harmless no-op. On platforms with a real @@ -177,24 +153,8 @@ static void dcscb_power_down(void) } else { arch_spin_unlock(&dcscb_lock); - /* - * Flush the local CPU cache. - * Let's do it in the safest possible way as above. - */ - asm volatile( - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" - "bic r0, r0, #"__stringify(CR_C)" \n\t" - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" - "isb \n\t" - "bl v7_flush_dcache_louis \n\t" - "clrex \n\t" - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" - "isb \n\t" - "dsb " - : : : "r0","r1","r2","r3","r4","r5","r6","r7", - "r9","r10","r11","lr","memory"); + /* Disable and flush the local CPU cache. */ + v7_exit_coherency_flush(louis); } __mcpm_cpu_down(cpu, cluster); diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c index dfb55d45b6..5940f1e317 100644 --- a/arch/arm/mach-vexpress/tc2_pm.c +++ b/arch/arm/mach-vexpress/tc2_pm.c @@ -134,26 +134,7 @@ static void tc2_pm_down(u64 residency) : : "r" (0x400) ); } - /* - * We need to disable and flush the whole (L1 and L2) cache. - * Let's do it in the safest possible way i.e. with - * no memory access within the following sequence - * including the stack. - */ - asm volatile( - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" - "bic r0, r0, #"__stringify(CR_C)" \n\t" - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" - "isb \n\t" - "bl v7_flush_dcache_all \n\t" - "clrex \n\t" - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" - "isb \n\t" - "dsb " - : : : "r0","r1","r2","r3","r4","r5","r6","r7", - "r9","r10","r11","lr","memory"); + v7_exit_coherency_flush(all); cci_disable_port_by_cpu(mpidr); @@ -169,24 +150,7 @@ static void tc2_pm_down(u64 residency) arch_spin_unlock(&tc2_pm_lock); - /* - * We need to disable and flush only the L1 cache. - * Let's do it in the safest possible way as above. - */ - asm volatile( - "mrc p15, 0, r0, c1, c0, 0 @ get CR \n\t" - "bic r0, r0, #"__stringify(CR_C)" \n\t" - "mcr p15, 0, r0, c1, c0, 0 @ set CR \n\t" - "isb \n\t" - "bl v7_flush_dcache_louis \n\t" - "clrex \n\t" - "mrc p15, 0, r0, c1, c0, 1 @ get AUXCR \n\t" - "bic r0, r0, #(1 << 6) @ disable local coherency \n\t" - "mcr p15, 0, r0, c1, c0, 1 @ set AUXCR \n\t" - "isb \n\t" - "dsb " - : : : "r0","r1","r2","r3","r4","r5","r6","r7", - "r9","r10","r11","lr","memory"); + v7_exit_coherency_flush(louis); } __mcpm_cpu_down(cpu, cluster);