Message ID | 1395330365-9901-10-git-send-email-ian.campbell@citrix.com |
---|---|
State | New |
Headers | show |
On 20/03/14 15:45, Ian Campbell wrote: > The mem* primitives which I am about to import from Linux in a subsequent > patch rely on the hardware handling misalignment. > > The benefits of an optimised memcpy etc oughtweigh the downsides. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > xen/arch/arm/arm64/head.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > index 9547ef5..22d0030 100644 > --- a/xen/arch/arm/arm64/head.S > +++ b/xen/arch/arm/arm64/head.S > @@ -241,7 +241,7 @@ skip_bss: > * I-cache enabled, > * Alignment checking enabled, Is this comment still true? ~Andrew > * MMU translation disabled (for now). */ > - ldr x0, =(HSCTLR_BASE|SCTLR_A) > + ldr x0, =(HSCTLR_BASE) > msr SCTLR_EL2, x0 > > /* Rebuild the boot pagetable's first-level entries. The structure
On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote: > On 20/03/14 15:45, Ian Campbell wrote: > > The mem* primitives which I am about to import from Linux in a subsequent > > patch rely on the hardware handling misalignment. > > > > The benefits of an optimised memcpy etc oughtweigh the downsides. Ahem, "outweigh". > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > --- > > xen/arch/arm/arm64/head.S | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > > index 9547ef5..22d0030 100644 > > --- a/xen/arch/arm/arm64/head.S > > +++ b/xen/arch/arm/arm64/head.S > > @@ -241,7 +241,7 @@ skip_bss: > > * I-cache enabled, > > * Alignment checking enabled, > > Is this comment still true? Oh balls, no it is not. I had a meeting between deciding to make this change and actually making it... Ian. > > ~Andrew > > > * MMU translation disabled (for now). */ > > - ldr x0, =(HSCTLR_BASE|SCTLR_A) > > + ldr x0, =(HSCTLR_BASE) > > msr SCTLR_EL2, x0 > > > > /* Rebuild the boot pagetable's first-level entries. The structure >
On 2014-03-20 15:59, Ian Campbell wrote: > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote: >> On 20/03/14 15:45, Ian Campbell wrote: >> > The mem* primitives which I am about to import from Linux in a subsequent >> > patch rely on the hardware handling misalignment. >> > >> > The benefits of an optimised memcpy etc oughtweigh the downsides. > > Ahem, "outweigh". Just FYI, the slow-down from heavy unaligned accesses (with hardware alignment fixup, you can't disable it using /proc/cpu/alignment) on Cortex A15 is about 40x. Most of the commonly used code has been fixed recently, but there are still some packages that exhibit misaligned access traps during their test suites and/or normal operation. Whether the hardware alignment fixup is less overheady on ARM64, I don't know - I haven't been able to get my hands on the hardware yet. Gordan
On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote: > On 2014-03-20 15:59, Ian Campbell wrote: > > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote: > >> On 20/03/14 15:45, Ian Campbell wrote: > >> > The mem* primitives which I am about to import from Linux in a subsequent > >> > patch rely on the hardware handling misalignment. > >> > > >> > The benefits of an optimised memcpy etc oughtweigh the downsides. > > > > Ahem, "outweigh". > > Just FYI, the slow-down from heavy unaligned accesses (with > hardware alignment fixup, you can't disable it using > /proc/cpu/alignment) on Cortex A15 is about 40x. That's pretty staggering -- are you positive this wasn't the kernel doing the fixups? > Most of the commonly used code has been fixed recently, but > there are still some packages that exhibit misaligned access > traps during their test suites and/or normal operation. > > Whether the hardware alignment fixup is less overheady on > ARM64, I don't know - I haven't been able to get my hands > on the hardware yet. arm64 is a lot "friendlier" than arm32 in this regard. I was mostly taking it on trust that whoever implemented memcpy.S etc found that memcpy.S with hardware alignment was better than the dumb loop, even if it wasn't as good as a clever memcpy.S which avoided the alignments. Ian.
On 2014-03-20 16:27, Ian Campbell wrote: > On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote: >> On 2014-03-20 15:59, Ian Campbell wrote: >> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote: >> >> On 20/03/14 15:45, Ian Campbell wrote: >> >> > The mem* primitives which I am about to import from Linux in a subsequent >> >> > patch rely on the hardware handling misalignment. >> >> > >> >> > The benefits of an optimised memcpy etc oughtweigh the downsides. >> > >> > Ahem, "outweigh". >> >> Just FYI, the slow-down from heavy unaligned accesses (with >> hardware alignment fixup, you can't disable it using >> /proc/cpu/alignment) on Cortex A15 is about 40x. > > That's pretty staggering -- are you positive this wasn't the kernel > doing the fixups? I'm not sure if this is easily checkable: # echo 0 > /proc/cpu/alignment # cat /proc/cpu/alignment User: 0 System: 631040 Skipped: 0 Half: 0 Word: 631040 DWord: 0 Multi: 0 User faults: 2 (fixup) i.e. I can't disable it. This is on a Samsung Exynos Chromebook with the standard ChromeOS kernel. Here is a recent thread from the Fedora ARM mailing list which contains links to a simple test program that can be used to test the alignment related slowdown: http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html >> Most of the commonly used code has been fixed recently, but >> there are still some packages that exhibit misaligned access >> traps during their test suites and/or normal operation. >> >> Whether the hardware alignment fixup is less overheady on >> ARM64, I don't know - I haven't been able to get my hands >> on the hardware yet. > > arm64 is a lot "friendlier" than arm32 in this regard. I was mostly > taking it on trust that whoever implemented memcpy.S etc found that > memcpy.S with hardware alignment was better than the dumb loop, even if > it wasn't as good as a clever memcpy.S which avoided the alignments. I am inclined to agree - it shouldn't be the job of the kernel or the hypervisor to do this. It is up to the application developers to know what they are doing and not do things that introduce misaligned accesses. Unfortunately, there is far too little push-back on buggy code because most developers have only ever used x86 and have no idea that until recently everything else wasn't forgiving of such things. Gordan
On Thu, 2014-03-20 at 16:43 +0000, Gordan Bobic wrote: > On 2014-03-20 16:27, Ian Campbell wrote: > > On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote: > >> On 2014-03-20 15:59, Ian Campbell wrote: > >> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote: > >> >> On 20/03/14 15:45, Ian Campbell wrote: > >> >> > The mem* primitives which I am about to import from Linux in a subsequent > >> >> > patch rely on the hardware handling misalignment. > >> >> > > >> >> > The benefits of an optimised memcpy etc oughtweigh the downsides. > >> > > >> > Ahem, "outweigh". > >> > >> Just FYI, the slow-down from heavy unaligned accesses (with > >> hardware alignment fixup, you can't disable it using > >> /proc/cpu/alignment) on Cortex A15 is about 40x. > > > > That's pretty staggering -- are you positive this wasn't the kernel > > doing the fixups? > > I'm not sure if this is easily checkable: > > # echo 0 > /proc/cpu/alignment > # cat /proc/cpu/alignment > User: 0 > System: 631040 > Skipped: 0 > Half: 0 > Word: 631040 > DWord: 0 > Multi: 0 > User faults: 2 (fixup) > > i.e. I can't disable it. That "fixup" implies to me that the kernel will be fixing things up. linux/Documentation/arm/mem_alignment describes what happens here. > > This is on a Samsung Exynos Chromebook with the > standard ChromeOS kernel. I've no idea if this sets SCTLR.A but it sounds like it does. > > Here is a recent thread from the Fedora ARM mailing list > which contains links to a simple test program that can > be used to test the alignment related slowdown: > > http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html > > >> Most of the commonly used code has been fixed recently, but > >> there are still some packages that exhibit misaligned access > >> traps during their test suites and/or normal operation. > >> > >> Whether the hardware alignment fixup is less overheady on > >> ARM64, I don't know - I haven't been able to get my hands > >> on the hardware yet. > > > > arm64 is a lot "friendlier" than arm32 in this regard. I was mostly > > taking it on trust that whoever implemented memcpy.S etc found that > > memcpy.S with hardware alignment was better than the dumb loop, even if > > it wasn't as good as a clever memcpy.S which avoided the alignments. > > I am inclined to agree - it shouldn't be the job of the kernel or the > hypervisor to do this. This patch is only changing the alignment trap behaviour for the hypervisor itself, it has no impact on either guest kernel or userspace, which have their own control bits for operation in those modes. Ian.
Hi Ian, On 03/20/2014 03:45 PM, Ian Campbell wrote: > The mem* primitives which I am about to import from Linux in a subsequent > patch rely on the hardware handling misalignment. > > The benefits of an optimised memcpy etc oughtweigh the downsides. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> With the both minor changes that Andrew and you spotted: Acked-by: Julien Grall <julien.grall@linaro.org> Regards,
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index 9547ef5..22d0030 100644 --- a/xen/arch/arm/arm64/head.S +++ b/xen/arch/arm/arm64/head.S @@ -241,7 +241,7 @@ skip_bss: * I-cache enabled, * Alignment checking enabled, * MMU translation disabled (for now). */ - ldr x0, =(HSCTLR_BASE|SCTLR_A) + ldr x0, =(HSCTLR_BASE) msr SCTLR_EL2, x0 /* Rebuild the boot pagetable's first-level entries. The structure
The mem* primitives which I am about to import from Linux in a subsequent patch rely on the hardware handling misalignment. The benefits of an optimised memcpy etc oughtweigh the downsides. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/arm64/head.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)