Message ID | ed676fc3-3c20-308a-d765-5c5a60d7ea08@foss.arm.com |
---|---|
State | New |
Headers | show |
On 11/11/16 14:45, Thomas Preudhomme wrote: > Hi, > > To fix PR69714, code was added to disable bswap when the resulting symbolic > expression (a load or load + byte swap) is smaller than the source expression > (eg. some of the bytes accessed in the source code gets bitwise ANDed with 0). > As explained in [1], there was already two pieces of code written independently > in bswap to deal with that case and that's the interaction of the two that > caused the bug. > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00948.html > > PR69714 proves that this pattern do occur in real code so this patch set out to > reenable the optimization and remove the big endian adjustment in bswap_replace: > the change in find_bswap_or_nop ensures that either we cancel the optimization > or we don't and there is no need for offset adjustement. As explained in [2], > the current code only support loss of bytes at the highest addresses because > there is no code to adjust the address of the load. However, for little and big > endian targets the bytes at highest address translate into different byte > significance in the result. This patch first separate cmpxchg and cmpnop > adjustement into 2 steps and then deal with endianness correctly for the second > step. > > [2] https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00119.html > > Ideally we would want to still be able to do the adjustment to deal with load or > load+bswap at an offset from the byte at lowest memory address accessed but this > would require more code to recognize it properly for both little endian and big > endian and will thus have to wait GCC 8 stage 1. > > ChangeLog entry is as follows: > > *** gcc/ChangeLog *** > > 2016-11-10 Thomas Preud'homme <thomas.preudhomme@arm.com> > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg > and cmpnop in two steps: first the ones not accessed in original gimple > expression in a endian independent way and then the ones not accessed > in the final result in an endian-specific way. > (bswap_replace): Stop doing big endian adjustment. > > > Testsuite does not show any regression on an armeb-none-eabi GCC cross-compiler > targeting ARM Cortex-M3 and on an x86_64-linux-gnu bootstrapped native GCC > compiler. Bootstrap on powerpc in progress. > > Is this ok for trunk provided that the powerpc bootstrap succeeds? Bootstrap for C, C++ and fortran succeeded. Best regards, Thomas
On Fri, 11 Nov 2016, Thomas Preudhomme wrote: > Hi, > > To fix PR69714, code was added to disable bswap when the resulting symbolic > expression (a load or load + byte swap) is smaller than the source expression > (eg. some of the bytes accessed in the source code gets bitwise ANDed with 0). > As explained in [1], there was already two pieces of code written > independently in bswap to deal with that case and that's the interaction of > the two that caused the bug. > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00948.html > > PR69714 proves that this pattern do occur in real code so this patch set out > to reenable the optimization and remove the big endian adjustment in > bswap_replace: the change in find_bswap_or_nop ensures that either we cancel > the optimization or we don't and there is no need for offset adjustement. As > explained in [2], the current code only support loss of bytes at the highest > addresses because there is no code to adjust the address of the load. However, > for little and big endian targets the bytes at highest address translate into > different byte significance in the result. This patch first separate cmpxchg > and cmpnop adjustement into 2 steps and then deal with endianness correctly > for the second step. > > [2] https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00119.html > > Ideally we would want to still be able to do the adjustment to deal with load > or load+bswap at an offset from the byte at lowest memory address accessed but > this would require more code to recognize it properly for both little endian > and big endian and will thus have to wait GCC 8 stage 1. > > ChangeLog entry is as follows: > > *** gcc/ChangeLog *** > > 2016-11-10 Thomas Preud'homme <thomas.preudhomme@arm.com> > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg > and cmpnop in two steps: first the ones not accessed in original > gimple > expression in a endian independent way and then the ones not accessed > in the final result in an endian-specific way. > (bswap_replace): Stop doing big endian adjustment. > > > Testsuite does not show any regression on an armeb-none-eabi GCC > cross-compiler targeting ARM Cortex-M3 and on an x86_64-linux-gnu bootstrapped > native GCC compiler. Bootstrap on powerpc in progress. > > Is this ok for trunk provided that the powerpc bootstrap succeeds? Ok. Thanks, Richard.
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c315da88ce4feea1196a0416e4ea02e2a75a4377..b28c808c55489ae1ae16c173d66c561c1897e6ab 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2504,9 +2504,11 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number *n, int limit) static gimple * find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) { - /* The number which the find_bswap_or_nop_1 result should match in order - to have a full byte swap. The number is shifted to the right - according to the size of the symbolic number before using it. */ + unsigned rsize; + uint64_t tmpn, mask; +/* The number which the find_bswap_or_nop_1 result should match in order + to have a full byte swap. The number is shifted to the right + according to the size of the symbolic number before using it. */ uint64_t cmpxchg = CMPXCHG; uint64_t cmpnop = CMPNOP; @@ -2527,28 +2529,38 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) /* Find real size of result (highest non-zero byte). */ if (n->base_addr) - { - unsigned HOST_WIDE_INT rsize; - uint64_t tmpn; - - for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); - if (BYTES_BIG_ENDIAN && n->range != rsize) - /* This implies an offset, which is currently not handled by - bswap_replace. */ - return NULL; - n->range = rsize; - } + for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); + else + rsize = n->range; - /* Zero out the extra bits of N and CMP*. */ + /* Zero out the bits corresponding to untouched bytes in original gimple + expression. */ if (n->range < (int) sizeof (int64_t)) { - uint64_t mask; - mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1; cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER; cmpnop &= mask; } + /* Zero out the bits corresponding to unused bytes in the result of the + gimple expression. */ + if (rsize < n->range) + { + if (BYTES_BIG_ENDIAN) + { + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1; + cmpxchg &= mask; + cmpnop >>= (n->range - rsize) * BITS_PER_MARKER; + } + else + { + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1; + cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER; + cmpnop &= mask; + } + n->range = rsize; + } + /* A complete byte swap should make the symbolic number to start with the largest digit in the highest order byte. Unchanged symbolic number indicates a read with same endianness as target architecture. */ @@ -2636,26 +2648,6 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl, HOST_WIDE_INT load_offset = 0; align = get_object_alignment (src); - /* If the new access is smaller than the original one, we need - to perform big endian adjustment. */ - if (BYTES_BIG_ENDIAN) - { - HOST_WIDE_INT bitsize, bitpos; - machine_mode mode; - int unsignedp, reversep, volatilep; - tree offset; - - get_inner_reference (src, &bitsize, &bitpos, &offset, &mode, - &unsignedp, &reversep, &volatilep); - if (n->range < (unsigned HOST_WIDE_INT) bitsize) - { - load_offset = (bitsize - n->range) / BITS_PER_UNIT; - unsigned HOST_WIDE_INT l - = (load_offset * BITS_PER_UNIT) & (align - 1); - if (l) - align = least_bit_hwi (l); - } - } if (bswap && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))