Message ID | d6e24bfd-fc3e-4160-fe27-db3eda5b857c@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Wed, Dec 7, 2016 at 5:14 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type) >> produces (uint64_t)uint32 + -1U + 1. This simply means that we cannot ignore >> overflow of the inner operation and for some reason your change >> to extract_range_from_binary_expr didn't catch this. That is _8 + 4294967295 >> overflows but we ignored that. > > In this case the range of _8 was [1,1] so no overflow. > I think the problem is rather about the interpretation of the inner > constant. I tried discerning two cases now, a range split (i.e. when the > single range of a binop variable becomes an anti range or two ranges > after combining it with the binop's other range) and an overflow of the > range's min and max. > If the range didn't split, we can perform the simplification. If there > was a min and max overflow, we have to interpret the inner constant as > signed and perform a sign extension before converting it to the outer > type. Without overflow we can use TYPE_SIGN (inner_type). > Does this make sense? I'm not sure there is anything to "interpret" -- the operation is unsigned and overflow is when the operation may wrap around zero. There might be clever ways of re-writing the expression to (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1) avoiding the overflow and thus allowing the transform but I'm not sure that's good. A related thing would be canonicalizing unsigned X plus CST to unsigned X minus CST' if CST' has a smaller absolute value than CST. I think currently we simply canonicalize to 'plus CST' but also only in fold-const.c, not in match.pd (ah we do, but only in a simplified manner). That said, can we leave that "trick" out of the patch? I think your more complicated "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all ops (like MULT_EXPR where more complicated cases can arise). Richard. > Included the remarks and attached the new version. > > Regards > Robin
Perhaps I'm still missing how some cases are handled or not handled, sorry for the noise. > I'm not sure there is anything to "interpret" -- the operation is unsigned > and overflow is when the operation may wrap around zero. There might > be clever ways of re-writing the expression to > (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1) > avoiding the overflow and thus allowing the transform but I'm not sure that's > good. The extra work I introduced was to discern between (uint64_t)(a + UINT_MAX) + 1 -> (uint64_t)(a), (uint64_t)(a + UINT_MAX) + 1 -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1), For a's range of [1,1] there is an overflow in both cases. We still want to simplify the first case by combining UINT_MAX + 1 -> 0. If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the outer constant is larger than UINT_MAX. What else can we do here? Do we see cases like the second one at all? If it's not needed, the extra work is likely not needed. > A related thing would be canonicalizing unsigned X plus CST to > unsigned X minus CST' > if CST' has a smaller absolute value than CST. I think currently we > simply canonicalize > to 'plus CST' but also only in fold-const.c, not in match.pd (ah we > do, but only in a simplified manner). I can imagine this could simplify the treatment of some cases, yet I'm already a bit lost with the current cases :) > That said, can we leave that "trick" out of the patch? I think your > more complicated > "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all > ops (like MULT_EXPR where more complicated cases can arise). There is certainly additional work to be done for MULT_EXPR, I disregarded it so far. For this patch, I'd rather conservatively assume overflow. Regards Robin
Ping. To put it shortly, I'm not sure how to differentiate between: example range of a: [3,3] (ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend example range of a: [0,0] (ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no sign extend In this case, there is an "ok" overflow in the first example's range and no overflow in the second, but I don't think this suffices as criterion. As said, your proposed canonicalization (" unsigned X plus CST to unsigned X minus CST' if CST' has a smaller absolute value than CST) might help here, but I'm unsure how to do it currently.
On Tue, Jan 10, 2017 at 2:32 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: > Perhaps I'm still missing how some cases are handled or not handled, > sorry for the noise. > >> I'm not sure there is anything to "interpret" -- the operation is unsigned >> and overflow is when the operation may wrap around zero. There might >> be clever ways of re-writing the expression to >> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1) >> avoiding the overflow and thus allowing the transform but I'm not sure that's >> good. > > The extra work I introduced was to discern between > > (uint64_t)(a + UINT_MAX) + 1 -> (uint64_t)(a), > (uint64_t)(a + UINT_MAX) + 1 -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1), > > For a's range of [1,1] there is an overflow in both cases. > We still want to simplify the first case by combining UINT_MAX + 1 -> 0. So there's the case where a == 0 where (uint64_t)(0 + UINT_MAX) + 1 is not zero. I think we're still searching for the proper condition on when it is allowed to re-write (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST to (uint64_t)(uint32_t) + (uint64_t)uint32_t-CST + uint64_t-CST or to (uint64_t)(uint32_t) + (uint64_t)(uint32_t-CST + (uint32_t)uint64_t-CST) > If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps > (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the > outer constant is larger than UINT_MAX. What else can we do here? > Do we see cases like the second one at all? If it's not needed, the > extra work is likely not needed. We do have the need to associate and simplfy across conversions but of course we have to do it conservatively correct. >> A related thing would be canonicalizing unsigned X plus CST to >> unsigned X minus CST' >> if CST' has a smaller absolute value than CST. I think currently we >> simply canonicalize >> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we >> do, but only in a simplified manner). > > I can imagine this could simplify the treatment of some cases, yet I'm > already a bit lost with the current cases :) Yes, but I am lost a bit as well. I don't know the correct conditions to test off-head -- I know we may not introduce new undefined overflow ops and of course we need to not compute wrong numbers either. >> That said, can we leave that "trick" out of the patch? I think your >> more complicated >> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all >> ops (like MULT_EXPR where more complicated cases can arise). > > There is certainly additional work to be done for MULT_EXPR, I > disregarded it so far. For this patch, I'd rather conservatively assume > overflow. Ok... Richard. > Regards > Robin >
I skimmed through the code to see where transformation like (a - 1) -> (a + UINT_MAX) are performed. It seems there are only two places, match.pd (/* A - B -> A + (-B) if B is easily negatable. */) and fold-const.c. In order to be able to reliably know whether to zero-extend or to sign-extend the constant (1) in (ulong)(a - 1) + 1 -> (ulong)(a) + 0 or -> (ulong)(a) + (ulong)(UINT_MAX + 1) we'd have to know if the constant was converted by a transformation like above. I first tried removing the two transformations altogether but this causes around 20 new regressions on s390x and I didn't go through all of them to see whether they can be fixed. Is there a rationale for applying the transformations other than canonicalization? I saw some optimizations that only apply to PLUS_EXPRs but that can easily be changed to also include MINUS_EXPR. My other attempt entails an additional flag TREE_WAS_SIGNED for the lack of a better naming idea. It is set when the above transformation takes place. I used (NODE)->base.deprecated_flag but there may be better choices. Tentative/example patch attached for reference. Using this, the type extension in my original patch can be changed to w1 = w1.from (w1, TYPE_PRECISION (type), TREE_WAS_SIGNED (@1) ? SIGNED : TYPE_SIGN (inner_type)); which works for me and does not introduce regressions on s390x. Is this a viable approach? Regards Robindiff --git a/gcc/fold-const.c b/gcc/fold-const.c index c649e54..cbb7469 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -9648,9 +9648,15 @@ fold_binary_loc (location_t loc, && (TREE_CODE (op1) != REAL_CST || REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1)))) || INTEGRAL_TYPE_P (type))) - return fold_build2_loc (loc, PLUS_EXPR, type, - fold_convert_loc (loc, type, arg0), - negate_expr (op1)); + { + tree negtem = negate_expr (op1); + if (CONSTANT_CLASS_P (negtem)) + TREE_WAS_SIGNED (negtem) = 1; + tree tem = fold_build2_loc (loc, PLUS_EXPR, type, + fold_convert_loc (loc, type, arg0), + negtem); + return tem; + } /* Fold &a[i] - &a[j] to i-j. */ if (TREE_CODE (arg0) == ADDR_EXPR @@ -13620,6 +13626,7 @@ fold_negate_const (tree arg0, tree type) t = force_fit_type (type, val, 1, (overflow | TREE_OVERFLOW (arg0)) && !TYPE_UNSIGNED (type)); + TREE_WAS_SIGNED (t) = TREE_WAS_SIGNED (arg0); break; } diff --git a/gcc/match.pd b/gcc/match.pd index b3f2063..e791942 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -870,7 +870,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (minus @0 negate_expr_p@1) (if (!FIXED_POINT_TYPE_P (type)) - (plus @0 (negate @1)))) + (with { + if (CONSTANT_CLASS_P (@1)) + TREE_WAS_SIGNED (@1) = 1; + } + (plus @0 (negate @1))))) /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) when profitable. @@ -1223,8 +1227,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Extend @1 to TYPE. Perform sign extension if the range overflowed but did not split. */ - w1 = w1.from (w1, TYPE_PRECISION (type), ovf.ovf ? SIGNED : - TYPE_SIGN (inner_type)); + w1 = w1.from (w1, TYPE_PRECISION (type), TREE_WAS_SIGNED (@1) + ? SIGNED : TYPE_SIGN (inner_type)); /* w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); */ diff --git a/gcc/tree.h b/gcc/tree.h index 62cd7bb..1e7efd9 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -735,11 +735,18 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag) +/* Foo. */ + +#define TREE_WAS_SIGNED(NODE) (CST_CHECK (NODE)->base.deprecated_flag) + /* TREE_OVERFLOW can only be true for EXPR of CONSTANT_CLASS_P. */ #define TREE_OVERFLOW_P(EXPR) \ (CONSTANT_CLASS_P (EXPR) && TREE_OVERFLOW (EXPR)) +#define TREE_WAS_SIGNED_P(EXPR) \ + (CONSTANT_CLASS_P (EXPR) && TREE_WAS_SIGNED (EXPR)) + /* In a VAR_DECL, FUNCTION_DECL, NAMESPACE_DECL or TYPE_DECL, nonzero means name is to be accessible from outside this translation unit. In an IDENTIFIER_NODE, nonzero means an external declaration
ping.
On Tue, Jan 17, 2017 at 9:48 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Jan 10, 2017 at 2:32 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >> Perhaps I'm still missing how some cases are handled or not handled, >> sorry for the noise. >> >>> I'm not sure there is anything to "interpret" -- the operation is unsigned >>> and overflow is when the operation may wrap around zero. There might >>> be clever ways of re-writing the expression to >>> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1) >>> avoiding the overflow and thus allowing the transform but I'm not sure that's >>> good. >> >> The extra work I introduced was to discern between >> >> (uint64_t)(a + UINT_MAX) + 1 -> (uint64_t)(a), >> (uint64_t)(a + UINT_MAX) + 1 -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1), >> >> For a's range of [1,1] there is an overflow in both cases. >> We still want to simplify the first case by combining UINT_MAX + 1 -> 0. > > So there's the case where a == 0 where (uint64_t)(0 + UINT_MAX) + 1 is not zero. > > I think we're still searching for the proper condition on when it is allowed to > re-write > > (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST > > to > > (uint64_t)(uint32_t) + (uint64_t)uint32_t-CST + uint64_t-CST Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient condition for such transformation? > > or to > > (uint64_t)(uint32_t) + (uint64_t)(uint32_t-CST + (uint32_t)uint64_t-CST) We don't actually need to prove this? What complicates this is when it's safe to transform: (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST into (uint64_t)(uint32_t + uint32_t-CSTx) where uint32_t-CSTx = uint32_t-CST + (uint32_t)uint64_t-CST Am I misunderstanding the whole thing? Thanks, bin > >> If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps >> (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the >> outer constant is larger than UINT_MAX. What else can we do here? >> Do we see cases like the second one at all? If it's not needed, the >> extra work is likely not needed. > > We do have the need to associate and simplfy across conversions but of > course we have to do it conservatively correct. > >>> A related thing would be canonicalizing unsigned X plus CST to >>> unsigned X minus CST' >>> if CST' has a smaller absolute value than CST. I think currently we >>> simply canonicalize >>> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we >>> do, but only in a simplified manner). >> >> I can imagine this could simplify the treatment of some cases, yet I'm >> already a bit lost with the current cases :) > > Yes, but I am lost a bit as well. I don't know the correct conditions to test > off-head -- I know we may not introduce new undefined overflow ops and > of course we need to not compute wrong numbers either. > >>> That said, can we leave that "trick" out of the patch? I think your >>> more complicated >>> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all >>> ops (like MULT_EXPR where more complicated cases can arise). >> >> There is certainly additional work to be done for MULT_EXPR, I >> disregarded it so far. For this patch, I'd rather conservatively assume >> overflow. > > Ok... > > Richard. > >> Regards >> Robin >>
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient > condition for such transformation? Yes, in principle this should suffice. What we're actually looking for is something like a "proper" (or no) overflow, i.e. an overflow in both min and max of the value range. In (a + cst1) + cst2 an overflow of (a + cst1) will still produce a valid range as long as overflow(a_min + cst1) == overflow(a_max + cst1). When max overflows but min does not we must not simplify. Currently, I'm checking if the range of (a + cst1) is still a VR_RANGE, for now disregarding ANTI_RANGEs which could most likely be dealt with as well. A major discussion point in my last try was to wrongly always use sign-extend when extending cst1 to the outer type. This is now changed to use sign-extension when (a + cst1) "properly" overflows as in ASSUME (a > 0); (unsigned long)(a + UINT_MAX) + 1; resulting in (unsigned long)(a) + (unsigned long)0, or use the type sign of the constant like in ASSUME (a < 2); (unsigned long)(a + 4294967294u) + 10; resulting in (unsigned long)(a) + (unsigned long)((unsigned long)4294967294 + (unsigned long)10). The additional flag from the last patch is not necessary. Test suite is clean on s390x and x86, bootstraps successful. Regards Robin
On Thu, May 18, 2017 at 3:47 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: > match.pd part of the patch. > > gcc/ChangeLog: > > 2017-05-18 Robin Dapp <rdapp@linux.vnet.ibm.com> > > * match.pd: Simplify wrapped binary operations. > * tree-vrp.c (extract_range_from_binary_expr_1): Add overflow > parameter. > (extract_range_from_binary_expr): Likewise. > * tree-vrp.h: Export. Hi, I didn't follow this issue from the beginning, so might asking stupid questions. > diff --git a/gcc/match.pd b/gcc/match.pd > index 80a17ba..3fa18b9 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1290,6 +1290,85 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (cst && !TREE_OVERFLOW (cst)) > (plus { cst; } @0)))) > > +/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST) */ > +#if GIMPLE > + (for outer_op (plus minus) > + (for inner_op (plus minus) > + (simplify > + (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (TREE_CODE (type) == INTEGER_TYPE > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) > + (with > + { > + bool ovf = true; > + > + tree cst = NULL_TREE; > + tree inner_type = TREE_TYPE (@3); > + value_range vr = VR_INITIALIZER; > + > + /* Convert combined constant to tree of outer type if > + there was no overflow in the original operation. */ > + wide_int minv, maxv; > + if (TYPE_OVERFLOW_UNDEFINED (inner_type) > + || (extract_range_from_binary_expr (&vr, inner_op, > + inner_type, @0, @1, &ovf), vr.type == VR_RANGE)) Any reason to expose tree-vrp.c internal interface here? The function looks quite expensive. Overflow check can be done by get_range_info and simple wi::cmp calls. Existing code like in tree-ssa-loop-niters.c already does that. Also could you avoid using comma expressions in condition please? It only makes the code harder to be read. Thanks, bin
> Any reason to expose tree-vrp.c internal interface here? The function > looks quite expensive. Overflow check can be done by get_range_info > and simple wi::cmp calls. Existing code like in > tree-ssa-loop-niters.c already does that. Also could you avoid using > comma expressions in condition please? It only makes the code harder > to be read. I tried get_range_info () as well and got a FAIL in gcc.c-torture/execute/bitfld-5.c. where get_range_info () returns a VR_RANGE but extract...() gives VR_VARYING. The test case relies on not simplifying, i.e. would expect a VR_VARYING here but I didn't look into it more. Also, is there a possibility to know if there was an "ok" overflow or not from get_range_info ()'s output? Would I have to compare the result with the involved variable's range? Regards Robin
On Thu, May 18, 2017 at 5:08 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >> Any reason to expose tree-vrp.c internal interface here? The function >> looks quite expensive. Overflow check can be done by get_range_info >> and simple wi::cmp calls. Existing code like in >> tree-ssa-loop-niters.c already does that. Also could you avoid using >> comma expressions in condition please? It only makes the code harder >> to be read. > > I tried get_range_info () as well and got a FAIL in > gcc.c-torture/execute/bitfld-5.c. > where get_range_info () returns a VR_RANGE but extract...() gives > VR_VARYING. The test case relies on not simplifying, i.e. would expect > a VR_VARYING here but I didn't look into it more. I can guess what is happening here. It's a 40 bits unsigned long long field, (s.b-8) will be like: _1 = s.b _2 = _1 + 0xfffffffff8 Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1. You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8 overflows against precision of the bit-field which is 40 bits precision. The failure might because overflowness is checked against unsigned long long's precision which is 64 bits. > > Also, is there a possibility to know if there was an "ok" overflow or > not from get_range_info ()'s output? Would I have to compare the result > with the involved variable's range? I think you have to check it manually against max/min value of that type precision. Thanks, bin > > Regards > Robin >
> I can guess what is happening here. It's a 40 bits unsigned long long > field, (s.b-8) will be like: > _1 = s.b > _2 = _1 + 0xfffffffff8 > Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1. > You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8 > overflows against precision of the bit-field which is 40 bits > precision. The failure might because overflowness is checked against > unsigned long long's precision which is 64 bits. >> Also, is there a possibility to know if there was an "ok" overflow or >> not from get_range_info ()'s output? Would I have to compare the result >> with the involved variable's range? > I think you have to check it manually against max/min value of that > type precision. Currently, extract_... () does that all that for me, is it really too expensive to call? I guess, using get_range_info first and calling extract when get_range_info returns a VR_RANGE is not really a favorable thing to do either? :) Regards Robin
On Fri, May 19, 2017 at 11:09 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >> I can guess what is happening here. It's a 40 bits unsigned long long >> field, (s.b-8) will be like: >> _1 = s.b >> _2 = _1 + 0xfffffffff8 >> Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1. >> You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8 >> overflows against precision of the bit-field which is 40 bits >> precision. The failure might because overflowness is checked against >> unsigned long long's precision which is 64 bits. > >>> Also, is there a possibility to know if there was an "ok" overflow or >>> not from get_range_info ()'s output? Would I have to compare the result >>> with the involved variable's range? >> I think you have to check it manually against max/min value of that >> type precision. > > Currently, extract_... () does that all that for me, is it really too > expensive to call? I guess, using get_range_info first and calling > extract when get_range_info returns a VR_RANGE is not really a favorable > thing to do either? :) Not only the cost, we should avoid introducing more interfaces while old ones can do the work. Anyway, it's Richard's call here. Thanks, bin > > Regards > Robin >
On Fri, May 19, 2017 at 12:13 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Fri, May 19, 2017 at 11:09 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >>> I can guess what is happening here. It's a 40 bits unsigned long long >>> field, (s.b-8) will be like: >>> _1 = s.b >>> _2 = _1 + 0xfffffffff8 >>> Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1. >>> You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8 >>> overflows against precision of the bit-field which is 40 bits >>> precision. The failure might because overflowness is checked against >>> unsigned long long's precision which is 64 bits. >> >>>> Also, is there a possibility to know if there was an "ok" overflow or >>>> not from get_range_info ()'s output? Would I have to compare the result >>>> with the involved variable's range? >>> I think you have to check it manually against max/min value of that >>> type precision. >> >> Currently, extract_... () does that all that for me, is it really too >> expensive to call? I guess, using get_range_info first and calling >> extract when get_range_info returns a VR_RANGE is not really a favorable >> thing to do either? :) > Not only the cost, we should avoid introducing more interfaces while > old ones can do the work. Anyway, it's Richard's call here. Using get_range_info and wi:: is prefered, I didn't look into the issue you are running into but wi:: do have proper bit-precision tracking. Maybe the overflow checks are not implemented correctly there though. Richard. > Thanks, > bin >> >> Regards >> Robin >>
>> Currently, extract_... () does that all that for me, is it really too >> expensive to call? I guess, using get_range_info first and calling >> extract when get_range_info returns a VR_RANGE is not really a favorable >> thing to do either? :) > Not only the cost, we should avoid introducing more interfaces while > old ones can do the work. Anyway, it's Richard's call here. I rewrote the match.pd patterns to use get_range_info () now, keeping track of an "ok" overflow (both min and max overflow) and one which does not allow us to continue (min xor max overflow, split/anti range). Test suite on s390x has no regressions, bootstrap is ok, x86 running. Regards Robin -- gcc/ChangeLog: 2017-06-19 Robin Dapp <rdapp@linux.vnet.ibm.com> * match.pd: Simplify wrapped binary operations.diff --git a/gcc/match.pd b/gcc/match.pd index 80a17ba..66c37f6 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1290,6 +1290,128 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0)))) +/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (for inner_op (plus minus) + (simplify + (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (TREE_CODE (type) == INTEGER_TYPE + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + (with + { + tree cst = NULL_TREE; + tree inner_type = TREE_TYPE (@3); + wide_int wmin, wmax; + wide_int wmin0, wmax0; + + bool ovf = true; + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + + enum value_range_type vr_outer = + get_range_info (@3, &wmin, &wmax); + enum value_range_type vr0 = + get_range_info (@0, &wmin0, &wmax0); + + /* Convert combined constant to tree of outer type if + there was no overflow in the original operation. */ + if (ovf_undef || vr_outer == VR_RANGE) + { + wide_int w1 = @1; + wide_int w2 = @2; + + if (ovf_undef || vr0 == VR_RANGE) + { + if (inner_op == MINUS_EXPR) + w1 = wi::neg (w1); + + if (outer_op == MINUS_EXPR) + w2 = wi::neg (w2); + + bool split_range = true; + + if (!ovf_undef && vr0 == VR_RANGE) + { + int max_ovf = 0; + int min_ovf = 0; + + signop sgn = TYPE_SIGN (inner_type); + + wmin = wi::add (wmin0, w1); + min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wmax = wi::add (wmax0, w1); + max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + ovf = min_ovf || max_ovf; + + split_range = ((min_ovf && !max_ovf) + || (!min_ovf && max_ovf)); + } + + if (ovf_undef || !split_range) + { + /* Extend @1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), + ovf ? SIGNED : TYPE_SIGN (TREE_TYPE (@1))); + + /* Combine in outer, larger type. */ + wide_int combined_cst; + combined_cst = wi::add (w1, w2); + + cst = wide_int_to_tree (type, combined_cst); + } + } + } + } +(if (cst) + (outer_op (convert @0) { cst; })) + ))))) +#endif + +/* ((T)(A)) +- CST -> (T)(A +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (simplify + (outer_op (convert SSA_NAME@0) INTEGER_CST@2) + (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE) + /* Perform binary operation inside the cast if the constant fits + and there is no overflow. */ + (with + { + bool split_range = true; + tree cst_inner = NULL_TREE; + enum value_range_type vr = VR_VARYING; + tree inner_type = TREE_TYPE (@0); + + if (int_fits_type_p (@2, inner_type)) + { + cst_inner = fold_convert (inner_type, @2); + + wide_int wmin0, wmax0; + wide_int w1 = cst_inner; + signop sgn = TYPE_SIGN (inner_type); + vr = get_range_info (@0, &wmin0, &wmax0); + + if (vr == VR_RANGE) + { + wide_int wmin = wi::add (wmin0, w1); + bool min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wide_int wmax = wi::add (wmax0, w1); + bool max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf); + } + } + } + (if (cst_inner && !split_range) + (convert (outer_op @0 { cst_inner; }))) + )))) +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0)
On Tue, Jun 20, 2017 at 3:08 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >>> Currently, extract_... () does that all that for me, is it really too >>> expensive to call? I guess, using get_range_info first and calling >>> extract when get_range_info returns a VR_RANGE is not really a favorable >>> thing to do either? :) >> Not only the cost, we should avoid introducing more interfaces while >> old ones can do the work. Anyway, it's Richard's call here. > > I rewrote the match.pd patterns to use get_range_info () now, keeping > track of an "ok" overflow (both min and max overflow) and one which does > not allow us to continue (min xor max overflow, split/anti range). Test > suite on s390x has no regressions, bootstrap is ok, x86 running. + (if (TREE_CODE (type) == INTEGER_TYPE + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + (with use INTEGRAL_TYPE_P. + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + so this is overflow behavior of the inner op. + /* Convert combined constant to tree of outer type if + there was no overflow in the original operation. */ "in the original inner operation." you then go on and use ovf_undef also for the outer operation: + if (ovf_undef || vr_outer == VR_RANGE) + { but you do not actually _use_ vr_outer. Do you think that if vr_outer is a VR_RANGE then the outer operation may not possibly have wrapped? That's a false conclusion. But I don't see how overflow in the original outer operation matters and the code lacks comments as to explaining that as well. So if you have a vr0 then you can compute whether the inner operation cannot overflow. You do this here: + if (!ovf_undef && vr0 == VR_RANGE) + { + int max_ovf = 0; + int min_ovf = 0; + + signop sgn = TYPE_SIGN (inner_type); + + wmin = wi::add (wmin0, w1); + min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wmax = wi::add (wmax0, w1); + max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + ovf = min_ovf || max_ovf; + + split_range = ((min_ovf && !max_ovf) + || (!min_ovf && max_ovf)); ah, here's the use of the outer value-range. This lacks a comment (and it looks fishy given the outer value-range is a conservative approximation and thus could be [-INF, +INF]). Why's this not using the wi::add overload with the overflow flag? ISTR you want to handle "negative" unsigned constants somehow, but then I don't see how the above works. I'd say if wmin/wmax interpreted as signed are positive and then using a signed op to add w1 results in a still positive number you're fine (you don't seem to restrict the widening cast to either zero- or sign-extending). + if (ovf_undef || !split_range) + { + /* Extend @1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), + ovf ? SIGNED : TYPE_SIGN (TREE_TYPE (@1))); ideally you could always interpret w1 as signed? + /* Combine in outer, larger type. */ + wide_int combined_cst; + combined_cst = wi::add (w1, w2); +(if (cst) + (outer_op (convert @0) { cst; })) + ))))) bogus indent. +/* ((T)(A)) +- CST -> (T)(A +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (simplify + (outer_op (convert SSA_NAME@0) INTEGER_CST@2) + (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE) INTEGRAL_TYPE_P and do that first before looking at TYPE_PRECISION. + if (vr == VR_RANGE) + { + wide_int wmin = wi::add (wmin0, w1); + bool min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wide_int wmax = wi::add (wmax0, w1); + bool max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf); similar why not use wi:add overload with the overflow flag? Btw, I find (with { tree x = NULL; if (...) x = non-NULL; } (if (x) (....)))) ugly. Use (with { ... } (if (...) (... { non-NULL } ) or sth like that which makes control flow more easily visible. Richard. > Regards > Robin > > -- > > gcc/ChangeLog: > > 2017-06-19 Robin Dapp <rdapp@linux.vnet.ibm.com> > > * match.pd: Simplify wrapped binary operations.
> use INTEGRAL_TYPE_P. Done. > but you do not actually _use_ vr_outer. Do you think that if > vr_outer is a VR_RANGE then the outer operation may not > possibly have wrapped? That's a false conclusion. These were remains of a previous version. vr_outer is indeed not needed anymore; removed. > wi::add overload with the overflow flag? ISTR you want to handle "negative" > unsigned constants somehow, but then I don't see how the above works. > I'd say if wmin/wmax interpreted as signed are positive and then using > a signed op to add w1 results in a still positive number you're fine > (you don't seem > to restrict the widening cast to either zero- or sign-extending). Changed to using wi:add overload now. In essence, three cases are being handled: - wrapped_range --> do not simplify - !wrapped_range && ovf ("negative" unsigned) --> simplify and combine with sign extension in the outer type - !wrapped_range && !ovf ("positive" unsigned) --> simplify and combine with zero extension in the outer type. Regards Robindiff --git a/gcc/match.pd b/gcc/match.pd index 80a17ba..ec1af69 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1290,6 +1290,116 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0)))) +/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (for inner_op (plus minus) + (simplify + (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + (with + { + tree cst; + tree inner_type = TREE_TYPE (@3); + wide_int wmin0, wmax0; + + bool ovf = true; + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + + enum value_range_type vr0 = + get_range_info (@0, &wmin0, &wmax0); + + bool wrapped_range = true; + + /* Convert combined constant to tree of outer type if + there was no overflow in the original inner operation. */ + if (ovf_undef || vr0 == VR_RANGE) + { + wide_int w1 = @1; + wide_int w2 = @2; + + if (inner_op == MINUS_EXPR) + w1 = wi::neg (w1); + + if (outer_op == MINUS_EXPR) + w2 = wi::neg (w2); + + bool ovf; + + if (!ovf_undef && vr0 == VR_RANGE) + { + bool max_ovf; + bool min_ovf; + + signop sgn = TYPE_SIGN (inner_type); + wi::add (wmin0, w1, sgn, &min_ovf); + wi::add (wmax0, w1, sgn, &max_ovf); + + ovf = min_ovf || max_ovf; + wrapped_range = ((min_ovf && !max_ovf) + || (!min_ovf && max_ovf)); + } + + /* Extend @1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), + ovf ? SIGNED : TYPE_SIGN (inner_type)); + + /* Combine in outer, larger type. */ + wide_int combined_cst; + combined_cst = wi::add (w1, w2); + + cst = wide_int_to_tree (type, combined_cst); + } + } + (if (ovf_undef || !wrapped_range) + (outer_op (convert @0) { cst; })) + ))))) +#endif + +/* ((T)(A)) +- CST -> (T)(A +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (simplify + (outer_op (convert SSA_NAME@0) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits + and there is no overflow. */ + (with + { + bool wrapped_range = true; + tree cst_inner = NULL_TREE; + enum value_range_type vr = VR_VARYING; + tree inner_type = TREE_TYPE (@0); + + if (int_fits_type_p (@2, inner_type)) + { + cst_inner = fold_convert (inner_type, @2); + + wide_int wmin0, wmax0; + wide_int w1 = cst_inner; + signop sgn = TYPE_SIGN (inner_type); + vr = get_range_info (@0, &wmin0, &wmax0); + + if (vr == VR_RANGE) + { + bool min_ovf; + wi::add (wmin0, w1, sgn, &min_ovf); + + bool max_ovf; + wi::add (wmax0, w1, sgn, &max_ovf); + + wrapped_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf); + } + } + } + (if (cst_inner && !wrapped_range) + (convert (outer_op @0 { cst_inner; }))) + )))) +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0)
Ping.
On Wed, Jun 21, 2017 at 1:44 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote: >> use INTEGRAL_TYPE_P. > > Done. > >> but you do not actually _use_ vr_outer. Do you think that if >> vr_outer is a VR_RANGE then the outer operation may not >> possibly have wrapped? That's a false conclusion. > > These were remains of a previous version. vr_outer is indeed not needed > anymore; removed. > >> wi::add overload with the overflow flag? ISTR you want to handle "negative" >> unsigned constants somehow, but then I don't see how the above works. >> I'd say if wmin/wmax interpreted as signed are positive and then using >> a signed op to add w1 results in a still positive number you're fine >> (you don't seem >> to restrict the widening cast to either zero- or sign-extending). > > Changed to using wi:add overload now. > > In essence, three cases are being handled: > - wrapped_range --> do not simplify > - !wrapped_range && ovf ("negative" unsigned) --> simplify and combine > with sign extension in the outer type > - !wrapped_range && !ovf ("positive" unsigned) --> simplify and combine > with zero extension in the outer type. Let's split this and look at the simpler case: +/* ((T)(A)) +- CST -> (T)(A +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (simplify + (outer_op (convert SSA_NAME@0) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits + and there is no overflow. */ + (with + { + bool wrapped_range = true; + tree cst_inner = NULL_TREE; + enum value_range_type vr = VR_VARYING; + tree inner_type = TREE_TYPE (@0); + + if (int_fits_type_p (@2, inner_type)) do && get_range_info (...) == VR_RANGE) here. That avoids vr and its initialization and you get all of the "work" when you know it will eventually succeed. + { + cst_inner = fold_convert (inner_type, @2); ideally you'd use a wide-int here and defer the tree allocation to the result wide_int wi = wi::from (@2, TYPE_PRECISION (inner_type), TYPE_SIGN (inner_type)); + wide_int wmin0, wmax0; + wide_int w1 = cst_inner; + signop sgn = TYPE_SIGN (inner_type); + vr = get_range_info (@0, &wmin0, &wmax0); + + if (vr == VR_RANGE) + { + bool min_ovf; + wi::add (wmin0, w1, sgn, &min_ovf); + + bool max_ovf; + wi::add (wmax0, w1, sgn, &max_ovf); So I guess we never run into the outer_op == minus case as the above is clearly wrong for that? The comment above says "if there is no overflow" but below you allow min_ovf && max_ovf without any further explanation. + wrapped_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf); + } + } + } + (if (cst_inner && !wrapped_range) + (convert (outer_op @0 { cst_inner; }))) thus (if ((min_ovf && !max_ovf) || ....) (convert (outer_op @0 { wide_int_to_tree (inner_type, w1); } )))) try to keep vertical spacing in patterns minimal -- I belive that patterns should be small enough to fit in a terminal window (24 lines). Richard. + )))) +#endif > Regards > Robin
> ideally you'd use a wide-int here and defer the tree allocation to the result Did that in the attached version. > So I guess we never run into the outer_op == minus case as the above is > clearly wrong for that? Right, damn, not only was the treatment for this missing but it was bogus in the other pattern as well. Since we are mostly dealing with PLUS_EXPR anyways it's probably better to defer the MINUS_EXPR case for now. This will also slim down the patterns a bit. > try to keep vertical spacing in patterns minimal -- I belive that patterns > should be small enough to fit in a terminal window (24 lines). I find using the expanded wrapped_range condition in the simplification somewhat cumbersome, especially because I need the condition to evaluate to true by default making the initialization unintuitive. Yet, I guess setting wrapped_range = true was not terribly intuitive either... Regards Robindiff --git a/gcc/match.pd b/gcc/match.pd index 80a17ba..ed497cf 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1290,6 +1290,79 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0)))) +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ +#if GIMPLE + (simplify + (plus (convert (plus@3 @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + /* Combine CST1 and CST2 to CST and convert to outer type if + (A + CST1)'s range does not wrap. */ + (with + { + tree inner_type = TREE_TYPE (@3); + wide_int wmin0, wmax0; + wide_int w1 = @1; + wide_int w2 = @2; + wide_int combined_cst; + + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + bool min_ovf = true, max_ovf = false; + + enum value_range_type vr0 = + get_range_info (@0, &wmin0, &wmax0); + + if (ovf_undef || vr0 == VR_RANGE) + { + bool ovf = true; + if (!ovf_undef && vr0 == VR_RANGE) + { + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); + ovf = min_ovf || max_ovf; + } + + /* Extend CST1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), + ovf ? SIGNED : TYPE_SIGN (inner_type)); + } + } + (if (ovf_undef || !((min_ovf && !max_ovf) || (!min_ovf && max_ovf))) + (plus (convert @0) { wide_int_to_tree (type, wi::add (w1, w2)); } + ))))) +#endif + +/* ((T)(A)) + CST -> (T)(A + CST) */ +#if GIMPLE + (simplify + (plus (convert SSA_NAME@0) INTEGER_CST@1) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && int_fits_type_p (@1, TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits + and (A + CST)'s range does not wrap. */ + (with + { + bool min_ovf = true, max_ovf = false; + tree inner_type = TREE_TYPE (@0); + + wide_int w1 = @1; + w1 = w1.from (w1, TYPE_PRECISION (inner_type), TYPE_SIGN + (inner_type)); + + wide_int wmin0, wmax0; + if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE) + { + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); + } + } + (if (!((min_ovf && !max_ovf) || (!min_ovf && max_ovf)) ) + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) + ))) +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0)
On Wed, Jun 28, 2017 at 4:34 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote >> ideally you'd use a wide-int here and defer the tree allocation to the result > > Did that in the attached version. > >> So I guess we never run into the outer_op == minus case as the above is >> clearly wrong for that? > > Right, damn, not only was the treatment for this missing but it was > bogus in the other pattern as well. Since we are mostly dealing with > PLUS_EXPR anyways it's probably better to defer the MINUS_EXPR case for > now. This will also slim down the patterns a bit. > >> try to keep vertical spacing in patterns minimal -- I belive that patterns >> should be small enough to fit in a terminal window (24 lines). > > I find using the expanded wrapped_range condition in the simplification > somewhat cumbersome, especially because I need the condition to evaluate > to true by default making the initialization unintuitive. Yet, I guess > setting wrapped_range = true was not terribly intuitive either... + /* Perform binary operation inside the cast if the constant fits + and (A + CST)'s range does not wrap. */ + (with + { + bool min_ovf = true, max_ovf = false; While the initialization value doesn't matter (wi::add will overwrite it) better initialize both to false ;) Ah, you mean because we want to transform only if get_range_info returned VR_RANGE. Indeed somewhat unintuitive (but still the best variant for now). + wide_int w1 = @1; + w1 = w1.from (w1, TYPE_PRECISION (inner_type), TYPE_SIGN + (inner_type)); I think wi::from (@1, ....) should work as well. + (if (!((min_ovf && !max_ovf) || (!min_ovf && max_ovf)) ) + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) so I'm still missing a comment on why min_ovf && max_ovf is ok. The simple-minded would have written (if (! min_ovf && ! max_ovf) ... I'd like to see testcase(s) with this patch, preferably exactly also for the case of min_ovf && max_ovf. That said, consider (long)[0xfffffffe, 0xffffffff] + 2 which should have min_ovf and max_ovf which results in [0x0, 0x1] in type unsigned int but [0x100000000, 0x100000001] in type long. Richard. > Regards > Robin
> While the initialization value doesn't matter (wi::add will overwrite it) > better initialize both to false ;) Ah, you mean because we want to > transform only if get_range_info returned VR_RANGE. Indeed somewhat > unintuitive (but still the best variant for now). > so I'm still missing a comment on why min_ovf && max_ovf is ok. > The simple-minded would have written [...] I suppose it's more a matter of considering too many things at the same time for me... I was still thinking of including more cases than necessary for the regression. Guess the attached version will do as well and should not contain any more surprises. If needed, I'll add additional cases some time. Tests in a followup message. Regards Robindiff --git a/gcc/match.pd b/gcc/match.pd index 80a17ba..3acf8be 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1290,6 +1290,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0)))) +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ +#if GIMPLE + (simplify + (plus (convert (plus@3 @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + /* Combine CST1 and CST2 to CST and convert to outer type if + (A + CST1)'s range does not overflow. */ + (with + { + tree inner_type = TREE_TYPE (@3); + wide_int wmin0, wmax0; + wide_int w1 = @1; + + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + bool min_ovf = true, max_ovf = true; + + enum value_range_type vr0 = get_range_info (@0, &wmin0, &wmax0); + + if (ovf_undef || vr0 == VR_RANGE) + { + if (!ovf_undef && vr0 == VR_RANGE) + { + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); + } + w1 = w1.from (@1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); + } + } + (if (ovf_undef || !(min_ovf || max_ovf)) + (plus (convert @0) { wide_int_to_tree (type, wi::add (w1, @2)); } + ))))) +#endif + +/* ((T)(A)) + CST -> (T)(A + CST) */ +#if GIMPLE + (simplify + (plus (convert SSA_NAME@0) INTEGER_CST@1) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && int_fits_type_p (@1, TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits + and (A + CST)'s range does not overflow. */ + (with + { + bool min_ovf = true, max_ovf = true; + tree inner_type = TREE_TYPE (@0); + + wide_int w1 = w1.from (@1, TYPE_PRECISION (inner_type), TYPE_SIGN + (inner_type)); + + wide_int wmin0, wmax0; + if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE) + { + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); + } + } + (if (!min_ovf && !max_ovf) + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) + ))) +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0)
[3/3] Tests -- gcc/testsuite/ChangeLog: 2017-07-05 Robin Dapp <rdapp@linux.vnet.ibm.com> * gcc.dg/wrapped-binop-simplify-signed-1.c: New test. * gcc.dg/wrapped-binop-simplify-signed-2.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c new file mode 100644 index 0000000..2571a07 --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ccp1-details" } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 12 "ccp1" } } */ + +#include <limits.h> + +long foo(int a) +{ + return (long)(a - 2) + 1; +} + +long bar(int a) +{ + return (long)(a + 3) - 1; +} + +long baz(int a) +{ + return (long)(a - 1) + 2; +} + +long baf(int a) +{ + return (long)(a + 1) - 2; +} + +long bak(int a) +{ + return (long)(a + 1) + 3; +} + +long bal(int a) +{ + return (long)(a - 7) - 4; +} + +long bam(int a) +{ + return (long)(a - 1) - INT_MAX; +} + +long bam2(int a) +{ + return (long)(a + 1) + INT_MAX; +} + +long ban(int a) +{ + return (long)(a - 1) + INT_MIN; +} + +long ban2(int a) +{ + return (long)(a + 1) - INT_MIN; +} + +unsigned long baq(int a) +{ + return (unsigned long)(a + 1) - 1; +} + +unsigned long baq2(int a) +{ + return (unsigned long)(a - 2) + 1; +} diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c new file mode 100644 index 0000000..5c897ba --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include <assert.h> +#include <limits.h> + +int aa = -3; + +__attribute__((noinline)) +long foo (int a) +{ + return (long)(a - INT_MIN) + 1; +} + +__attribute__((noinline)) +long foo2 (int a) +{ + if (a > -10 && a < 10) + return (long)(a + 2) - 1; +} + +__attribute__((noinline)) +long foo3 (int a) +{ + if (a > -10 && a < 10) + return (long)(a) - 3; +} + +int main() +{ + volatile long h = foo (aa); + assert (h == 2147483646); + + volatile long i = foo2 (aa); + assert (i == -2); + + volatile long j = foo3 (aa); + assert (j == -6); +} diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c new file mode 100644 index 0000000..04a7ca49 --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-ccp2-details -fdump-tree-vrp1-details" } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "evrp" } } */ +/* { dg-final { scan-tree-dump-times "Match-and-simplified" 2 "ccp2" } } */ +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 3 "vrp1" } } */ + +#include <limits.h> + +unsigned long oof2(unsigned int a) +{ + if (a > 0) + return (unsigned long)(a - 1) + 1; +} + +unsigned long bah (unsigned int a) +{ + if (a > 0) + return (unsigned long)(a - 1) - 1; +} + +long baq3(unsigned int a) +{ + if (a > 0) + return (long)(a - 1) + 1; +} + +unsigned long bap(unsigned int a) +{ + if (a < UINT_MAX) + return (unsigned long)(a + 1) + ULONG_MAX; +} + +unsigned long bar3(unsigned int a) +{ + if (a < UINT_MAX) + return (unsigned long)(a + 1) - 5; +} + +unsigned long bar4(unsigned int a) +{ + if (a < UINT_MAX) + return (unsigned long)(a + 1) - 6; +} diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c new file mode 100644 index 0000000..46290e7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c @@ -0,0 +1,125 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include <assert.h> +#include <limits.h> + +unsigned int a = 3; +int aa = 3; +int bb = 1; +int cc = 4; +unsigned int dd = 0; +unsigned int ee = 4294967294u; + +__attribute__((noinline)) +unsigned long foo1 (unsigned int a) +{ + return (unsigned long)(UINT_MAX + 1) - 1; +} + +__attribute__((noinline)) +unsigned long foo2 (unsigned int a) +{ + if (a < 4) + return (unsigned long)(a - 4) + 1; +} + +__attribute__((noinline)) +unsigned long foo3 (unsigned int a) +{ + if (a > 2) + return (unsigned long)(a + UINT_MAX - 4) + 2; +} + +__attribute__((noinline)) +unsigned long foo4 (unsigned int a) +{ + if (a > 2) + return (unsigned long)(a - UINT_MAX) + UINT_MAX; +} + +__attribute__((noinline)) +unsigned long foo5 (unsigned int a) +{ + if (a > 2) + return (unsigned long)(a + UINT_MAX) - UINT_MAX; +} + +__attribute__((noinline)) +long foo6 (unsigned int a) +{ + if (a > 2) + return (long)(a - 4) + 1; +} + +__attribute__((noinline)) +long foo7 (unsigned int a) +{ + if (a > 2) + return (long)(a + UINT_MAX) + 1; +} + +__attribute__((noinline)) +unsigned long foo8 (unsigned int a) +{ + if (a < 2) + return (unsigned long)(a + 4294967294u) + 5000000000; +} + +__attribute__((noinline)) +unsigned long foo9 (unsigned int a) +{ + if (a > 2) + return (unsigned long)(a + 4294967294u) + 8000000000; +} + +__attribute__((noinline)) +unsigned long foo10 (unsigned int a) +{ + if (a < 2) + return (unsigned long)(a + 4294967294u) + 2; +} + +__attribute__((noinline)) +unsigned long foo11 (unsigned int a) +{ + if (a > 4294967293u) + return (unsigned long)(a + 2u) + 2; +} + + +int main() +{ + unsigned long b = foo1 (UINT_MAX); + assert (b == 18446744073709551615ul); + + unsigned long c = foo2 (a); + assert (c == 4294967296u); + + unsigned long d = foo3 (a); + assert (d == 4294967296ul); + + unsigned long e = foo4 (a); + assert (e == 4294967299ul); + + unsigned long f = foo5 (a); + assert (f == 18446744069414584323ul); + + long g = foo6 (a); + assert (g == 4294967296ul); + + long h = foo7 (aa); + assert (h == 3); + + unsigned long i = foo8 (bb); + assert (i == 9294967295ul); + + unsigned long j = foo9 (cc); + assert (j == 8000000002); + + unsigned long k = foo10 (dd); + assert (k == 0x100000000); + + unsigned long l = foo11 (ee); + assert (l == 2); +}
On Wed, 5 Jul 2017, Robin Dapp wrote: >> While the initialization value doesn't matter (wi::add will overwrite it) >> better initialize both to false ;) Ah, you mean because we want to >> transform only if get_range_info returned VR_RANGE. Indeed somewhat >> unintuitive (but still the best variant for now). > >> so I'm still missing a comment on why min_ovf && max_ovf is ok. >> The simple-minded would have written [...] > > I suppose it's more a matter of considering too many things at the same > time for me... I was still thinking of including more cases than > necessary for the regression. Guess the attached version will do as > well and should not contain any more surprises. If needed, I'll add > additional cases some time. What happens for (long)(X+10)+LONG_MAX where X has type int and is in [-30, -20]? It looks like wi::add will overflow and you will generate X+negative which overflows at runtime. (It looks like you don't need to name @3, you could just use the type of @0 instead) -- Marc Glisse
diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c index fbe7e13..110587d 100644 --- a/gcc/tree-ssa-propagate.c +++ b/gcc/tree-ssa-propagate.c @@ -1065,13 +1065,15 @@ substitute_and_fold_dom_walker::before_dom_children (basic_block bb) /* Replace real uses in the statement. */ did_replace |= replace_uses_in (stmt, get_value_fn); + if (did_replace) + gimple_set_modified (stmt, true); /* If we made a replacement, fold the statement. */ - if (did_replace) + if (fold_stmt (&i, follow_single_use_edges)) { - fold_stmt (&i, follow_single_use_edges); stmt = gsi_stmt (i); gimple_set_modified (stmt, true); + did_replace = true; } /* Some statements may be simplified using propagator