Message ID | CAAgBjMmtn1_bTRbKv=LdaqVioaDTObjs+ubnS+QSvcqR6fecNg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Thu, Aug 03, 2017 at 12:58:06PM +0530, Prathamesh Kulkarni wrote: > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt) > return; > } > break; > + case CFN_BUILT_IN_TOUPPER: > + case CFN_BUILT_IN_TOLOWER: > + if (tree lhs = gimple_call_lhs (stmt)) > + { > + tree type = TREE_TYPE (lhs); > + unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A'; > + unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z'; > + tree range_min = build_int_cstu (type, min); > + tree range_max = build_int_cstu (type, max); > + set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL); > + return; You are hardcoding here host characters and using it for target. I think you need to use lang_hooks.to_target_charset (really no idea how it works or doesn't in LTO, but gimple-fold.c is already using it among others). Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps, which isn't the case for e.g. EBCDIC. So I think you'd need to verify (once?) that the target charset has this property. Jakub
On 3 August 2017 at 13:21, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Aug 03, 2017 at 12:58:06PM +0530, Prathamesh Kulkarni wrote: >> --- a/gcc/tree-vrp.c >> +++ b/gcc/tree-vrp.c >> @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt) >> return; >> } >> break; >> + case CFN_BUILT_IN_TOUPPER: >> + case CFN_BUILT_IN_TOLOWER: >> + if (tree lhs = gimple_call_lhs (stmt)) >> + { >> + tree type = TREE_TYPE (lhs); >> + unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A'; >> + unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z'; >> + tree range_min = build_int_cstu (type, min); >> + tree range_max = build_int_cstu (type, max); >> + set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL); >> + return; > > You are hardcoding here host characters and using it for target. > I think you need to use > lang_hooks.to_target_charset > (really no idea how it works or doesn't in LTO, but gimple-fold.c is already > using it among others). > Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps, > which isn't the case for e.g. EBCDIC. So I think you'd need to verify > (once?) that the target charset has this property. Hi Jakub, Thanks for the suggestions, I wasn't aware about lang_hooks.to_target_charset. Would following be a correct check that target has ascii charset and only then set range to ~[a, z] ? target_a = lang_hooks.to_target_charset ('a'); target_z = lang_hooks.to_target_charset('z'); if (target_a == 'a' && target_z == 'z') { // set vr to ~['a', 'z'] } Thanks, Prathamesh > > Jakub
On Thu, Aug 03, 2017 at 04:13:38PM +0530, Prathamesh Kulkarni wrote: > > You are hardcoding here host characters and using it for target. > > I think you need to use > > lang_hooks.to_target_charset > > (really no idea how it works or doesn't in LTO, but gimple-fold.c is already > > using it among others). > > Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps, > > which isn't the case for e.g. EBCDIC. So I think you'd need to verify > > (once?) that the target charset has this property. > Hi Jakub, > Thanks for the suggestions, I wasn't aware about lang_hooks.to_target_charset. > Would following be a correct check that target has ascii charset and only then > set range to ~[a, z] ? > > target_a = lang_hooks.to_target_charset ('a'); > target_z = lang_hooks.to_target_charset('z'); > if (target_a == 'a' && target_z == 'z') > { > // set vr to ~['a', 'z'] > } No. E.g. if host == target charset is EBCDIC, then the condition is true, but it isn't a consecutive range. A rough check that would maybe work (at least would be false for EBCDIC and true for ASCII) could be something like: if (target_z > target_a && target_z - target_a == 25) (which is not exact, some strange charset could have say '0'-'9' block in the middle of 'a'-'z' block, and say 'h'-'r' outside of the range), but perhaps good enough for real-world charsets. In any case, you should probably investigate all the locales say in glibc or some other big locale repository whether tolower/toupper have the expected properties there. Plus of course, the set vr to ~[ ] needs to use target_a/target_z. Jakub
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > In any case, you should probably investigate all the locales say in glibc or > some other big locale repository whether tolower/toupper have the expected > properties there. They don't. In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a multibyte character and toupper can only return single-byte characters. -- Joseph S. Myers joseph@codesourcery.com
On Thu, Aug 03, 2017 at 02:38:35PM +0000, Joseph Myers wrote: > On Thu, 3 Aug 2017, Jakub Jelinek wrote: > > > In any case, you should probably investigate all the locales say in glibc or > > some other big locale repository whether tolower/toupper have the expected > > properties there. > > They don't. In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the > correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a > multibyte character and toupper can only return single-byte characters. Indeed, #include <ctype.h> #include <locale.h> int main () { setlocale (LC_ALL, ""); int i; for (i = -1000; i < 1000; i++) if (tolower (i) >= 'A' && tolower (i) <= 'Z') __builtin_abort (); else if (toupper (i) >= 'a' && toupper (i) <= 'z') __builtin_abort (); return 0; } fails for LC_ALL=tr_TR.UTF-8, because tolower ('I') is 'I'. Not to mention that the result is unspecified if the functions are called with a value outside of the range of unsigned char or EOF. Therefore, this optimization is invalid. Jakub
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr78888.c b/gcc/testsuite/gcc.dg/tree-ssa/pr78888.c new file mode 100644 index 00000000000..2bcddf1f2c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr78888.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-slim" } */ + +void g (int x) +{ + if (__builtin_toupper ((unsigned char)x) == 'a') + __builtin_abort (); +} + +void h (int x) +{ + if (__builtin_tolower ((unsigned char)x) == 'A') + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */ diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 79a29bf0efb..7137a4c52ec 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt) return; } break; + case CFN_BUILT_IN_TOUPPER: + case CFN_BUILT_IN_TOLOWER: + if (tree lhs = gimple_call_lhs (stmt)) + { + tree type = TREE_TYPE (lhs); + unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A'; + unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z'; + tree range_min = build_int_cstu (type, min); + tree range_max = build_int_cstu (type, max); + set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL); + return; + } + break; default: break; }