Message ID | CABXYE2X4OhMvyK-z+yQvDkcT0TgKUc2Nv+YdhXWZnJDNDMeJqg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Jul 14, 2015 at 9:13 AM, Richard Earnshaw <Richard.Earnshaw@foss.arm.com> wrote: > We went through this a couple of weeks back. The backend documentation > for PROMOTE_MODE says: I disagree that this is a fully resolved issue. I see clear problems with how the ARM port uses PROMOTE_MODE. > " For most machines, the macro definition does not change @var{unsignedp}. > However, some machines, have instructions that preferentially handle > either signed or unsigned quantities of certain modes. For example, on > the DEC Alpha, 32-bit loads from memory and 32-bit add instructions > sign-extend the result to 64 bits. On such machines, set > @var{unsignedp} according to which kind of extension is more efficient." The Alpha case is different than the ARM case. The Alpha only changes sign for 32-bit values in 64-bit registers. The alpha happens to have a nice set of instructions that operates on 32-bit values, that accepts these wrong-signed values and handle them correctly. Thus on the alpha, it appears that there are no extra zero/sign extends required, and everything is the same speed or faster with the wrong sign. The MIPS port works the same way. This is not true on the arm though. The arm port is changing sign on 8 and 16 bit value, but does not have instructions that directly operate on 8 and 16 bit values. This requires the compiler to emit extra zero/sign extend instructions that affect the performance of the code. Other than the thumb1 8-bit load, and the pre-armv6 use of andi for 8-bit zero-extend, I haven't seen anything that is faster in the wrong sign, and there are a number of operations that are actually slower because of the extra zero/sign extends required by the arm PROMOTE_MODE definition. I've given some examples. I have since noticed that the relatively new pass to optimize away unnecessary zero/sign extensions is actually fixing some of the damage caused by the ARM PROMOTE_MODE definition. But there are still some cases that won't get fixed, as per my earlier examples. It is better to emit the fast code at the beginning than to rely on an optimization pass to convert the slow code into fast code. So I think the ARM PROMOTE_MODE definition still needs to be fixed. > So clearly it anticipates that all permitted extensions should work, and > in particular it makes no mention of this having to match some > abi-mandated promotions. That makes this a MI bug not a target one. If you look at older gcc releases, like gcc-2.95.3, you will see that there is only PROMOTE_MODE and a boolean PROMOTE_FUNCTION_ARGS which says whether PROMOTE_MODE is applied to function arguments or not. You can't have different zero/sign extensions for parameters and locals in gcc-2.95.3. The fact that you can have it today is more an accident than a deliberate choice. When PROMOTE_FUNCTION_ARGS was hookized, it was made into a separate function, and for the ARM port, the code for PROMOTE_MODE was not correctly copied into the new hook, resulting in the accidental difference for parameter and local zero/sign promotion for ARM. The PROMOTE_MODE docs were written back when different sign/zero extensions for parms/locals weren't possible, and hence did not consider the case. Now that we do have the problem, we can't fix it without an ARM port ABI change, which is undesirable, so we may have to fix it with a MI change. There were two MI changes suggested, one was fixing the out-of-ssa pass to handle SUBREG_PROMOTED_P promotions. The other was to disallow creating PHI nodes between parms and locals. I haven't had a chance to try implementing the second one yet; I hope to work on that today. Jim
On Wed, Jul 15, 2015 at 6:04 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Tue, 14 Jul 2015, Jim Wilson wrote: > >> Now that we do have the problem, we can't fix it without an ARM port ABI >> change, which is undesirable, so we may have to fix it with a MI change. > > What's the ABI implication of fixing the inconsistency? Currently signed chars and signed shorts are passed sign-extended. If we make TARGET_PROMOTE_FUNCTION_MODE work the same as PROMOTE_MODE, then they will be passed zero-extended. Given the testcase: int sub (int) __attribute__ ((noinline)); int sub2 (signed char) __attribute__ ((noinline)); int sub (int i) { return sub2 (i); } int sub2 (signed char c) { return c & 0xff; } Currently sub will do a char sign-extend to convert the int to signed char, and sub2 will do a char zero-extend for the and. With the change, sub will do a char zero-extend to convert the int to unsigned char, and sub2 will do nothing. If you compile sub without the change and sub2 with the change, then you lose the and operation and get a sign-extended char at the end. >> There were two MI changes suggested, one was fixing the out-of-ssa pass >> to handle SUBREG_PROMOTED_P promotions. The other was to disallow >> creating PHI nodes between parms and locals. I haven't had a chance to >> try implementing the second one yet; I hope to work on that today. > > Don't bother with the latter, it doesn't have a chance of being accepted. I tried looking at it anyways, as I need to learn more about this stuff. It didn't seem feasible without changing a lot of optimization passes which doesn't seem reasonable. > If the terrible hack in outof-ssa really will be necessary (and I really > really hope it won't) then I think I prefer the approach you partly tried > in comment #12 of PR 65932 already. Let partition_to_pseudo[] refer to > the promoted subreg and deal with that situation in emit_partition_copy; > I'd then hope that the unsignedsrcp parameter could go away (unfortunately > the sizeexp will have to stay). Yes, I think that is a cleaner way to do it, but I had trouble getting that to work as I don't know enough about the code yet. Doing it directly in emit_partition_copy was easier, just to prove it could work. I can go back and try to make this work again. Jim
Index: tree-outof-ssa.c =================================================================== --- tree-outof-ssa.c (revision 225477) +++ tree-outof-ssa.c (working copy) @@ -230,11 +230,32 @@ set_location_for_edge (edge e) SRC/DEST might be BLKmode memory locations SIZEEXP is a tree from which we deduce the size to copy in that case. */ -static inline rtx_insn * -emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp) +rtx_insn * +emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp, + tree var2 ATTRIBUTE_UNUSED) { start_sequence (); + /* If var2 is set, then sizeexp is the src decl and var2 is the dest decl. */ + if (var2) + { + tree src_var = (TREE_CODE (sizeexp) == SSA_NAME + ? SSA_NAME_VAR (sizeexp) : sizeexp); + tree dest_var = (TREE_CODE (var2) == SSA_NAME + ? SSA_NAME_VAR (var2) : var2); + int src_unsignedp = TYPE_UNSIGNED (TREE_TYPE (src_var)); + int dest_unsignedp = TYPE_UNSIGNED (TREE_TYPE (dest_var)); + machine_mode src_mode = promote_decl_mode (src_var, &src_unsignedp); + machine_mode dest_mode = promote_decl_mode (dest_var, &dest_unsignedp); + if (src_unsignedp != dest_unsignedp + && src_mode != DECL_MODE (src_var) + && dest_mode != DECL_MODE (dest_var)) + { + src = gen_lowpart_common (DECL_MODE (src_var), src); + unsignedsrcp = dest_unsignedp; + } + } + if (GET_MODE (src) != VOIDmode && GET_MODE (src) != GET_MODE (dest)) src = convert_to_mode (GET_MODE (dest), src, unsignedsrcp); if (GET_MODE (src) == BLKmode) @@ -256,7 +277,7 @@ emit_partition_copy (rtx dest, rtx src, static void insert_partition_copy_on_edge (edge e, int dest, int src, source_location locus) { - tree var; + tree var, var2; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, @@ -276,10 +297,11 @@ insert_partition_copy_on_edge (edge e, i set_curr_insn_location (locus); var = partition_to_var (SA.map, src); + var2 = partition_to_var (SA.map, dest); rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]), copy_rtx (SA.partition_to_pseudo[src]), TYPE_UNSIGNED (TREE_TYPE (var)), - var); + var, var2); insert_insn_on_edge (seq, e); } @@ -373,7 +395,8 @@ insert_rtx_to_part_on_edge (edge e, int involved), so it doesn't matter. */ rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]), src, unsignedsrcp, - partition_to_var (SA.map, dest)); + partition_to_var (SA.map, dest), 0); + insert_insn_on_edge (seq, e); } @@ -406,7 +429,7 @@ insert_part_to_rtx_on_edge (edge e, rtx rtx_insn *seq = emit_partition_copy (dest, copy_rtx (SA.partition_to_pseudo[src]), TYPE_UNSIGNED (TREE_TYPE (var)), - var); + var, 0); insert_insn_on_edge (seq, e); }