Message ID | 20221020233836.2341671-1-richard.henderson@linaro.org |
---|---|
State | Accepted |
Commit | 9b246685b3dbbf21800e3a9a09f8bed384a1fb37 |
Headers | show |
Series | tcg/riscv: Fix reg overlap case in tcg_out_addsub2 | expand |
On Fri, Oct 21, 2022 at 9:47 AM Richard Henderson <richard.henderson@linaro.org> wrote: > > There was a typo using opc_addi instead of opc_add with the > two registers. While we're at it, simplify the gating test > to al == bl to improve dynamic scheduling even when the > output register does not overlap the inputs. > > Reported-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Alistair > --- > Supersedes: 20221020104154.4276-4-zhiwei_liu@linux.alibaba.com > ("[RFC PATCH 3/3] tcg/riscv: Remove a wrong optimization for addsub2") > --- > tcg/riscv/tcg-target.c.inc | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc > index 81a83e45b1..1cdaf7b57b 100644 > --- a/tcg/riscv/tcg-target.c.inc > +++ b/tcg/riscv/tcg-target.c.inc > @@ -687,9 +687,15 @@ static void tcg_out_addsub2(TCGContext *s, > if (cbl) { > tcg_out_opc_imm(s, opc_addi, rl, al, bl); > tcg_out_opc_imm(s, OPC_SLTIU, TCG_REG_TMP0, rl, bl); > - } else if (rl == al && rl == bl) { > + } else if (al == bl) { > + /* > + * If the input regs overlap, this is a simple doubling > + * and carry-out is the input msb. This special case is > + * required when the output reg overlaps the input, > + * but we might as well use it always. > + */ > tcg_out_opc_imm(s, OPC_SLTI, TCG_REG_TMP0, al, 0); > - tcg_out_opc_reg(s, opc_addi, rl, al, bl); > + tcg_out_opc_reg(s, opc_add, rl, al, al); > } else { > tcg_out_opc_reg(s, opc_add, rl, al, bl); > tcg_out_opc_reg(s, OPC_SLTU, TCG_REG_TMP0, > -- > 2.34.1 > >
On Fri, Oct 21, 2022 at 9:47 AM Richard Henderson <richard.henderson@linaro.org> wrote: > > There was a typo using opc_addi instead of opc_add with the > two registers. While we're at it, simplify the gating test > to al == bl to improve dynamic scheduling even when the > output register does not overlap the inputs. > > Reported-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Thanks! Applied to riscv-to-apply.next Alistair > --- > Supersedes: 20221020104154.4276-4-zhiwei_liu@linux.alibaba.com > ("[RFC PATCH 3/3] tcg/riscv: Remove a wrong optimization for addsub2") > --- > tcg/riscv/tcg-target.c.inc | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc > index 81a83e45b1..1cdaf7b57b 100644 > --- a/tcg/riscv/tcg-target.c.inc > +++ b/tcg/riscv/tcg-target.c.inc > @@ -687,9 +687,15 @@ static void tcg_out_addsub2(TCGContext *s, > if (cbl) { > tcg_out_opc_imm(s, opc_addi, rl, al, bl); > tcg_out_opc_imm(s, OPC_SLTIU, TCG_REG_TMP0, rl, bl); > - } else if (rl == al && rl == bl) { > + } else if (al == bl) { > + /* > + * If the input regs overlap, this is a simple doubling > + * and carry-out is the input msb. This special case is > + * required when the output reg overlaps the input, > + * but we might as well use it always. > + */ > tcg_out_opc_imm(s, OPC_SLTI, TCG_REG_TMP0, al, 0); > - tcg_out_opc_reg(s, opc_addi, rl, al, bl); > + tcg_out_opc_reg(s, opc_add, rl, al, al); > } else { > tcg_out_opc_reg(s, opc_add, rl, al, bl); > tcg_out_opc_reg(s, OPC_SLTU, TCG_REG_TMP0, > -- > 2.34.1 > >
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc index 81a83e45b1..1cdaf7b57b 100644 --- a/tcg/riscv/tcg-target.c.inc +++ b/tcg/riscv/tcg-target.c.inc @@ -687,9 +687,15 @@ static void tcg_out_addsub2(TCGContext *s, if (cbl) { tcg_out_opc_imm(s, opc_addi, rl, al, bl); tcg_out_opc_imm(s, OPC_SLTIU, TCG_REG_TMP0, rl, bl); - } else if (rl == al && rl == bl) { + } else if (al == bl) { + /* + * If the input regs overlap, this is a simple doubling + * and carry-out is the input msb. This special case is + * required when the output reg overlaps the input, + * but we might as well use it always. + */ tcg_out_opc_imm(s, OPC_SLTI, TCG_REG_TMP0, al, 0); - tcg_out_opc_reg(s, opc_addi, rl, al, bl); + tcg_out_opc_reg(s, opc_add, rl, al, al); } else { tcg_out_opc_reg(s, opc_add, rl, al, bl); tcg_out_opc_reg(s, OPC_SLTU, TCG_REG_TMP0,
There was a typo using opc_addi instead of opc_add with the two registers. While we're at it, simplify the gating test to al == bl to improve dynamic scheduling even when the output register does not overlap the inputs. Reported-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Supersedes: 20221020104154.4276-4-zhiwei_liu@linux.alibaba.com ("[RFC PATCH 3/3] tcg/riscv: Remove a wrong optimization for addsub2") --- tcg/riscv/tcg-target.c.inc | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)