[RFC,v3,00/34] Hexagon patch series

Message ID	1597765847-16637-1-git-send-email-tsimpson@quicinc.com
Headers	show Return-Path: <SRS0=kYy5=B4=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 03BDE20888 From: Taylor Simpson <tsimpson@quicinc.com> To: qemu-devel@nongnu.org Subject: [RFC PATCH v3 00/34] Hexagon patch series Date: Tue, 18 Aug 2020 10:50:13 -0500 Message-Id: <1597765847-16637-1-git-send-email-tsimpson@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Received-SPF: pass client-ip=199.106.114.38; envelope-from=tsimpson@qualcomm.com; helo=alexa-out-sd-01.qualcomm.com Precedence: list Cc: ale@rev.ng, riku.voipio@iki.fi, richard.henderson@linaro.org, laurent@vivier.eu, tsimpson@quicinc.com, philmd@redhat.com, aleksandar.m.mail@gmail.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	Hexagon patch series \| expand [RFC,v3,00/34] Hexagon patch series [RFC,v3,04/34] Hexagon (target/hexagon) scalar core definition [RFC,v3,05/34] Hexagon (target/hexagon) register names [RFC,v3,07/34] Hexagon (target/hexagon) scalar core helpers [RFC,v3,10/34] Hexagon (target/hexagon) instruction and packet types [RFC,v3,12/34] Hexagon (target/hexagon) instruction attributes [RFC,v3,13/34] Hexagon (target/hexagon) register map [RFC,v3,14/34] Hexagon (target/hexagon) instruction/packet decode [RFC,v3,17/34] Hexagon (target/hexagon/imported) arch import - macro definitions [RFC,v3,19/34] Hexagon (target/hexagon/imported) arch import - instruction encoding [RFC,v3,21/34] Hexagon (target/hexagon) generator phase 2 - generate header files [RFC,v3,22/34] Hexagon (target/hexagon) generator phase 3 - C preprocessor for decode tree [RFC,v3,23/34] Hexagon (target/hexagon) generater phase 4 - decode tree [RFC,v3,25/34] Hexagon (target/hexagon) macros to interface with the generator [RFC,v3,31/34] Hexagon (target/hexagon) translation [RFC,v3,33/34] Hexagon (tests/tcg/hexagon) TCG tests [RFC,v3,34/34] Hexagon build infrastructure

Taylor Simpson Aug. 18, 2020, 3:50 p.m. UTC

This series adds support for the Hexagon processor with Linux user support

See patch 02/34 Hexagon README for detailed information.

Once the series is applied, the Hexagon port will pass "make check-tcg".
The series also includes Hexagon-specific tests in tcg/tests/hexagon.

We have a parallel effort to make the Hexagon Linux toolchain inside a docker
container publically available.

*** Future items under consideration ***
Use qemu softfloat
Use qemu decodetree

*** Known checkpatch issues ***

The following are known checkpatch errors in the series
    target/hexagon/reg_fields.h         Complex macro
    target/hexagon/attribs.h            Complex macro
    target/hexagon/decode.c             Complex macro
    target/hexagon/q6v_decode.c         Macro needs do - while
    target/hexagon/printinsn.c          Macro needs do - while
    target/hexagon/gen_semantics.c      Suspicious ; after while (0)
    target/hexagon/gen_dectree_import.c Complex macro
    target/hexagon/gen_dectree_import.c Suspicious ; after while (0)
    target/hexagon/opcodes.c            Complex macro
    target/hexagon/iclass.h             Complex macro
    scripts/qemu-binfmt-conf.sh         Line over 90 characters

The following are known checkpatch warnings in the series
    target/hexagon/fma_emu.c            Comments inside macro definition
    scripts/qemu-binfmt-conf.sh         Line over 80 characters

*** Changes in v3 ***
Remove substantial portions of the code to facilitate review
- Plan to submit subsequent patches
- Hexagon Vector eXtensions (HVX)
- Circular and bit-reverse addressiong
- Add/sub-with-carry
- Unused insn_t and pkt_t fields
- Unused instruction attributes
- All TCG overrides except instructions with multiple definitions
- Unused macros
- Unused reg fields
- COUNT_HEX_HELPERS
Use Laurent's gensyscall.sh script to generate linux-user/hexagon/syscall_nr.h
Handle mem_noshuf
Remove "RsV = RsV" per review feedback
Simplify include file structure
Add directed tests in <qemu>/tests/tcg/hexagon
Rework the python scripts to generate one header file at a time
Change fWRAP_* macros to fGEN_TCG_*

*** Changes in v2 ***
- Use scripts/git.orderfile

Taylor Simpson (34):
  Hexagon Update MAINTAINERS file
  Hexagon (target/hexagon) README
  Hexagon (include/elf.h) ELF machine definition
  Hexagon (target/hexagon) scalar core definition
  Hexagon (target/hexagon) register names
  Hexagon (disas) disassembler
  Hexagon (target/hexagon) scalar core helpers
  Hexagon (target/hexagon) GDB Stub
  Hexagon (target/hexagon) architecture types
  Hexagon (target/hexagon) instruction and packet types
  Hexagon (target/hexagon) register fields
  Hexagon (target/hexagon) instruction attributes
  Hexagon (target/hexagon) register map
  Hexagon (target/hexagon) instruction/packet decode
  Hexagon (target/hexagon) instruction printing
  Hexagon (target/hexagon) utility functions
  Hexagon (target/hexagon/imported) arch import - macro definitions
  Hexagon (target/hexagon/imported) arch import - instruction semantics
  Hexagon (target/hexagon/imported) arch import - instruction encoding
  Hexagon (target/hexagon) generator phase 1 - C preprocessor for
    semantics
  Hexagon (target/hexagon) generator phase 2 - generate header files
  Hexagon (target/hexagon) generator phase 3 - C preprocessor for decode
    tree
  Hexagon (target/hexagon) generater phase 4 - decode tree
  Hexagon (target/hexagon) opcode data structures
  Hexagon (target/hexagon) macros to interface with the generator
  Hexagon (target/hexagon) macros referenced in instruction semantics
  Hexagon (target/hexagon) instruction classes
  Hexagon (target/hexagon) TCG generation helpers
  Hexagon (target/hexagon) TCG generation
  Hexagon (target/hexagon) TCG for instructions with multiple
    definitions
  Hexagon (target/hexagon) translation
  Hexagon (linux-user/hexagon) Linux user emulation
  Hexagon (tests/tcg/hexagon) TCG tests
  Hexagon build infrastructure

 configure                                  |    9 +
 default-configs/hexagon-linux-user.mak     |    1 +
 include/disas/dis-asm.h                    |    1 +
 include/elf.h                              |    2 +
 linux-user/hexagon/sockbits.h              |   18 +
 linux-user/hexagon/syscall_nr.h            |  343 +++++
 linux-user/hexagon/target_cpu.h            |   44 +
 linux-user/hexagon/target_elf.h            |   40 +
 linux-user/hexagon/target_fcntl.h          |   18 +
 linux-user/hexagon/target_signal.h         |   34 +
 linux-user/hexagon/target_structs.h        |   46 +
 linux-user/hexagon/target_syscall.h        |   32 +
 linux-user/hexagon/termbits.h              |   18 +
 linux-user/syscall_defs.h                  |   33 +
 target/hexagon/arch.h                      |   42 +
 target/hexagon/attribs.h                   |   32 +
 target/hexagon/attribs_def.h               |   98 ++
 target/hexagon/conv_emu.h                  |   50 +
 target/hexagon/cpu-param.h                 |   26 +
 target/hexagon/cpu.h                       |  164 +++
 target/hexagon/cpu_bits.h                  |   34 +
 target/hexagon/decode.h                    |   39 +
 target/hexagon/fma_emu.h                   |   27 +
 target/hexagon/gen_tcg.h                   |  198 +++
 target/hexagon/genptr.h                    |   25 +
 target/hexagon/genptr_helpers.h            |  244 ++++
 target/hexagon/helper.h                    |   35 +
 target/hexagon/hex_arch_types.h            |   43 +
 target/hexagon/hex_regs.h                  |   83 ++
 target/hexagon/iclass.h                    |   46 +
 target/hexagon/insn.h                      |   75 +
 target/hexagon/internal.h                  |   42 +
 target/hexagon/macros.h                    |  982 +++++++++++++
 target/hexagon/opcodes.h                   |   66 +
 target/hexagon/printinsn.h                 |   26 +
 target/hexagon/reg_fields.h                |   40 +
 target/hexagon/reg_fields_def.h            |   78 +
 target/hexagon/regmap.h                    |   38 +
 target/hexagon/translate.h                 |  103 ++
 disas/hexagon.c                            |   62 +
 linux-user/elfload.c                       |   16 +
 linux-user/hexagon/cpu_loop.c              |   99 ++
 linux-user/hexagon/signal.c                |  276 ++++
 linux-user/syscall.c                       |    2 +
 target/hexagon/arch.c                      |  354 +++++
 target/hexagon/conv_emu.c                  |  369 +++++
 target/hexagon/cpu.c                       |  318 +++++
 target/hexagon/decode.c                    |  593 ++++++++
 target/hexagon/fma_emu.c                   |  781 ++++++++++
 target/hexagon/gdbstub.c                   |   49 +
 target/hexagon/gen_dectree_import.c        |  191 +++
 target/hexagon/gen_semantics.c             |   88 ++
 target/hexagon/genptr.c                    |   56 +
 target/hexagon/iclass.c                    |   88 ++
 target/hexagon/op_helper.c                 |  365 +++++
 target/hexagon/opcodes.c                   |  211 +++
 target/hexagon/printinsn.c                 |   94 ++
 target/hexagon/q6v_decode.c                |  373 +++++
 target/hexagon/reg_fields.c                |   28 +
 target/hexagon/translate.c                 |  730 ++++++++++
 tests/tcg/hexagon/atomics.c                |  122 ++
 tests/tcg/hexagon/clrtnew.c                |   56 +
 tests/tcg/hexagon/dual_stores.c            |   60 +
 tests/tcg/hexagon/exec_counters.c          |   57 +
 tests/tcg/hexagon/mem_noshuf.c             |  291 ++++
 tests/tcg/hexagon/misc.c                   |  293 ++++
 tests/tcg/hexagon/preg_alias.c             |  106 ++
 tests/tcg/hexagon/pthread_cancel.c         |   43 +
 tests/tcg/hexagon/sfminmax.c               |   62 +
 MAINTAINERS                                |    8 +
 disas/Makefile.objs                        |    1 +
 scripts/gensyscalls.sh                     |    3 +-
 scripts/qemu-binfmt-conf.sh                |    6 +-
 target/hexagon/Makefile.objs               |  203 +++
 target/hexagon/README                      |  254 ++++
 target/hexagon/dectree.py                  |  352 +++++
 target/hexagon/gen_helper_funcs.py         |  230 +++
 target/hexagon/gen_helper_protos.py        |  158 +++
 target/hexagon/gen_op_attribs.py           |   46 +
 target/hexagon/gen_op_regs.py              |  119 ++
 target/hexagon/gen_opcodes_def.py          |   43 +
 target/hexagon/gen_printinsn.py            |  182 +++
 target/hexagon/gen_shortcode.py            |   71 +
 target/hexagon/gen_tcg_funcs.py            |  301 ++++
 target/hexagon/hex_common.py               |  204 +++
 target/hexagon/imported/allidefs.def       |   30 +
 target/hexagon/imported/alu.idef           | 1259 +++++++++++++++++
 target/hexagon/imported/branch.idef        |  328 +++++
 target/hexagon/imported/compare.idef       |  621 ++++++++
 target/hexagon/imported/encode.def         |  125 ++
 target/hexagon/imported/encode_pp.def      | 2110 ++++++++++++++++++++++++++++
 target/hexagon/imported/encode_subinsn.def |  150 ++
 target/hexagon/imported/float.idef         |  313 +++++
 target/hexagon/imported/iclass.def         |   52 +
 target/hexagon/imported/ldst.idef          |  286 ++++
 target/hexagon/imported/macros.def         | 1529 ++++++++++++++++++++
 target/hexagon/imported/mpy.idef           | 1212 ++++++++++++++++
 target/hexagon/imported/shift.idef         | 1067 ++++++++++++++
 target/hexagon/imported/subinsns.idef      |  152 ++
 target/hexagon/imported/system.idef        |   69 +
 tests/tcg/configure.sh                     |    4 +-
 tests/tcg/hexagon/Makefile.target          |   49 +
 tests/tcg/hexagon/first.S                  |   57 +
 tests/tcg/hexagon/float_convs.ref          |  748 ++++++++++
 tests/tcg/hexagon/float_madds.ref          |  768 ++++++++++
 105 files changed, 22615 insertions(+), 3 deletions(-)
 create mode 100644 default-configs/hexagon-linux-user.mak
 create mode 100644 linux-user/hexagon/sockbits.h
 create mode 100644 linux-user/hexagon/syscall_nr.h
 create mode 100644 linux-user/hexagon/target_cpu.h
 create mode 100644 linux-user/hexagon/target_elf.h
 create mode 100644 linux-user/hexagon/target_fcntl.h
 create mode 100644 linux-user/hexagon/target_signal.h
 create mode 100644 linux-user/hexagon/target_structs.h
 create mode 100644 linux-user/hexagon/target_syscall.h
 create mode 100644 linux-user/hexagon/termbits.h
 create mode 100644 target/hexagon/arch.h
 create mode 100644 target/hexagon/attribs.h
 create mode 100644 target/hexagon/attribs_def.h
 create mode 100644 target/hexagon/conv_emu.h
 create mode 100644 target/hexagon/cpu-param.h
 create mode 100644 target/hexagon/cpu.h
 create mode 100644 target/hexagon/cpu_bits.h
 create mode 100644 target/hexagon/decode.h
 create mode 100644 target/hexagon/fma_emu.h
 create mode 100644 target/hexagon/gen_tcg.h
 create mode 100644 target/hexagon/genptr.h
 create mode 100644 target/hexagon/genptr_helpers.h
 create mode 100644 target/hexagon/helper.h
 create mode 100644 target/hexagon/hex_arch_types.h
 create mode 100644 target/hexagon/hex_regs.h
 create mode 100644 target/hexagon/iclass.h
 create mode 100644 target/hexagon/insn.h
 create mode 100644 target/hexagon/internal.h
 create mode 100644 target/hexagon/macros.h
 create mode 100644 target/hexagon/opcodes.h
 create mode 100644 target/hexagon/printinsn.h
 create mode 100644 target/hexagon/reg_fields.h
 create mode 100644 target/hexagon/reg_fields_def.h
 create mode 100644 target/hexagon/regmap.h
 create mode 100644 target/hexagon/translate.h
 create mode 100644 disas/hexagon.c
 create mode 100644 linux-user/hexagon/cpu_loop.c
 create mode 100644 linux-user/hexagon/signal.c
 create mode 100644 target/hexagon/arch.c
 create mode 100644 target/hexagon/conv_emu.c
 create mode 100644 target/hexagon/cpu.c
 create mode 100644 target/hexagon/decode.c
 create mode 100644 target/hexagon/fma_emu.c
 create mode 100644 target/hexagon/gdbstub.c
 create mode 100644 target/hexagon/gen_dectree_import.c
 create mode 100644 target/hexagon/gen_semantics.c
 create mode 100644 target/hexagon/genptr.c
 create mode 100644 target/hexagon/iclass.c
 create mode 100644 target/hexagon/op_helper.c
 create mode 100644 target/hexagon/opcodes.c
 create mode 100644 target/hexagon/printinsn.c
 create mode 100644 target/hexagon/q6v_decode.c
 create mode 100644 target/hexagon/reg_fields.c
 create mode 100644 target/hexagon/translate.c
 create mode 100644 tests/tcg/hexagon/atomics.c
 create mode 100644 tests/tcg/hexagon/clrtnew.c
 create mode 100644 tests/tcg/hexagon/dual_stores.c
 create mode 100644 tests/tcg/hexagon/exec_counters.c
 create mode 100644 tests/tcg/hexagon/mem_noshuf.c
 create mode 100644 tests/tcg/hexagon/misc.c
 create mode 100644 tests/tcg/hexagon/preg_alias.c
 create mode 100644 tests/tcg/hexagon/pthread_cancel.c
 create mode 100644 tests/tcg/hexagon/sfminmax.c
 create mode 100644 target/hexagon/Makefile.objs
 create mode 100644 target/hexagon/README
 create mode 100755 target/hexagon/dectree.py
 create mode 100755 target/hexagon/gen_helper_funcs.py
 create mode 100755 target/hexagon/gen_helper_protos.py
 create mode 100755 target/hexagon/gen_op_attribs.py
 create mode 100755 target/hexagon/gen_op_regs.py
 create mode 100755 target/hexagon/gen_opcodes_def.py
 create mode 100755 target/hexagon/gen_printinsn.py
 create mode 100755 target/hexagon/gen_shortcode.py
 create mode 100755 target/hexagon/gen_tcg_funcs.py
 create mode 100755 target/hexagon/hex_common.py
 create mode 100644 target/hexagon/imported/allidefs.def
 create mode 100644 target/hexagon/imported/alu.idef
 create mode 100644 target/hexagon/imported/branch.idef
 create mode 100644 target/hexagon/imported/compare.idef
 create mode 100644 target/hexagon/imported/encode.def
 create mode 100644 target/hexagon/imported/encode_pp.def
 create mode 100644 target/hexagon/imported/encode_subinsn.def
 create mode 100644 target/hexagon/imported/float.idef
 create mode 100644 target/hexagon/imported/iclass.def
 create mode 100644 target/hexagon/imported/ldst.idef
 create mode 100755 target/hexagon/imported/macros.def
 create mode 100644 target/hexagon/imported/mpy.idef
 create mode 100644 target/hexagon/imported/shift.idef
 create mode 100644 target/hexagon/imported/subinsns.idef
 create mode 100644 target/hexagon/imported/system.idef
 create mode 100644 tests/tcg/hexagon/Makefile.target
 create mode 100644 tests/tcg/hexagon/first.S
 create mode 100644 tests/tcg/hexagon/float_convs.ref
 create mode 100644 tests/tcg/hexagon/float_madds.ref

Richard Henderson Aug. 29, 2020, 1:37 a.m. UTC | #1

On 8/18/20 8:50 AM, Taylor Simpson wrote:
> +} iclass_t;
...
> +extern const char *find_iclass_slots(opcode_t opcode, int itype);
...
> +typedef struct {
> +    const char * const slots;
> +} iclass_info_t;

I'll note that you aren't following our CODING_STYLE for types.  Which of these
need to match imported/ and which can you fix.

> +typedef enum {
> +
> +#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS)    ICLASS_FROM_TYPE(TYPE),
> +#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS)    /* nothing */
> +#include "imported/iclass.def"
> +#undef DEF_PP_ICLASS32
> +#undef DEF_EE_ICLASS32
> +
> +#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS)    ICLASS_FROM_TYPE(TYPE),
> +#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS)    /* nothing */
> +#include "imported/iclass.def"
> +#undef DEF_PP_ICLASS32
> +#undef DEF_EE_ICLASS32

Any particular reason why you're defining one as nothing, and doing this dance
twice?

> +const char *find_iclass_slots(opcode_t opcode, int itype)
> +{
> +    /* There are some exceptions to what the iclass dictates */
> +    if (GET_ATTRIB(opcode, A_ICOP)) {
> +        return "2";

Why are you returning a string and not a simple bitmask?

> +    [ICLASS_FROM_TYPE(TYPE)] = { .slots = #SLOTS },

This could be converted to the bitmask with

enum {
    SLOTS_0  = (1 << 0),
    SLOTS_1  = (1 << 1),
    SLOTS_23 = (1 << 2) | (1 << 3),
    ...
};

static const uint8_t iclass_info[] = {

#define DEF_ICLASS(TYPE, SLOTS) \
    [ICLASS_##TYPE] = SLOTS_##SLOTS
#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS) \
    DEF_ICLASS(TYPE, SLOTS)
#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS) \
    DEF_ICLASS(TYPE, SLOTS)

At which point the ultimate consumer,

>     for (i = 0, slot = 3; i < pkt->num_insns; i++) {
>         valid_slot_str = get_valid_slot_str(pkt, i);
> 
>         while (strchr(valid_slot_str, '0' + slot) == NULL) {
>             slot--;
>         }
>         pkt->insn[i].slot = slot;

becomes a simple integer mask check.


r~

Richard Henderson Aug. 29, 2020, 1:58 a.m. UTC | #2

On 8/18/20 8:50 AM, Taylor Simpson wrote:
> +/* Fill in the table with NULLs because not all the opcodes have DEF_QEMU */
> +semantic_insn_t opcode_genptr[] = {
> +#define OPCODE(X)                              NULL
> +#include "opcodes_def_generated.h"
> +    NULL
> +#undef OPCODE
> +};
> +
> +/* This function overwrites the NULL entries where we have a DEF_QEMU */
> +void init_genptr(void)
> +{
> +#define DEF_TCG_FUNC(TAG, GENFN) \
> +    opcode_genptr[TAG] = generate_##TAG;
> +#include "tcg_funcs_generated.h"
> +#undef DEF_TCG_FUNC
> +}

Just size the array properly to start.

const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {

#define DEF_TCG_FUNC(TAG, GENFN) \
    [TAG] = generate_##TAG,
#include "tcg_funcs_generated.h"

};

Richard Henderson Aug. 29, 2020, 2:59 a.m. UTC | #3

On 8/18/20 8:50 AM, Taylor Simpson wrote:
> Implementation of Linux user emulation for Hexagon
> Some common files modified in addition to new files in linux-user/hexagon
> 
> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> ---

Looks plausible.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~

Taylor Simpson Aug. 30, 2020, 7:49 p.m. UTC | #4

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Friday, August 28, 2020 7:58 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 29/34] Hexagon (target/hexagon) TCG

> generation

>

> On 8/18/20 8:50 AM, Taylor Simpson wrote:

> > +/* Fill in the table with NULLs because not all the opcodes have

> DEF_QEMU */

> > +semantic_insn_t opcode_genptr[] = {

> > +#define OPCODE(X)                              NULL

> > +#include "opcodes_def_generated.h"

> > +    NULL

> > +#undef OPCODE

> > +};

> > +

> > +/* This function overwrites the NULL entries where we have a DEF_QEMU

> */

> > +void init_genptr(void)

> > +{

> > +#define DEF_TCG_FUNC(TAG, GENFN) \

> > +    opcode_genptr[TAG] = generate_##TAG;

> > +#include "tcg_funcs_generated.h"

> > +#undef DEF_TCG_FUNC

> > +}

>

> Just size the array properly to start.

>

> const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {

>

> #define DEF_TCG_FUNC(TAG, GENFN) \

>     [TAG] = generate_##TAG,

> #include "tcg_funcs_generated.h"

>

> };


OK

Taylor Simpson Aug. 30, 2020, 7:53 p.m. UTC | #5

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Friday, August 28, 2020 7:49 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 28/34] Hexagon (target/hexagon) TCG

> generation helpers

>

> On 8/18/20 8:50 AM, Taylor Simpson wrote:

> > Helpers for reading and writing registers

> > Helpers for load-locked/store-conditional

> >

> > Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>

> > ---

> >  target/hexagon/genptr_helpers.h | 244

> ++++++++++++++++++++++++++++++++++++++++

> >  target/hexagon/op_helper.c      |  18 +++

> >  2 files changed, 262 insertions(+)

> >  create mode 100644 target/hexagon/genptr_helpers.h

> >

> > diff --git a/target/hexagon/genptr_helpers.h

> b/target/hexagon/genptr_helpers.h

> > new file mode 100644

> > index 0000000..ffcb1e3

> > --- /dev/null

> > +++ b/target/hexagon/genptr_helpers.h

> > @@ -0,0 +1,244 @@

> > +

> > +static inline void gen_log_reg_write(int rnum, TCGv val, int slot,

> > +                                     int is_predicated)

>

> These are quite large.  Why are they marked inline?


Since this is a header file, it prevents the compiler from complaining when they aren't used.

>

> > +        /* Low word */

> > +        tcg_gen_extrl_i64_i32(val32, val);

> > +        tcg_gen_mov_tl(hex_new_value[rnum], val32);

>

> Why are you extracting into a temporary?

> This could be done with

>

>     tcg_gen_extr_i64_i32(hex_new_value[rnum],

>                          hex_new_value[rnum + 1], val);


OK

> > +static inline void gen_read_p3_0(TCGv control_reg)

> > +{

> > +    TCGv pval = tcg_temp_new();

> > +    int i;

> > +    tcg_gen_movi_tl(control_reg, 0);

> > +    for (i = NUM_PREGS - 1; i >= 0; i--) {

> > +        tcg_gen_shli_tl(control_reg, control_reg, 8);

> > +        tcg_gen_andi_tl(pval, hex_pred[i], 0xff);

> > +        tcg_gen_or_tl(control_reg, control_reg, pval);

>

> tcg_gen_deposit_tl(control_reg, control_reg,

>                    hex_pred[i], i * 8, 8);


OK

> > +    for (i = 0; i < NUM_PREGS; i++) {

> > +        tcg_gen_andi_tl(pred_val, control_reg, 0xff);

> > +        tcg_gen_mov_tl(hex_pred[i], pred_val);

> > +        tcg_gen_shri_tl(control_reg, control_reg, 8);

>

> tcg_gen_extract_tl(hex_pred[i], control_reg, i * 8, 8);


OK

> > +static inline void log_store32(CPUHexagonState *env, target_ulong addr,

> > +                               int32_t val, int width, int slot)

> > +{

> > +    HEX_DEBUG_LOG("log_store%d(0x%x, %d [0x%x])\n", width, addr, val,

> val);

> > +    env->mem_log_stores[slot].va = addr;

> > +    env->mem_log_stores[slot].width = width;

> > +    env->mem_log_stores[slot].data32 = val;

> > +}

> > +

> > +static inline void log_store64(CPUHexagonState *env, target_ulong addr,

> > +                               int64_t val, int width, int slot)

> > +{

> > +    HEX_DEBUG_LOG("log_store%d(0x%x, %ld [0x%lx])\n", width, addr,

> val, val);

> > +    env->mem_log_stores[slot].va = addr;

> > +    env->mem_log_stores[slot].width = width;

> > +    env->mem_log_stores[slot].data64 = val;

> > +}

>

> ... or fold this re-addition back into where it was accidentally removed.  ;-)


Could you elaborate?

Taylor Simpson Aug. 30, 2020, 8:23 p.m. UTC | #6

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Friday, August 28, 2020 7:17 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros

> referenced in instruction semantics

>

> On 8/18/20 8:50 AM, Taylor Simpson wrote:

> > + * For qemu, we look for a load in slot 0 when there is  a store in slot 1

> > + * in the same packet.  When we see this, we call a helper that merges the

> > + * bytes from the store buffer with the value loaded from memory.

> > + */

> > +#define CHECK_NOSHUF(DST, VA, SZ, SIGN) \

> > +    do { \

> > +        if (insn->slot == 0 && pkt->pkt_has_store_s1) { \

> > +            gen_helper_merge_inflight_store##SZ##SIGN(DST, cpu_env, VA,

> DST); \

> > +        } \

> > +    } while (0)

>

> Ah, so I see what you're trying to do with the merge thing, which had the

> host-endian problems.

>

> I think the merge stuff is a mistake.  I think you can get the semantics that

> you want with

>

> probe_read(ld_addr, ld_len)

> qemu_st(st_value, st_addr)

> qemu_ld(ld_value, ld_addr)

>

> In this way, all exceptions are recognized before the store is complete, the

> normal memory operations handle any possible overlap.


So, do this inside the helper?  Or is there a way to generate TCG code?

>

> > +#define fINSERT_BITS(REG, WIDTH, OFFSET, INVAL) \

> > +    do { \

> > +        REG = ((REG) & ~(((fCONSTLL(1) << (WIDTH)) - 1) << (OFFSET))) | \

> > +           (((INVAL) & ((fCONSTLL(1) << (WIDTH)) - 1)) << (OFFSET)); \

> > +    } while (0)

>

> reg = deposit32(reg, offset, width, inval)


OK

> > +#define fEXTRACTU_BITS(INREG, WIDTH, OFFSET) \

> > +    (fZXTN(WIDTH, 32, (INREG >> OFFSET)))

> > +#define fEXTRACTU_BIDIR(INREG, WIDTH, OFFSET) \

> > +    (fZXTN(WIDTH, 32, fBIDIR_LSHIFTR((INREG), (OFFSET), 4_8)))

> > +#define fEXTRACTU_RANGE(INREG, HIBIT, LOWBIT) \

> > +    (fZXTN((HIBIT - LOWBIT + 1), 32, (INREG >> LOWBIT)))

>

> extract32(inreg, offset, width)


OK

> > +#define fZXTN(N, M, VAL) ((VAL) & ((1LL << (N)) - 1))

>

> extract32(VAL, 0, n)


OK

> > +#define fSXTN(N, M, VAL) \

> > +    ((fZXTN(N, M, VAL) ^ (1LL << ((N) - 1))) - (1LL << ((N) - 1)))

>

> sextract32(val, 0, n)


OK

> > +#define fRND(A) (((A) + 1) >> 1)

>

> Does A have a type that won't overflow?

> For Arm we write this as

>

>     (A >> 1) + (A & 1)


Will investigate

> > +#define fDCFETCH(REG) do { REG = REG; } while (0) /* Nothing to do in

> qemu */

> > +#define fICINVA(REG) do { REG = REG; } while (0) /* Nothing to do in

> qemu */

> > +#define fL2FETCH(ADDR, HEIGHT, WIDTH, STRIDE, FLAGS)

> > +#define fDCCLEANA(REG) do { REG = REG; } while (0) /* Nothing to do in

> qemu */

> > +#define fDCCLEANINVA(REG) \

> > +    do { REG = REG; } while (0) /* Nothing to do in qemu */

>

> Is this "R=R" thing trying to avoid a compiler warning?

> Perhaps "(void)R" would be sufficient to avoid that?


Yes, it avoids a compiler warning.  Will change to (void)

> > -static inline void log_store32(CPUHexagonState *env, target_ulong addr,

> > -                               target_ulong val, int width, int slot)

> > -{

> > -    HEX_DEBUG_LOG("log_store%d(0x" TARGET_FMT_lx ", "

> TARGET_FMT_ld

> > -                  " [0x" TARGET_FMT_lx "])\n",

> > -                  width, addr, val, val);

> > -    env->mem_log_stores[slot].va = addr;

> > -    env->mem_log_stores[slot].width = width;

> > -    env->mem_log_stores[slot].data32 = val;

> > -}

> > -

> > -static inline void log_store64(CPUHexagonState *env, target_ulong addr,

> > -                               int64_t val, int width, int slot)

> > -{

> > -    HEX_DEBUG_LOG("log_store%d(0x" TARGET_FMT_lx ", %ld [0x%lx])\n",

> > -                   width, addr, val, val);

> > -    env->mem_log_stores[slot].va = addr;

> > -    env->mem_log_stores[slot].width = width;

> > -    env->mem_log_stores[slot].data64 = val;

> > -}

> > -

>

> Fold this back to wherever it came from.  Clearly no need to introduce it in

> the first place.


These are invoked by the MEM_STORE macros.  Are you saying to put this code there?

Richard Henderson Aug. 30, 2020, 8:43 p.m. UTC | #7

On 8/30/20 1:04 PM, Taylor Simpson wrote:
> So, this should be CamelCase?  I should be able to fix all of them.

Yes, they should.  Thanks.


r~

Taylor Simpson Aug. 30, 2020, 8:47 p.m. UTC | #8

Richard,

Thank you so much for the feedback.  I really appreciate it.

I'll get to work addressing the issues.  Since some of the items will take longer than others, please advise whether it's preferred to send intermediate updates or wait until they are all addressed.

Taylor


> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Friday, August 28, 2020 9:27 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 00/34] Hexagon patch series

>

> On 8/18/20 8:50 AM, Taylor Simpson wrote:

> > This series adds support for the Hexagon processor with Linux user support

> >

> > See patch 02/34 Hexagon README for detailed information.

> >

> > Once the series is applied, the Hexagon port will pass "make check-tcg".

> > The series also includes Hexagon-specific tests in tcg/tests/hexagon.

> >

> > We have a parallel effort to make the Hexagon Linux toolchain inside a

> docker

> > container publically available.

>

> Oh, excellent.

>

> > *** Future items under consideration ***

> > Use qemu softfloat

>

> This is a blocker.  It's definitely not hard.

>

> > Use qemu decodetree

>

> This would certainly clean up all of the string processing that I mentioned.

> It seems like it would be just as easy to convert into the decodetree format

> as

> what you're currently doing for dectree_generated.h.  Indeed, easier, since

> you

> only need the ones that are TERMINAL().  All of the other things labeled

> TABLE_LINK are handled by decodetree itself.

>

> Anyway, I've completed what review I planed to do against this version.

>

>

> r~

Richard Henderson Aug. 30, 2020, 8:52 p.m. UTC | #9

On 8/30/20 12:53 PM, Taylor Simpson wrote:
>>> +++ b/target/hexagon/genptr_helpers.h
>>> @@ -0,0 +1,244 @@
>>> +
>>> +static inline void gen_log_reg_write(int rnum, TCGv val, int slot,
>>> +                                     int is_predicated)
>>
>> These are quite large.  Why are they marked inline?
> 
> Since this is a header file, it prevents the compiler from complaining when they aren't used.

Ok, why are they in a header file?
Why would they be unused, come to that?

The header file is used exactly once, by genptr.c.  Seems to me they could just
as well be *in* genptr.c.

If the functions are not used, just remove them?


>>> +static inline void log_store32(CPUHexagonState *env, target_ulong addr,
>>> +                               int32_t val, int width, int slot)
>>> +{
>>> +    HEX_DEBUG_LOG("log_store%d(0x%x, %d [0x%x])\n", width, addr, val,
>> val);
>>> +    env->mem_log_stores[slot].va = addr;
>>> +    env->mem_log_stores[slot].width = width;
>>> +    env->mem_log_stores[slot].data32 = val;
>>> +}
>>> +
>>> +static inline void log_store64(CPUHexagonState *env, target_ulong addr,
>>> +                               int64_t val, int width, int slot)
>>> +{
>>> +    HEX_DEBUG_LOG("log_store%d(0x%x, %ld [0x%lx])\n", width, addr,
>> val, val);
>>> +    env->mem_log_stores[slot].va = addr;
>>> +    env->mem_log_stores[slot].width = width;
>>> +    env->mem_log_stores[slot].data64 = val;
>>> +}
>>
>> ... or fold this re-addition back into where it was accidentally removed.  ;-)
> 
> Could you elaborate?

You added this code in one patch (didn't check which), removed it in patch 26,
and re-added it here in patch 28.


r~

Richard Henderson Aug. 30, 2020, 9:06 p.m. UTC | #10

On 8/30/20 1:23 PM, Taylor Simpson wrote:
>> I think the merge stuff is a mistake.  I think you can get the semantics that
>> you want with
>>
>> probe_read(ld_addr, ld_len)
>> qemu_st(st_value, st_addr)
>> qemu_ld(ld_value, ld_addr)
>>
>> In this way, all exceptions are recognized before the store is complete, the
>> normal memory operations handle any possible overlap.
> 
> So, do this inside the helper?  Or is there a way to generate TCG code?

I was thinking TCG code, where you can look at the packet before any code gen,
and decide whether or not this situation actually applies.

The probe is a simple helper call, for which the generic machinery exists
(probe_access, probe_write, probe_read).  All you have to do is write the wrapper.

The loads and stores are, well, normal loads and stores.

r~

Taylor Simpson Aug. 30, 2020, 9:38 p.m. UTC | #11

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Sunday, August 30, 2020 2:52 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 28/34] Hexagon (target/hexagon) TCG

> generation helpers

>

> On 8/30/20 12:53 PM, Taylor Simpson wrote:

> >>> +++ b/target/hexagon/genptr_helpers.h

> >>> @@ -0,0 +1,244 @@

> >>> +

> >>> +static inline void gen_log_reg_write(int rnum, TCGv val, int slot,

> >>> +                                     int is_predicated)

> >>

> >> These are quite large.  Why are they marked inline?

> >

> > Since this is a header file, it prevents the compiler from complaining when

> they aren't used.

>

> Ok, why are they in a header file?

> Why would they be unused, come to that?

>

> The header file is used exactly once, by genptr.c.  Seems to me they could

> just

> as well be *in* genptr.c.

>

> If the functions are not used, just remove them?


I could have sworn it was included in more than one C file.  I'll move the contents to genptr.c.


> >>> +static inline void log_store32(CPUHexagonState *env, target_ulong

> addr,

> >>> +                               int32_t val, int width, int slot)

> >>> +{

> >>> +    HEX_DEBUG_LOG("log_store%d(0x%x, %d [0x%x])\n", width, addr,

> val,

> >> val);

> >>> +    env->mem_log_stores[slot].va = addr;

> >>> +    env->mem_log_stores[slot].width = width;

> >>> +    env->mem_log_stores[slot].data32 = val;

> >>> +}

> >>> +

> >>> +static inline void log_store64(CPUHexagonState *env, target_ulong

> addr,

> >>> +                               int64_t val, int width, int slot)

> >>> +{

> >>> +    HEX_DEBUG_LOG("log_store%d(0x%x, %ld [0x%lx])\n", width, addr,

> >> val, val);

> >>> +    env->mem_log_stores[slot].va = addr;

> >>> +    env->mem_log_stores[slot].width = width;

> >>> +    env->mem_log_stores[slot].data64 = val;

> >>> +}

> >>

> >> ... or fold this re-addition back into where it was accidentally removed.  ;-)

> >

> > Could you elaborate?

>

> You added this code in one patch (didn't check which), removed it in patch

> 26,

> and re-added it here in patch 28.


My apologies, this is my screwing up the git rebase.  I'll fix it.

>

>

> r~

Richard Henderson Aug. 30, 2020, 11:33 p.m. UTC | #12

On 8/30/20 1:47 PM, Taylor Simpson wrote:
> Richard,
> 
> Thank you so much for the feedback.  I really appreciate it.
> 
> I'll get to work addressing the issues.  Since some of the items will take longer than others, please advise whether it's preferred to send intermediate updates or wait until they are all addressed.

I don't mind intermediate updates.  Just keep a list in the cover letter of the
things that are still on the to-do list, and I'll not focus on those.

We could also talk about what portions of the to-do list are blocker, and what
can be done via normal development.  Because neither you nor I want to carry
around this huge patch set forever.

r~

Taylor Simpson Aug. 31, 2020, 5:08 p.m. UTC | #13

> >> -----Original Message-----

> >> From: Richard Henderson <richard.henderson@linaro.org>

> >> Sent: Sunday, August 30, 2020 3:13 PM

> >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> >> aleksandar.m.mail@gmail.com; ale@rev.ng

> >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> >> instructions with multiple definitions

> >>

> >> On 8/30/20 12:48 PM, Taylor Simpson wrote:

> >>> I'll add the following comment to indicate what's going on

> >>>

> >>> /*

> >>>   * Each of the generated helpers is wrapped with #ifndef

> >> fGEN_TCG_<tag>.

> >>>   * For example

> >>>    *     #ifndef fGEN_TCG_A2_add

> >>>    *     DEF_HELPER_3(A2_add, s32, env, s32, s32)

> >>>    *     #endif

> >>>   * We include gen_tcg.h here to provide definitions of fGEN_TCG for

> any

> >> instructions that

> >>>   * are overridden.

> >>>   *

> >>>   * This prevents unused helpers from taking up space in the executable.

> >>>   */

> >>

> >> Ah, hum.  Well.

> >>

> >> How about we figure out a way to communicate to the generator scripts

> >> which

> >> functions have been implemented "directly", and don't need to be

> >> generated at all?

> >>

> >> I don't see why we have to wait until the c preprocessor to remove this

> >> unused

> >> code, and the less we have of it, the better.

> >>

> >

> > A few reasons

> > - Makes it easy to override instruction helpers.  All one has to do is #define

> fGEN_TCG_<tag>.

>

> If the generator can examine, say, genptr_override.c.inc, then you don't

> even

> have to add a #define.  Just add the code.

>

> Perhaps something like

>

> DEFINE_FGEN(tag)

> {

>     // some code

> }

>

> where DEFINE_FGEN expands to the appropriate function declaration.  The

> generator then need only grep '^DEFINE_FGEN' and extract the list of

> overridden

> tags.

>

>

> > - When debugging the override, sometimes you want to quickly revert back

> to the helper.  Or if you've written a bunch of overrides in one shot and then

> find that a test case is failing, you can binary search which one to turn off and

> get the test to pass.  This is the one with the bug to fix.

>

> No difference there, just binary searching on different text.

>

> > - Reduces time for an incremental build.  When we add or delete an

> override, we don't have to re-run the generator.

>

> About this I care not at all.  I can't imagine that more than fractions of a

> second are at stake.


I can modify the generator to skip instructions with overrides.

There are some assumptions here I'd like to clarify.  When I started this discussion, there seemed to be general resistance to the concept of a generator.  Because of this, I made the generator as simple as possible and pushed the complexity and optimization into code that consumes the output of the generator.  Also, I assumed people wouldn't read the output of the generator, so I didn't focus on making it readable.

Now, it sounds like my assumptions were not correct.  You pointed out two things to do in the generator
- Expand the macros inline
- Skip the instructions that have overrides
I addition, there other things I should do if we want the generated files to be more readable
- Indent the code
- Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag>
- Generate the declaration of the generate_<tag> function instead of just the body

Thoughts?

Thanks,
Taylor

Richard Henderson Aug. 31, 2020, 5:29 p.m. UTC | #14

On 8/31/20 10:08 AM, Taylor Simpson wrote:
> There are some assumptions here I'd like to clarify.  When I started this
> discussion, there seemed to be general resistance to the concept of a
> generator.  Because of this, I made the generator as simple as possible and
> pushed the complexity and optimization into code that consumes the output of
> the generator.  Also, I assumed people wouldn't read the output of the
> generator, so I didn't focus on making it readable.

Except, at some point, one has to debug this code.
If the code is unreadable, how do you figure out what's broken?

> Now, it sounds like my assumptions were not correct.  You pointed out two
> things to do in the generator> - Expand the macros inline
> - Skip the instructions that have overrides

Yes please.

> I addition, there other things I should do if we want the generated files to be more readable
> - Indent the code

Helpful, yes.

I wouldn't worry about nested statements within the *.def files too much,
except to put each ";" terminated statement on a new line, so that gdb's step
command goes to the next statement instead of skipping everything all at once.

> - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag>

That would be part of "skip the instructions that have overrides", would it not?

> - Generate the declaration of the generate_<tag> function instead of just the body

I'm not quite sure what this means.

Aren't the "generate_<tag>" functions private to genptr.c?  Why would we need a
separate declaration of them, as opposed to just a definition?

r~

Taylor Simpson Aug. 31, 2020, 6:14 p.m. UTC | #15

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Monday, August 31, 2020 11:29 AM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> instructions with multiple definitions

>

> On 8/31/20 10:08 AM, Taylor Simpson wrote:

> > There are some assumptions here I'd like to clarify.  When I started this

> > discussion, there seemed to be general resistance to the concept of a

> > generator.  Because of this, I made the generator as simple as possible and

> > pushed the complexity and optimization into code that consumes the

> output of

> > the generator.  Also, I assumed people wouldn't read the output of the

> > generator, so I didn't focus on making it readable.

>

> Except, at some point, one has to debug this code.

> If the code is unreadable, how do you figure out what's broken?

>

> > Now, it sounds like my assumptions were not correct.  You pointed out two

> > things to do in the generator> - Expand the macros inline

> > - Skip the instructions that have overrides

>

> Yes please.

>

> > I addition, there other things I should do if we want the generated files to

> be more readable

> > - Indent the code

>

> Helpful, yes.

>

> I wouldn't worry about nested statements within the *.def files too much,

> except to put each ";" terminated statement on a new line, so that gdb's step

> command goes to the next statement instead of skipping everything all at

> once.

>

> > - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag>

>

> That would be part of "skip the instructions that have overrides", would it

> not?


Just to be explicit, the code that generates code is generated as
    #ifdef fGEN_TCG_A2_add
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});
    #else
    do {
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    } while (0);
    #endif
If we're going to have the generator know if there is an override, this would be more readable as either
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});
or
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);


> > - Generate the declaration of the generate_<tag> function instead of just

> the body

>

> I'm not quite sure what this means.

>

> Aren't the "generate_<tag>" functions private to genptr.c?  Why would we

> need a

> separate declaration of them, as opposed to just a definition?


In genptr.c, there is
    #define DEF_TCG_FUNC(TAG, GENFN) \
    static void generate_##TAG(CPUHexagonState *env, DisasContext *ctx, \
                               insn_t *insn, packet_t *pkt) \
    { \
        GENFN \
    }
    #include "tcg_funcs_generated.h"
    #undef DEF_TCG_FUNC
The generated code only has the body of the function.  It would be more readable to move the static void generate_##TAG ... into the generated code.

Richard Henderson Aug. 31, 2020, 8:43 p.m. UTC | #16

On 8/31/20 10:57 AM, Taylor Simpson wrote:
> OK, here's the list of items.  Let me know if I missed anything.  I'll 
> indicate which ones can be done quickly and which ones would take more time.
> I added a column for blocker if you or anyone else has input on that.
> 
> PatchItemEffortBlocker
> Use qemu softfloat??Yes

Hmm, this table didn't render.  Below, yes/no for blocker column.

> Use qemu decodetree.py??

no

> SeveralUse const when appropriatesmall

yes

> SeveralRemove anything after g_assert_not_reachedsmall

yes

> SeveralFix log_store32/64 add/remove/add in patch seriessmall

yes

> SeveralFollow naming guidelines for structs and enumssmall

yes

> 04Move decls to cpu-param.hsmall

Yes.  The only reason this even compiled is that you don't do system mode yet.  ;-)

> 04Remove CONFIG_USER_ONLY ifdef'ssmall

yes

> 04Remove DEBUG_HEXAGONsmall

yes

> 04Remove stack pointer modification hack, use qemu mechanismsmall

yes

> 04Add property x-lldb-compat to control output in logsmall

yes

> 06Include instruction and raw bytes in disassemblysmall

yes

> 07Use DEF_HELPER_FLAGSsmall

no, but should be easy.

> 07, 26Endianness of merge_bytessmall

Yes, one way or another; hopefully by removing all of the merge_bytes and using
probe_read.

> 07Fix overlap testsmall

yes

> 07Remove HELPER(debug_value)/HELPER(debug_value_i64)small

yes

> 09Include "qemu/osdep.h" instead of <stdint.h>small

yes

> 10 (and others)Stick with stdint.h types except in imported filessmall

yes

> 11Remove description from reg field definitionssmall

yes

> 13Move regmap.h into decode.csmall

yes

> 14, 27Use bit mask instead of strings in decodingsmall

no, but should be easy.

> 14Add comments to decodersmall

yes

> 16Use qemu/int128.hmedium

no

> 17Squash patches dealing with imported filessmall

yes

> 24Use qemu/bitops.h for instruction attributessmall

no

> 24Fix initialization of opcode_short_semanticssmall

yes

> 24Change if (p == NULL) { g_assert_not_reached(); } to assert(p != NULL)small

no

> 25Expand DECL/READ/WRITE/FREE macros into generated codesmall

Yes.

In the end I think some of these will in the end want to be helper functions.
As I was thinking how to best write A2_add, I was thinking

/* TODO: You currently have an "offset" parameter to
   DECL_REG.  I can't see that it is ever used?
   I would *hope* that this could be resolved earlier,
   so that by this time insn->regno[*] is absolute.  */
static int resolve_regno(Insn *insn, int slot, int off);

/* Return hex_new_value[regno]; log the write. */
static TCGv reg_for_write(DisasContext *ctx, int regno);

/* Return hex_reg[regno]; force up-to-date value for PC. */
static TCGv reg_for_read(DisasContext *ctx, int regno);

/* if (preg) hex_new_value[regno] = hex_reg[regno],
   or !preg if !test_positive.
   Leaves hex_new_value[] canonical for gen_reg_writes,
   no extra temporary required. */
static void gen_cancel_reg(DisasContext *ctx, int preg,
                           int rreg, bool test_positive);

DEF_TCG_FUNC(A2_add)
{
    int rd = resolve_regno(insn, 0, 0);
    int rs = resolve_regno(insn, 1, 0);
    int rt = resolve_regno(insn, 2, 0);

    tcg_gen_add_tl(reg_for_write(ctx, rd),
                   reg_for_read(ctx, rs),
                   reg_for_read(ctx, rt));
}

DEF_TCG_FUNC(A2_paddit)
{
    int pu = resolve_regno(insn, 0, 0);
    int rd = resolve_regno(insn, 1, 0);
    int rs = resolve_regno(insn, 2, 0);
    int rt = resolve_regno(insn, 3, 0);

    tcg_gen_add_tl(reg_for_write(ctx, rd),
                   reg_for_read(ctx, rs),
                   reg_for_read(ctx, rt));
    gen_cancel_reg(ctx, insn, rd, pu, true);
}

However, I don't think we have to have a comprehensive set of these now.  We
can expand everything into the generator to start, then adjust the generator as
we add helper functions and use them within the overrides.

> 26Rewrite fINSERT*, fEXTRACT*, f?XTN macrossmall

yes

> 26Investigate fRND macrosmall

no, but should be easy.

> 26Change REG = REG to (VOID)REG to suppress compiler warningsmall

yes

> 27Remove multiple includes of imported/iclass.defsmall

yes

> 28Move genptr_helpers.h into genptr.csmall

yes

> 28Remove unneeded tempssmall

no

> 28Use tcg_gen_deposit_tl and tcg_gen_extract_tl when dealing with p3_0small

no

> 29Size opcode_genptr[] properly and initialize with [TAG] = generate_##TAGsmall

yes; i think this will fall out of other changes.

> 30Don't generate helpers for instructions that are overriddensmall

yes

> Don't include "gen_tcg.h" in helper.h

yes

> 31Use bitmask for ctx->reg_log instead of an arraysmall

yes

> 31Use tcg_gen_extract_i32 for gen_slot_cancelled_checksmall

yes

> 31Properly deal with reading instructions across a page boundary and toomedium
> many instructions before finding end-of-packet

Yes, this should be easy.  Unless there's some surprise in the code I have
already suggested code.

> 31Don't set PC at the beginning of every packetmedium

no

> 31Don't set slot_cancelled unless neededsmall

no

> 31Don't set hex_pred_written unless neededmedium

no

> 31Change cancelled variable to not localsmall

yes

> 31Remove unnecessary tempsmall

yes

> 31Let tcg_gen_addi_tl check for zerosmall

yes

> 31Move gen_exec_counters to end of TB instead of every packetmedium

no

> 31Move end of TB handling to hexagon_tr_tb_stopsmall

yes


r~

Taylor Simpson Aug. 31, 2020, 11:10 p.m. UTC | #17

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Monday, August 31, 2020 1:20 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> instructions with multiple definitions

>

> The fGEN_TCG_A2_add macro does not require nor use that {...} argument.


The fGEN_TCG_A2_add macro does need that argument, but there are cases that do need it.  Here's an example from gen_tcg.h
    #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE
This is explained in the README, but basically the argument is useful if we can properly define the macros that it contains to generate TCG.


> What it *does* need are the same arguments as are given to generate_<rtag>.  I

> assume you are using those arguments implicitly in your current

> fGEN_TCG_<rtag>

> instances?


Yes

>

> It would be cleanest to only have the generate_* functions.

>

> Either they are written by hand (replacing the current fGEN_TCG_*), or they

> are

> generated.  In either case, there's just the one level of indirection from

> opcode_genptr[].

>

> I'd imagine

>

> --- genptr.c

>

> #define DEF_TCG_FUNC(TAG) \

> static void generate_##TAG(CPUHexagonState *env, \

>     DisasContext *ctx, insn_t *insn, packet_t *pkt)

>

> /*

>  * All IIDs with an explicit implementation,

>  * overriding the auto-generated helper functions.

>  */

>

> DEF_TCG_FUNC(A2_add)

> {

>     /* { RdV=RsV+RtV;} */

>     tcg_gen_add_tl(args...);

> }


There's additional generated code before and after the tcg_gen_add_tl.  IMO, we don't want the person who writes an override having to reproduce the generated code.  Assuming we have a definition of fGEN_TCG_A2_add and we have the generator intelligently expanding the macros, this is what will be generated.

static void generate_A2_add(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt)
{
/* A2_add */
int RdN =insn->regno[0];
TCGv RdV = tcg_temp_local_new();
int RsN = insn->regno[1];
TCGv RsV = hex_gpr[RsN];
int RtN = insn->regno[2];
TCGv RtV = hex_gpr[RtN];

fGEN_TCG_A2_add({ RdV=RsV+RtV;});

gen_log_reg_write(RdN, RdV, insn->slot, 0);
ctx_log_reg_write(ctx, RdN);

tcg_temp_free(RdV);
/* A2_add */
}

If there weren't an override, we'd get this

static void generate_A2_add(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt)
{
/* A2_add */
int RdN =insn->regno[0];
TCGv RdV = tcg_temp_local_new();
int RsN = insn->regno[1];
TCGv RsV = hex_gpr[RsN];
int RtN = insn->regno[2];
TCGv RtV = hex_gpr[RtN];

gen_helper_A2_add(RdV, cpu_env, RsV, RtV);                    /* Only difference is this line */

gen_log_reg_write(RdN, RdV, insn->slot, 0);
ctx_log_reg_write(ctx, RdN);

tcg_temp_free(RdV);
/* A2_add */
}

The fGEN_TCG_<tag> macro can also mention the operands of the instruction (RdV, RsV, RtV in this example).

Unlike the generate_<tag> functions that all have the same signature.  The overrides would have different signatures.  This would be more defensive programming because you know exactly where the variables come from but more verbose when writing the overrides by hand.  Also, note that these need to be macros in order to take advantage of the SHORTCODE.

In other words, instead of
#define fGEN_TCG_A2_add(SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV)

We would write
#define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV);

Personally, I prefer the former, but will change to the latter if you feel strongly.

I'm not married to the fGEN_TCG_<tag> name.  DEF_TCG_<tag> would also be fine.

>

> /*

>  * Generate calls to the auto-generate helpers,

>  * and slot everything into the opcode_genptr table.

>  */

> #include "genptr_generated.c.inc"

>

> --- genptr_generated.c.inc

>

> DEF_TCG_FUNC(A4_tlbmatch)

> {

>    gen_helper_A4_tlbmatch(args...);

> }

>

> // etc

>

> const SemanticInsn opcode_genptr[] = {

>     // All IID's, generated or not.

> };

>

> ---

>

> This leaves genptr.c as the file to grep for '^DEF_TCG_FUNC'.

>

>

> r~

Taylor Simpson Aug. 31, 2020, 11:48 p.m. UTC | #18

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Monday, August 31, 2020 2:44 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 00/34] Hexagon patch series

>

> On 8/31/20 10:57 AM, Taylor Simpson wrote:

> > OK, here's the list of items.  Let me know if I missed anything.  I'll

> > indicate which ones can be done quickly and which ones would take more

> time.

> > I added a column for blocker if you or anyone else has input on that.

> >

> > PatchItemEffortBlocker

> > Use qemu softfloat??Yes

>

> Hmm, this table didn't render.  Below, yes/no for blocker column.


Sorry about that - not sure what happened.

I will work all those you marked "yes" or "no, but should be easy".

> > 25Expand DECL/READ/WRITE/FREE macros into generated codesmall

>

> Yes.

>

> In the end I think some of these will in the end want to be helper functions.

> As I was thinking how to best write A2_add, I was thinking


See my response to the thread on patch 30/34.

Since you mention A2_paddit, here's what it would look like assuming it is overridden.

static void generate_A2_paddt(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt)
{
/* A2_paddit */
int PuN = insn->regno[0];
TCGv PuV = hex_pred[PuN];
Int RdN = insn->regno[1];
TCGv RdV = tcg_temp_local_new();
if (!is_preloaded(ctx, RdN)) {
    tcg_gen_mov_tl(hex_new_value[RdN], hex_gpr[RdN]);
}
int RsN = insn->regno[2];
TCGv RsV = hex_gpr[RsN];
int siV = insn->immed[0];

fGEN_TCG_A2_paddit({if(fLSBOLD(PuV)){fIMMEXT(siV); RdV=RsV+siV;} else {CANCEL;}});

gen_log_reg_write(RdN, RdV, insn->slot, 1);   /* Only does the write if we haven't cancelled */
ctx_log_reg_write(ctx, RdN);

tcg_temp_free(RdV);
/* A2_paddit */
}

Here's what the override looks like (there are a bunch of these, so we have a helper macro which could also be a function)
/* Predicated add instructions */
#define GEN_TCG_padd(PRED, ADD) \
    do { \
        TCGv LSB = tcg_temp_new(); \
        TCGv mask = tcg_temp_new(); \
        TCGv zero = tcg_const_tl(0); \
        PRED; \
        ADD; \
        tcg_gen_movi_tl(mask, 1 << insn->slot); \
        tcg_gen_or_tl(mask, hex_slot_cancelled, mask); \
        tcg_gen_movcond_tl(TCG_COND_NE, hex_slot_cancelled, LSB, zero, \
                                               hex_slot_cancelled, mask); \
        tcg_temp_free(LSB); \
        tcg_temp_free(mask); \
        tcg_temp_free(zero); \
    } while (0)

#define fGEN_TCG_A2_paddit(SHORTCODE) \
    GEN_TCG_padd(fLSBOLD(PuV), tcg_gen_addi_tl(RdV, RsV, siV))

Richard Henderson Sept. 1, 2020, 2:40 a.m. UTC | #19

On 8/31/20 4:10 PM, Taylor Simpson wrote:
> 
> 
>> -----Original Message-----
>> From: Richard Henderson <richard.henderson@linaro.org>
>> Sent: Monday, August 31, 2020 1:20 PM
>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
>> aleksandar.m.mail@gmail.com; ale@rev.ng
>> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
>> instructions with multiple definitions
>>
>> The fGEN_TCG_A2_add macro does not require nor use that {...} argument.
> 
> The fGEN_TCG_A2_add macro does need that argument, but there are cases that
> do need it.  Here's an example from gen_tcg.h
>     #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE
> This is explained in the README, but basically the argument is useful if we
> can properly define the macros that it contains to generate TCG.
We're certainly not going to be able to handle e.g. "+" or "if", so it is going
to work only for the most trivial of SHORTCODE.

Though in fact loadrub_pr makes that grade...

> IMO, we don't want the person who writes an override having to reproduce the 
> generated code.  Assuming we have a definition of fGEN_TCG_A2_add and we
> have the generator intelligently expanding the macros, this is what will be
> generated.
You need to give me a better example that A2_add, then.  Because I see that
being exactly one line, calling a helper that handles all instructions of the
same format, passing tcg_gen_add_tl as a callback.

Have a browse through my recent microblaze decodetree conversion.  Note that
the basic logical operations are implemented with exactly one source line.

> Unlike the generate_<tag> functions that all have the same signature.  The overrides would have different signatures.  This would be more defensive programming because you know exactly where the variables come from but more verbose when writing the overrides by hand.  Also, note that these need to be macros in order to take advantage of the SHORTCODE.
> 
> In other words, instead of
> #define fGEN_TCG_A2_add(SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV)
> 
> We would write
> #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV);
> 
> Personally, I prefer the former, but will change to the latter if you feel strongly.

This comes from trying to handle instructions in different ways, but represent
them all the same.

I guess I see the attraction of the magic non-parameters -- you get a
compilation error if the variable is not present, but are not tied to
positional parameters.

Ho hum.  Maybe I'm trying to overthink this too much before tackling the
ultimate goal of full parsing of the SHORTCODE.

Perhaps the only thing for the short term is to have the generator grep
genptr.c for "#define fGEN", to choose between the two alternatives: inline
generation or out-of-line helper generation.


r~

Taylor Simpson Sept. 1, 2020, 4:17 a.m. UTC | #20

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Monday, August 31, 2020 8:41 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> instructions with multiple definitions

>

> On 8/31/20 4:10 PM, Taylor Simpson wrote:

> >

> >

> >> -----Original Message-----

> >> From: Richard Henderson <richard.henderson@linaro.org>

> >> Sent: Monday, August 31, 2020 1:20 PM

> >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> >> aleksandar.m.mail@gmail.com; ale@rev.ng

> >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> >> instructions with multiple definitions

> >>

> >> The fGEN_TCG_A2_add macro does not require nor use that {...}

> argument.

> >

> > The fGEN_TCG_A2_add macro does need that argument, but there are

> cases that

> > do need it.  Here's an example from gen_tcg.h

> >     #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE

> > This is explained in the README, but basically the argument is useful if we

> > can properly define the macros that it contains to generate TCG.

> We're certainly not going to be able to handle e.g. "+" or "if", so it is going

> to work only for the most trivial of SHORTCODE.

>

> Though in fact loadrub_pr makes that grade...


The prior version of this series included all the overrides I've written to date.  To reduce the size of this version, I removed most of them and only left the ones that are essential for correct execution.  I plan to submit the others in subsequent series.  Anyway, there are >50 overrides of load/store instructions that leverage SHORTCODE.

> > IMO, we don't want the person who writes an override having to

> reproduce the

> > generated code.  Assuming we have a definition of fGEN_TCG_A2_add and

> we

> > have the generator intelligently expanding the macros, this is what will be

> > generated.

> You need to give me a better example that A2_add, then.  Because I see that

> being exactly one line, calling a helper that handles all instructions of the

> same format, passing tcg_gen_add_tl as a callback.


Here's a more complicated example for a predicated post-increment load

Static void generate_L2_ploadrit_pi(CPUHexagonState *env, DisasContext*cts, insn_t *insn, packet_t *pkt)
{
/* L2_ploadrit_pi */
TCGv EA = tcg_temp_local_new();
int PtN = insn->regno[0];
TCGv PtV = hex_pred[PtN];
int RdN = insn->regno[1];
TCGv RdV = tcg_temp_local_new();
if (!is_preloaded(ctx, RdN)) {
    tcg_gen_mov_tl(hex_hew_value[RdN], hex_gpr[RdN]);
}
int RxN = insn->regno[2];
TCGv RxV = tcg_temp_local_new();
if (!is_preloaded(ctx, RxN)) {
    tcg_gen_mov_tl(hex_new_value[RdN], hex_gpr[RxN]);
}
int siV = insn->immed[0];
tcg_gen_mov_tl(RxV, hex_gpr[RxN]);
fGEN_TCG_L2_ploadrit_pi({fEA_REG(RxV); if(fLSBOLD(PtV)){ fPM_I(RxV,siV); fLOAD(1,4,u,EA,RdV);} else {LOAD_CANCEL(EA);}});
gen_log_reg_write(RdN, RdV, insn->slot, 1);
gen_log_reg_write(RxN, RxV, insn->slot, 1);
tcg_temp_free(EA);
tcg_temp_free(RdV);
tcg_temp_free(RxV);
/* L2_ploadrit_pi */
}


> Have a browse through my recent microblaze decodetree conversion.  Note

> that

> the basic logical operations are implemented with exactly one source line.


With a helper function, our compares are all one line also

static inline void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
{
    TCGv one = tcg_const_tl(0xff);
    TCGv zero = tcg_const_tl(0);

    tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);

    tcg_temp_free(one);
    tcg_temp_free(zero);
}

/* Compare instructions */
#define fGEN_TCG_C2_cmpeq(SHORTCODE) \
    gen_compare(TCG_COND_EQ, PdV, RsV, RtV)
#define fGEN_TCG_C4_cmpneq(SHORTCODE) \
    gen_compare(TCG_COND_NE, PdV, RsV, RtV)
#define fGEN_TCG_C2_cmpgt(SHORTCODE) \
    gen_compare(TCG_COND_GT, PdV, RsV, RtV)
#define fGEN_TCG_C2_cmpgtu(SHORTCODE) \
    gen_compare(TCG_COND_GTU, PdV, RsV, RtV)
...



> > Unlike the generate_<tag> functions that all have the same signature.  The

> overrides would have different signatures.  This would be more defensive

> programming because you know exactly where the variables come from but

> more verbose when writing the overrides by hand.  Also, note that these

> need to be macros in order to take advantage of the SHORTCODE.

> >

> > In other words, instead of

> > #define fGEN_TCG_A2_add(SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV)

> >

> > We would write

> > #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV,

> SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV);

> >

> > Personally, I prefer the former, but will change to the latter if you feel

> strongly.

>

> This comes from trying to handle instructions in different ways, but

> represent

> them all the same.

>

> I guess I see the attraction of the magic non-parameters -- you get a

> compilation error if the variable is not present, but are not tied to

> positional parameters.

>

> Ho hum.  Maybe I'm trying to overthink this too much before tackling the

> ultimate goal of full parsing of the SHORTCODE.


Alessandro (ale@rev.ng) and Niccolo (nizzo@rev.ng) are working on this 😊

> Perhaps the only thing for the short term is to have the generator grep

> genptr.c for "#define fGEN", to choose between the two alternatives: inline

> generation or out-of-line helper generation.


That's certainly doable.  It will also be good to implement some of your other ideas
- Have the generator expand the DECL/READ/WRITE/FREE macros will make the generated code more readable and we can specialize them for predicated vs non-predicated instructions which will make translation faster.
- Generate the entire generate_<tag> function instead of just the body will make the generated code more readable.

Brian Cain Sept. 15, 2020, 12:41 a.m. UTC | #21

> -----Original Message-----

> From: Qemu-devel <qemu-devel-bounces+bcain=quicinc.com@nongnu.org>

> On Behalf Of Rob Landley

...
> 

> On 8/30/20 3:47 PM, Taylor Simpson wrote:

> > Richard,

> >

> > Thank you so much for the feedback.  I really appreciate it.

> >

> > I'll get to work addressing the issues.  Since some of the items will take longer

> than others, please advise whether it's preferred to send intermediate updates

> or wait until they are all addressed.

> >

> > Taylor

> 

> Which branch of https://github.com/quic/qemu do I pull to try this without

> scraping 30 patches out of the mailing list?


IIRC this patch series was "small_series_v3"

> >>> Once the series is applied, the Hexagon port will pass "make check-tcg".

> >>> The series also includes Hexagon-specific tests in tcg/tests/hexagon.

> >>>

> >>> We have a parallel effort to make the Hexagon Linux toolchain inside

> >>> a

> >> docker

> >>> container publically available.

> >>

> >> Oh, excellent.

> 

> Is that a follow-up to https://www.spinics.net/lists/linux-

> hexagon/msg01706.html

> and is this a clang toolchain or a gcc toolchain?


It's a clang toolchain.

> I've noticed gcc 9.2 has hexagon in config.guess and config.sub, but the only

> other file outside of the test suite containing the word "hexagon" in a case

> insensitive match is the Changelog saying Linas Veptas added it to config.guess

> and config.sub in 2011. And while https://github.com/quic has a musl fork it

> doesn't have any compiler or linker forks...


clang and lld support for hexagon exists in the upstream llvm-project repo.

-Brian

Taylor Simpson Sept. 24, 2020, 2:56 a.m. UTC | #22

> > On 8/31/20 4:10 PM, Taylor Simpson wrote:

> > >

> > >

> > >> -----Original Message-----

> > >> From: Richard Henderson <richard.henderson@linaro.org>

> > >> Sent: Monday, August 31, 2020 1:20 PM

> > >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> > >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> > >> aleksandar.m.mail@gmail.com; ale@rev.ng

> > >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> > >> instructions with multiple definitions

> > >>

> > Ho hum.  Maybe I'm trying to overthink this too much before tackling the

> > ultimate goal of full parsing of the SHORTCODE.

> > Perhaps the only thing for the short term is to have the generator grep

> > genptr.c for "#define fGEN", to choose between the two alternatives:

> inline

> > generation or out-of-line helper generation.

>

> That's certainly doable.  It will also be good to implement some of your other

> ideas

> - Have the generator expand the DECL/READ/WRITE/FREE macros will make

> the generated code more readable and we can specialize them for

> predicated vs non-predicated instructions which will make translation faster.

> - Generate the entire generate_<tag> function instead of just the body will

> make the generated code more readable.


I've made these changes to the generator.  I hope you like the results.  As an example, here is what we generate for the add instruction

DEF_TCG_FUNC(A2_add,
static void generate_A2_add(
                CPUHexagonState *env,
                DisasContext *ctx,
                insn_t *insn,
                packet_t *pkt)
{
    TCGv RdV = tcg_temp_local_new();
    const int RdN = insn->regno[0];
    TCGv RsV = hex_gpr[insn->regno[1]];
    TCGv RtV = hex_gpr[insn->regno[2]];
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    gen_log_reg_write(RdN, RdV);
    ctx_log_reg_write(ctx, RdN);
    tcg_temp_free(RdV);
})

And here is how the generated file gets used in genptr.c

#define DEF_TCG_FUNC(TAG, GENFN) \
    GENFN
#include "tcg_funcs_generated.h"
#undef DEF_TCG_FUNC

/*
 * Not all opcodes have generate_<tag> functions, so initialize
 * the table from the tcg_funcs_generated.h file.
 */
const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {
#define DEF_TCG_FUNC(TAG, GENFN) \
    [TAG] = generate_##TAG,
#include "tcg_funcs_generated.h"
#undef DEF_TCG_FUNC
};

I've also addressed several of the items from Richard's review, so I'll resubmit the series once I figure out how to get "make check-tcg" working under meson.

Thanks,
Taylor

Richard Henderson Sept. 24, 2020, 3:03 p.m. UTC | #23

On 9/23/20 7:56 PM, Taylor Simpson wrote:
> 

> 

>>> On 8/31/20 4:10 PM, Taylor Simpson wrote:

>>>>

>>>>

>>>>> -----Original Message-----

>>>>> From: Richard Henderson <richard.henderson@linaro.org>

>>>>> Sent: Monday, August 31, 2020 1:20 PM

>>>>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

>>>>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

>>>>> aleksandar.m.mail@gmail.com; ale@rev.ng

>>>>> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

>>>>> instructions with multiple definitions

>>>>>

>>> Ho hum.  Maybe I'm trying to overthink this too much before tackling the

>>> ultimate goal of full parsing of the SHORTCODE.

>>> Perhaps the only thing for the short term is to have the generator grep

>>> genptr.c for "#define fGEN", to choose between the two alternatives:

>> inline

>>> generation or out-of-line helper generation.

>>

>> That's certainly doable.  It will also be good to implement some of your other

>> ideas

>> - Have the generator expand the DECL/READ/WRITE/FREE macros will make

>> the generated code more readable and we can specialize them for

>> predicated vs non-predicated instructions which will make translation faster.

>> - Generate the entire generate_<tag> function instead of just the body will

>> make the generated code more readable.

> 

> I've made these changes to the generator.  I hope you like the results.  As an example, here is what we generate for the add instruction

> 

> DEF_TCG_FUNC(A2_add,

> static void generate_A2_add(

>                 CPUHexagonState *env,

>                 DisasContext *ctx,

>                 insn_t *insn,

>                 packet_t *pkt)

> {

>     TCGv RdV = tcg_temp_local_new();

>     const int RdN = insn->regno[0];

>     TCGv RsV = hex_gpr[insn->regno[1]];

>     TCGv RtV = hex_gpr[insn->regno[2]];

>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);

>     gen_log_reg_write(RdN, RdV);

>     ctx_log_reg_write(ctx, RdN);

>     tcg_temp_free(RdV);

> })


I would be happier if the entire function body were not in a macro.  Have you
tried to set a gdb breakpoint in one of these?  Once upon a time at least, this
would have resulted in all lines of the function becoming one "source line" in
the debug info.

I also think the full function prototype is unnecessary, and the replication of
"A2_add" undesirable.

I would prefer the function prototype itself to be macro-ized.

E.g.

DEF_TCG_FUNC(A2_add)
{
    TCGv RdV = tcg_temp_local_new();
    const int RdN = insn->regno[0];
    TCGv RsV = hex_gpr[insn->regno[1]];
    TCGv RtV = hex_gpr[insn->regno[2]];
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    gen_log_reg_write(RdN, RdV);
    ctx_log_reg_write(ctx, RdN);
    tcg_temp_free(RdV);
}

with

#define DEF_TCG_FUNC(TAG)                             \
    static void generate_##TAG(CPUHexagonState *env,  \
                               DisasContext *ctx,     \
                               insn_t *insn,          \
                               packet_t *pkt)

> And here is how the generated file gets used in genptr.c

> 

> #define DEF_TCG_FUNC(TAG, GENFN) \

>     GENFN

> #include "tcg_funcs_generated.h"

> #undef DEF_TCG_FUNC

> 

> /*

>  * Not all opcodes have generate_<tag> functions, so initialize

>  * the table from the tcg_funcs_generated.h file.

>  */

> const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {

> #define DEF_TCG_FUNC(TAG, GENFN) \

>     [TAG] = generate_##TAG,

> #include "tcg_funcs_generated.h"

> #undef DEF_TCG_FUNC

> };


Obviously, the macro I propose above cannot be directly reused, as you do here.
 But I also think we should not try to do so.

You've got a script generating stuff.  It can just as easily generate two
different lists.  You're trying to do too much with the C preprocessor and too
little with python.

At some point in the v3 thread, I had suggested grepping for some macro in
order to indicate to the python script which tags are implemented manually.  My
definition above is easy to look for: exactly one thing on the line, easy regexp.

> I've also addressed several of the items from Richard's review, so I'll resubmit the series once I figure out how to get "make check-tcg" working under meson.


Yes, make check-tcg is currently broken, as are a few other check-foo.  I've
not yet had the courage to look into it, hoping that someone else will do it first.


r~

Taylor Simpson Sept. 24, 2020, 5:18 p.m. UTC | #24

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Thursday, September 24, 2020 9:04 AM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for

> instructions with multiple definitions

>

> >

> > I've made these changes to the generator.  I hope you like the results.  As

> an example, here is what we generate for the add instruction

> >

> > DEF_TCG_FUNC(A2_add,

> > static void generate_A2_add(

> >                 CPUHexagonState *env,

> >                 DisasContext *ctx,

> >                 insn_t *insn,

> >                 packet_t *pkt)

> > {

> >     TCGv RdV = tcg_temp_local_new();

> >     const int RdN = insn->regno[0];

> >     TCGv RsV = hex_gpr[insn->regno[1]];

> >     TCGv RtV = hex_gpr[insn->regno[2]];

> >     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);

> >     gen_log_reg_write(RdN, RdV);

> >     ctx_log_reg_write(ctx, RdN);

> >     tcg_temp_free(RdV);

> > })

>

> I would be happier if the entire function body were not in a macro.  Have you

> tried to set a gdb breakpoint in one of these?  Once upon a time at least, this

> would have resulted in all lines of the function becoming one "source line" in

> the debug info.


Good point.  It still comes out as a single line.

> I also think the full function prototype is unnecessary, and the replication of

> "A2_add" undesirable.

>

> I would prefer the function prototype itself to be macro-ized.

>

> E.g.

>

> DEF_TCG_FUNC(A2_add)

> {

>     TCGv RdV = tcg_temp_local_new();

>     const int RdN = insn->regno[0];

>     TCGv RsV = hex_gpr[insn->regno[1]];

>     TCGv RtV = hex_gpr[insn->regno[2]];

>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);

>     gen_log_reg_write(RdN, RdV);

>     ctx_log_reg_write(ctx, RdN);

>     tcg_temp_free(RdV);

> }

>

> with

>

> #define DEF_TCG_FUNC(TAG)                             \

>     static void generate_##TAG(CPUHexagonState *env,  \

>                                DisasContext *ctx,     \

>                                insn_t *insn,          \

>                                packet_t *pkt)

>

> > And here is how the generated file gets used in genptr.c

> >

> > #define DEF_TCG_FUNC(TAG, GENFN) \

> >     GENFN

> > #include "tcg_funcs_generated.h"

> > #undef DEF_TCG_FUNC

> >

> > /*

> >  * Not all opcodes have generate_<tag> functions, so initialize

> >  * the table from the tcg_funcs_generated.h file.

> >  */

> > const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {

> > #define DEF_TCG_FUNC(TAG, GENFN) \

> >     [TAG] = generate_##TAG,

> > #include "tcg_funcs_generated.h"

> > #undef DEF_TCG_FUNC

> > };

>

> Obviously, the macro I propose above cannot be directly reused, as you do

> here.

>  But I also think we should not try to do so.

>

> You've got a script generating stuff.  It can just as easily generate two

> different lists.  You're trying to do too much with the C preprocessor and too

> little with python.


Sure, generating two lists was also suggested by Alessandro (ale@rev.ng).  Although doing more in python and less with the C preprocessor would lead to the conclusion we shouldn't define the function prototype in a macro.  If we generate two lists, what is the advantage of putting the function signature in a macro vs generating?

>

> At some point in the v3 thread, I had suggested grepping for some macro in

> order to indicate to the python script which tags are implemented manually.

> My

> definition above is easy to look for: exactly one thing on the line, easy

> regexp.


This is already done as well.  As you may recall, we were previously generating
    #ifdef fGEN_TCG_A2_add
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});
    #else
    do {
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    } while (0);

The generator now searches for "#define fGEN_TCG_<tag>" and generates different code depending on what it finds.  This version of the series only contains overrides that are required for correct execution.  So, A2_add isn't there.  When we do override it, the generator produces this

static void generate_A2_add(
                CPUHexagonState *env,
                DisasContext *ctx,
                insn_t *insn,
                packet_t *pkt)
{
    TCGv RdV = tcg_temp_local_new();
    const int RdN = insn->regno[0];
    TCGv RsV = hex_gpr[insn->regno[1]];
    TCGv RtV = hex_gpr[insn->regno[2]];
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});                  <---- This line is different
    gen_log_reg_write(RdN, RdV);
    ctx_log_reg_write(ctx, RdN);
    tcg_temp_free(RdV);
}

Also, if it finds the override, it doesn't generate the DEF_HELPER or the helper function.  That means we don't include tcg_gen.h in helper.h as you suggested.


Thanks,
Taylor

Richard Henderson Sept. 24, 2020, 7:04 p.m. UTC | #25

On 9/24/20 10:18 AM, Taylor Simpson wrote:
>> You've got a script generating stuff.  It can just as easily generate two

>> different lists.  You're trying to do too much with the C preprocessor and too

>> little with python.

> 

> Sure, generating two lists was also suggested by Alessandro (ale@rev.ng).  Although doing more in python and less with the C preprocessor would lead to the conclusion we shouldn't define the function prototype in a macro.  If we generate two lists, what is the advantage of putting the function signature in a macro vs generating?


None, because...

>> At some point in the v3 thread, I had suggested grepping for some macro in

>> order to indicate to the python script which tags are implemented manually.

>> My

>> definition above is easy to look for: exactly one thing on the line, easy

>> regexp.

> 

> This is already done as well.  As you may recall, we were previously generating

>     #ifdef fGEN_TCG_A2_add

>     fGEN_TCG_A2_add({ RdV=RsV+RtV;});

>     #else

>     do {

>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);

>     } while (0);

> 

> The generator now searches for "#define fGEN_TCG_<tag>" ...


... I'd forgotten that they were two different macros.

> Also, if it finds the override, it doesn't generate the DEF_HELPER or the helper function.  That means we don't include tcg_gen.h in helper.h as you suggested.


Excellent.


r~

Taylor Simpson Oct. 8, 2020, 3 p.m. UTC | #26

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Friday, August 28, 2020 7:17 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros

> referenced in instruction semantics

>

>

> Ah, so I see what you're trying to do with the merge thing, which had the

> host-endian problems.

>

> I think the merge stuff is a mistake.  I think you can get the semantics that

> you want with

>

> probe_read(ld_addr, ld_len)

> qemu_st(st_value, st_addr)

> qemu_ld(ld_value, ld_addr)

>

> In this way, all exceptions are recognized before the store is complete, the

> normal memory operations handle any possible overlap.


Following up on this...

1) We don't need to do the probe_read because the load has already happened at this point.

2) I'm not familiar with qemu_st/qemu_ld.  Are these shorthand for tcg_gen_qemu_st*/tcg_gen_qemu_ld*?  We can't actually do the store at this point because it would alter the memory before we are sure the packet will commit (i.e., there might be still be an exception raised by another instruction in the packet).

3) How important is it to support big endian hosts?  Would it be OK to put something like this to declare it isn't supported for Hexagon?
#if defined(HOST_WORDS_BIGENDIAN)
#error Hexagon guest not supported on big endian host
#endif

4) If the above is not OK, are the macros in bswap.h the correct ones to use?  Is this pseudo-code correct?
store_val = le32_to_cpu(store_val);
load_val = le32_to_cpu(load_val);
<merge bytes>
/* store_val is dead so no need to convert back */
load_val = cpu_to_le32(load_val)


Thanks,
Taylor

Richard Henderson Oct. 8, 2020, 5:30 p.m. UTC | #27

On 10/8/20 10:00 AM, Taylor Simpson wrote:
> 
> 
>> -----Original Message-----
>> From: Richard Henderson <richard.henderson@linaro.org>
>> Sent: Friday, August 28, 2020 7:17 PM
>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
>> aleksandar.m.mail@gmail.com; ale@rev.ng
>> Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros
>> referenced in instruction semantics
>>
>>
>> Ah, so I see what you're trying to do with the merge thing, which had the
>> host-endian problems.
>>
>> I think the merge stuff is a mistake.  I think you can get the semantics that
>> you want with
>>
>> probe_read(ld_addr, ld_len)
>> qemu_st(st_value, st_addr)
>> qemu_ld(ld_value, ld_addr)
>>
>> In this way, all exceptions are recognized before the store is complete, the
>> normal memory operations handle any possible overlap.
> 
> Following up on this...
> 
> 1) We don't need to do the probe_read because the load has already happened at this point.


What do you mean "has already happened"?  How can it have done without doing
the merging by hand.  Which this operation ordering is intended to make
unnecessary.

I think you're missing the point.


> 2) I'm not familiar with qemu_st/qemu_ld.  Are these shorthand for tcg_gen_qemu_st*/tcg_gen_qemu_ld*?

Yes.

> We can't actually do the store at this point because it would alter the memory before we are sure the packet will commit (i.e., there might be still be an exception raised by another instruction in the packet).

What other instruction?  Give me a concrete example so that I can give you a
concrete answer.

I think you may need to do more preprocessing of the entire packet so that you
can answer this sort of question (is there any other possible exception) when
generating code.

> 3) How important is it to support big endian hosts?  Would it be OK to put something like this to declare it isn't supported for Hexagon?
> #if defined(HOST_WORDS_BIGENDIAN)
> #error Hexagon guest not supported on big endian host
> #endif

That would make ./configure && make fail on such a host.
So, no, you can't do that.

> 
> 4) If the above is not OK, are the macros in bswap.h the correct ones to use?  Is this pseudo-code correct?
> store_val = le32_to_cpu(store_val);
> load_val = le32_to_cpu(load_val);
> <merge bytes>
> /* store_val is dead so no need to convert back */
> load_val = cpu_to_le32(load_val)

There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked those
names, but it helps to think about what form the data is already in).


r~

Taylor Simpson Oct. 8, 2020, 6:51 p.m. UTC | #28

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Thursday, October 8, 2020 11:31 AM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros

> referenced in instruction semantics

>

> >> Ah, so I see what you're trying to do with the merge thing, which had the

> >> host-endian problems.

> >>

> >> I think the merge stuff is a mistake.  I think you can get the semantics that

> >> you want with

> >>

> >> probe_read(ld_addr, ld_len)

> >> qemu_st(st_value, st_addr)

> >> qemu_ld(ld_value, ld_addr)

> >>

> >> In this way, all exceptions are recognized before the store is complete,

> the

> >> normal memory operations handle any possible overlap.

> >

> > Following up on this...

> >

> > 1) We don't need to do the probe_read because the load has already

> happened at this point.

>

>

> What do you mean "has already happened"?

> How can it have done without doing the merging by hand.  Which this operation ordering is intended to make unnecessary.

>

> I think you're missing the point.


Sorry I wasn't clear.  We have done the load from memory as it was at the beginning of the packet.  The result of the store is in mem_log_stores in CPUHexagonState.  This code updates the bytes that were loaded with any overlapping bytes from the store that hasn't been committed yet.

>

>

> > 2) I'm not familiar with qemu_st/qemu_ld.  Are these shorthand for

> tcg_gen_qemu_st*/tcg_gen_qemu_ld*?

>

> Yes.

>

> > We can't actually do the store at this point because it would alter the

> memory before we are sure the packet will commit (i.e., there might be still

> be an exception raised by another instruction in the packet).

>

> What other instruction?  Give me a concrete example so that I can give you a

> concrete answer.


Remember, there can be 4 instructions in a packet.  This code is only dealing with two of them (a load and a store).  Here's an example where a different instruction in the packet can fault.

    67f8:       c0 40 21 1f     1f2140c0 {      v0.uh = vsat(v0.uw,v1.uw)
    67fc:       00 45 02 a1     a1024500        memb(r2+#0) = r5
    6800:       02 c0 04 91     9104c002        r2 = memb(r4+#0) }

The vsat instruction requires a vector context.  If the thread doesn't have a vector context, an exception will be raised.  The OS can provide a context and replay the packet.

> I think you may need to do more preprocessing of the entire packet so that

> you

> can answer this sort of question (is there any other possible exception) when

> generating code.

>

> > 3) How important is it to support big endian hosts?  Would it be OK to put

> something like this to declare it isn't supported for Hexagon?

> > #if defined(HOST_WORDS_BIGENDIAN)

> > #error Hexagon guest not supported on big endian host

> > #endif

>

> That would make ./configure && make fail on such a host.

> So, no, you can't do that.

>

> >

> > 4) If the above is not OK, are the macros in bswap.h the correct ones to

> use?  Is this pseudo-code correct?

> > store_val = le32_to_cpu(store_val);

> > load_val = le32_to_cpu(load_val);

> > <merge bytes>

> > /* store_val is dead so no need to convert back */

> > load_val = cpu_to_le32(load_val)

>

> There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked

> those

> names, but it helps to think about what form the data is already in).


So, what is the right sequence?


Thanks,
Taylor

Richard Henderson Oct. 8, 2020, 8:02 p.m. UTC | #29

On 10/8/20 1:51 PM, Taylor Simpson wrote:
>> How can it have done without doing the merging by hand.  Which this operation ordering is intended to make unnecessary.
>>
>> I think you're missing the point.
> 
> Sorry I wasn't clear.  We have done the load from memory as it was at the beginning of the packet.  The result of the store is in mem_log_stores in CPUHexagonState.  This code updates the bytes that were loaded with any overlapping bytes from the store that hasn't been committed yet.

Right, so you *are* missing the point.

The point is to *not* do the load earlier, but only probe the memory for
readability so that any exception is recognized before the store commits.

Then, after the store, actually perform the load.  Thus any overlapping bytes
get the values that they should.

Voila, no by-hand merging.

>     67f8:       c0 40 21 1f     1f2140c0 {      v0.uh = vsat(v0.uw,v1.uw)
>     67fc:       00 45 02 a1     a1024500        memb(r2+#0) = r5
>     6800:       02 c0 04 91     9104c002        r2 = memb(r4+#0) }
> 
> The vsat instruction requires a vector context.  If the thread doesn't have a vector context, an exception will be raised.  The OS can provide a context and replay the packet.

Sure.

Is there any per-packet exception priority that would require a fault from the
store to be provided in preference to the fault for the vector context?

Anyway, I'm suggesting ordering the operations within the packet to be one
that's most convenient for us.

>>> store_val = le32_to_cpu(store_val);
>>> load_val = le32_to_cpu(load_val);
>>> <merge bytes>
>>> /* store_val is dead so no need to convert back */
>>> load_val = cpu_to_le32(load_val)
>>
>> There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked
>> those
>> names, but it helps to think about what form the data is already in).
> 
> So, what is the right sequence?

Well, <merge_bytes> wants to operate on a le ordering, so the final load_val
assignment should use le32_to_cpu.  Think about this in terms of units, like
Fahrenheit vs Celsius.

As for the other two, it depends on where the values come from.  Probably they
should be cpu_to_le32, but I can't tell without extra context.

r~

Taylor Simpson Oct. 8, 2020, 8:54 p.m. UTC | #30

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Thursday, October 8, 2020 2:02 PM

> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org

> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;

> aleksandar.m.mail@gmail.com; ale@rev.ng

> Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros

> referenced in instruction semantics

>

> On 10/8/20 1:51 PM, Taylor Simpson wrote:

> >> How can it have done without doing the merging by hand.  Which this

> operation ordering is intended to make unnecessary.

> >>

> >> I think you're missing the point.

> >

> > Sorry I wasn't clear.  We have done the load from memory as it was at the

> beginning of the packet.  The result of the store is in mem_log_stores in

> CPUHexagonState.  This code updates the bytes that were loaded with any

> overlapping bytes from the store that hasn't been committed yet.

>

> Right, so you *are* missing the point.

>

> The point is to *not* do the load earlier, but only probe the memory for

> readability so that any exception is recognized before the store commits.

>

> Then, after the store, actually perform the load.  Thus any overlapping bytes

> get the values that they should.

That's what I coded originally, but after sleeping on it decided it could lead to problems.  See below...

> Voila, no by-hand merging.

>

> >     67f8:       c0 40 21 1f     1f2140c0 {      v0.uh = vsat(v0.uw,v1.uw)

> >     67fc:       00 45 02 a1     a1024500        memb(r2+#0) = r5

> >     6800:       02 c0 04 91     9104c002        r2 = memb(r4+#0) }

> >

> > The vsat instruction requires a vector context.  If the thread doesn't have a

> vector context, an exception will be raised.  The OS can provide a context and

> replay the packet.

>

> Sure.

>

> Is there any per-packet exception priority that would require a fault from the

> store to be provided in preference to the fault for the vector context?

I don't think there's a priority for the exceptions.  In this example, any of the 3 instructions could generate an exception.  The thing I'm worried about isn't that the store could generate an exception.  The problem is letting the store change the machine state before we're sure none of the instructions will raise an exception.  Maybe I'm worrying about something that would never result in different behavior - if the packet gets replayed, we'll just store the same value again.

I'll reach out to the architects and ask if there is a case where doing the store first could be a problem - and also about the priority of exceptions.

> Anyway, I'm suggesting ordering the operations within the packet to be one

> that's most convenient for us.

This may or may not be possible.  We already have to reorder to put .new consumers after the producers, and we have to keep the change-of-flow instructions in the original order.  So, I'm reluctant to go down this path worrying that there will be a long tail of corner case bugs to engineer out.

Richard Henderson Oct. 9, 2020, 12:59 p.m. UTC | #31

On 10/8/20 3:54 PM, Taylor Simpson wrote:
> I'll reach out to the architects and ask if there is a case where doing the store first could be a problem - and also about the priority of exceptions.


Of course there are cases where doing the store first will cause problems.

> This may or may not be possible.  We already have to reorder to put .new consumers after the producers, and we have to keep the change-of-flow instructions in the original order.  So, I'm reluctant to go down this path worrying that there will be a long tail of corner case bugs to engineer out.


Well, do as you like, I suppose, but I think the merging thing that you were
trying to do is just as fraught with peril.


r~

Taylor Simpson Oct. 9, 2020, 4:02 p.m. UTC | #32

> -----Original Message-----

> From: Richard Henderson <richard.henderson@linaro.org>

> Sent: Friday, October 9, 2020 6:59 AM

>

> On 10/8/20 3:54 PM, Taylor Simpson wrote:

> > I'll reach out to the architects and ask if there is a case where doing the

> store first could be a problem - and also about the priority of exceptions.

>

> Of course there are cases where doing the store first will cause problems.

>

> > This may or may not be possible.  We already have to reorder to put .new

> consumers after the producers, and we have to keep the change-of-flow

> instructions in the original order.  So, I'm reluctant to go down this path

> worrying that there will be a long tail of corner case bugs to engineer out.

>

> Well, do as you like, I suppose, but I think the merging thing that you were

> trying to do is just as fraught with peril.

Thanks for your patience on this.  After speaking to the architects and putting more comments in the decoder (per your feedback),  I'm convinced that doing the store first won't lead to problems.  So, I'll move forward with your suggestion.

In other news, this change and the switch to qemu softfloat are the only blockers from your review.  Once these are done, I'll submit v5 of the series.  Thanks for your feedback, it's made our code significantly better.

Thanks
Taylor