Message ID | 1597765847-16637-1-git-send-email-tsimpson@quicinc.com |
---|---|
Headers | show |
Series | Hexagon patch series | expand |
On 8/18/20 8:50 AM, Taylor Simpson wrote: > +} iclass_t; ... > +extern const char *find_iclass_slots(opcode_t opcode, int itype); ... > +typedef struct { > + const char * const slots; > +} iclass_info_t; I'll note that you aren't following our CODING_STYLE for types. Which of these need to match imported/ and which can you fix. > +typedef enum { > + > +#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS) ICLASS_FROM_TYPE(TYPE), > +#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS) /* nothing */ > +#include "imported/iclass.def" > +#undef DEF_PP_ICLASS32 > +#undef DEF_EE_ICLASS32 > + > +#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS) ICLASS_FROM_TYPE(TYPE), > +#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS) /* nothing */ > +#include "imported/iclass.def" > +#undef DEF_PP_ICLASS32 > +#undef DEF_EE_ICLASS32 Any particular reason why you're defining one as nothing, and doing this dance twice? > +const char *find_iclass_slots(opcode_t opcode, int itype) > +{ > + /* There are some exceptions to what the iclass dictates */ > + if (GET_ATTRIB(opcode, A_ICOP)) { > + return "2"; Why are you returning a string and not a simple bitmask? > + [ICLASS_FROM_TYPE(TYPE)] = { .slots = #SLOTS }, This could be converted to the bitmask with enum { SLOTS_0 = (1 << 0), SLOTS_1 = (1 << 1), SLOTS_23 = (1 << 2) | (1 << 3), ... }; static const uint8_t iclass_info[] = { #define DEF_ICLASS(TYPE, SLOTS) \ [ICLASS_##TYPE] = SLOTS_##SLOTS #define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS) \ DEF_ICLASS(TYPE, SLOTS) #define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS) \ DEF_ICLASS(TYPE, SLOTS) At which point the ultimate consumer, > for (i = 0, slot = 3; i < pkt->num_insns; i++) { > valid_slot_str = get_valid_slot_str(pkt, i); > > while (strchr(valid_slot_str, '0' + slot) == NULL) { > slot--; > } > pkt->insn[i].slot = slot; becomes a simple integer mask check. r~
On 8/18/20 8:50 AM, Taylor Simpson wrote: > +/* Fill in the table with NULLs because not all the opcodes have DEF_QEMU */ > +semantic_insn_t opcode_genptr[] = { > +#define OPCODE(X) NULL > +#include "opcodes_def_generated.h" > + NULL > +#undef OPCODE > +}; > + > +/* This function overwrites the NULL entries where we have a DEF_QEMU */ > +void init_genptr(void) > +{ > +#define DEF_TCG_FUNC(TAG, GENFN) \ > + opcode_genptr[TAG] = generate_##TAG; > +#include "tcg_funcs_generated.h" > +#undef DEF_TCG_FUNC > +} Just size the array properly to start. const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = { #define DEF_TCG_FUNC(TAG, GENFN) \ [TAG] = generate_##TAG, #include "tcg_funcs_generated.h" };
On 8/18/20 8:50 AM, Taylor Simpson wrote: > Implementation of Linux user emulation for Hexagon > Some common files modified in addition to new files in linux-user/hexagon > > Signed-off-by: Taylor Simpson <tsimpson@quicinc.com> > --- Looks plausible. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Friday, August 28, 2020 7:58 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 29/34] Hexagon (target/hexagon) TCG > generation > > On 8/18/20 8:50 AM, Taylor Simpson wrote: > > +/* Fill in the table with NULLs because not all the opcodes have > DEF_QEMU */ > > +semantic_insn_t opcode_genptr[] = { > > +#define OPCODE(X) NULL > > +#include "opcodes_def_generated.h" > > + NULL > > +#undef OPCODE > > +}; > > + > > +/* This function overwrites the NULL entries where we have a DEF_QEMU > */ > > +void init_genptr(void) > > +{ > > +#define DEF_TCG_FUNC(TAG, GENFN) \ > > + opcode_genptr[TAG] = generate_##TAG; > > +#include "tcg_funcs_generated.h" > > +#undef DEF_TCG_FUNC > > +} > > Just size the array properly to start. > > const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = { > > #define DEF_TCG_FUNC(TAG, GENFN) \ > [TAG] = generate_##TAG, > #include "tcg_funcs_generated.h" > > }; OK
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Friday, August 28, 2020 7:49 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 28/34] Hexagon (target/hexagon) TCG > generation helpers > > On 8/18/20 8:50 AM, Taylor Simpson wrote: > > Helpers for reading and writing registers > > Helpers for load-locked/store-conditional > > > > Signed-off-by: Taylor Simpson <tsimpson@quicinc.com> > > --- > > target/hexagon/genptr_helpers.h | 244 > ++++++++++++++++++++++++++++++++++++++++ > > target/hexagon/op_helper.c | 18 +++ > > 2 files changed, 262 insertions(+) > > create mode 100644 target/hexagon/genptr_helpers.h > > > > diff --git a/target/hexagon/genptr_helpers.h > b/target/hexagon/genptr_helpers.h > > new file mode 100644 > > index 0000000..ffcb1e3 > > --- /dev/null > > +++ b/target/hexagon/genptr_helpers.h > > @@ -0,0 +1,244 @@ > > + > > +static inline void gen_log_reg_write(int rnum, TCGv val, int slot, > > + int is_predicated) > > These are quite large. Why are they marked inline? Since this is a header file, it prevents the compiler from complaining when they aren't used. > > > + /* Low word */ > > + tcg_gen_extrl_i64_i32(val32, val); > > + tcg_gen_mov_tl(hex_new_value[rnum], val32); > > Why are you extracting into a temporary? > This could be done with > > tcg_gen_extr_i64_i32(hex_new_value[rnum], > hex_new_value[rnum + 1], val); OK > > +static inline void gen_read_p3_0(TCGv control_reg) > > +{ > > + TCGv pval = tcg_temp_new(); > > + int i; > > + tcg_gen_movi_tl(control_reg, 0); > > + for (i = NUM_PREGS - 1; i >= 0; i--) { > > + tcg_gen_shli_tl(control_reg, control_reg, 8); > > + tcg_gen_andi_tl(pval, hex_pred[i], 0xff); > > + tcg_gen_or_tl(control_reg, control_reg, pval); > > tcg_gen_deposit_tl(control_reg, control_reg, > hex_pred[i], i * 8, 8); OK > > + for (i = 0; i < NUM_PREGS; i++) { > > + tcg_gen_andi_tl(pred_val, control_reg, 0xff); > > + tcg_gen_mov_tl(hex_pred[i], pred_val); > > + tcg_gen_shri_tl(control_reg, control_reg, 8); > > tcg_gen_extract_tl(hex_pred[i], control_reg, i * 8, 8); OK > > +static inline void log_store32(CPUHexagonState *env, target_ulong addr, > > + int32_t val, int width, int slot) > > +{ > > + HEX_DEBUG_LOG("log_store%d(0x%x, %d [0x%x])\n", width, addr, val, > val); > > + env->mem_log_stores[slot].va = addr; > > + env->mem_log_stores[slot].width = width; > > + env->mem_log_stores[slot].data32 = val; > > +} > > + > > +static inline void log_store64(CPUHexagonState *env, target_ulong addr, > > + int64_t val, int width, int slot) > > +{ > > + HEX_DEBUG_LOG("log_store%d(0x%x, %ld [0x%lx])\n", width, addr, > val, val); > > + env->mem_log_stores[slot].va = addr; > > + env->mem_log_stores[slot].width = width; > > + env->mem_log_stores[slot].data64 = val; > > +} > > ... or fold this re-addition back into where it was accidentally removed. ;-) Could you elaborate?
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Friday, August 28, 2020 7:17 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros > referenced in instruction semantics > > On 8/18/20 8:50 AM, Taylor Simpson wrote: > > + * For qemu, we look for a load in slot 0 when there is a store in slot 1 > > + * in the same packet. When we see this, we call a helper that merges the > > + * bytes from the store buffer with the value loaded from memory. > > + */ > > +#define CHECK_NOSHUF(DST, VA, SZ, SIGN) \ > > + do { \ > > + if (insn->slot == 0 && pkt->pkt_has_store_s1) { \ > > + gen_helper_merge_inflight_store##SZ##SIGN(DST, cpu_env, VA, > DST); \ > > + } \ > > + } while (0) > > Ah, so I see what you're trying to do with the merge thing, which had the > host-endian problems. > > I think the merge stuff is a mistake. I think you can get the semantics that > you want with > > probe_read(ld_addr, ld_len) > qemu_st(st_value, st_addr) > qemu_ld(ld_value, ld_addr) > > In this way, all exceptions are recognized before the store is complete, the > normal memory operations handle any possible overlap. So, do this inside the helper? Or is there a way to generate TCG code? > > > +#define fINSERT_BITS(REG, WIDTH, OFFSET, INVAL) \ > > + do { \ > > + REG = ((REG) & ~(((fCONSTLL(1) << (WIDTH)) - 1) << (OFFSET))) | \ > > + (((INVAL) & ((fCONSTLL(1) << (WIDTH)) - 1)) << (OFFSET)); \ > > + } while (0) > > reg = deposit32(reg, offset, width, inval) OK > > +#define fEXTRACTU_BITS(INREG, WIDTH, OFFSET) \ > > + (fZXTN(WIDTH, 32, (INREG >> OFFSET))) > > +#define fEXTRACTU_BIDIR(INREG, WIDTH, OFFSET) \ > > + (fZXTN(WIDTH, 32, fBIDIR_LSHIFTR((INREG), (OFFSET), 4_8))) > > +#define fEXTRACTU_RANGE(INREG, HIBIT, LOWBIT) \ > > + (fZXTN((HIBIT - LOWBIT + 1), 32, (INREG >> LOWBIT))) > > extract32(inreg, offset, width) OK > > +#define fZXTN(N, M, VAL) ((VAL) & ((1LL << (N)) - 1)) > > extract32(VAL, 0, n) OK > > +#define fSXTN(N, M, VAL) \ > > + ((fZXTN(N, M, VAL) ^ (1LL << ((N) - 1))) - (1LL << ((N) - 1))) > > sextract32(val, 0, n) OK > > +#define fRND(A) (((A) + 1) >> 1) > > Does A have a type that won't overflow? > For Arm we write this as > > (A >> 1) + (A & 1) Will investigate > > +#define fDCFETCH(REG) do { REG = REG; } while (0) /* Nothing to do in > qemu */ > > +#define fICINVA(REG) do { REG = REG; } while (0) /* Nothing to do in > qemu */ > > +#define fL2FETCH(ADDR, HEIGHT, WIDTH, STRIDE, FLAGS) > > +#define fDCCLEANA(REG) do { REG = REG; } while (0) /* Nothing to do in > qemu */ > > +#define fDCCLEANINVA(REG) \ > > + do { REG = REG; } while (0) /* Nothing to do in qemu */ > > Is this "R=R" thing trying to avoid a compiler warning? > Perhaps "(void)R" would be sufficient to avoid that? Yes, it avoids a compiler warning. Will change to (void) > > -static inline void log_store32(CPUHexagonState *env, target_ulong addr, > > - target_ulong val, int width, int slot) > > -{ > > - HEX_DEBUG_LOG("log_store%d(0x" TARGET_FMT_lx ", " > TARGET_FMT_ld > > - " [0x" TARGET_FMT_lx "])\n", > > - width, addr, val, val); > > - env->mem_log_stores[slot].va = addr; > > - env->mem_log_stores[slot].width = width; > > - env->mem_log_stores[slot].data32 = val; > > -} > > - > > -static inline void log_store64(CPUHexagonState *env, target_ulong addr, > > - int64_t val, int width, int slot) > > -{ > > - HEX_DEBUG_LOG("log_store%d(0x" TARGET_FMT_lx ", %ld [0x%lx])\n", > > - width, addr, val, val); > > - env->mem_log_stores[slot].va = addr; > > - env->mem_log_stores[slot].width = width; > > - env->mem_log_stores[slot].data64 = val; > > -} > > - > > Fold this back to wherever it came from. Clearly no need to introduce it in > the first place. These are invoked by the MEM_STORE macros. Are you saying to put this code there?
On 8/30/20 1:04 PM, Taylor Simpson wrote:
> So, this should be CamelCase? I should be able to fix all of them.
Yes, they should. Thanks.
r~
Richard, Thank you so much for the feedback. I really appreciate it. I'll get to work addressing the issues. Since some of the items will take longer than others, please advise whether it's preferred to send intermediate updates or wait until they are all addressed. Taylor > -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Friday, August 28, 2020 9:27 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 00/34] Hexagon patch series > > On 8/18/20 8:50 AM, Taylor Simpson wrote: > > This series adds support for the Hexagon processor with Linux user support > > > > See patch 02/34 Hexagon README for detailed information. > > > > Once the series is applied, the Hexagon port will pass "make check-tcg". > > The series also includes Hexagon-specific tests in tcg/tests/hexagon. > > > > We have a parallel effort to make the Hexagon Linux toolchain inside a > docker > > container publically available. > > Oh, excellent. > > > *** Future items under consideration *** > > Use qemu softfloat > > This is a blocker. It's definitely not hard. > > > Use qemu decodetree > > This would certainly clean up all of the string processing that I mentioned. > It seems like it would be just as easy to convert into the decodetree format > as > what you're currently doing for dectree_generated.h. Indeed, easier, since > you > only need the ones that are TERMINAL(). All of the other things labeled > TABLE_LINK are handled by decodetree itself. > > Anyway, I've completed what review I planed to do against this version. > > > r~
On 8/30/20 12:53 PM, Taylor Simpson wrote: >>> +++ b/target/hexagon/genptr_helpers.h >>> @@ -0,0 +1,244 @@ >>> + >>> +static inline void gen_log_reg_write(int rnum, TCGv val, int slot, >>> + int is_predicated) >> >> These are quite large. Why are they marked inline? > > Since this is a header file, it prevents the compiler from complaining when they aren't used. Ok, why are they in a header file? Why would they be unused, come to that? The header file is used exactly once, by genptr.c. Seems to me they could just as well be *in* genptr.c. If the functions are not used, just remove them? >>> +static inline void log_store32(CPUHexagonState *env, target_ulong addr, >>> + int32_t val, int width, int slot) >>> +{ >>> + HEX_DEBUG_LOG("log_store%d(0x%x, %d [0x%x])\n", width, addr, val, >> val); >>> + env->mem_log_stores[slot].va = addr; >>> + env->mem_log_stores[slot].width = width; >>> + env->mem_log_stores[slot].data32 = val; >>> +} >>> + >>> +static inline void log_store64(CPUHexagonState *env, target_ulong addr, >>> + int64_t val, int width, int slot) >>> +{ >>> + HEX_DEBUG_LOG("log_store%d(0x%x, %ld [0x%lx])\n", width, addr, >> val, val); >>> + env->mem_log_stores[slot].va = addr; >>> + env->mem_log_stores[slot].width = width; >>> + env->mem_log_stores[slot].data64 = val; >>> +} >> >> ... or fold this re-addition back into where it was accidentally removed. ;-) > > Could you elaborate? You added this code in one patch (didn't check which), removed it in patch 26, and re-added it here in patch 28. r~
On 8/30/20 1:23 PM, Taylor Simpson wrote: >> I think the merge stuff is a mistake. I think you can get the semantics that >> you want with >> >> probe_read(ld_addr, ld_len) >> qemu_st(st_value, st_addr) >> qemu_ld(ld_value, ld_addr) >> >> In this way, all exceptions are recognized before the store is complete, the >> normal memory operations handle any possible overlap. > > So, do this inside the helper? Or is there a way to generate TCG code? I was thinking TCG code, where you can look at the packet before any code gen, and decide whether or not this situation actually applies. The probe is a simple helper call, for which the generic machinery exists (probe_access, probe_write, probe_read). All you have to do is write the wrapper. The loads and stores are, well, normal loads and stores. r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Sunday, August 30, 2020 2:52 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 28/34] Hexagon (target/hexagon) TCG > generation helpers > > On 8/30/20 12:53 PM, Taylor Simpson wrote: > >>> +++ b/target/hexagon/genptr_helpers.h > >>> @@ -0,0 +1,244 @@ > >>> + > >>> +static inline void gen_log_reg_write(int rnum, TCGv val, int slot, > >>> + int is_predicated) > >> > >> These are quite large. Why are they marked inline? > > > > Since this is a header file, it prevents the compiler from complaining when > they aren't used. > > Ok, why are they in a header file? > Why would they be unused, come to that? > > The header file is used exactly once, by genptr.c. Seems to me they could > just > as well be *in* genptr.c. > > If the functions are not used, just remove them? I could have sworn it was included in more than one C file. I'll move the contents to genptr.c. > >>> +static inline void log_store32(CPUHexagonState *env, target_ulong > addr, > >>> + int32_t val, int width, int slot) > >>> +{ > >>> + HEX_DEBUG_LOG("log_store%d(0x%x, %d [0x%x])\n", width, addr, > val, > >> val); > >>> + env->mem_log_stores[slot].va = addr; > >>> + env->mem_log_stores[slot].width = width; > >>> + env->mem_log_stores[slot].data32 = val; > >>> +} > >>> + > >>> +static inline void log_store64(CPUHexagonState *env, target_ulong > addr, > >>> + int64_t val, int width, int slot) > >>> +{ > >>> + HEX_DEBUG_LOG("log_store%d(0x%x, %ld [0x%lx])\n", width, addr, > >> val, val); > >>> + env->mem_log_stores[slot].va = addr; > >>> + env->mem_log_stores[slot].width = width; > >>> + env->mem_log_stores[slot].data64 = val; > >>> +} > >> > >> ... or fold this re-addition back into where it was accidentally removed. ;-) > > > > Could you elaborate? > > You added this code in one patch (didn't check which), removed it in patch > 26, > and re-added it here in patch 28. My apologies, this is my screwing up the git rebase. I'll fix it. > > > r~
On 8/30/20 1:47 PM, Taylor Simpson wrote: > Richard, > > Thank you so much for the feedback. I really appreciate it. > > I'll get to work addressing the issues. Since some of the items will take longer than others, please advise whether it's preferred to send intermediate updates or wait until they are all addressed. I don't mind intermediate updates. Just keep a list in the cover letter of the things that are still on the to-do list, and I'll not focus on those. We could also talk about what portions of the to-do list are blocker, and what can be done via normal development. Because neither you nor I want to carry around this huge patch set forever. r~
> >> -----Original Message----- > >> From: Richard Henderson <richard.henderson@linaro.org> > >> Sent: Sunday, August 30, 2020 3:13 PM > >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > >> aleksandar.m.mail@gmail.com; ale@rev.ng > >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > >> instructions with multiple definitions > >> > >> On 8/30/20 12:48 PM, Taylor Simpson wrote: > >>> I'll add the following comment to indicate what's going on > >>> > >>> /* > >>> * Each of the generated helpers is wrapped with #ifndef > >> fGEN_TCG_<tag>. > >>> * For example > >>> * #ifndef fGEN_TCG_A2_add > >>> * DEF_HELPER_3(A2_add, s32, env, s32, s32) > >>> * #endif > >>> * We include gen_tcg.h here to provide definitions of fGEN_TCG for > any > >> instructions that > >>> * are overridden. > >>> * > >>> * This prevents unused helpers from taking up space in the executable. > >>> */ > >> > >> Ah, hum. Well. > >> > >> How about we figure out a way to communicate to the generator scripts > >> which > >> functions have been implemented "directly", and don't need to be > >> generated at all? > >> > >> I don't see why we have to wait until the c preprocessor to remove this > >> unused > >> code, and the less we have of it, the better. > >> > > > > A few reasons > > - Makes it easy to override instruction helpers. All one has to do is #define > fGEN_TCG_<tag>. > > If the generator can examine, say, genptr_override.c.inc, then you don't > even > have to add a #define. Just add the code. > > Perhaps something like > > DEFINE_FGEN(tag) > { > // some code > } > > where DEFINE_FGEN expands to the appropriate function declaration. The > generator then need only grep '^DEFINE_FGEN' and extract the list of > overridden > tags. > > > > - When debugging the override, sometimes you want to quickly revert back > to the helper. Or if you've written a bunch of overrides in one shot and then > find that a test case is failing, you can binary search which one to turn off and > get the test to pass. This is the one with the bug to fix. > > No difference there, just binary searching on different text. > > > - Reduces time for an incremental build. When we add or delete an > override, we don't have to re-run the generator. > > About this I care not at all. I can't imagine that more than fractions of a > second are at stake. I can modify the generator to skip instructions with overrides. There are some assumptions here I'd like to clarify. When I started this discussion, there seemed to be general resistance to the concept of a generator. Because of this, I made the generator as simple as possible and pushed the complexity and optimization into code that consumes the output of the generator. Also, I assumed people wouldn't read the output of the generator, so I didn't focus on making it readable. Now, it sounds like my assumptions were not correct. You pointed out two things to do in the generator - Expand the macros inline - Skip the instructions that have overrides I addition, there other things I should do if we want the generated files to be more readable - Indent the code - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag> - Generate the declaration of the generate_<tag> function instead of just the body Thoughts? Thanks, Taylor
On 8/31/20 10:08 AM, Taylor Simpson wrote: > There are some assumptions here I'd like to clarify. When I started this > discussion, there seemed to be general resistance to the concept of a > generator. Because of this, I made the generator as simple as possible and > pushed the complexity and optimization into code that consumes the output of > the generator. Also, I assumed people wouldn't read the output of the > generator, so I didn't focus on making it readable. Except, at some point, one has to debug this code. If the code is unreadable, how do you figure out what's broken? > Now, it sounds like my assumptions were not correct. You pointed out two > things to do in the generator> - Expand the macros inline > - Skip the instructions that have overrides Yes please. > I addition, there other things I should do if we want the generated files to be more readable > - Indent the code Helpful, yes. I wouldn't worry about nested statements within the *.def files too much, except to put each ";" terminated statement on a new line, so that gdb's step command goes to the next statement instead of skipping everything all at once. > - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag> That would be part of "skip the instructions that have overrides", would it not? > - Generate the declaration of the generate_<tag> function instead of just the body I'm not quite sure what this means. Aren't the "generate_<tag>" functions private to genptr.c? Why would we need a separate declaration of them, as opposed to just a definition? r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Monday, August 31, 2020 11:29 AM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > instructions with multiple definitions > > On 8/31/20 10:08 AM, Taylor Simpson wrote: > > There are some assumptions here I'd like to clarify. When I started this > > discussion, there seemed to be general resistance to the concept of a > > generator. Because of this, I made the generator as simple as possible and > > pushed the complexity and optimization into code that consumes the > output of > > the generator. Also, I assumed people wouldn't read the output of the > > generator, so I didn't focus on making it readable. > > Except, at some point, one has to debug this code. > If the code is unreadable, how do you figure out what's broken? > > > Now, it sounds like my assumptions were not correct. You pointed out two > > things to do in the generator> - Expand the macros inline > > - Skip the instructions that have overrides > > Yes please. > > > I addition, there other things I should do if we want the generated files to > be more readable > > - Indent the code > > Helpful, yes. > > I wouldn't worry about nested statements within the *.def files too much, > except to put each ";" terminated statement on a new line, so that gdb's step > command goes to the next statement instead of skipping everything all at > once. > > > - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag> > > That would be part of "skip the instructions that have overrides", would it > not? Just to be explicit, the code that generates code is generated as #ifdef fGEN_TCG_A2_add fGEN_TCG_A2_add({ RdV=RsV+RtV;}); #else do { gen_helper_A2_add(RdV, cpu_env, RsV, RtV); } while (0); #endif If we're going to have the generator know if there is an override, this would be more readable as either fGEN_TCG_A2_add({ RdV=RsV+RtV;}); or gen_helper_A2_add(RdV, cpu_env, RsV, RtV); > > - Generate the declaration of the generate_<tag> function instead of just > the body > > I'm not quite sure what this means. > > Aren't the "generate_<tag>" functions private to genptr.c? Why would we > need a > separate declaration of them, as opposed to just a definition? In genptr.c, there is #define DEF_TCG_FUNC(TAG, GENFN) \ static void generate_##TAG(CPUHexagonState *env, DisasContext *ctx, \ insn_t *insn, packet_t *pkt) \ { \ GENFN \ } #include "tcg_funcs_generated.h" #undef DEF_TCG_FUNC The generated code only has the body of the function. It would be more readable to move the static void generate_##TAG ... into the generated code.
On 8/31/20 10:57 AM, Taylor Simpson wrote: > OK, here's the list of items. Let me know if I missed anything. I'll > indicate which ones can be done quickly and which ones would take more time. > I added a column for blocker if you or anyone else has input on that. > > PatchItemEffortBlocker > Use qemu softfloat??Yes Hmm, this table didn't render. Below, yes/no for blocker column. > Use qemu decodetree.py?? no > SeveralUse const when appropriatesmall yes > SeveralRemove anything after g_assert_not_reachedsmall yes > SeveralFix log_store32/64 add/remove/add in patch seriessmall yes > SeveralFollow naming guidelines for structs and enumssmall yes > 04Move decls to cpu-param.hsmall Yes. The only reason this even compiled is that you don't do system mode yet. ;-) > 04Remove CONFIG_USER_ONLY ifdef'ssmall yes > 04Remove DEBUG_HEXAGONsmall yes > 04Remove stack pointer modification hack, use qemu mechanismsmall yes > 04Add property x-lldb-compat to control output in logsmall yes > 06Include instruction and raw bytes in disassemblysmall yes > 07Use DEF_HELPER_FLAGSsmall no, but should be easy. > 07, 26Endianness of merge_bytessmall Yes, one way or another; hopefully by removing all of the merge_bytes and using probe_read. > 07Fix overlap testsmall yes > 07Remove HELPER(debug_value)/HELPER(debug_value_i64)small yes > 09Include "qemu/osdep.h" instead of <stdint.h>small yes > 10 (and others)Stick with stdint.h types except in imported filessmall yes > 11Remove description from reg field definitionssmall yes > 13Move regmap.h into decode.csmall yes > 14, 27Use bit mask instead of strings in decodingsmall no, but should be easy. > 14Add comments to decodersmall yes > 16Use qemu/int128.hmedium no > 17Squash patches dealing with imported filessmall yes > 24Use qemu/bitops.h for instruction attributessmall no > 24Fix initialization of opcode_short_semanticssmall yes > 24Change if (p == NULL) { g_assert_not_reached(); } to assert(p != NULL)small no > 25Expand DECL/READ/WRITE/FREE macros into generated codesmall Yes. In the end I think some of these will in the end want to be helper functions. As I was thinking how to best write A2_add, I was thinking /* TODO: You currently have an "offset" parameter to DECL_REG. I can't see that it is ever used? I would *hope* that this could be resolved earlier, so that by this time insn->regno[*] is absolute. */ static int resolve_regno(Insn *insn, int slot, int off); /* Return hex_new_value[regno]; log the write. */ static TCGv reg_for_write(DisasContext *ctx, int regno); /* Return hex_reg[regno]; force up-to-date value for PC. */ static TCGv reg_for_read(DisasContext *ctx, int regno); /* if (preg) hex_new_value[regno] = hex_reg[regno], or !preg if !test_positive. Leaves hex_new_value[] canonical for gen_reg_writes, no extra temporary required. */ static void gen_cancel_reg(DisasContext *ctx, int preg, int rreg, bool test_positive); DEF_TCG_FUNC(A2_add) { int rd = resolve_regno(insn, 0, 0); int rs = resolve_regno(insn, 1, 0); int rt = resolve_regno(insn, 2, 0); tcg_gen_add_tl(reg_for_write(ctx, rd), reg_for_read(ctx, rs), reg_for_read(ctx, rt)); } DEF_TCG_FUNC(A2_paddit) { int pu = resolve_regno(insn, 0, 0); int rd = resolve_regno(insn, 1, 0); int rs = resolve_regno(insn, 2, 0); int rt = resolve_regno(insn, 3, 0); tcg_gen_add_tl(reg_for_write(ctx, rd), reg_for_read(ctx, rs), reg_for_read(ctx, rt)); gen_cancel_reg(ctx, insn, rd, pu, true); } However, I don't think we have to have a comprehensive set of these now. We can expand everything into the generator to start, then adjust the generator as we add helper functions and use them within the overrides. > 26Rewrite fINSERT*, fEXTRACT*, f?XTN macrossmall yes > 26Investigate fRND macrosmall no, but should be easy. > 26Change REG = REG to (VOID)REG to suppress compiler warningsmall yes > 27Remove multiple includes of imported/iclass.defsmall yes > 28Move genptr_helpers.h into genptr.csmall yes > 28Remove unneeded tempssmall no > 28Use tcg_gen_deposit_tl and tcg_gen_extract_tl when dealing with p3_0small no > 29Size opcode_genptr[] properly and initialize with [TAG] = generate_##TAGsmall yes; i think this will fall out of other changes. > 30Don't generate helpers for instructions that are overriddensmall yes > Don't include "gen_tcg.h" in helper.h yes > 31Use bitmask for ctx->reg_log instead of an arraysmall yes > 31Use tcg_gen_extract_i32 for gen_slot_cancelled_checksmall yes > 31Properly deal with reading instructions across a page boundary and toomedium > many instructions before finding end-of-packet Yes, this should be easy. Unless there's some surprise in the code I have already suggested code. > 31Don't set PC at the beginning of every packetmedium no > 31Don't set slot_cancelled unless neededsmall no > 31Don't set hex_pred_written unless neededmedium no > 31Change cancelled variable to not localsmall yes > 31Remove unnecessary tempsmall yes > 31Let tcg_gen_addi_tl check for zerosmall yes > 31Move gen_exec_counters to end of TB instead of every packetmedium no > 31Move end of TB handling to hexagon_tr_tb_stopsmall yes r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Monday, August 31, 2020 1:20 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > instructions with multiple definitions > > The fGEN_TCG_A2_add macro does not require nor use that {...} argument. The fGEN_TCG_A2_add macro does need that argument, but there are cases that do need it. Here's an example from gen_tcg.h #define fGEN_TCG_L2_loadrub_pr(SHORTCODE) SHORTCODE This is explained in the README, but basically the argument is useful if we can properly define the macros that it contains to generate TCG. > What it *does* need are the same arguments as are given to generate_<rtag>. I > assume you are using those arguments implicitly in your current > fGEN_TCG_<rtag> > instances? Yes > > It would be cleanest to only have the generate_* functions. > > Either they are written by hand (replacing the current fGEN_TCG_*), or they > are > generated. In either case, there's just the one level of indirection from > opcode_genptr[]. > > I'd imagine > > --- genptr.c > > #define DEF_TCG_FUNC(TAG) \ > static void generate_##TAG(CPUHexagonState *env, \ > DisasContext *ctx, insn_t *insn, packet_t *pkt) > > /* > * All IIDs with an explicit implementation, > * overriding the auto-generated helper functions. > */ > > DEF_TCG_FUNC(A2_add) > { > /* { RdV=RsV+RtV;} */ > tcg_gen_add_tl(args...); > } There's additional generated code before and after the tcg_gen_add_tl. IMO, we don't want the person who writes an override having to reproduce the generated code. Assuming we have a definition of fGEN_TCG_A2_add and we have the generator intelligently expanding the macros, this is what will be generated. static void generate_A2_add(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt) { /* A2_add */ int RdN =insn->regno[0]; TCGv RdV = tcg_temp_local_new(); int RsN = insn->regno[1]; TCGv RsV = hex_gpr[RsN]; int RtN = insn->regno[2]; TCGv RtV = hex_gpr[RtN]; fGEN_TCG_A2_add({ RdV=RsV+RtV;}); gen_log_reg_write(RdN, RdV, insn->slot, 0); ctx_log_reg_write(ctx, RdN); tcg_temp_free(RdV); /* A2_add */ } If there weren't an override, we'd get this static void generate_A2_add(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt) { /* A2_add */ int RdN =insn->regno[0]; TCGv RdV = tcg_temp_local_new(); int RsN = insn->regno[1]; TCGv RsV = hex_gpr[RsN]; int RtN = insn->regno[2]; TCGv RtV = hex_gpr[RtN]; gen_helper_A2_add(RdV, cpu_env, RsV, RtV); /* Only difference is this line */ gen_log_reg_write(RdN, RdV, insn->slot, 0); ctx_log_reg_write(ctx, RdN); tcg_temp_free(RdV); /* A2_add */ } The fGEN_TCG_<tag> macro can also mention the operands of the instruction (RdV, RsV, RtV in this example). Unlike the generate_<tag> functions that all have the same signature. The overrides would have different signatures. This would be more defensive programming because you know exactly where the variables come from but more verbose when writing the overrides by hand. Also, note that these need to be macros in order to take advantage of the SHORTCODE. In other words, instead of #define fGEN_TCG_A2_add(SHORTCODE) tcg_gen_add_tl(RdV, RsV, RtV) We would write #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, SHORTCODE) tcg_gen_add_tl(RdV, RsV, RtV); Personally, I prefer the former, but will change to the latter if you feel strongly. I'm not married to the fGEN_TCG_<tag> name. DEF_TCG_<tag> would also be fine. > > /* > * Generate calls to the auto-generate helpers, > * and slot everything into the opcode_genptr table. > */ > #include "genptr_generated.c.inc" > > --- genptr_generated.c.inc > > DEF_TCG_FUNC(A4_tlbmatch) > { > gen_helper_A4_tlbmatch(args...); > } > > // etc > > const SemanticInsn opcode_genptr[] = { > // All IID's, generated or not. > }; > > --- > > This leaves genptr.c as the file to grep for '^DEF_TCG_FUNC'. > > > r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Monday, August 31, 2020 2:44 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 00/34] Hexagon patch series > > On 8/31/20 10:57 AM, Taylor Simpson wrote: > > OK, here's the list of items. Let me know if I missed anything. I'll > > indicate which ones can be done quickly and which ones would take more > time. > > I added a column for blocker if you or anyone else has input on that. > > > > PatchItemEffortBlocker > > Use qemu softfloat??Yes > > Hmm, this table didn't render. Below, yes/no for blocker column. Sorry about that - not sure what happened. I will work all those you marked "yes" or "no, but should be easy". > > 25Expand DECL/READ/WRITE/FREE macros into generated codesmall > > Yes. > > In the end I think some of these will in the end want to be helper functions. > As I was thinking how to best write A2_add, I was thinking See my response to the thread on patch 30/34. Since you mention A2_paddit, here's what it would look like assuming it is overridden. static void generate_A2_paddt(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt) { /* A2_paddit */ int PuN = insn->regno[0]; TCGv PuV = hex_pred[PuN]; Int RdN = insn->regno[1]; TCGv RdV = tcg_temp_local_new(); if (!is_preloaded(ctx, RdN)) { tcg_gen_mov_tl(hex_new_value[RdN], hex_gpr[RdN]); } int RsN = insn->regno[2]; TCGv RsV = hex_gpr[RsN]; int siV = insn->immed[0]; fGEN_TCG_A2_paddit({if(fLSBOLD(PuV)){fIMMEXT(siV); RdV=RsV+siV;} else {CANCEL;}}); gen_log_reg_write(RdN, RdV, insn->slot, 1); /* Only does the write if we haven't cancelled */ ctx_log_reg_write(ctx, RdN); tcg_temp_free(RdV); /* A2_paddit */ } Here's what the override looks like (there are a bunch of these, so we have a helper macro which could also be a function) /* Predicated add instructions */ #define GEN_TCG_padd(PRED, ADD) \ do { \ TCGv LSB = tcg_temp_new(); \ TCGv mask = tcg_temp_new(); \ TCGv zero = tcg_const_tl(0); \ PRED; \ ADD; \ tcg_gen_movi_tl(mask, 1 << insn->slot); \ tcg_gen_or_tl(mask, hex_slot_cancelled, mask); \ tcg_gen_movcond_tl(TCG_COND_NE, hex_slot_cancelled, LSB, zero, \ hex_slot_cancelled, mask); \ tcg_temp_free(LSB); \ tcg_temp_free(mask); \ tcg_temp_free(zero); \ } while (0) #define fGEN_TCG_A2_paddit(SHORTCODE) \ GEN_TCG_padd(fLSBOLD(PuV), tcg_gen_addi_tl(RdV, RsV, siV))
On 8/31/20 4:10 PM, Taylor Simpson wrote: > > >> -----Original Message----- >> From: Richard Henderson <richard.henderson@linaro.org> >> Sent: Monday, August 31, 2020 1:20 PM >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; >> aleksandar.m.mail@gmail.com; ale@rev.ng >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for >> instructions with multiple definitions >> >> The fGEN_TCG_A2_add macro does not require nor use that {...} argument. > > The fGEN_TCG_A2_add macro does need that argument, but there are cases that > do need it. Here's an example from gen_tcg.h > #define fGEN_TCG_L2_loadrub_pr(SHORTCODE) SHORTCODE > This is explained in the README, but basically the argument is useful if we > can properly define the macros that it contains to generate TCG. We're certainly not going to be able to handle e.g. "+" or "if", so it is going to work only for the most trivial of SHORTCODE. Though in fact loadrub_pr makes that grade... > IMO, we don't want the person who writes an override having to reproduce the > generated code. Assuming we have a definition of fGEN_TCG_A2_add and we > have the generator intelligently expanding the macros, this is what will be > generated. You need to give me a better example that A2_add, then. Because I see that being exactly one line, calling a helper that handles all instructions of the same format, passing tcg_gen_add_tl as a callback. Have a browse through my recent microblaze decodetree conversion. Note that the basic logical operations are implemented with exactly one source line. > Unlike the generate_<tag> functions that all have the same signature. The overrides would have different signatures. This would be more defensive programming because you know exactly where the variables come from but more verbose when writing the overrides by hand. Also, note that these need to be macros in order to take advantage of the SHORTCODE. > > In other words, instead of > #define fGEN_TCG_A2_add(SHORTCODE) tcg_gen_add_tl(RdV, RsV, RtV) > > We would write > #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, SHORTCODE) tcg_gen_add_tl(RdV, RsV, RtV); > > Personally, I prefer the former, but will change to the latter if you feel strongly. This comes from trying to handle instructions in different ways, but represent them all the same. I guess I see the attraction of the magic non-parameters -- you get a compilation error if the variable is not present, but are not tied to positional parameters. Ho hum. Maybe I'm trying to overthink this too much before tackling the ultimate goal of full parsing of the SHORTCODE. Perhaps the only thing for the short term is to have the generator grep genptr.c for "#define fGEN", to choose between the two alternatives: inline generation or out-of-line helper generation. r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Monday, August 31, 2020 8:41 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > instructions with multiple definitions > > On 8/31/20 4:10 PM, Taylor Simpson wrote: > > > > > >> -----Original Message----- > >> From: Richard Henderson <richard.henderson@linaro.org> > >> Sent: Monday, August 31, 2020 1:20 PM > >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > >> aleksandar.m.mail@gmail.com; ale@rev.ng > >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > >> instructions with multiple definitions > >> > >> The fGEN_TCG_A2_add macro does not require nor use that {...} > argument. > > > > The fGEN_TCG_A2_add macro does need that argument, but there are > cases that > > do need it. Here's an example from gen_tcg.h > > #define fGEN_TCG_L2_loadrub_pr(SHORTCODE) SHORTCODE > > This is explained in the README, but basically the argument is useful if we > > can properly define the macros that it contains to generate TCG. > We're certainly not going to be able to handle e.g. "+" or "if", so it is going > to work only for the most trivial of SHORTCODE. > > Though in fact loadrub_pr makes that grade... The prior version of this series included all the overrides I've written to date. To reduce the size of this version, I removed most of them and only left the ones that are essential for correct execution. I plan to submit the others in subsequent series. Anyway, there are >50 overrides of load/store instructions that leverage SHORTCODE. > > IMO, we don't want the person who writes an override having to > reproduce the > > generated code. Assuming we have a definition of fGEN_TCG_A2_add and > we > > have the generator intelligently expanding the macros, this is what will be > > generated. > You need to give me a better example that A2_add, then. Because I see that > being exactly one line, calling a helper that handles all instructions of the > same format, passing tcg_gen_add_tl as a callback. Here's a more complicated example for a predicated post-increment load Static void generate_L2_ploadrit_pi(CPUHexagonState *env, DisasContext*cts, insn_t *insn, packet_t *pkt) { /* L2_ploadrit_pi */ TCGv EA = tcg_temp_local_new(); int PtN = insn->regno[0]; TCGv PtV = hex_pred[PtN]; int RdN = insn->regno[1]; TCGv RdV = tcg_temp_local_new(); if (!is_preloaded(ctx, RdN)) { tcg_gen_mov_tl(hex_hew_value[RdN], hex_gpr[RdN]); } int RxN = insn->regno[2]; TCGv RxV = tcg_temp_local_new(); if (!is_preloaded(ctx, RxN)) { tcg_gen_mov_tl(hex_new_value[RdN], hex_gpr[RxN]); } int siV = insn->immed[0]; tcg_gen_mov_tl(RxV, hex_gpr[RxN]); fGEN_TCG_L2_ploadrit_pi({fEA_REG(RxV); if(fLSBOLD(PtV)){ fPM_I(RxV,siV); fLOAD(1,4,u,EA,RdV);} else {LOAD_CANCEL(EA);}}); gen_log_reg_write(RdN, RdV, insn->slot, 1); gen_log_reg_write(RxN, RxV, insn->slot, 1); tcg_temp_free(EA); tcg_temp_free(RdV); tcg_temp_free(RxV); /* L2_ploadrit_pi */ } > Have a browse through my recent microblaze decodetree conversion. Note > that > the basic logical operations are implemented with exactly one source line. With a helper function, our compares are all one line also static inline void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2) { TCGv one = tcg_const_tl(0xff); TCGv zero = tcg_const_tl(0); tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero); tcg_temp_free(one); tcg_temp_free(zero); } /* Compare instructions */ #define fGEN_TCG_C2_cmpeq(SHORTCODE) \ gen_compare(TCG_COND_EQ, PdV, RsV, RtV) #define fGEN_TCG_C4_cmpneq(SHORTCODE) \ gen_compare(TCG_COND_NE, PdV, RsV, RtV) #define fGEN_TCG_C2_cmpgt(SHORTCODE) \ gen_compare(TCG_COND_GT, PdV, RsV, RtV) #define fGEN_TCG_C2_cmpgtu(SHORTCODE) \ gen_compare(TCG_COND_GTU, PdV, RsV, RtV) ... > > Unlike the generate_<tag> functions that all have the same signature. The > overrides would have different signatures. This would be more defensive > programming because you know exactly where the variables come from but > more verbose when writing the overrides by hand. Also, note that these > need to be macros in order to take advantage of the SHORTCODE. > > > > In other words, instead of > > #define fGEN_TCG_A2_add(SHORTCODE) tcg_gen_add_tl(RdV, RsV, RtV) > > > > We would write > > #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, > SHORTCODE) tcg_gen_add_tl(RdV, RsV, RtV); > > > > Personally, I prefer the former, but will change to the latter if you feel > strongly. > > This comes from trying to handle instructions in different ways, but > represent > them all the same. > > I guess I see the attraction of the magic non-parameters -- you get a > compilation error if the variable is not present, but are not tied to > positional parameters. > > Ho hum. Maybe I'm trying to overthink this too much before tackling the > ultimate goal of full parsing of the SHORTCODE. Alessandro (ale@rev.ng) and Niccolo (nizzo@rev.ng) are working on this 😊 > Perhaps the only thing for the short term is to have the generator grep > genptr.c for "#define fGEN", to choose between the two alternatives: inline > generation or out-of-line helper generation. That's certainly doable. It will also be good to implement some of your other ideas - Have the generator expand the DECL/READ/WRITE/FREE macros will make the generated code more readable and we can specialize them for predicated vs non-predicated instructions which will make translation faster. - Generate the entire generate_<tag> function instead of just the body will make the generated code more readable.
> -----Original Message----- > From: Qemu-devel <qemu-devel-bounces+bcain=quicinc.com@nongnu.org> > On Behalf Of Rob Landley ... > > On 8/30/20 3:47 PM, Taylor Simpson wrote: > > Richard, > > > > Thank you so much for the feedback. I really appreciate it. > > > > I'll get to work addressing the issues. Since some of the items will take longer > than others, please advise whether it's preferred to send intermediate updates > or wait until they are all addressed. > > > > Taylor > > Which branch of https://github.com/quic/qemu do I pull to try this without > scraping 30 patches out of the mailing list? IIRC this patch series was "small_series_v3" > >>> Once the series is applied, the Hexagon port will pass "make check-tcg". > >>> The series also includes Hexagon-specific tests in tcg/tests/hexagon. > >>> > >>> We have a parallel effort to make the Hexagon Linux toolchain inside > >>> a > >> docker > >>> container publically available. > >> > >> Oh, excellent. > > Is that a follow-up to https://www.spinics.net/lists/linux- > hexagon/msg01706.html > and is this a clang toolchain or a gcc toolchain? It's a clang toolchain. > I've noticed gcc 9.2 has hexagon in config.guess and config.sub, but the only > other file outside of the test suite containing the word "hexagon" in a case > insensitive match is the Changelog saying Linas Veptas added it to config.guess > and config.sub in 2011. And while https://github.com/quic has a musl fork it > doesn't have any compiler or linker forks... clang and lld support for hexagon exists in the upstream llvm-project repo. -Brian
> > On 8/31/20 4:10 PM, Taylor Simpson wrote: > > > > > > > > >> -----Original Message----- > > >> From: Richard Henderson <richard.henderson@linaro.org> > > >> Sent: Monday, August 31, 2020 1:20 PM > > >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > > >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > > >> aleksandar.m.mail@gmail.com; ale@rev.ng > > >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > > >> instructions with multiple definitions > > >> > > Ho hum. Maybe I'm trying to overthink this too much before tackling the > > ultimate goal of full parsing of the SHORTCODE. > > Perhaps the only thing for the short term is to have the generator grep > > genptr.c for "#define fGEN", to choose between the two alternatives: > inline > > generation or out-of-line helper generation. > > That's certainly doable. It will also be good to implement some of your other > ideas > - Have the generator expand the DECL/READ/WRITE/FREE macros will make > the generated code more readable and we can specialize them for > predicated vs non-predicated instructions which will make translation faster. > - Generate the entire generate_<tag> function instead of just the body will > make the generated code more readable. I've made these changes to the generator. I hope you like the results. As an example, here is what we generate for the add instruction DEF_TCG_FUNC(A2_add, static void generate_A2_add( CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt) { TCGv RdV = tcg_temp_local_new(); const int RdN = insn->regno[0]; TCGv RsV = hex_gpr[insn->regno[1]]; TCGv RtV = hex_gpr[insn->regno[2]]; gen_helper_A2_add(RdV, cpu_env, RsV, RtV); gen_log_reg_write(RdN, RdV); ctx_log_reg_write(ctx, RdN); tcg_temp_free(RdV); }) And here is how the generated file gets used in genptr.c #define DEF_TCG_FUNC(TAG, GENFN) \ GENFN #include "tcg_funcs_generated.h" #undef DEF_TCG_FUNC /* * Not all opcodes have generate_<tag> functions, so initialize * the table from the tcg_funcs_generated.h file. */ const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = { #define DEF_TCG_FUNC(TAG, GENFN) \ [TAG] = generate_##TAG, #include "tcg_funcs_generated.h" #undef DEF_TCG_FUNC }; I've also addressed several of the items from Richard's review, so I'll resubmit the series once I figure out how to get "make check-tcg" working under meson. Thanks, Taylor
On 9/23/20 7:56 PM, Taylor Simpson wrote: > > >>> On 8/31/20 4:10 PM, Taylor Simpson wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Richard Henderson <richard.henderson@linaro.org> >>>>> Sent: Monday, August 31, 2020 1:20 PM >>>>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org >>>>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; >>>>> aleksandar.m.mail@gmail.com; ale@rev.ng >>>>> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for >>>>> instructions with multiple definitions >>>>> >>> Ho hum. Maybe I'm trying to overthink this too much before tackling the >>> ultimate goal of full parsing of the SHORTCODE. >>> Perhaps the only thing for the short term is to have the generator grep >>> genptr.c for "#define fGEN", to choose between the two alternatives: >> inline >>> generation or out-of-line helper generation. >> >> That's certainly doable. It will also be good to implement some of your other >> ideas >> - Have the generator expand the DECL/READ/WRITE/FREE macros will make >> the generated code more readable and we can specialize them for >> predicated vs non-predicated instructions which will make translation faster. >> - Generate the entire generate_<tag> function instead of just the body will >> make the generated code more readable. > > I've made these changes to the generator. I hope you like the results. As an example, here is what we generate for the add instruction > > DEF_TCG_FUNC(A2_add, > static void generate_A2_add( > CPUHexagonState *env, > DisasContext *ctx, > insn_t *insn, > packet_t *pkt) > { > TCGv RdV = tcg_temp_local_new(); > const int RdN = insn->regno[0]; > TCGv RsV = hex_gpr[insn->regno[1]]; > TCGv RtV = hex_gpr[insn->regno[2]]; > gen_helper_A2_add(RdV, cpu_env, RsV, RtV); > gen_log_reg_write(RdN, RdV); > ctx_log_reg_write(ctx, RdN); > tcg_temp_free(RdV); > }) I would be happier if the entire function body were not in a macro. Have you tried to set a gdb breakpoint in one of these? Once upon a time at least, this would have resulted in all lines of the function becoming one "source line" in the debug info. I also think the full function prototype is unnecessary, and the replication of "A2_add" undesirable. I would prefer the function prototype itself to be macro-ized. E.g. DEF_TCG_FUNC(A2_add) { TCGv RdV = tcg_temp_local_new(); const int RdN = insn->regno[0]; TCGv RsV = hex_gpr[insn->regno[1]]; TCGv RtV = hex_gpr[insn->regno[2]]; gen_helper_A2_add(RdV, cpu_env, RsV, RtV); gen_log_reg_write(RdN, RdV); ctx_log_reg_write(ctx, RdN); tcg_temp_free(RdV); } with #define DEF_TCG_FUNC(TAG) \ static void generate_##TAG(CPUHexagonState *env, \ DisasContext *ctx, \ insn_t *insn, \ packet_t *pkt) > And here is how the generated file gets used in genptr.c > > #define DEF_TCG_FUNC(TAG, GENFN) \ > GENFN > #include "tcg_funcs_generated.h" > #undef DEF_TCG_FUNC > > /* > * Not all opcodes have generate_<tag> functions, so initialize > * the table from the tcg_funcs_generated.h file. > */ > const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = { > #define DEF_TCG_FUNC(TAG, GENFN) \ > [TAG] = generate_##TAG, > #include "tcg_funcs_generated.h" > #undef DEF_TCG_FUNC > }; Obviously, the macro I propose above cannot be directly reused, as you do here. But I also think we should not try to do so. You've got a script generating stuff. It can just as easily generate two different lists. You're trying to do too much with the C preprocessor and too little with python. At some point in the v3 thread, I had suggested grepping for some macro in order to indicate to the python script which tags are implemented manually. My definition above is easy to look for: exactly one thing on the line, easy regexp. > I've also addressed several of the items from Richard's review, so I'll resubmit the series once I figure out how to get "make check-tcg" working under meson. Yes, make check-tcg is currently broken, as are a few other check-foo. I've not yet had the courage to look into it, hoping that someone else will do it first. r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Thursday, September 24, 2020 9:04 AM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for > instructions with multiple definitions > > > > > I've made these changes to the generator. I hope you like the results. As > an example, here is what we generate for the add instruction > > > > DEF_TCG_FUNC(A2_add, > > static void generate_A2_add( > > CPUHexagonState *env, > > DisasContext *ctx, > > insn_t *insn, > > packet_t *pkt) > > { > > TCGv RdV = tcg_temp_local_new(); > > const int RdN = insn->regno[0]; > > TCGv RsV = hex_gpr[insn->regno[1]]; > > TCGv RtV = hex_gpr[insn->regno[2]]; > > gen_helper_A2_add(RdV, cpu_env, RsV, RtV); > > gen_log_reg_write(RdN, RdV); > > ctx_log_reg_write(ctx, RdN); > > tcg_temp_free(RdV); > > }) > > I would be happier if the entire function body were not in a macro. Have you > tried to set a gdb breakpoint in one of these? Once upon a time at least, this > would have resulted in all lines of the function becoming one "source line" in > the debug info. Good point. It still comes out as a single line. > I also think the full function prototype is unnecessary, and the replication of > "A2_add" undesirable. > > I would prefer the function prototype itself to be macro-ized. > > E.g. > > DEF_TCG_FUNC(A2_add) > { > TCGv RdV = tcg_temp_local_new(); > const int RdN = insn->regno[0]; > TCGv RsV = hex_gpr[insn->regno[1]]; > TCGv RtV = hex_gpr[insn->regno[2]]; > gen_helper_A2_add(RdV, cpu_env, RsV, RtV); > gen_log_reg_write(RdN, RdV); > ctx_log_reg_write(ctx, RdN); > tcg_temp_free(RdV); > } > > with > > #define DEF_TCG_FUNC(TAG) \ > static void generate_##TAG(CPUHexagonState *env, \ > DisasContext *ctx, \ > insn_t *insn, \ > packet_t *pkt) > > > And here is how the generated file gets used in genptr.c > > > > #define DEF_TCG_FUNC(TAG, GENFN) \ > > GENFN > > #include "tcg_funcs_generated.h" > > #undef DEF_TCG_FUNC > > > > /* > > * Not all opcodes have generate_<tag> functions, so initialize > > * the table from the tcg_funcs_generated.h file. > > */ > > const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = { > > #define DEF_TCG_FUNC(TAG, GENFN) \ > > [TAG] = generate_##TAG, > > #include "tcg_funcs_generated.h" > > #undef DEF_TCG_FUNC > > }; > > Obviously, the macro I propose above cannot be directly reused, as you do > here. > But I also think we should not try to do so. > > You've got a script generating stuff. It can just as easily generate two > different lists. You're trying to do too much with the C preprocessor and too > little with python. Sure, generating two lists was also suggested by Alessandro (ale@rev.ng). Although doing more in python and less with the C preprocessor would lead to the conclusion we shouldn't define the function prototype in a macro. If we generate two lists, what is the advantage of putting the function signature in a macro vs generating? > > At some point in the v3 thread, I had suggested grepping for some macro in > order to indicate to the python script which tags are implemented manually. > My > definition above is easy to look for: exactly one thing on the line, easy > regexp. This is already done as well. As you may recall, we were previously generating #ifdef fGEN_TCG_A2_add fGEN_TCG_A2_add({ RdV=RsV+RtV;}); #else do { gen_helper_A2_add(RdV, cpu_env, RsV, RtV); } while (0); The generator now searches for "#define fGEN_TCG_<tag>" and generates different code depending on what it finds. This version of the series only contains overrides that are required for correct execution. So, A2_add isn't there. When we do override it, the generator produces this static void generate_A2_add( CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt) { TCGv RdV = tcg_temp_local_new(); const int RdN = insn->regno[0]; TCGv RsV = hex_gpr[insn->regno[1]]; TCGv RtV = hex_gpr[insn->regno[2]]; fGEN_TCG_A2_add({ RdV=RsV+RtV;}); <---- This line is different gen_log_reg_write(RdN, RdV); ctx_log_reg_write(ctx, RdN); tcg_temp_free(RdV); } Also, if it finds the override, it doesn't generate the DEF_HELPER or the helper function. That means we don't include tcg_gen.h in helper.h as you suggested. Thanks, Taylor
On 9/24/20 10:18 AM, Taylor Simpson wrote: >> You've got a script generating stuff. It can just as easily generate two >> different lists. You're trying to do too much with the C preprocessor and too >> little with python. > > Sure, generating two lists was also suggested by Alessandro (ale@rev.ng). Although doing more in python and less with the C preprocessor would lead to the conclusion we shouldn't define the function prototype in a macro. If we generate two lists, what is the advantage of putting the function signature in a macro vs generating? None, because... >> At some point in the v3 thread, I had suggested grepping for some macro in >> order to indicate to the python script which tags are implemented manually. >> My >> definition above is easy to look for: exactly one thing on the line, easy >> regexp. > > This is already done as well. As you may recall, we were previously generating > #ifdef fGEN_TCG_A2_add > fGEN_TCG_A2_add({ RdV=RsV+RtV;}); > #else > do { > gen_helper_A2_add(RdV, cpu_env, RsV, RtV); > } while (0); > > The generator now searches for "#define fGEN_TCG_<tag>" ... ... I'd forgotten that they were two different macros. > Also, if it finds the override, it doesn't generate the DEF_HELPER or the helper function. That means we don't include tcg_gen.h in helper.h as you suggested. Excellent. r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Friday, August 28, 2020 7:17 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros > referenced in instruction semantics > > > Ah, so I see what you're trying to do with the merge thing, which had the > host-endian problems. > > I think the merge stuff is a mistake. I think you can get the semantics that > you want with > > probe_read(ld_addr, ld_len) > qemu_st(st_value, st_addr) > qemu_ld(ld_value, ld_addr) > > In this way, all exceptions are recognized before the store is complete, the > normal memory operations handle any possible overlap. Following up on this... 1) We don't need to do the probe_read because the load has already happened at this point. 2) I'm not familiar with qemu_st/qemu_ld. Are these shorthand for tcg_gen_qemu_st*/tcg_gen_qemu_ld*? We can't actually do the store at this point because it would alter the memory before we are sure the packet will commit (i.e., there might be still be an exception raised by another instruction in the packet). 3) How important is it to support big endian hosts? Would it be OK to put something like this to declare it isn't supported for Hexagon? #if defined(HOST_WORDS_BIGENDIAN) #error Hexagon guest not supported on big endian host #endif 4) If the above is not OK, are the macros in bswap.h the correct ones to use? Is this pseudo-code correct? store_val = le32_to_cpu(store_val); load_val = le32_to_cpu(load_val); <merge bytes> /* store_val is dead so no need to convert back */ load_val = cpu_to_le32(load_val) Thanks, Taylor
On 10/8/20 10:00 AM, Taylor Simpson wrote: > > >> -----Original Message----- >> From: Richard Henderson <richard.henderson@linaro.org> >> Sent: Friday, August 28, 2020 7:17 PM >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; >> aleksandar.m.mail@gmail.com; ale@rev.ng >> Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros >> referenced in instruction semantics >> >> >> Ah, so I see what you're trying to do with the merge thing, which had the >> host-endian problems. >> >> I think the merge stuff is a mistake. I think you can get the semantics that >> you want with >> >> probe_read(ld_addr, ld_len) >> qemu_st(st_value, st_addr) >> qemu_ld(ld_value, ld_addr) >> >> In this way, all exceptions are recognized before the store is complete, the >> normal memory operations handle any possible overlap. > > Following up on this... > > 1) We don't need to do the probe_read because the load has already happened at this point. What do you mean "has already happened"? How can it have done without doing the merging by hand. Which this operation ordering is intended to make unnecessary. I think you're missing the point. > 2) I'm not familiar with qemu_st/qemu_ld. Are these shorthand for tcg_gen_qemu_st*/tcg_gen_qemu_ld*? Yes. > We can't actually do the store at this point because it would alter the memory before we are sure the packet will commit (i.e., there might be still be an exception raised by another instruction in the packet). What other instruction? Give me a concrete example so that I can give you a concrete answer. I think you may need to do more preprocessing of the entire packet so that you can answer this sort of question (is there any other possible exception) when generating code. > 3) How important is it to support big endian hosts? Would it be OK to put something like this to declare it isn't supported for Hexagon? > #if defined(HOST_WORDS_BIGENDIAN) > #error Hexagon guest not supported on big endian host > #endif That would make ./configure && make fail on such a host. So, no, you can't do that. > > 4) If the above is not OK, are the macros in bswap.h the correct ones to use? Is this pseudo-code correct? > store_val = le32_to_cpu(store_val); > load_val = le32_to_cpu(load_val); > <merge bytes> > /* store_val is dead so no need to convert back */ > load_val = cpu_to_le32(load_val) There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked those names, but it helps to think about what form the data is already in). r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Thursday, October 8, 2020 11:31 AM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros > referenced in instruction semantics > > >> Ah, so I see what you're trying to do with the merge thing, which had the > >> host-endian problems. > >> > >> I think the merge stuff is a mistake. I think you can get the semantics that > >> you want with > >> > >> probe_read(ld_addr, ld_len) > >> qemu_st(st_value, st_addr) > >> qemu_ld(ld_value, ld_addr) > >> > >> In this way, all exceptions are recognized before the store is complete, > the > >> normal memory operations handle any possible overlap. > > > > Following up on this... > > > > 1) We don't need to do the probe_read because the load has already > happened at this point. > > > What do you mean "has already happened"? > How can it have done without doing the merging by hand. Which this operation ordering is intended to make unnecessary. > > I think you're missing the point. Sorry I wasn't clear. We have done the load from memory as it was at the beginning of the packet. The result of the store is in mem_log_stores in CPUHexagonState. This code updates the bytes that were loaded with any overlapping bytes from the store that hasn't been committed yet. > > > > 2) I'm not familiar with qemu_st/qemu_ld. Are these shorthand for > tcg_gen_qemu_st*/tcg_gen_qemu_ld*? > > Yes. > > > We can't actually do the store at this point because it would alter the > memory before we are sure the packet will commit (i.e., there might be still > be an exception raised by another instruction in the packet). > > What other instruction? Give me a concrete example so that I can give you a > concrete answer. Remember, there can be 4 instructions in a packet. This code is only dealing with two of them (a load and a store). Here's an example where a different instruction in the packet can fault. 67f8: c0 40 21 1f 1f2140c0 { v0.uh = vsat(v0.uw,v1.uw) 67fc: 00 45 02 a1 a1024500 memb(r2+#0) = r5 6800: 02 c0 04 91 9104c002 r2 = memb(r4+#0) } The vsat instruction requires a vector context. If the thread doesn't have a vector context, an exception will be raised. The OS can provide a context and replay the packet. > I think you may need to do more preprocessing of the entire packet so that > you > can answer this sort of question (is there any other possible exception) when > generating code. > > > 3) How important is it to support big endian hosts? Would it be OK to put > something like this to declare it isn't supported for Hexagon? > > #if defined(HOST_WORDS_BIGENDIAN) > > #error Hexagon guest not supported on big endian host > > #endif > > That would make ./configure && make fail on such a host. > So, no, you can't do that. > > > > > 4) If the above is not OK, are the macros in bswap.h the correct ones to > use? Is this pseudo-code correct? > > store_val = le32_to_cpu(store_val); > > load_val = le32_to_cpu(load_val); > > <merge bytes> > > /* store_val is dead so no need to convert back */ > > load_val = cpu_to_le32(load_val) > > There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked > those > names, but it helps to think about what form the data is already in). So, what is the right sequence? Thanks, Taylor
On 10/8/20 1:51 PM, Taylor Simpson wrote: >> How can it have done without doing the merging by hand. Which this operation ordering is intended to make unnecessary. >> >> I think you're missing the point. > > Sorry I wasn't clear. We have done the load from memory as it was at the beginning of the packet. The result of the store is in mem_log_stores in CPUHexagonState. This code updates the bytes that were loaded with any overlapping bytes from the store that hasn't been committed yet. Right, so you *are* missing the point. The point is to *not* do the load earlier, but only probe the memory for readability so that any exception is recognized before the store commits. Then, after the store, actually perform the load. Thus any overlapping bytes get the values that they should. Voila, no by-hand merging. > 67f8: c0 40 21 1f 1f2140c0 { v0.uh = vsat(v0.uw,v1.uw) > 67fc: 00 45 02 a1 a1024500 memb(r2+#0) = r5 > 6800: 02 c0 04 91 9104c002 r2 = memb(r4+#0) } > > The vsat instruction requires a vector context. If the thread doesn't have a vector context, an exception will be raised. The OS can provide a context and replay the packet. Sure. Is there any per-packet exception priority that would require a fault from the store to be provided in preference to the fault for the vector context? Anyway, I'm suggesting ordering the operations within the packet to be one that's most convenient for us. >>> store_val = le32_to_cpu(store_val); >>> load_val = le32_to_cpu(load_val); >>> <merge bytes> >>> /* store_val is dead so no need to convert back */ >>> load_val = cpu_to_le32(load_val) >> >> There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked >> those >> names, but it helps to think about what form the data is already in). > > So, what is the right sequence? Well, <merge_bytes> wants to operate on a le ordering, so the final load_val assignment should use le32_to_cpu. Think about this in terms of units, like Fahrenheit vs Celsius. As for the other two, it depends on where the values come from. Probably they should be cpu_to_le32, but I can't tell without extra context. r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Thursday, October 8, 2020 2:02 PM > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org > Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi; > aleksandar.m.mail@gmail.com; ale@rev.ng > Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros > referenced in instruction semantics > > On 10/8/20 1:51 PM, Taylor Simpson wrote: > >> How can it have done without doing the merging by hand. Which this > operation ordering is intended to make unnecessary. > >> > >> I think you're missing the point. > > > > Sorry I wasn't clear. We have done the load from memory as it was at the > beginning of the packet. The result of the store is in mem_log_stores in > CPUHexagonState. This code updates the bytes that were loaded with any > overlapping bytes from the store that hasn't been committed yet. > > Right, so you *are* missing the point. > > The point is to *not* do the load earlier, but only probe the memory for > readability so that any exception is recognized before the store commits. > > Then, after the store, actually perform the load. Thus any overlapping bytes > get the values that they should. That's what I coded originally, but after sleeping on it decided it could lead to problems. See below... > Voila, no by-hand merging. > > > 67f8: c0 40 21 1f 1f2140c0 { v0.uh = vsat(v0.uw,v1.uw) > > 67fc: 00 45 02 a1 a1024500 memb(r2+#0) = r5 > > 6800: 02 c0 04 91 9104c002 r2 = memb(r4+#0) } > > > > The vsat instruction requires a vector context. If the thread doesn't have a > vector context, an exception will be raised. The OS can provide a context and > replay the packet. > > Sure. > > Is there any per-packet exception priority that would require a fault from the > store to be provided in preference to the fault for the vector context? I don't think there's a priority for the exceptions. In this example, any of the 3 instructions could generate an exception. The thing I'm worried about isn't that the store could generate an exception. The problem is letting the store change the machine state before we're sure none of the instructions will raise an exception. Maybe I'm worrying about something that would never result in different behavior - if the packet gets replayed, we'll just store the same value again. I'll reach out to the architects and ask if there is a case where doing the store first could be a problem - and also about the priority of exceptions. > Anyway, I'm suggesting ordering the operations within the packet to be one > that's most convenient for us. This may or may not be possible. We already have to reorder to put .new consumers after the producers, and we have to keep the change-of-flow instructions in the original order. So, I'm reluctant to go down this path worrying that there will be a long tail of corner case bugs to engineer out.
On 10/8/20 3:54 PM, Taylor Simpson wrote: > I'll reach out to the architects and ask if there is a case where doing the store first could be a problem - and also about the priority of exceptions. Of course there are cases where doing the store first will cause problems. > This may or may not be possible. We already have to reorder to put .new consumers after the producers, and we have to keep the change-of-flow instructions in the original order. So, I'm reluctant to go down this path worrying that there will be a long tail of corner case bugs to engineer out. Well, do as you like, I suppose, but I think the merging thing that you were trying to do is just as fraught with peril. r~
> -----Original Message----- > From: Richard Henderson <richard.henderson@linaro.org> > Sent: Friday, October 9, 2020 6:59 AM > > On 10/8/20 3:54 PM, Taylor Simpson wrote: > > I'll reach out to the architects and ask if there is a case where doing the > store first could be a problem - and also about the priority of exceptions. > > Of course there are cases where doing the store first will cause problems. > > > This may or may not be possible. We already have to reorder to put .new > consumers after the producers, and we have to keep the change-of-flow > instructions in the original order. So, I'm reluctant to go down this path > worrying that there will be a long tail of corner case bugs to engineer out. > > Well, do as you like, I suppose, but I think the merging thing that you were > trying to do is just as fraught with peril. Thanks for your patience on this. After speaking to the architects and putting more comments in the decoder (per your feedback), I'm convinced that doing the store first won't lead to problems. So, I'll move forward with your suggestion. In other news, this change and the switch to qemu softfloat are the only blockers from your review. Once these are done, I'll submit v5 of the series. Thanks for your feedback, it's made our code significantly better. Thanks Taylor