[2/4] BRIG (HSAIL) frontend: The FE itself.

Hi Martin,

Thanks for the comments and suggestions. Replies inline:

On Thu, Oct 20, 2016 at 6:10 PM, Martin Jambor <mjambor@suse.cz> wrote:
> - Still quite few things need to be documented better, e.g.:

>   + brig_to_generic::get_mangled_name_tmpl and to a lesser extent

>     brig_to_generic::get_mangled_name.  It should be clear what is the

>     intended difference in usage of the two (specially since the

>     former is a template and so the parameter does not give that much

>     of a hint to the reader)

Added more comments.

>   + the visitor classes need some description so that the first time

>     readers see them, they understand what they are for and what they

>     visit (i.e. what "visiting" even means).

This is an adaptation of the classic gang of four Visitor design pattern.
I added a reference to it in a comment.

> - I know it was me who told you to use gcc_assert and gcc_unreachable

>   instead of internal_error.  The thing is, often the error is clearly

>   not an internal error but an error in input.  I think that we should

>   plan to handle these cases differently and report the issues better

>   to the user, give a meaningful error message together width the

>   section and the offset there when it was encountered.  I am not

>   asking you to audit all asserts now and convert those in this

>   category but it would be nice to have a mechanism to easily do so

>   (and convert a few obvious places), so that we can convert these as

>   we bump into them.

I'm not sure about this. BRIG FE is a rather special case as we
assume HSAILasm has been used to parse and error check the original
HSAIL text to the binary BRIG format which it consumes.

Of course HSAILasm can have bugs, but how much we should produce human
readable error messages to help debugging HSAILasm is another thing.
In case the BRIG FE
fails to consume the input, it means either the BRIG is corrupted for
a reason or another,
but typically is not a human error (those should be caught be HSAILasm).

"File format not recognized" error is one that might be useful though.
I added a check for the BRIG magic number and the supported version (1.0).

Perhaps we should add error printouts later on case by case basis when we
see which error cases can be useful and worth reporting in a human readable
graceful manner? It can be as easy as converting the internal_error
to fatal_error or similar in that case.

> - A very minor suggestion: In GCC it is customary to write TODO as one

>   word.  We generally do not use "TO OPTIMIZE", that is just a TODO

>   (as opposed to a FIXME, which hints that something is at least a bit

>   wrong here).  I think you can keep your way if you want but for

>   example I do have emacs highlighting set up for the traditional

>   formats.

Converted all to TODO.

> - You do not seem to handle BRIG_OPCODE_ICALL, or have I missed it?

>   That is fine, I don't think anybody else does that now anyway, I

>   just got curious reading the ipa analysis.

Right.

> - brig-lang.c:

>   + x_flag_whole_program = 0; - talk about this with Honza

I guess this is note to yourself? As we agreed I didn't try whole program
optimizations yet.  I might later when optimizing for a target. It will
be really useful, I agree. Now that it used the proper builtin way, it
might work more easily.

>   + brig_langhook_type_for_size: has several issues.  Either always

>     return a type like go or return error_mark_node instead of NULL.

>     Also, do not count on long being 64-bit.  I would just copy what

>     go does.  Or lto.

Copied the go's version, it should work with BRIG too.

>   + brig_langhook_type_for_mode: Also please do not depend on knowing

>     that long is 8 bytes, or int being 4 bytes long.  For complex

>     modes, the correct thing seems to be to return NULL instead of

>     void_type_node.  In any case, it would be better to return

>     error_mark_node rather than void_type_node.

Fixed.

>   + convert: did you avoid using convert_to_vector deliberately?  The

>     size check seems genuinely useful.

BRIG/HSAIL is a bit special case due to its "untyped" variables (registers),
I use bit level casts in a lot of places to avoid accidental type conversions
or sign extensions.

> - brigfrontend/brig-to-generic.cc:

>   + brig_to_generic::parse: You seem to be handling LDA instruction

>     (or more generally, BRIG_KIND_INST_ADDR instructions, but there is

>     just the one) with copy_move_inst_handler, which seems just wrong.

What do you mean by 'wrong'? The handlers do not map to instruction types
directly, but often more towards the specification chapters.
It comes straight from:
http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BRIG_syntax_copy_move.htm

>     Its operator() blindly casts the instruction to

>     BrigInstSourceType, interpreting the segment as if it was a source

>     type... am I missing something?

I believe you are confused by the BRIG struct name: BrigInstSourceType
is a struct
for "instructions that have different types for their destination and
source operands"
http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BrigInstSourceType.htm

That is, it's not a "source type" object, but an instruction type object.

>   + build_reinterpret_cast: gcc_unreachable followed by return

>     NULL_TREE does not make sense, please convert the whole thing to

>     an assert.  More importantly, I think that you really need to

Converted it to an assert.

>     solve the case of mismatching sizes differently, and not create a

>     (single) V_C_E for it.  For register types, you can V_C_E them to

>     an unsigned int of the same size, then do a widening NOP_EXPR to

>     unsigned type with the same size as the destination and then do a

>     V_C_E to whatever you need.  But perhaps this can be solved better

>     at the caller side somewhere?

I added the extra conversion step through unsigned ints.
I hadn't noticed that V_C_E is not guaranteed to work for this case.

>   + brig_to_generic::finish_function: the ifdef'ed debug_functions

>     should not be a part of the final submission.  Checking

Removed.

>     m_cf->m_is_kernel twice looks ugly.

Removed the extra check.

>   + brig_to_generic::append_group_variable: Don't put a statement

>     guarded with an if statement on the same line as the condition.  I

>     also believe align_padding, when the offset is not already

>     aligned, should be:

>        alignment - m_next_group_offset % alignment

>   + brig_to_generic::append_private_variable: The same as above.

Good catch!

>   + I would suggest that you change brig_to_generic::dump_function to

>     a standalone function taking a file parameter.  It is then much

>     easier to call it for example from debugger.

Done.

> - brig-code-entry-handler.cc

>   + brig_code_entry_handler::build_tree_operand: The unreachable

>     return at the end looks misleading, just removing it should be

>     fine.  Ditto for the breaks after returns, the upcoming

>     fall-through warning will notify us if we ever get a potentially

>     wrong fall-through.

Done.

>   + brig_code_entry_handler::get_tree_cst_for_hsa_operand: GCC coding

>     standard mandates that if a branch has only one statement in it,

>     it should not be encapsulated in braces.

Done.

>     If, in the condition

>

>       if (type != NULL && TREE_CODE (type) == ARRAY_TYPE)

>

>     type can ever actually be NULL, then the function will segfault

>     just before ending, when again checking whether type is an array.

>     If it cannot be NULL, then please gcc_checking_assert it instead.

It cannot be NULL anymore at that point. Removed the misleading check.

>   + brig_code_entry_handler::build_address_operand: I must say I

>     really dislike how the end of the function is structured, it is

>     terrible difficult to read given that it is not doing anything that

>     complicated and I think it does not handle correctly an (LDA of a)

>     NULL address, which unfortunately I believe is valid for private

>     and group segment addresses.  Dealing with the most complex case

>     by converting symbol to size_type looks exactly backwards,

>     especially given that you have converted the base of the

>     POINTER_PLUS_EXPR only few lines before.  I think the code would

>     be a lot nicer and easier to comprehend if you clearly

>     distinguished the various cases (symbol_base != NULL (and sub

>     cases when ptr_base is or is not NULL), ptrbase != NULL and simple

>     constant, even NULL constant, which you do not handle but fail an

>     assert, I think) and handled them separately, including all type

>     conversions.  ptr_base is an unfortunate name, IMHO, in many cases

>     it has the role of a variable offset rather than a base.

>     Similarly, ptr_ofset is really a constant_offset.

Renamed ptr_base to var_offset and ptr_offset to const_offset which
indeed are more descriptive.

I also cleaned the if..else mess at the end of the function and made the
different cases very explicit, instead of the spaghetti mess previous
version. It looks much better now to me at least.

>   + brig_code_entry_handler::get_tree_type_for_hsa_type: I believe it

>     is better to and the type with BRIG_TYPE_PACK_MASK if you want to

>     determine whether you are looking at a vector (packed) HSAIL

Done.

>     instruction.  I also think that putting that test into a separate

>     function and calling it from all places when you do this would be

>     more future-proof (something like hsa_type_packed_p in gcc/hsa.c).

I just used that function instead and called if from everywhere.

>     For the (inner_brig_type == BRIG_TYPE_F16) vector case, you do not

>     end up calling build_type_variant (why is it necessary in any

>     case?) but for other vector types you do.  Is that intentional?

I don't remember why the const stripping code was there.
Maybe a leftover of some earlier hack/workaround in a previous
gcc version. I removed it and it seems to work fine.

>   + brig_code_entry_handler::expand_or_call_builtin: If I am correct

>     that the operands parameter contains only input operands at this

>     point, please state so in the function comment.  Since we only

>     allow ourselves 80 characters wide code, it is customary not to

>     put code into else branches if the if branch returns (or

>     breaks/continues) anyway.

Done.

>     Given that this function builds vectors only to pile them

>     element-wise to arguments of variable-argument-length function

>     call_builtin, which then builds vectors out if its

>     elements... have you considered having an overloaded

>     implementation of call_builtin that would not do this?  It seems

>     particularly wasteful.

Optimized by inlining the essential builtin build code to the call site.

>   + brig_code_entry_handler::build_operands: Please make the return

>     type consistent with the one in class definition (tree_stl_vec

>     instead of explicit std::vector<tree>, assuming you prefer the

>     former).

Done.

>     GCC coding standard mandates that if a branch has only one

>     statement in it, it should not be encapsulated in braces.

Cleaned up.

>     Please replace

>       if (operand == NULL_TREE)

>             gcc_unreachable ();

>     with gcc_assert (operand);

Done.

>     Please rewrite conditions like !(TREE_CODE (operand) == TREE_VEC)

>     as (TREE_CODE (operand) != TREE_VEC)

Did.

>     Again you are creating VIEW_CONVERT_EXPR for an operand that is of

>     a different size than the result type.  For scalar types, this is

>     bound to cause trouble sooner or later and I really think you need

>     to avoid it.

Fixed by delegating the conversion to the (now updated)
build_reinterpret_cast().

> - brig-basic-inst-handler.cc:

>   + scalarized_sat_arithmetics::scalarized_sat_arithmetics: Do not

>     undef macros preemptively.  Instead, undef them right after using

>     them, after the include of a .def file.

This seems to be an idiom with the builtin import mechanism elsewhere
also.  builtins.def defines a default macro which one must undef if not
wanting to do anything with that builtin type in that particular import
location.

>   + brig_basic_inst_handler::must_be_scalarized: I am intrigued by the

>     (elements < 16) part of the condition.  This function would also

>     benefit from a comment.

This function black listed the known cases of MULHI with vectors
that have been tested to break with AMD64/x86-64. It seems the
support for vector MULT_HIGHPART_EXPR is flaky and undertested.

The robust thing to do here is to force scalarization always with MULHI for now,
until these issues are  debugged further. I added an exception for 2x64b
MULT_HIGHPART_EXPR to avoid the need for 128b scalar arithmetics,
and as it seems to work for the CPUs I've tested.

The decision should not IMO belong to the frontend, but there'd be
better a step where the vector operations are optionally scalarized if the
target prefers scalarized operations which should be caught by this
step also. Something to fix during optimization work.

>   + scalarized_sat_arithmetics::builtin - please prefix with m_, ditto

>     for brig_inst_ (why the trailing underscore?)

Done.

>   + brig_basic_inst_handler::must_be_scalarized needs a comment

>     explaining what it is for.

Removed the method as unneeded for now (see the comment above about
MULHI).

>   + brig_basic_inst_handler::get_raw_type: Unless "raw" is some HSA

>     term I have missed, I would strongly suggest that you rename the

>     function to something more immediately obvious, like uint_for_type

>     or something like that.  Also, don't use literal 8 but

Renamed to get_unsigned_int_type().

>     BITS_PER_UNIT.  Also, do we really need both this function and

Done.

>     brig_code_entry_handler::get_raw_tree_type ?

Nope.

>   + build_shuffle, build_unpack, build_pack, build_unpack_lo_or_hi and

>     build_lower_element_broadcast: I admit that so far I have only

>     very briefly skimmed through these functions.  In any way, use

>     BITS_PER_UNIT instead of 8.

Done.

>   + brig_basic_inst_handler::build_instr_expr: Please remove the 'r'

>     and make it build_inst_expr for the sake of consistency.  If I

Done.

>     understand the code correctly, the operands parameter contains

>     only input operands.  In that case, please state so in the

>     function comment and remove the local variable first_input, it has

>     no purpose but to confuse.  Also, please move the definition and

Done.

>     assignments to local variable input_count (and possibly also

>     output_count) down to where it is used.

Done.

>   + brig_basic_inst_handler::operator (): It seems that the opcode

>     local variable is only used to identify the return brig

>     instructions which seems wasteful.  Generally, it would be nice to

It's also used to catch MULHI which is generated from multiple brig
opcodes.

>     clean this function up a little by moving assignments to some of

>     the very many local variables down, as close to their first use as

>     reasonable.  Surprisingly often, you'd remove the need to compute

>     them in many cases at all, e.g. look at element_count and

>     element_size_bits.

Moved element_count and is_fp16_operation. element_size_bits is now
used for catching mulhis for 64b elements.

>     Extra points for a function comment explaining how work is divided

>     in between operator() itself and its main helpers such as

>     build_instr_expr.

I added a method comment, but the truth is that the division of work
is a bit artificial, mostly the build_inst_expr() call is there to split
a complex if..else structure to two functions to improve readability.

>   + brig_basic_inst_handler::get_tree_code_for_hsa_opcode: The comment

>     says the special value returned when it is necessary to use a

>     chain of tree expressions or an builtin is NULL_TREE, but the

>     function itself returns TREE_LIST or CALL_EXPR.

Corrected.

> - brig-cmp-inst-handler.cc:

>   + brig_cmp_inst_handler::operator (): the neg_expr seems to be

>     something left from earlier times?  Use BITS_PER_UNIT instead of

Correct. Removed.

>     8, having both result_width and element_width seems unnecessary

Removed element_width and moved result_width definition
closer to its use.

>     (and speaking of elements, is that actually even a vector case?),

>     and should be initialized only in the case when it is used.  In

>     case of vector results, please build either all_ones or all_zeros,

>     it is wasteful to allocate both.

They both are used to produce the HSA required all_ones/all_zeros
output.

> - brig-mem-inst-handler.cc: I believe that using the alignment

>   modifier is something that we should try to get done as soon as

>   possible.

I agree. Probably one of the first things that will pop up during optimizing
the performance.

> - brig-inst-mod-handler.cc: This seems like something that we should

>   at least warn about (in case when effectively an unsupported

>   operation is requested).

If there will be an upgrade for the frontend to support the 'full' profile (it's
only supporting 'base' now) with all the rounding modifiers, a better way might
be found than injecting fesetround() calls around all float
expressions.  Probably
in that case all float ops must be converted to builtin calls that
ensure the wanted
rounding (ungh!).

> - brig-seg-inst-handler.cc: At this point I'm trying to read quickly

>   but it seems to me you do not support conversion between flat and

>   global segment... how come?

Global address is already a flat address. Check the description tab in
http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/05_Arithmetic/segment_conversion.htm?Highlight=global

> - brig-copy-move-inst-handler.cc:

>   + brig_copy_move_inst_handler::operator (): The function definitely

>     should not cast LDA instructions to BrigInstSourceType*!

Yes it should. Like I explained above, the struct name is misleading,
but it's actually an instruction calls not a source type.

> - brig-branch-inst-handler.cc: I believe that as long as the builtins

>   representing barriers are not pure, they will not be hoisted out of

>   a loop.  Nonduplication might indeed be a problem, although short of

>   whole function cloning, I could not think of a transformation gcc

>   performs that might pause a problem.  Nevertheless, we probably

>   should introduce an attribute for it and look for it in

>   gimple_can_duplicate_bb_p (and in cfg_layout_can_duplicate_bb_p?).

>   An important issue, but hopefully for later.

Agreed.

> - brig-variable-handler.cc:

>   + brig_directive_variable_handler::operator (): Please use

>     BITS_PER_UNIT instead of 8.

Done.

>   + build_variable: Likewise.  I am a bit concerned that unlike in

>     operator(), you do not make the alignment at least as big as

>     natural one, which means that in theory (and probably only on

>     malformed BRIG, I suppose), the two functions might disagree about

>     alignment?  I think it would be nice to outline the extraction of

>     alignment to an independent function and use that from both

>     places.

Done.

> - brig-function-handler.cc:

>   + brig_directive_function_handler::operator: Please use gcc_assert

>     instead of assert.  (Well, in this case it is clearly input error

>     which, eventually, we will want to give nice errors about.  But at

>     least do not use assert.)

Converted to gcc_assert() for now.

> - brig-function.cc:

>   + brig_function::analyze_calls: The first if condition should be

>     terminated by a newline

Done.

>   + brig_function::add_wi_loop:  Is the second TODO now obsolete?

Yes, removed.

>   + brig_function::build_launcher_and_metadata: The ASM directive is

>     really an ugly hack.  It is isolated so I am not that much

>     concerned, but building a structure and filling it with data (like

>     we do for example in hsa_output_libgomp_mapping) seems cleaner.

Hmm. I don't generate any metadata structs to data section, but a separate
custom ELF section per kernel. I agree the ASM directive is not ideal
in general,
but I don't think there's a generic way to add custom ELF sections.

I'm not sure how much building the structure field by field would be a better
approach in comparison to a raw dump as the point is to just transfer the
metadata to the HSA runtime with the finalized binary. The runtime should use
exactly the same struct layout, otherwise it won't work anyways. If I
do a raw dump,
at least I ensure that if the struct is updated I won't forget to update the
serialization code.

> - brig-util.cc:

>   + gccbrig_is_raw_operation: I think that calling the operations

>     "bit" operations instead of "raw" would make life of readers of

>     the code slightly but noticeably easier.

Renamed.

>   + gccbrig_hsa_type_bit_size: If possible, please make the default

>     case be gcc_unreachable().  (If zero is expected in some cases,

>     then all callers should check for it, so that we for example do

>     not divide by zero in

>     brig_code_entry_handler::get_tree_type_for_hsa_type.)

Made it call gcc_unreachable ().

>   + might_be_host_defined_var: there is no need for the returned

>     expression to start on a new line.

Fixed.

> - gcc/builtins.def: In the added DEF_HSAIL_*_BUILTIN macros, please

>   arrange it so that they only pass true in the last argument of

>   DEF_BUILTIN when gcc is configured with BRIG FE.  Builtins are not

>   free and should not be added needlessly.

It doesn't even include the hsail-builtins.def in case BRIG FE not
enabled now. I suppose that's even better than executing those macros
for nothing.

> Overall, the code has improved significantly.  As far as I am

> concerned, the only real issue I see are the VIEW_CONVERT_EXPRs with

> mismatched operands.  They are asking for trouble, only Ada produces

> those (although it is acknowledged it should not) and Ada only does it

> for aggregates.

These should be now fixed.

> If I understood you correctly, both you and your sponsor have already

> signed the Copyright assignment, right?  If that is so, I'll ask the

> steering committee to approve the intention and then ask a global

> reviewer to also peek at it.

Correct, and I see you already did. Thanks!

> Thanks for your patience,

Thank a lot for the comments. I know how much patience it requires to wade
through a big bunch of someone else's boring code.

New BRIG FE patch set attached.

BR,
Pekka

[2/4] BRIG (HSAIL) frontend: The FE itself.

Commit Message

Comments

Patch