diff mbox

[2/4] BRIG (HSAIL) frontend: The FE itself.

Message ID CAJk11WCr6iv_jF-KAhZOhqUVbapmJPJcoRNzkeGS40AFmOORMQ@mail.gmail.com
State New
Headers show

Commit Message

Pekka Jääskeläinen Oct. 29, 2016, 11:58 a.m. UTC
Hi Martin,

Thanks for the comments and suggestions. Replies inline:

On Thu, Oct 20, 2016 at 6:10 PM, Martin Jambor <mjambor@suse.cz> wrote:
> - Still quite few things need to be documented better, e.g.:

>   + brig_to_generic::get_mangled_name_tmpl and to a lesser extent

>     brig_to_generic::get_mangled_name.  It should be clear what is the

>     intended difference in usage of the two (specially since the

>     former is a template and so the parameter does not give that much

>     of a hint to the reader)


Added more comments.

>   + the visitor classes need some description so that the first time

>     readers see them, they understand what they are for and what they

>     visit (i.e. what "visiting" even means).


This is an adaptation of the classic gang of four Visitor design pattern.
I added a reference to it in a comment.

> - I know it was me who told you to use gcc_assert and gcc_unreachable

>   instead of internal_error.  The thing is, often the error is clearly

>   not an internal error but an error in input.  I think that we should

>   plan to handle these cases differently and report the issues better

>   to the user, give a meaningful error message together width the

>   section and the offset there when it was encountered.  I am not

>   asking you to audit all asserts now and convert those in this

>   category but it would be nice to have a mechanism to easily do so

>   (and convert a few obvious places), so that we can convert these as

>   we bump into them.


I'm not sure about this. BRIG FE is a rather special case as we
assume HSAILasm has been used to parse and error check the original
HSAIL text to the binary BRIG format which it consumes.

Of course HSAILasm can have bugs, but how much we should produce human
readable error messages to help debugging HSAILasm is another thing.
In case the BRIG FE
fails to consume the input, it means either the BRIG is corrupted for
a reason or another,
but typically is not a human error (those should be caught be HSAILasm).

"File format not recognized" error is one that might be useful though.
I added a check for the BRIG magic number and the supported version (1.0).

Perhaps we should add error printouts later on case by case basis when we
see which error cases can be useful and worth reporting in a human readable
graceful manner? It can be as easy as converting the internal_error
to fatal_error or similar in that case.

> - A very minor suggestion: In GCC it is customary to write TODO as one

>   word.  We generally do not use "TO OPTIMIZE", that is just a TODO

>   (as opposed to a FIXME, which hints that something is at least a bit

>   wrong here).  I think you can keep your way if you want but for

>   example I do have emacs highlighting set up for the traditional

>   formats.


Converted all to TODO.

> - You do not seem to handle BRIG_OPCODE_ICALL, or have I missed it?

>   That is fine, I don't think anybody else does that now anyway, I

>   just got curious reading the ipa analysis.


Right.

> - brig-lang.c:

>   + x_flag_whole_program = 0; - talk about this with Honza


I guess this is note to yourself? As we agreed I didn't try whole program
optimizations yet.  I might later when optimizing for a target. It will
be really useful, I agree. Now that it used the proper builtin way, it
might work more easily.

>   + brig_langhook_type_for_size: has several issues.  Either always

>     return a type like go or return error_mark_node instead of NULL.

>     Also, do not count on long being 64-bit.  I would just copy what

>     go does.  Or lto.


Copied the go's version, it should work with BRIG too.

>   + brig_langhook_type_for_mode: Also please do not depend on knowing

>     that long is 8 bytes, or int being 4 bytes long.  For complex

>     modes, the correct thing seems to be to return NULL instead of

>     void_type_node.  In any case, it would be better to return

>     error_mark_node rather than void_type_node.


Fixed.

>   + convert: did you avoid using convert_to_vector deliberately?  The

>     size check seems genuinely useful.


BRIG/HSAIL is a bit special case due to its "untyped" variables (registers),
I use bit level casts in a lot of places to avoid accidental type conversions
or sign extensions.

> - brigfrontend/brig-to-generic.cc:

>   + brig_to_generic::parse: You seem to be handling LDA instruction

>     (or more generally, BRIG_KIND_INST_ADDR instructions, but there is

>     just the one) with copy_move_inst_handler, which seems just wrong.


What do you mean by 'wrong'? The handlers do not map to instruction types
directly, but often more towards the specification chapters.
It comes straight from:
http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BRIG_syntax_copy_move.htm

>     Its operator() blindly casts the instruction to

>     BrigInstSourceType, interpreting the segment as if it was a source

>     type... am I missing something?


I believe you are confused by the BRIG struct name: BrigInstSourceType
is a struct
for "instructions that have different types for their destination and
source operands"
http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BrigInstSourceType.htm

That is, it's not a "source type" object, but an instruction type object.

>   + build_reinterpret_cast: gcc_unreachable followed by return

>     NULL_TREE does not make sense, please convert the whole thing to

>     an assert.  More importantly, I think that you really need to


Converted it to an assert.

>     solve the case of mismatching sizes differently, and not create a

>     (single) V_C_E for it.  For register types, you can V_C_E them to

>     an unsigned int of the same size, then do a widening NOP_EXPR to

>     unsigned type with the same size as the destination and then do a

>     V_C_E to whatever you need.  But perhaps this can be solved better

>     at the caller side somewhere?


I added the extra conversion step through unsigned ints.
I hadn't noticed that V_C_E is not guaranteed to work for this case.

>   + brig_to_generic::finish_function: the ifdef'ed debug_functions

>     should not be a part of the final submission.  Checking


Removed.

>     m_cf->m_is_kernel twice looks ugly.


Removed the extra check.

>   + brig_to_generic::append_group_variable: Don't put a statement

>     guarded with an if statement on the same line as the condition.  I

>     also believe align_padding, when the offset is not already

>     aligned, should be:

>        alignment - m_next_group_offset % alignment

>   + brig_to_generic::append_private_variable: The same as above.


Good catch!

>   + I would suggest that you change brig_to_generic::dump_function to

>     a standalone function taking a file parameter.  It is then much

>     easier to call it for example from debugger.


Done.

> - brig-code-entry-handler.cc

>   + brig_code_entry_handler::build_tree_operand: The unreachable

>     return at the end looks misleading, just removing it should be

>     fine.  Ditto for the breaks after returns, the upcoming

>     fall-through warning will notify us if we ever get a potentially

>     wrong fall-through.


Done.

>   + brig_code_entry_handler::get_tree_cst_for_hsa_operand: GCC coding

>     standard mandates that if a branch has only one statement in it,

>     it should not be encapsulated in braces.


Done.

>     If, in the condition

>

>       if (type != NULL && TREE_CODE (type) == ARRAY_TYPE)

>

>     type can ever actually be NULL, then the function will segfault

>     just before ending, when again checking whether type is an array.

>     If it cannot be NULL, then please gcc_checking_assert it instead.


It cannot be NULL anymore at that point. Removed the misleading check.

>   + brig_code_entry_handler::build_address_operand: I must say I

>     really dislike how the end of the function is structured, it is

>     terrible difficult to read given that it is not doing anything that

>     complicated and I think it does not handle correctly an (LDA of a)

>     NULL address, which unfortunately I believe is valid for private

>     and group segment addresses.  Dealing with the most complex case

>     by converting symbol to size_type looks exactly backwards,

>     especially given that you have converted the base of the

>     POINTER_PLUS_EXPR only few lines before.  I think the code would

>     be a lot nicer and easier to comprehend if you clearly

>     distinguished the various cases (symbol_base != NULL (and sub

>     cases when ptr_base is or is not NULL), ptrbase != NULL and simple

>     constant, even NULL constant, which you do not handle but fail an

>     assert, I think) and handled them separately, including all type

>     conversions.  ptr_base is an unfortunate name, IMHO, in many cases

>     it has the role of a variable offset rather than a base.

>     Similarly, ptr_ofset is really a constant_offset.


Renamed ptr_base to var_offset and ptr_offset to const_offset which
indeed are more descriptive.

I also cleaned the if..else mess at the end of the function and made the
different cases very explicit, instead of the spaghetti mess previous
version. It looks much better now to me at least.

>   + brig_code_entry_handler::get_tree_type_for_hsa_type: I believe it

>     is better to and the type with BRIG_TYPE_PACK_MASK if you want to

>     determine whether you are looking at a vector (packed) HSAIL


Done.

>     instruction.  I also think that putting that test into a separate

>     function and calling it from all places when you do this would be

>     more future-proof (something like hsa_type_packed_p in gcc/hsa.c).


I just used that function instead and called if from everywhere.

>     For the (inner_brig_type == BRIG_TYPE_F16) vector case, you do not

>     end up calling build_type_variant (why is it necessary in any

>     case?) but for other vector types you do.  Is that intentional?


I don't remember why the const stripping code was there.
Maybe a leftover of some earlier hack/workaround in a previous
gcc version. I removed it and it seems to work fine.

>   + brig_code_entry_handler::expand_or_call_builtin: If I am correct

>     that the operands parameter contains only input operands at this

>     point, please state so in the function comment.  Since we only

>     allow ourselves 80 characters wide code, it is customary not to

>     put code into else branches if the if branch returns (or

>     breaks/continues) anyway.


Done.

>     Given that this function builds vectors only to pile them

>     element-wise to arguments of variable-argument-length function

>     call_builtin, which then builds vectors out if its

>     elements... have you considered having an overloaded

>     implementation of call_builtin that would not do this?  It seems

>     particularly wasteful.


Optimized by inlining the essential builtin build code to the call site.

>   + brig_code_entry_handler::build_operands: Please make the return

>     type consistent with the one in class definition (tree_stl_vec

>     instead of explicit std::vector<tree>, assuming you prefer the

>     former).


Done.

>     GCC coding standard mandates that if a branch has only one

>     statement in it, it should not be encapsulated in braces.


Cleaned up.

>     Please replace

>       if (operand == NULL_TREE)

>             gcc_unreachable ();

>     with gcc_assert (operand);


Done.

>     Please rewrite conditions like !(TREE_CODE (operand) == TREE_VEC)

>     as (TREE_CODE (operand) != TREE_VEC)


Did.

>     Again you are creating VIEW_CONVERT_EXPR for an operand that is of

>     a different size than the result type.  For scalar types, this is

>     bound to cause trouble sooner or later and I really think you need

>     to avoid it.


Fixed by delegating the conversion to the (now updated)
build_reinterpret_cast().

> - brig-basic-inst-handler.cc:

>   + scalarized_sat_arithmetics::scalarized_sat_arithmetics: Do not

>     undef macros preemptively.  Instead, undef them right after using

>     them, after the include of a .def file.


This seems to be an idiom with the builtin import mechanism elsewhere
also.  builtins.def defines a default macro which one must undef if not
wanting to do anything with that builtin type in that particular import
location.

>   + brig_basic_inst_handler::must_be_scalarized: I am intrigued by the

>     (elements < 16) part of the condition.  This function would also

>     benefit from a comment.


This function black listed the known cases of MULHI with vectors
that have been tested to break with AMD64/x86-64. It seems the
support for vector MULT_HIGHPART_EXPR is flaky and undertested.

The robust thing to do here is to force scalarization always with MULHI for now,
until these issues are  debugged further. I added an exception for 2x64b
MULT_HIGHPART_EXPR to avoid the need for 128b scalar arithmetics,
and as it seems to work for the CPUs I've tested.

The decision should not IMO belong to the frontend, but there'd be
better a step where the vector operations are optionally scalarized if the
target prefers scalarized operations which should be caught by this
step also. Something to fix during optimization work.

>   + scalarized_sat_arithmetics::builtin - please prefix with m_, ditto

>     for brig_inst_ (why the trailing underscore?)


Done.


>   + brig_basic_inst_handler::must_be_scalarized needs a comment

>     explaining what it is for.


Removed the method as unneeded for now (see the comment above about
MULHI).

>   + brig_basic_inst_handler::get_raw_type: Unless "raw" is some HSA

>     term I have missed, I would strongly suggest that you rename the

>     function to something more immediately obvious, like uint_for_type

>     or something like that.  Also, don't use literal 8 but


Renamed to get_unsigned_int_type().

>     BITS_PER_UNIT.  Also, do we really need both this function and


Done.

>     brig_code_entry_handler::get_raw_tree_type ?


Nope.

>   + build_shuffle, build_unpack, build_pack, build_unpack_lo_or_hi and

>     build_lower_element_broadcast: I admit that so far I have only

>     very briefly skimmed through these functions.  In any way, use

>     BITS_PER_UNIT instead of 8.


Done.

>   + brig_basic_inst_handler::build_instr_expr: Please remove the 'r'

>     and make it build_inst_expr for the sake of consistency.  If I


Done.

>     understand the code correctly, the operands parameter contains

>     only input operands.  In that case, please state so in the

>     function comment and remove the local variable first_input, it has

>     no purpose but to confuse.  Also, please move the definition and


Done.

>     assignments to local variable input_count (and possibly also

>     output_count) down to where it is used.


Done.

>   + brig_basic_inst_handler::operator (): It seems that the opcode

>     local variable is only used to identify the return brig

>     instructions which seems wasteful.  Generally, it would be nice to


It's also used to catch MULHI which is generated from multiple brig
opcodes.

>     clean this function up a little by moving assignments to some of

>     the very many local variables down, as close to their first use as

>     reasonable.  Surprisingly often, you'd remove the need to compute

>     them in many cases at all, e.g. look at element_count and

>     element_size_bits.


Moved element_count and is_fp16_operation. element_size_bits is now
used for catching mulhis for 64b elements.

>     Extra points for a function comment explaining how work is divided

>     in between operator() itself and its main helpers such as

>     build_instr_expr.


I added a method comment, but the truth is that the division of work
is a bit artificial, mostly the build_inst_expr() call is there to split
a complex if..else structure to two functions to improve readability.

>   + brig_basic_inst_handler::get_tree_code_for_hsa_opcode: The comment

>     says the special value returned when it is necessary to use a

>     chain of tree expressions or an builtin is NULL_TREE, but the

>     function itself returns TREE_LIST or CALL_EXPR.


Corrected.

> - brig-cmp-inst-handler.cc:

>   + brig_cmp_inst_handler::operator (): the neg_expr seems to be

>     something left from earlier times?  Use BITS_PER_UNIT instead of


Correct. Removed.

>     8, having both result_width and element_width seems unnecessary


Removed element_width and moved result_width definition
closer to its use.

>     (and speaking of elements, is that actually even a vector case?),

>     and should be initialized only in the case when it is used.  In

>     case of vector results, please build either all_ones or all_zeros,

>     it is wasteful to allocate both.


They both are used to produce the HSA required all_ones/all_zeros
output.

> - brig-mem-inst-handler.cc: I believe that using the alignment

>   modifier is something that we should try to get done as soon as

>   possible.


I agree. Probably one of the first things that will pop up during optimizing
the performance.

> - brig-inst-mod-handler.cc: This seems like something that we should

>   at least warn about (in case when effectively an unsupported

>   operation is requested).


If there will be an upgrade for the frontend to support the 'full' profile (it's
only supporting 'base' now) with all the rounding modifiers, a better way might
be found than injecting fesetround() calls around all float
expressions.  Probably
in that case all float ops must be converted to builtin calls that
ensure the wanted
rounding (ungh!).

> - brig-seg-inst-handler.cc: At this point I'm trying to read quickly

>   but it seems to me you do not support conversion between flat and

>   global segment... how come?


Global address is already a flat address. Check the description tab in
http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/05_Arithmetic/segment_conversion.htm?Highlight=global

> - brig-copy-move-inst-handler.cc:

>   + brig_copy_move_inst_handler::operator (): The function definitely

>     should not cast LDA instructions to BrigInstSourceType*!


Yes it should. Like I explained above, the struct name is misleading,
but it's actually an instruction calls not a source type.

> - brig-branch-inst-handler.cc: I believe that as long as the builtins

>   representing barriers are not pure, they will not be hoisted out of

>   a loop.  Nonduplication might indeed be a problem, although short of

>   whole function cloning, I could not think of a transformation gcc

>   performs that might pause a problem.  Nevertheless, we probably

>   should introduce an attribute for it and look for it in

>   gimple_can_duplicate_bb_p (and in cfg_layout_can_duplicate_bb_p?).

>   An important issue, but hopefully for later.


Agreed.

> - brig-variable-handler.cc:

>   + brig_directive_variable_handler::operator (): Please use

>     BITS_PER_UNIT instead of 8.


Done.

>   + build_variable: Likewise.  I am a bit concerned that unlike in

>     operator(), you do not make the alignment at least as big as

>     natural one, which means that in theory (and probably only on

>     malformed BRIG, I suppose), the two functions might disagree about

>     alignment?  I think it would be nice to outline the extraction of

>     alignment to an independent function and use that from both

>     places.


Done.

> - brig-function-handler.cc:

>   + brig_directive_function_handler::operator: Please use gcc_assert

>     instead of assert.  (Well, in this case it is clearly input error

>     which, eventually, we will want to give nice errors about.  But at

>     least do not use assert.)


Converted to gcc_assert() for now.

> - brig-function.cc:

>   + brig_function::analyze_calls: The first if condition should be

>     terminated by a newline


Done.

>   + brig_function::add_wi_loop:  Is the second TODO now obsolete?


Yes, removed.

>   + brig_function::build_launcher_and_metadata: The ASM directive is

>     really an ugly hack.  It is isolated so I am not that much

>     concerned, but building a structure and filling it with data (like

>     we do for example in hsa_output_libgomp_mapping) seems cleaner.


Hmm. I don't generate any metadata structs to data section, but a separate
custom ELF section per kernel. I agree the ASM directive is not ideal
in general,
but I don't think there's a generic way to add custom ELF sections.

I'm not sure how much building the structure field by field would be a better
approach in comparison to a raw dump as the point is to just transfer the
metadata to the HSA runtime with the finalized binary. The runtime should use
exactly the same struct layout, otherwise it won't work anyways. If I
do a raw dump,
at least I ensure that if the struct is updated I won't forget to update the
serialization code.

> - brig-util.cc:

>   + gccbrig_is_raw_operation: I think that calling the operations

>     "bit" operations instead of "raw" would make life of readers of

>     the code slightly but noticeably easier.


Renamed.

>   + gccbrig_hsa_type_bit_size: If possible, please make the default

>     case be gcc_unreachable().  (If zero is expected in some cases,

>     then all callers should check for it, so that we for example do

>     not divide by zero in

>     brig_code_entry_handler::get_tree_type_for_hsa_type.)


Made it call gcc_unreachable ().

>   + might_be_host_defined_var: there is no need for the returned

>     expression to start on a new line.


Fixed.

> - gcc/builtins.def: In the added DEF_HSAIL_*_BUILTIN macros, please

>   arrange it so that they only pass true in the last argument of

>   DEF_BUILTIN when gcc is configured with BRIG FE.  Builtins are not

>   free and should not be added needlessly.


It doesn't even include the hsail-builtins.def in case BRIG FE not
enabled now. I suppose that's even better than executing those macros
for nothing.

> Overall, the code has improved significantly.  As far as I am

> concerned, the only real issue I see are the VIEW_CONVERT_EXPRs with

> mismatched operands.  They are asking for trouble, only Ada produces

> those (although it is acknowledged it should not) and Ada only does it

> for aggregates.


These should be now fixed.

> If I understood you correctly, both you and your sponsor have already

> signed the Copyright assignment, right?  If that is so, I'll ask the

> steering committee to approve the intention and then ask a global

> reviewer to also peek at it.


Correct, and I see you already did. Thanks!

> Thanks for your patience,


Thank a lot for the comments. I know how much patience it requires to wade
through a big bunch of someone else's boring code.

New BRIG FE patch set attached.

BR,
Pekka

Comments

Martin Jambor Nov. 4, 2016, 2:58 p.m. UTC | #1
Hi Pekka,

On Sat, Oct 29, 2016 at 02:58:29PM +0300, Pekka Jääskeläinen wrote:
> Hi Martin,

> 

> Thanks for the comments and suggestions. Replies inline:

> 

> On Thu, Oct 20, 2016 at 6:10 PM, Martin Jambor <mjambor@suse.cz> wrote:

> > - Still quite few things need to be documented better, e.g.:

> >   + brig_to_generic::get_mangled_name_tmpl and to a lesser extent

> >     brig_to_generic::get_mangled_name.  It should be clear what is the

> >     intended difference in usage of the two (specially since the

> >     former is a template and so the parameter does not give that much

> >     of a hint to the reader)

> 

> Added more comments.


thanks.

> 

> >   + the visitor classes need some description so that the first time

> >     readers see them, they understand what they are for and what they

> >     visit (i.e. what "visiting" even means).

> 

> This is an adaptation of the classic gang of four Visitor design pattern.

> I added a reference to it in a comment.


Thanks.  As you can see, I am not a C++ person and do not know names
of common patterns.  For me, "Gang of Four" is a reference to history
of China and not something one would want to emulate.

> 

> > - I know it was me who told you to use gcc_assert and gcc_unreachable

> >   instead of internal_error.  The thing is, often the error is clearly

> >   not an internal error but an error in input.  I think that we should

> >   plan to handle these cases differently and report the issues better

> >   to the user, give a meaningful error message together width the

> >   section and the offset there when it was encountered.  I am not

> >   asking you to audit all asserts now and convert those in this

> >   category but it would be nice to have a mechanism to easily do so

> >   (and convert a few obvious places), so that we can convert these as

> >   we bump into them.

> 

> I'm not sure about this. BRIG FE is a rather special case as we

> assume HSAILasm has been used to parse and error check the original

> HSAIL text to the binary BRIG format which it consumes.

> 

> Of course HSAILasm can have bugs, but how much we should produce human

> readable error messages to help debugging HSAILasm is another thing.

> In case the BRIG FE

> fails to consume the input, it means either the BRIG is corrupted for

> a reason or another,

> but typically is not a human error (those should be caught be HSAILasm).


I personally primarily want this for debugging purposes, and we should
try to eventually report errors in a more comprehensible way than
HSAILasm.  But more generally, and more importantly, if the input,
whether human readable or not, is incorrect, the compiler should not
produce an "internal error."  If that happens, users will file bugs
against gcc when in fact it is the generating tool that is broken.
One you maintain the FE, you personally will not want this :-)

> 

> "File format not recognized" error is one that might be useful though.

> I added a check for the BRIG magic number and the supported version (1.0).

> 

> Perhaps we should add error printouts later on case by case basis when we

> see which error cases can be useful and worth reporting in a human readable

> graceful manner? It can be as easy as converting the internal_error

> to fatal_error or similar in that case.


Yeah, that is what I meant, together with printing some information
about offsets in different sections where the error is.

> 

> > - A very minor suggestion: In GCC it is customary to write TODO as one

> >   word.  We generally do not use "TO OPTIMIZE", that is just a TODO

> >   (as opposed to a FIXME, which hints that something is at least a bit

> >   wrong here).  I think you can keep your way if you want but for

> >   example I do have emacs highlighting set up for the traditional

> >   formats.

> 

> Converted all to TODO.

> 

> > - You do not seem to handle BRIG_OPCODE_ICALL, or have I missed it?

> >   That is fine, I don't think anybody else does that now anyway, I

> >   just got curious reading the ipa analysis.

> 

> Right.

> 

> > - brig-lang.c:

> >   + x_flag_whole_program = 0; - talk about this with Honza

> 

> I guess this is note to yourself? As we agreed I didn't try whole program

> optimizations yet.  I might later when optimizing for a target. It will

> be really useful, I agree. Now that it used the proper builtin way, it

> might work more easily.


Right, it was a note for myself.

> 

> >   + brig_langhook_type_for_size: has several issues.  Either always

> >     return a type like go or return error_mark_node instead of NULL.

> >     Also, do not count on long being 64-bit.  I would just copy what

> >     go does.  Or lto.

> 

> Copied the go's version, it should work with BRIG too.


Great.

> 

> >   + convert: did you avoid using convert_to_vector deliberately?  The

> >     size check seems genuinely useful.

> 

> BRIG/HSAIL is a bit special case due to its "untyped" variables (registers),

> I use bit level casts in a lot of places to avoid accidental type conversions

> or sign extensions.


I was wondering why in case of VECTOR_TYPE, instead of
    return build1 (VIEW_CONVERT_EXPR, type, expr);
you do not do
    return fold (convert_to_vector (type, expr));

it seemed to me natural given how you handle the other cases and that
the function internally also builds a V_C_E, after two checks.  So I
thought the checks (matching overall size and only converting from an
integer or another vector-type) were the problem and was wondering
which one.  Especially the size one is something I believe you should
never violate too.

> 

> > - brigfrontend/brig-to-generic.cc:

> >   + brig_to_generic::parse: You seem to be handling LDA instruction

> >     (or more generally, BRIG_KIND_INST_ADDR instructions, but there is

> >     just the one) with copy_move_inst_handler, which seems just wrong.

> 

> What do you mean by 'wrong'?


copy_move_inst_handler operator() starts by type-casting whatever it
gets to BrigInstSourceType* and immediately dereferencing it, loading
field sourceType, even though, in case of an LDA, what it is actually
looking at is not a BrigInstSourceType but a BrigInstAddr, which is an
entirely different structure that does not have that field.  So you
read the 8-bit segment and the first reserved field and happily pass
that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on
random data.  That is wrong.

Later on in the operator() you actually figure out that you are
looking at BrigInstSourceType but that is too late.

> The handlers do not map to instruction types

> directly, but often more towards the specification chapters.

> It comes straight from:

> http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BRIG_syntax_copy_move.htm


Ah, so that chapter is why you decided to handle type conversions and
LDA together, it made little sense to me.  Still, I do not think it is
a very good idea implementation-wise, but if the operator() is written
correctly, it certainly can be done.

> 

> >     Its operator() blindly casts the instruction to

> >     BrigInstSourceType, interpreting the segment as if it was a source

> >     type... am I missing something?

> 

> I believe you are confused by the BRIG struct name: BrigInstSourceType

> is a struct

> for "instructions that have different types for their destination and

> source operands"

> http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BrigInstSourceType.htm

> 

> That is, it's not a "source type" object, but an instruction type object.


Ehm, no, I think I understand that, the invalid typecast/dereference
is what I am pointing at.

> 

> >   + build_reinterpret_cast: gcc_unreachable followed by return

> >     NULL_TREE does not make sense, please convert the whole thing to

> >     an assert.  More importantly, I think that you really need to

> 

> Converted it to an assert.

> 

> >     solve the case of mismatching sizes differently, and not create a

> >     (single) V_C_E for it.  For register types, you can V_C_E them to

> >     an unsigned int of the same size, then do a widening NOP_EXPR to

> >     unsigned type with the same size as the destination and then do a

> >     V_C_E to whatever you need.  But perhaps this can be solved better

> >     at the caller side somewhere?

> 

> I added the extra conversion step through unsigned ints.

> I hadn't noticed that V_C_E is not guaranteed to work for this case.


Well, Ada also creates mismatched V_C_Es, so we do not verify the
sizes match and it mostly works.  However, after hunting down
expansion bugs caused by a combination of that and SRA, I really
really do not want to avoid adding more such situations, it still
seems quite fragile to me.

So thanks for doing this.

> >   + brig_code_entry_handler::build_address_operand: I must say I

> >     really dislike how the end of the function is structured, it is

> >     terrible difficult to read given that it is not doing anything that

> >     complicated and I think it does not handle correctly an (LDA of a)

> >     NULL address, which unfortunately I believe is valid for private

> >     and group segment addresses.  Dealing with the most complex case

> >     by converting symbol to size_type looks exactly backwards,

> >     especially given that you have converted the base of the

> >     POINTER_PLUS_EXPR only few lines before.  I think the code would

> >     be a lot nicer and easier to comprehend if you clearly

> >     distinguished the various cases (symbol_base != NULL (and sub

> >     cases when ptr_base is or is not NULL), ptrbase != NULL and simple

> >     constant, even NULL constant, which you do not handle but fail an

> >     assert, I think) and handled them separately, including all type

> >     conversions.  ptr_base is an unfortunate name, IMHO, in many cases

> >     it has the role of a variable offset rather than a base.

> >     Similarly, ptr_ofset is really a constant_offset.

> 

> Renamed ptr_base to var_offset and ptr_offset to const_offset which

> indeed are more descriptive.

> 

> I also cleaned the if..else mess at the end of the function and made the

> different cases very explicit, instead of the spaghetti mess previous

> version. It looks much better now to me at least.


Indeed it does, thanks.  However, I think that at the point when you
do:

  if (offs > 0)
    const_offset = build_int_cst (size_type_node, offs);

const_offset can already be non-NULL you may lose whatever value it
had before.  You might want to keep the constant offset as integer
until the very moment when you may want to use it.

(Also, single statements should not have braces around them ;-)

> > - brig-basic-inst-handler.cc:

> >   + scalarized_sat_arithmetics::scalarized_sat_arithmetics: Do not

> >     undef macros preemptively.  Instead, undef them right after using

> >     them, after the include of a .def file.

> 

> This seems to be an idiom with the builtin import mechanism elsewhere

> also.  builtins.def defines a default macro which one must undef if not

> wanting to do anything with that builtin type in that particular import

> location.


I see, OK.

> 

> >   + brig_basic_inst_handler::must_be_scalarized: I am intrigued by the

> >     (elements < 16) part of the condition.  This function would also

> >     benefit from a comment.

> 

> This function black listed the known cases of MULHI with vectors

> that have been tested to break with AMD64/x86-64. It seems the

> support for vector MULT_HIGHPART_EXPR is flaky and undertested.

> 

> The robust thing to do here is to force scalarization always with MULHI for now,

> until these issues are  debugged further. I added an exception for 2x64b

> MULT_HIGHPART_EXPR to avoid the need for 128b scalar arithmetics,

> and as it seems to work for the CPUs I've tested.

> 

> The decision should not IMO belong to the frontend, but there'd be

> better a step where the vector operations are optionally scalarized if the

> target prefers scalarized operations which should be caught by this

> step also. Something to fix during optimization work.


I see.  If you think that MULT_HIGHPART_EXPR is broken, it may be
worth filing a bug report (probably during next stage1).

> >   + brig_basic_inst_handler::operator (): It seems that the opcode

> >     local variable is only used to identify the return brig

> >     instructions which seems wasteful.  Generally, it would be nice to

> 

> It's also used to catch MULHI which is generated from multiple brig

> opcodes.

> 

> >     clean this function up a little by moving assignments to some of

> >     the very many local variables down, as close to their first use as

> >     reasonable.  Surprisingly often, you'd remove the need to compute

> >     them in many cases at all, e.g. look at element_count and

> >     element_size_bits.

> 

> Moved element_count and is_fp16_operation. element_size_bits is now

> used for catching mulhis for 64b elements.

> 

> >     Extra points for a function comment explaining how work is divided

> >     in between operator() itself and its main helpers such as

> >     build_instr_expr.

> 

> I added a method comment, but the truth is that the division of work

> is a bit artificial, mostly the build_inst_expr() call is there to split

> a complex if..else structure to two functions to improve readability.


Yeah, I thought so.  No worries.

> > - brig-cmp-inst-handler.cc:

> >   + brig_cmp_inst_handler::operator (): the neg_expr seems to be

> >     something left from earlier times?  Use BITS_PER_UNIT instead of

> 

> Correct. Removed.

> 

> >     8, having both result_width and element_width seems unnecessary

> 

> Removed element_width and moved result_width definition

> closer to its use.

> 

> >     (and speaking of elements, is that actually even a vector case?),

> >     and should be initialized only in the case when it is used.  In

> >     case of vector results, please build either all_ones or all_zeros,

> >     it is wasteful to allocate both.

> 

> They both are used to produce the HSA required all_ones/all_zeros

> output.


We already discussed this on IRC, this was my oversight.

> 

> > - brig-mem-inst-handler.cc: I believe that using the alignment

> >   modifier is something that we should try to get done as soon as

> >   possible.

> 

> I agree. Probably one of the first things that will pop up during optimizing

> the performance.

> 

> > - brig-inst-mod-handler.cc: This seems like something that we should

> >   at least warn about (in case when effectively an unsupported

> >   operation is requested).

> 

> If there will be an upgrade for the frontend to support the 'full' profile (it's

> only supporting 'base' now) with all the rounding modifiers, a better way might

> be found than injecting fesetround() calls around all float

> expressions.  Probably

> in that case all float ops must be converted to builtin calls that

> ensure the wanted

> rounding (ungh!).


Yay.  OK.

> 

> > - brig-seg-inst-handler.cc: At this point I'm trying to read quickly

> >   but it seems to me you do not support conversion between flat and

> >   global segment... how come?

> 

> Global address is already a flat address. Check the description tab in

> http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/05_Arithmetic/segment_conversion.htm?Highlight=global


Oh, I did not realize that HSAIL does not allow stof_global and
ftos_global variants.  Interesting.  OK then.

> 

> > - brig-copy-move-inst-handler.cc:

> >   + brig_copy_move_inst_handler::operator (): The function definitely

> >     should not cast LDA instructions to BrigInstSourceType*!

> 

> Yes it should. Like I explained above, the struct name is misleading,

> but it's actually an instruction calls not a source type.


but surely LDA is encoded as struct BrigInstAddr, not
BrigInstSourceType ???

> 

> > - brig-branch-inst-handler.cc: I believe that as long as the builtins

> >   representing barriers are not pure, they will not be hoisted out of

> >   a loop.  Nonduplication might indeed be a problem, although short of

> >   whole function cloning, I could not think of a transformation gcc

> >   performs that might pause a problem.  Nevertheless, we probably

> >   should introduce an attribute for it and look for it in

> >   gimple_can_duplicate_bb_p (and in cfg_layout_can_duplicate_bb_p?).

> >   An important issue, but hopefully for later.

> 

> Agreed.

> 


> 

> > - brig-function-handler.cc:

> >   + brig_directive_function_handler::operator: Please use gcc_assert

> >     instead of assert.  (Well, in this case it is clearly input error

> >     which, eventually, we will want to give nice errors about.  But at

> >     least do not use assert.)

> 

> Converted to gcc_assert() for now.

> 


> >   + brig_function::build_launcher_and_metadata: The ASM directive is

> >     really an ugly hack.  It is isolated so I am not that much

> >     concerned, but building a structure and filling it with data (like

> >     we do for example in hsa_output_libgomp_mapping) seems cleaner.

> 

> Hmm. I don't generate any metadata structs to data section, but a separate

> custom ELF section per kernel. I agree the ASM directive is not ideal

> in general,

> but I don't think there's a generic way to add custom ELF sections.

> 

> I'm not sure how much building the structure field by field would be a better

> approach in comparison to a raw dump as the point is to just transfer the

> metadata to the HSA runtime with the finalized binary. The runtime should use

> exactly the same struct layout, otherwise it won't work anyways. If I

> do a raw dump,

> at least I ensure that if the struct is updated I won't forget to update the

> serialization code.


Well, hopefully nobody will object too strongly against this.  As I
wrote before, it is isolated enough for me.

> > - gcc/builtins.def: In the added DEF_HSAIL_*_BUILTIN macros, please

> >   arrange it so that they only pass true in the last argument of

> >   DEF_BUILTIN when gcc is configured with BRIG FE.  Builtins are not

> >   free and should not be added needlessly.

> 

> It doesn't even include the hsail-builtins.def in case BRIG FE not

> enabled now. I suppose that's even better than executing those macros

> for nothing.


Oh, I have missed that.  Sure.

> > If I understood you correctly, both you and your sponsor have already

> > signed the Copyright assignment, right?  If that is so, I'll ask the

> > steering committee to approve the intention and then ask a global

> > reviewer to also peek at it.

> 

> Correct, and I see you already did. Thanks!

> 

> > Thanks for your patience,

> 

> Thank a lot for the comments. I know how much patience it requires to wade

> through a big bunch of someone else's boring code.

> 

> New BRIG FE patch set attached.


This week I have checked out the updated tree from github and looked
at only a few changed functions there.  Hopefully the steering
committee will decide soon and we'll get attention of a global
reviewer during the next few weeks too.

Thanks for addressing so many of my comments,

Martin
Pekka Jääskeläinen Nov. 9, 2016, 7:13 p.m. UTC | #2
Hi Martin,

On Fri, Nov 4, 2016 at 4:58 PM, Martin Jambor <mjambor@suse.cz> wrote:
> I personally primarily want this for debugging purposes, and we should

> try to eventually report errors in a more comprehensible way than

> HSAILasm.  But more generally, and more importantly, if the input,

> whether human readable or not, is incorrect, the compiler should not

> produce an "internal error."  If that happens, users will file bugs

> against gcc when in fact it is the generating tool that is broken.

> One you maintain the FE, you personally will not want this :-)


Yes, I will add better error messages as soon asissues are reported.

> I was wondering why in case of VECTOR_TYPE, instead of

>     return build1 (VIEW_CONVERT_EXPR, type, expr);

> you do not do

>     return fold (convert_to_vector (type, expr));

>

> it seemed to me natural given how you handle the other cases and that

> the function internally also builds a V_C_E, after two checks.  So I

> thought the checks (matching overall size and only converting from an

> integer or another vector-type) were the problem and was wondering

> which one.  Especially the size one is something I believe you should

> never violate too.


I see. I cannot recall the origins of this, but my guess is that the code is
copied from an older Go frontend version. I tested with your
suggestion and it seems
to work fine.

> copy_move_inst_handler operator() starts by type-casting whatever it

> gets to BrigInstSourceType* and immediately dereferencing it, loading

> field sourceType, even though, in case of an LDA, what it is actually

> looking at is not a BrigInstSourceType but a BrigInstAddr, which is an

> entirely different structure that does not have that field.  So you

> read the 8-bit segment and the first reserved field and happily pass

> that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on

> random data.  That is wrong.

>

> Later on in the operator() you actually figure out that you are

> looking at BrigInstSourceType but that is too late.


Ah, I see the issue now. It worked as with LDA the input type was
forced and handled explicitly. I now made the code more explicit
regarding this.

> Indeed it does, thanks.  However, I think that at the point when you

> do:

>

>   if (offs > 0)

>     const_offset = build_int_cst (size_type_node, offs);

>

> const_offset can already be non-NULL you may lose whatever value it

> had before.  You might want to keep the constant offset as integer

> until the very moment when you may want to use it.


Great catch! There indeed was a major memory corruption with
group and private variable which was not caught by the PRM conformance
suite I use for testing. Should be now fixed.

> (Also, single statements should not have braces around them ;-)


...also fixed these, once again ;)

> I see.  If you think that MULT_HIGHPART_EXPR is broken, it may be

> worth filing a bug report (probably during next stage1).


It can be considered broken or not just fully implemented for all targets
and various vector types (especially those not directly supported by the
target).  Yes, I rather look closer and report this after the patch has been
merged to be able to provide BRIG test cases that work with
upstream code base.

> This week I have checked out the updated tree from github and looked

> at only a few changed functions there.  Hopefully the steering

> committee will decide soon and we'll get attention of a global

> reviewer during the next few weeks too.

>

> Thanks for addressing so many of my comments,


Thanks again, an updated FE patch attached.

BR,
Pekka
diff mbox

Patch

This patch set adds a BRIG (HSAIL) frontend. It can be used as a core
for an HSAIL finalizer implementation for processors with gcc backends.

It is a bit unusual frontend as the consumed format is a binary
representation.  The textual HSAIL can be compiled to it with a 
separate assembler tool.

The frontend has been mostly tested with the HSA PRM conformance suite which
it now passes. The accompanied GENERIC-scanning test suite is supposed to be
only a smoke test. 

libhsail-rt implements HSAIL specific builtins and includes a simple runtime
that implements SPMD execution via either Pth-based fibers or loops to 
execute multiple work-item work groups without SPMD/SIMD-default hardware.

I've split it to 4 patches:

001 - the configuration file changes and misc.
002 - the frontend itself
003 - libhsail-rt
004 - the smoke test suite

The diffstat is as follows:

 .gitignore                                        |     2 +-
 Makefile.def                                      |     3 +
 Makefile.in                                       |   489 +
 configure                                         |     1 +
 configure.ac                                      |     1 +
 gcc/brig/Make-lang.in                             |   247 +
 gcc/brig/brig-builtins.h                          |    99 +
 gcc/brig/brig-c.h                                 |    66 +
 gcc/brig/brig-lang.c                              |   770 +
 gcc/brig/brigfrontend/brig-arg-block-handler.cc   |    66 +
 gcc/brig/brigfrontend/brig-atomic-inst-handler.cc |   265 +
 gcc/brig/brigfrontend/brig-basic-inst-handler.cc  |   865 +
 gcc/brig/brigfrontend/brig-branch-inst-handler.cc |   221 +
 gcc/brig/brigfrontend/brig-cmp-inst-handler.cc    |   198 +
 gcc/brig/brigfrontend/brig-code-entry-handler.cc  |  1712 ++
 gcc/brig/brigfrontend/brig-code-entry-handler.h   |   422 +
 gcc/brig/brigfrontend/brig-comment-handler.cc     |    39 +
 gcc/brig/brigfrontend/brig-control-handler.cc     |   108 +
 .../brigfrontend/brig-copy-move-inst-handler.cc   |    58 +
 gcc/brig/brigfrontend/brig-cvt-inst-handler.cc    |   260 +
 gcc/brig/brigfrontend/brig-fbarrier-handler.cc    |    44 +
 gcc/brig/brigfrontend/brig-function-handler.cc    |   373 +
 gcc/brig/brigfrontend/brig-function.cc            |   719 +
 gcc/brig/brigfrontend/brig-function.h             |   213 +
 gcc/brig/brigfrontend/brig-inst-mod-handler.cc    |    58 +
 gcc/brig/brigfrontend/brig-label-handler.cc       |    37 +
 gcc/brig/brigfrontend/brig-lane-inst-handler.cc   |    84 +
 gcc/brig/brigfrontend/brig-machine.c              |    44 +
 gcc/brig/brigfrontend/brig-machine.h              |    33 +
 gcc/brig/brigfrontend/brig-mem-inst-handler.cc    |   180 +
 gcc/brig/brigfrontend/brig-module-handler.cc      |    41 +
 gcc/brig/brigfrontend/brig-queue-inst-handler.cc  |    93 +
 gcc/brig/brigfrontend/brig-seg-inst-handler.cc    |   146 +
 gcc/brig/brigfrontend/brig-signal-inst-handler.cc |    42 +
 gcc/brig/brigfrontend/brig-to-generic.cc          |   812 +
 gcc/brig/brigfrontend/brig-to-generic.h           |   226 +
 gcc/brig/brigfrontend/brig-util.cc                |   446 +
 gcc/brig/brigfrontend/brig-util.h                 |    53 +
 gcc/brig/brigfrontend/brig-variable-handler.cc    |   263 +
 gcc/brig/brigfrontend/phsa.h                      |    69 +
 gcc/brig/brigspec.c                               |   135 +
 gcc/brig/config-lang.in                           |    41 +
 gcc/brig/lang-specs.h                             |    28 +
 gcc/brig/lang.opt                                 |    41 +
 gcc/builtin-types.def                             |    80 +-
 gcc/builtins.def                                  |    41 +
 gcc/config.in                                     |     6 +
 gcc/configure                                     |    10 +-
 gcc/configure.ac                                  |     5 +
 gcc/doc/frontends.texi                            |     2 +-
 gcc/doc/invoke.texi                               |     4 +
 gcc/doc/standards.texi                            |     8 +
 gcc/hsail-builtins.def                            |   659 +
 gcc/testsuite/brig.dg/README                      |    10 +
 gcc/testsuite/brig.dg/dg.exp                      |    27 +
 gcc/testsuite/brig.dg/test/gimple/alloca.hsail    |    37 +
 gcc/testsuite/brig.dg/test/gimple/atomics.hsail   |    33 +
 gcc/testsuite/brig.dg/test/gimple/branches.hsail  |    58 +
 gcc/testsuite/brig.dg/test/gimple/fbarrier.hsail  |    74 +
 .../brig.dg/test/gimple/function_calls.hsail      |    59 +
 gcc/testsuite/brig.dg/test/gimple/kernarg.hsail   |    25 +
 gcc/testsuite/brig.dg/test/gimple/mem.hsail       |    39 +
 gcc/testsuite/brig.dg/test/gimple/mulhi.hsail     |    33 +
 gcc/testsuite/brig.dg/test/gimple/packed.hsail    |    78 +
 .../brig.dg/test/gimple/smoke_test.hsail          |    91 +
 gcc/testsuite/brig.dg/test/gimple/variables.hsail |   124 +
 gcc/testsuite/brig.dg/test/gimple/vector.hsail    |    57 +
 gcc/testsuite/lib/brig-dg.exp                     |    29 +
 gcc/testsuite/lib/brig.exp                        |    40 +
 include/hsa-interface.h                           |   630 +
 libhsail-rt/Makefile.am                           |   124 +
 libhsail-rt/Makefile.in                           |   740 +
 libhsail-rt/README                                |     4 +
 libhsail-rt/aclocal.m4                            |   978 +
 libhsail-rt/config.h.in                           |   217 +
 libhsail-rt/configure                             | 17016 ++++++++++++++++++
 libhsail-rt/configure.ac                          |   151 +
 libhsail-rt/include/internal/fibers.h             |    95 +
 .../include/internal/phsa-queue-interface.h       |    60 +
 libhsail-rt/include/internal/phsa-rt.h            |    94 +
 libhsail-rt/include/internal/workitems.h          |   107 +
 libhsail-rt/m4/libtool.m4                         |  7997 ++++++++
 libhsail-rt/m4/ltoptions.m4                       |   384 +
 libhsail-rt/m4/ltsugar.m4                         |   123 +
 libhsail-rt/m4/ltversion.m4                       |    23 +
 libhsail-rt/m4/lt~obsolete.m4                     |    98 +
 libhsail-rt/rt/arithmetic.c                       |   475 +
 libhsail-rt/rt/atomics.c                          |   115 +
 libhsail-rt/rt/bitstring.c                        |   190 +
 libhsail-rt/rt/fbarrier.c                         |    87 +
 libhsail-rt/rt/fibers.c                           |   212 +
 libhsail-rt/rt/fp16.c                             |   135 +
 libhsail-rt/rt/misc.c                             |    89 +
 libhsail-rt/rt/multimedia.c                       |   135 +
 libhsail-rt/rt/queue.c                            |    71 +
 libhsail-rt/rt/sat_arithmetic.c                   |   299 +
 libhsail-rt/rt/segment.c                          |    57 +
 libhsail-rt/rt/workitems.c                        |   952 +
 libhsail-rt/target-config.h.in                    |    68 +
 99 files changed, 43463 insertions(+), 5 deletions(-)