Message ID | 1419326012-10666-1-git-send-email-maxim.uvarov@linaro.org |
---|---|
State | New |
Headers | show |
On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> wrote: > Odp atomic operations based on compiler build-ins. Make > sure that compiler supports such operation at configure > stage. > > Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org> > --- > v3: make atomic checks platfrom specific > > configure.ac | 4 +++- > platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++ > 2 files changed, 21 insertions(+), 1 deletion(-) > create mode 100644 platform/linux-generic/m4/configure.m4 > > diff --git a/configure.ac b/configure.ac > index 377e8be..f55beca 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -47,7 +47,9 @@ AC_ARG_WITH([platform], > [AS_HELP_STRING([--with-platform=prefix], > [Select platform to be used, default linux-generic])], > [], > - [with_platform=linux-generic]) > + [with_platform=linux-generic > + m4_include([./platform/linux-generic/m4/configure.m4]) > + ]) > > AC_SUBST([with_platform]) > > diff --git a/platform/linux-generic/m4/configure.m4 b/platform/linux-generic/m4/configure.m4 > new file mode 100644 > index 0000000..bfb1fb0 > --- /dev/null > +++ b/platform/linux-generic/m4/configure.m4 > @@ -0,0 +1,18 @@ > +AC_MSG_CHECKING(for GCC atomic builtins) > +AC_LINK_IFELSE( > + [AC_LANG_SOURCE( > + [[#include <stdint.h> > + int main() { > + uint32_t v = 1; > + __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); > + __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED); > + __atomic_store_n(&v, 1, __ATOMIC_RELAXED); > + __atomic_load_n(&v, __ATOMIC_RELAXED); > + return 0; > + } > + ]])], > + AC_MSG_RESULT(yes), > + AC_MSG_RESULT(no) > + echo "Atomic operation are not supported by your compiller." > + echo "Use newer version. For gcc > 4.7.3" Why >4.7.3? Minor releases do not introduce major new features. __atomic support was introduced on GCC 4.7.0. > + exit -1) > -- > 1.8.5.1.163.gd7aced9 > > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp
On 12/23/2014 04:26 PM, Ola Liljedahl wrote: > On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> wrote: >> Odp atomic operations based on compiler build-ins. Make >> sure that compiler supports such operation at configure >> stage. >> >> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org> >> --- >> v3: make atomic checks platfrom specific >> >> configure.ac | 4 +++- >> platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++ >> 2 files changed, 21 insertions(+), 1 deletion(-) >> create mode 100644 platform/linux-generic/m4/configure.m4 >> >> diff --git a/configure.ac b/configure.ac >> index 377e8be..f55beca 100644 >> --- a/configure.ac >> +++ b/configure.ac >> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform], >> [AS_HELP_STRING([--with-platform=prefix], >> [Select platform to be used, default linux-generic])], >> [], >> - [with_platform=linux-generic]) >> + [with_platform=linux-generic >> + m4_include([./platform/linux-generic/m4/configure.m4]) >> + ]) >> >> AC_SUBST([with_platform]) >> >> diff --git a/platform/linux-generic/m4/configure.m4 b/platform/linux-generic/m4/configure.m4 >> new file mode 100644 >> index 0000000..bfb1fb0 >> --- /dev/null >> +++ b/platform/linux-generic/m4/configure.m4 >> @@ -0,0 +1,18 @@ >> +AC_MSG_CHECKING(for GCC atomic builtins) >> +AC_LINK_IFELSE( >> + [AC_LANG_SOURCE( >> + [[#include <stdint.h> >> + int main() { >> + uint32_t v = 1; >> + __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); >> + __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED); >> + __atomic_store_n(&v, 1, __ATOMIC_RELAXED); >> + __atomic_load_n(&v, __ATOMIC_RELAXED); >> + return 0; >> + } >> + ]])], >> + AC_MSG_RESULT(yes), >> + AC_MSG_RESULT(no) >> + echo "Atomic operation are not supported by your compiller." >> + echo "Use newer version. For gcc > 4.7.3" > Why >4.7.3? Minor releases do not introduce major new features. > __atomic support was introduced on GCC 4.7.0. I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there. So I just set tested value. I'm ok to change it to 4.7.0. Maxim. > >> + exit -1) >> -- >> 1.8.5.1.163.gd7aced9 >> >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp
On 23 December 2014 at 16:16, Maxim Uvarov <maxim.uvarov@linaro.org> wrote: > On 12/23/2014 04:26 PM, Ola Liljedahl wrote: >> >> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> >> wrote: >>> >>> Odp atomic operations based on compiler build-ins. Make >>> sure that compiler supports such operation at configure >>> stage. >>> >>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org> >>> --- >>> v3: make atomic checks platfrom specific >>> >>> configure.ac | 4 +++- >>> platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++ >>> 2 files changed, 21 insertions(+), 1 deletion(-) >>> create mode 100644 platform/linux-generic/m4/configure.m4 >>> >>> diff --git a/configure.ac b/configure.ac >>> index 377e8be..f55beca 100644 >>> --- a/configure.ac >>> +++ b/configure.ac >>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform], >>> [AS_HELP_STRING([--with-platform=prefix], >>> [Select platform to be used, default linux-generic])], >>> [], >>> - [with_platform=linux-generic]) >>> + [with_platform=linux-generic >>> + m4_include([./platform/linux-generic/m4/configure.m4]) >>> + ]) >>> >>> AC_SUBST([with_platform]) >>> >>> diff --git a/platform/linux-generic/m4/configure.m4 >>> b/platform/linux-generic/m4/configure.m4 >>> new file mode 100644 >>> index 0000000..bfb1fb0 >>> --- /dev/null >>> +++ b/platform/linux-generic/m4/configure.m4 >>> @@ -0,0 +1,18 @@ >>> +AC_MSG_CHECKING(for GCC atomic builtins) >>> +AC_LINK_IFELSE( >>> + [AC_LANG_SOURCE( >>> + [[#include <stdint.h> >>> + int main() { >>> + uint32_t v = 1; You don't need to use a uint32_t, a plain "int" should suffice. Thus no need to include <stdint.h> either which should be more robust. >>> + __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); >>> + __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED); >>> + __atomic_store_n(&v, 1, __ATOMIC_RELAXED); >>> + __atomic_load_n(&v, __ATOMIC_RELAXED); >>> + return 0; >>> + } >>> + ]])], >>> + AC_MSG_RESULT(yes), >>> + AC_MSG_RESULT(no) >>> + echo "Atomic operation are not supported by your compiller." "compiller" speling erorr. I think the message should be more detailed. echo "GCC-style __atomic builtins not supported by the compiler" >>> + echo "Use newer version. For gcc > 4.7.3" >> >> Why >4.7.3? Minor releases do not introduce major new features. >> __atomic support was introduced on GCC 4.7.0. > > I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there. So I > just set tested value. > I'm ok to change it to 4.7.0. Good. > > Maxim. > >> >>> + exit -1) >>> -- >>> 1.8.5.1.163.gd7aced9 >>> >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> http://lists.linaro.org/mailman/listinfo/lng-odp > >
how will this work, Octeon does not care about __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); and yet you will fail configure I assume static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) { #if defined __OCTEON__ >-------uint32_t ret; >-------__asm__ __volatile__ ("syncws"); >-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) : >------->------->------- "r" (atom)); >-------return ret; #else >-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); #endif } On 23 December 2014 at 10:26, Ola Liljedahl <ola.liljedahl@linaro.org> wrote: > On 23 December 2014 at 16:16, Maxim Uvarov <maxim.uvarov@linaro.org> > wrote: > > On 12/23/2014 04:26 PM, Ola Liljedahl wrote: > >> > >> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> > >> wrote: > >>> > >>> Odp atomic operations based on compiler build-ins. Make > >>> sure that compiler supports such operation at configure > >>> stage. > >>> > >>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org> > >>> --- > >>> v3: make atomic checks platfrom specific > >>> > >>> configure.ac | 4 +++- > >>> platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++ > >>> 2 files changed, 21 insertions(+), 1 deletion(-) > >>> create mode 100644 platform/linux-generic/m4/configure.m4 > >>> > >>> diff --git a/configure.ac b/configure.ac > >>> index 377e8be..f55beca 100644 > >>> --- a/configure.ac > >>> +++ b/configure.ac > >>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform], > >>> [AS_HELP_STRING([--with-platform=prefix], > >>> [Select platform to be used, default linux-generic])], > >>> [], > >>> - [with_platform=linux-generic]) > >>> + [with_platform=linux-generic > >>> + m4_include([./platform/linux-generic/m4/configure.m4]) > >>> + ]) > >>> > >>> AC_SUBST([with_platform]) > >>> > >>> diff --git a/platform/linux-generic/m4/configure.m4 > >>> b/platform/linux-generic/m4/configure.m4 > >>> new file mode 100644 > >>> index 0000000..bfb1fb0 > >>> --- /dev/null > >>> +++ b/platform/linux-generic/m4/configure.m4 > >>> @@ -0,0 +1,18 @@ > >>> +AC_MSG_CHECKING(for GCC atomic builtins) > >>> +AC_LINK_IFELSE( > >>> + [AC_LANG_SOURCE( > >>> + [[#include <stdint.h> > >>> + int main() { > >>> + uint32_t v = 1; > You don't need to use a uint32_t, a plain "int" should suffice. Thus > no need to include <stdint.h> either which should be more robust. > > >>> + __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); > >>> + __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED); > >>> + __atomic_store_n(&v, 1, __ATOMIC_RELAXED); > >>> + __atomic_load_n(&v, __ATOMIC_RELAXED); > >>> + return 0; > >>> + } > >>> + ]])], > >>> + AC_MSG_RESULT(yes), > >>> + AC_MSG_RESULT(no) > >>> + echo "Atomic operation are not supported by your compiller." > "compiller" speling erorr. > > I think the message should be more detailed. > echo "GCC-style __atomic builtins not supported by the compiler" > > > >>> + echo "Use newer version. For gcc > 4.7.3" > >> > >> Why >4.7.3? Minor releases do not introduce major new features. > >> __atomic support was introduced on GCC 4.7.0. > > > > I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there. > So I > > just set tested value. > > I'm ok to change it to 4.7.0. > Good. > > > > > Maxim. > > > >> > >>> + exit -1) > >>> -- > >>> 1.8.5.1.163.gd7aced9 > >>> > >>> > >>> _______________________________________________ > >>> lng-odp mailing list > >>> lng-odp@lists.linaro.org > >>> http://lists.linaro.org/mailman/listinfo/lng-odp > > > > > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp >
We need and arch directory that ./configure can pull from for those things that are not platform specific but are core core specific. ./configure then needs different rules to accommodate the above cases, presumable the different rules can be pulled depending on something common like $ARCH similar to the linux compile. On 23 December 2014 at 10:28, Mike Holmes <mike.holmes@linaro.org> wrote: > how will this work, Octeon does not care about __atomic_fetch_add(&v, > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume > > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) > { > #if defined __OCTEON__ > >-------uint32_t ret; > >-------__asm__ __volatile__ ("syncws"); > >-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) : > >------->------->------- "r" (atom)); > >-------return ret; > #else > >-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); > #endif > } > > > On 23 December 2014 at 10:26, Ola Liljedahl <ola.liljedahl@linaro.org> > wrote: > >> On 23 December 2014 at 16:16, Maxim Uvarov <maxim.uvarov@linaro.org> >> wrote: >> > On 12/23/2014 04:26 PM, Ola Liljedahl wrote: >> >> >> >> On 23 December 2014 at 10:13, Maxim Uvarov <maxim.uvarov@linaro.org> >> >> wrote: >> >>> >> >>> Odp atomic operations based on compiler build-ins. Make >> >>> sure that compiler supports such operation at configure >> >>> stage. >> >>> >> >>> Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org> >> >>> --- >> >>> v3: make atomic checks platfrom specific >> >>> >> >>> configure.ac | 4 +++- >> >>> platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++ >> >>> 2 files changed, 21 insertions(+), 1 deletion(-) >> >>> create mode 100644 platform/linux-generic/m4/configure.m4 >> >>> >> >>> diff --git a/configure.ac b/configure.ac >> >>> index 377e8be..f55beca 100644 >> >>> --- a/configure.ac >> >>> +++ b/configure.ac >> >>> @@ -47,7 +47,9 @@ AC_ARG_WITH([platform], >> >>> [AS_HELP_STRING([--with-platform=prefix], >> >>> [Select platform to be used, default linux-generic])], >> >>> [], >> >>> - [with_platform=linux-generic]) >> >>> + [with_platform=linux-generic >> >>> + m4_include([./platform/linux-generic/m4/configure.m4]) >> >>> + ]) >> >>> >> >>> AC_SUBST([with_platform]) >> >>> >> >>> diff --git a/platform/linux-generic/m4/configure.m4 >> >>> b/platform/linux-generic/m4/configure.m4 >> >>> new file mode 100644 >> >>> index 0000000..bfb1fb0 >> >>> --- /dev/null >> >>> +++ b/platform/linux-generic/m4/configure.m4 >> >>> @@ -0,0 +1,18 @@ >> >>> +AC_MSG_CHECKING(for GCC atomic builtins) >> >>> +AC_LINK_IFELSE( >> >>> + [AC_LANG_SOURCE( >> >>> + [[#include <stdint.h> >> >>> + int main() { >> >>> + uint32_t v = 1; >> You don't need to use a uint32_t, a plain "int" should suffice. Thus >> no need to include <stdint.h> either which should be more robust. >> >> >>> + __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); >> >>> + __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED); >> >>> + __atomic_store_n(&v, 1, __ATOMIC_RELAXED); >> >>> + __atomic_load_n(&v, __ATOMIC_RELAXED); >> >>> + return 0; >> >>> + } >> >>> + ]])], >> >>> + AC_MSG_RESULT(yes), >> >>> + AC_MSG_RESULT(no) >> >>> + echo "Atomic operation are not supported by your compiller." >> "compiller" speling erorr. >> >> I think the message should be more detailed. >> echo "GCC-style __atomic builtins not supported by the compiler" >> >> >> >>> + echo "Use newer version. For gcc > 4.7.3" >> >> >> >> Why >4.7.3? Minor releases do not introduce major new features. >> >> __atomic support was introduced on GCC 4.7.0. >> > >> > I have ubuntu 12.04 and there is 4.7.3. And it definatelly works there. >> So I >> > just set tested value. >> > I'm ok to change it to 4.7.0. >> Good. >> >> > >> > Maxim. >> > >> >> >> >>> + exit -1) >> >>> -- >> >>> 1.8.5.1.163.gd7aced9 >> >>> >> >>> >> >>> _______________________________________________ >> >>> lng-odp mailing list >> >>> lng-odp@lists.linaro.org >> >>> http://lists.linaro.org/mailman/listinfo/lng-odp >> > >> > >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp >> > > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP >
On 12/23/2014 05:28 PM, Mike Holmes wrote: > how will this work, Octeon does not care about __atomic_fetch_add(&v, > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume > > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) > { > #if defined __OCTEON__ >>-------uint32_t ret; >>-------__asm__ __volatile__ ("syncws"); >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) : >>------->------->------- "r" (atom)); >>-------return ret; > #else >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); > #endif > } Linux-generic shouldn't have arch specific parts, because it is *generic*. Instead it can fall back to strong __sync functions if __atomic are not available.
On 23 December 2014 at 11:15, Taras Kondratiuk <taras.kondratiuk@linaro.org> wrote: > On 12/23/2014 05:28 PM, Mike Holmes wrote: > > how will this work, Octeon does not care about __atomic_fetch_add(&v, > > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume > > > > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) > > { > > #if defined __OCTEON__ > >>-------uint32_t ret; > >>-------__asm__ __volatile__ ("syncws"); > >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) : > >>------->------->------- "r" (atom)); > >>-------return ret; > > #else > >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); > > #endif > > } > > Linux-generic shouldn't have arch specific parts, because it is > *generic*. Instead it can fall back to strong __sync functions if > __atomic are not available. > > Linux kernel is generic but has core specifics - is this not the same ? What about these cases ? platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE *file, odp_system_info_t *sysinfo) platform/linux-generic/odp_system_info.c: .cpu_arch_str = "octeon", platform/linux-generic/odp_system_info.c: .cpuinfo_parser = cpuinfo_octeon platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ || defined __i386__ platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ || defined __i386__ platform/linux-generic/include/odp_spin_internal.h:#if defined __x86_64__ || defined __i386__ platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined __i386__ || defined __OCTEON__ || \ platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined __i386__ platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE *file, odp_system_info_t *sysinfo) platform/linux-generic/odp_system_info.c: #if defined __x86_64__ || defined __i386__ platform/linux-generic/odp_system_info.c: .cpu_arch_str = "x86", platform/linux-generic/odp_system_info.c: .cpuinfo_parser = cpuinfo_x86 platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined __i386__ || defined __OCTEON__ || \ platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined __i386__ platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ || defined __aarch64__ platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__) platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__ platform/linux-generic/odp_system_info.c:#elif defined __arm__ || defined __aarch64__ platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE *file ODP_UNUSED, platform/linux-generic/odp_system_info.c: #elif defined __arm__ || defined __aarch64__ platform/linux-generic/odp_system_info.c: .cpu_arch_str = "arm", platform/linux-generic/odp_system_info.c: .cpuinfo_parser = cpuinfo_arm > > Taras Kondratiuk >
I had some discussions with Bala and we think the GCC generates the appropriate code for OCTEON so there should be no need for any OCTEON-specific inline assembler in odp_atomic.h. I just didn't want to remove this piece of code when I redesigned the atomic support because I didn't understand why there is a SYNCW instruction *before* the load. Load-acquire type of loads may need sync or barrier instructions but they come after the load, not before. I think we should trust GCC to generate better code for OCTEON than us mere humans, at least until proven wrong. So please remove this OCTEON-specific snippet. -- Ola On 23 December 2014 at 18:01, Mike Holmes <mike.holmes@linaro.org> wrote: > > > On 23 December 2014 at 11:15, Taras Kondratiuk <taras.kondratiuk@linaro.org> > wrote: >> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >> > how will this work, Octeon does not care about __atomic_fetch_add(&v, >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume >> > >> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >> > { >> > #if defined __OCTEON__ >> >>-------uint32_t ret; >> >>-------__asm__ __volatile__ ("syncws"); >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) : >> >>------->------->------- "r" (atom)); >> >>-------return ret; >> > #else >> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); >> > #endif >> > } >> >> Linux-generic shouldn't have arch specific parts, because it is >> *generic*. Instead it can fall back to strong __sync functions if >> __atomic are not available. >> > > Linux kernel is generic but has core specifics - is this not the same ? > > What about these cases ? > > platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE > *file, odp_system_info_t *sysinfo) > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "octeon", > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > cpuinfo_octeon > platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ || > defined __i386__ > platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ || > defined __i386__ > platform/linux-generic/include/odp_spin_internal.h:#if defined __x86_64__ || > defined __i386__ > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined > __i386__ || defined __OCTEON__ || \ > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined > __i386__ > platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE *file, > odp_system_info_t *sysinfo) > platform/linux-generic/odp_system_info.c: #if defined __x86_64__ || > defined __i386__ > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "x86", > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > cpuinfo_x86 > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || defined > __i386__ || defined __OCTEON__ || \ > platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined __i386__ > platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ || > defined __aarch64__ > platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__) > platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__ > platform/linux-generic/odp_system_info.c:#elif defined __arm__ || defined > __aarch64__ > platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE *file > ODP_UNUSED, > platform/linux-generic/odp_system_info.c: #elif defined __arm__ || > defined __aarch64__ > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "arm", > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > cpuinfo_arm > > >> >> >> Taras Kondratiuk > > > > > -- > Mike Holmes > Linaro Sr Technical Manager > LNG - ODP
That's good to know and certainly the code is cleaner that way. I think we should adopt the practice that any architecture-specific changes to linux-generic code (adds, updates, or removals) must either originate from or have a Reviewed-by a representative of that architecture (in this case, Bala). That way there's no confusion as to whether the known experts agree with the change, since these sort of things can have subtle implications that are not obvious to the layman. On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl <ola.liljedahl@linaro.org> wrote: > > I had some discussions with Bala and we think the GCC generates the > appropriate code for OCTEON so there should be no need for any > OCTEON-specific inline assembler in odp_atomic.h. I just didn't want > to remove this piece of code when I redesigned the atomic support > because I didn't understand why there is a SYNCW instruction *before* > the load. Load-acquire type of loads may need sync or barrier > instructions but they come after the load, not before. > > I think we should trust GCC to generate better code for OCTEON than us > mere humans, at least until proven wrong. So please remove this > OCTEON-specific snippet. > > -- Ola > > > On 23 December 2014 at 18:01, Mike Holmes <mike.holmes@linaro.org> wrote: > > > > > > On 23 December 2014 at 11:15, Taras Kondratiuk < > taras.kondratiuk@linaro.org> > > wrote: > >> > >> On 12/23/2014 05:28 PM, Mike Holmes wrote: > >> > how will this work, Octeon does not care about > __atomic_fetch_add(&v, > >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume > >> > > >> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t > *atom) > >> > { > >> > #if defined __OCTEON__ > >> >>-------uint32_t ret; > >> >>-------__asm__ __volatile__ ("syncws"); > >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) : > >> >>------->------->------- "r" (atom)); > >> >>-------return ret; > >> > #else > >> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); > >> > #endif > >> > } > >> > >> Linux-generic shouldn't have arch specific parts, because it is > >> *generic*. Instead it can fall back to strong __sync functions if > >> __atomic are not available. > >> > > > > Linux kernel is generic but has core specifics - is this not the same ? > > > > What about these cases ? > > > > platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE > > *file, odp_system_info_t *sysinfo) > > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "octeon", > > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > > cpuinfo_octeon > > platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ || > > defined __i386__ > > platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ || > > defined __i386__ > > platform/linux-generic/include/odp_spin_internal.h:#if defined > __x86_64__ || > > defined __i386__ > > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || > defined > > __i386__ || defined __OCTEON__ || \ > > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || > defined > > __i386__ > > platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE > *file, > > odp_system_info_t *sysinfo) > > platform/linux-generic/odp_system_info.c: #if defined __x86_64__ || > > defined __i386__ > > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "x86", > > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > > cpuinfo_x86 > > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || > defined > > __i386__ || defined __OCTEON__ || \ > > platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined > __i386__ > > platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ || > > defined __aarch64__ > > platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__) > > platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__ > > platform/linux-generic/odp_system_info.c:#elif defined __arm__ || defined > > __aarch64__ > > platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE > *file > > ODP_UNUSED, > > platform/linux-generic/odp_system_info.c: #elif defined __arm__ || > > defined __aarch64__ > > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "arm", > > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > > cpuinfo_arm > > > > > >> > >> > >> Taras Kondratiuk > > > > > > > > > > -- > > Mike Holmes > > Linaro Sr Technical Manager > > LNG - ODP > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp >
On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org> wrote: > That's good to know and certainly the code is cleaner that way. I think > we should adopt the practice that any architecture-specific changes to > linux-generic code (adds, updates, or removals) must either originate from > or have a Reviewed-by a representative of that architecture (in this case, > Bala). That way there's no confusion as to whether the known experts agree > with the change, since these sort of things can have subtle implications > that are not obvious to the layman. > I think linux-generic should be devoid of any per core optimizations in its default state. There could be in an arch directory and it would be supported as you say Bill by someone who will care for that that variant, this also avoids filling linux-generic with a possibly large list of preprocessor flags that we can't test. In this case we don't have public access to the octeon to even know if this code works so separating or deletion feels right to me, although Cavium are adding their board in this case. > > On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl <ola.liljedahl@linaro.org> > wrote: > >> I had some discussions with Bala and we think the GCC generates the >> appropriate code for OCTEON so there should be no need for any >> OCTEON-specific inline assembler in odp_atomic.h. I just didn't want >> to remove this piece of code when I redesigned the atomic support >> because I didn't understand why there is a SYNCW instruction *before* >> the load. Load-acquire type of loads may need sync or barrier >> instructions but they come after the load, not before. >> >> I think we should trust GCC to generate better code for OCTEON than us >> mere humans, at least until proven wrong. So please remove this >> OCTEON-specific snippet. >> >> -- Ola >> >> >> On 23 December 2014 at 18:01, Mike Holmes <mike.holmes@linaro.org> wrote: >> > >> > >> > On 23 December 2014 at 11:15, Taras Kondratiuk < >> taras.kondratiuk@linaro.org> >> > wrote: >> >> >> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >> >> > how will this work, Octeon does not care about >> __atomic_fetch_add(&v, >> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I assume >> >> > >> >> > static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t >> *atom) >> >> > { >> >> > #if defined __OCTEON__ >> >> >>-------uint32_t ret; >> >> >>-------__asm__ __volatile__ ("syncws"); >> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (atom) >> : >> >> >>------->------->------- "r" (atom)); >> >> >>-------return ret; >> >> > #else >> >> >>-------return __atomic_fetch_add(&atom->v, 1, __ATOMIC_RELAXED); >> >> > #endif >> >> > } >> >> >> >> Linux-generic shouldn't have arch specific parts, because it is >> >> *generic*. Instead it can fall back to strong __sync functions if >> >> __atomic are not available. >> >> >> > >> > Linux kernel is generic but has core specifics - is this not the same ? >> > >> > What about these cases ? >> > >> > platform/linux-generic/odp_system_info.c:static int cpuinfo_octeon(FILE >> > *file, odp_system_info_t *sysinfo) >> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >> "octeon", >> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >> > cpuinfo_octeon >> > platform/linux-generic/include/api/odp_align.h:#if defined __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/include/api/odp_sync.h:#if defined __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/include/odp_spin_internal.h:#if defined >> __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || >> defined >> > __i386__ || defined __OCTEON__ || \ >> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || >> defined >> > __i386__ >> > platform/linux-generic/odp_system_info.c:static int cpuinfo_x86(FILE >> *file, >> > odp_system_info_t *sysinfo) >> > platform/linux-generic/odp_system_info.c: #if defined __x86_64__ >> || >> > defined __i386__ >> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "x86", >> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >> > cpuinfo_x86 >> > platform/linux-generic/odp_system_info.c:#if defined __x86_64__ || >> defined >> > __i386__ || defined __OCTEON__ || \ >> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || defined >> __i386__ >> > platform/linux-generic/include/api/odp_align.h:#elif defined __arm__ || >> > defined __aarch64__ >> > platform/linux-generic/include/api/odp_sync.h:#elif defined(__arm__) >> > platform/linux-generic/include/odp_spin_internal.h:#elif defined __arm__ >> > platform/linux-generic/odp_system_info.c:#elif defined __arm__ || >> defined >> > __aarch64__ >> > platform/linux-generic/odp_system_info.c:static int cpuinfo_arm(FILE >> *file >> > ODP_UNUSED, >> > platform/linux-generic/odp_system_info.c: #elif defined __arm__ || >> > defined __aarch64__ >> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = "arm", >> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >> > cpuinfo_arm >> > >> > >> >> >> >> >> >> Taras Kondratiuk >> > >> > >> > >> > >> > -- >> > Mike Holmes >> > Linaro Sr Technical Manager >> > LNG - ODP >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp >> >
On 12/23/2014 07:15 PM, Taras Kondratiuk wrote: > Linux-generic shouldn't have arch specific parts, because it is > *generic*. Instead it can fall back to strong __sync functions if > __atomic are not available. Taras I think that even linux-generic should have it's own requirements. It might be compiler version. Dependence on libssl. Kernel version and etc. We do not say that linux-generic should work on any linux, even 20 years old. We just define needed requirements to compile it. The same is with atomics. Instead of creating 20 branches for all gcc and not gcc version we just stick with more fresh gcc and say that it's minimal requirement to run linux-generic. Maxim.
On 12/23/2014 10:40 PM, Mike Holmes wrote: > > > On 23 December 2014 at 13:49, Bill Fischofer > <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote: > > That's good to know and certainly the code is cleaner that way. I > think we should adopt the practice that any architecture-specific > changes to linux-generic code (adds, updates, or removals) must > either originate from or have a Reviewed-by a representative of > that architecture (in this case, Bala). That way there's no > confusion as to whether the known experts agree with the change, > since these sort of things can have subtle implications that are > not obvious to the layman. > > > I think linux-generic should be devoid of any per core optimizations > in its default state. > There could be in an arch directory and it would be supported as you > say Bill by someone who will care for that that variant, this also > avoids filling linux-generic with a possibly large list of > preprocessor flags that we can't test. > > In this case we don't have public access to the octeon to even know if > this code works so separating or deletion feels right to me, although > Cavium are adding their board in this case. > To have optimizations we should understand which per arch optimization we need. I see that if newer gcc supports good optimizations then we need to use that gcc. Most of people use Open Embedded and Linaro gcc which is almost latest version. So if there is no optimization in standard environment (gcc, kernel, libc) then arch implementation should go to platform specific git variant. If it's covered by standard environment then we simple updating minimal version requirement for required component. Maxim. > > On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl > <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote: > > I had some discussions with Bala and we think the GCC > generates the > appropriate code for OCTEON so there should be no need for any > OCTEON-specific inline assembler in odp_atomic.h. I just > didn't want > to remove this piece of code when I redesigned the atomic support > because I didn't understand why there is a SYNCW instruction > *before* > the load. Load-acquire type of loads may need sync or barrier > instructions but they come after the load, not before. > > I think we should trust GCC to generate better code for OCTEON > than us > mere humans, at least until proven wrong. So please remove this > OCTEON-specific snippet. > > -- Ola > > > On 23 December 2014 at 18:01, Mike Holmes > <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote: > > > > > > On 23 December 2014 at 11:15, Taras Kondratiuk > <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@linaro.org>> > > wrote: > >> > >> On 12/23/2014 05:28 PM, Mike Holmes wrote: > >> > how will this work, Octeon does not care about > __atomic_fetch_add(&v, > >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I > assume > >> > > >> > static inline uint32_t > odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) > >> > { > >> > #if defined __OCTEON__ > >> >>-------uint32_t ret; > >> >>-------__asm__ __volatile__ ("syncws"); > >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), > "+m" (atom) : > >> >>------->------->------- "r" (atom)); > >> >>-------return ret; > >> > #else > >> >>-------return __atomic_fetch_add(&atom->v, 1, > __ATOMIC_RELAXED); > >> > #endif > >> > } > >> > >> Linux-generic shouldn't have arch specific parts, because it is > >> *generic*. Instead it can fall back to strong __sync > functions if > >> __atomic are not available. > >> > > > > Linux kernel is generic but has core specifics - is this not > the same ? > > > > What about these cases ? > > > > platform/linux-generic/odp_system_info.c:static int > cpuinfo_octeon(FILE > > *file, odp_system_info_t *sysinfo) > > platform/linux-generic/odp_system_info.c: .cpu_arch_str = > "octeon", > > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > > cpuinfo_octeon > > platform/linux-generic/include/api/odp_align.h:#if defined > __x86_64__ || > > defined __i386__ > > platform/linux-generic/include/api/odp_sync.h:#if defined > __x86_64__ || > > defined __i386__ > > platform/linux-generic/include/odp_spin_internal.h:#if > defined __x86_64__ || > > defined __i386__ > > platform/linux-generic/odp_system_info.c:#if defined > __x86_64__ || defined > > __i386__ || defined __OCTEON__ || \ > > platform/linux-generic/odp_system_info.c:#if defined > __x86_64__ || defined > > __i386__ > > platform/linux-generic/odp_system_info.c:static int > cpuinfo_x86(FILE *file, > > odp_system_info_t *sysinfo) > > platform/linux-generic/odp_system_info.c: #if defined > __x86_64__ || > > defined __i386__ > > platform/linux-generic/odp_system_info.c: .cpu_arch_str = > "x86", > > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > > cpuinfo_x86 > > platform/linux-generic/odp_system_info.c:#if defined > __x86_64__ || defined > > __i386__ || defined __OCTEON__ || \ > > platform/linux-generic/odp_time.c:#if defined __x86_64__ || > defined __i386__ > > platform/linux-generic/include/api/odp_align.h:#elif defined > __arm__ || > > defined __aarch64__ > > platform/linux-generic/include/api/odp_sync.h:#elif > defined(__arm__) > > platform/linux-generic/include/odp_spin_internal.h:#elif > defined __arm__ > > platform/linux-generic/odp_system_info.c:#elif defined > __arm__ || defined > > __aarch64__ > > platform/linux-generic/odp_system_info.c:static int > cpuinfo_arm(FILE *file > > ODP_UNUSED, > > platform/linux-generic/odp_system_info.c: #elif defined > __arm__ || > > defined __aarch64__ > > platform/linux-generic/odp_system_info.c: .cpu_arch_str = > "arm", > > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = > > cpuinfo_arm > > > > > >> > >> > >> Taras Kondratiuk > > > > > > > > > > -- > > Mike Holmes > > Linaro Sr Technical Manager > > LNG - ODP > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> > http://lists.linaro.org/mailman/listinfo/lng-odp > > > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP > > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp
On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org> wrote: > On 12/23/2014 10:40 PM, Mike Holmes wrote: > >> >> >> On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org >> <mailto:bill.fischofer@linaro.org>> wrote: >> >> That's good to know and certainly the code is cleaner that way. I >> think we should adopt the practice that any architecture-specific >> changes to linux-generic code (adds, updates, or removals) must >> either originate from or have a Reviewed-by a representative of >> that architecture (in this case, Bala). That way there's no >> confusion as to whether the known experts agree with the change, >> since these sort of things can have subtle implications that are >> not obvious to the layman. >> >> >> I think linux-generic should be devoid of any per core optimizations in >> its default state. >> There could be in an arch directory and it would be supported as you say >> Bill by someone who will care for that that variant, this also avoids >> filling linux-generic with a possibly large list of preprocessor flags that >> we can't test. >> >> In this case we don't have public access to the octeon to even know if >> this code works so separating or deletion feels right to me, although >> Cavium are adding their board in this case. >> >> To have optimizations we should understand which per arch optimization > we need. I see that if newer gcc supports good optimizations then we need > to use that gcc. Most of people use Open Embedded and Linaro gcc which is > almost latest version. So if there is no optimization in standard > environment (gcc, kernel, libc) then arch implementation should go to > platform specific git variant. If it's covered by standard environment then > we simple updating minimal version requirement for required component. > > Lets take an example ODP API file static inline void odp_sync_stores(void) { #if defined __x86_64__ || defined __i386__ >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); #elif defined(__arm__) #if __ARM_ARCH == 6 >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ >------->------->-------: : "r" (0) : "memory"); #elif __ARM_ARCH >= 7 || defined __aarch64__ >-------__asm__ __volatile__ ("dmb st" : : : "memory"); #else >-------__asm__ __volatile__ ("" : : : "memory"); #endif #elif defined __OCTEON__ >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); #else >-------__sync_synchronize(); #endif How do we accommodate this without resorting to architecture specific processor flags / assembly? I would prefer to see platform/linux-generic/include/api/odp_sync.h changed to platform/linux-generic/include/api/x86/odp_sync.h platform/linux-generic/include/api/arm/odp_sync.h platform/linux-generic/include/api/octeon/odp_sync.h This assumes that there is currently no pure c code that can be used that is acceptable on all three platforms. Or since linux-generic is not a performance platform but rather a reference whenever we need to make a trade off between simplicity and performance, we alter this to be a regular function and move to:- platform/linux-generic/include/api/odp_sync.h <- now a pure forward declaration of the API with doxygen documentation and implementations correctly broken down by arch arch/x86/odp_sync.h arch/arm/odp_sync.h arch/octeon/odp_sync.h > Maxim. > > > >> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >> <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote: >> >> I had some discussions with Bala and we think the GCC >> generates the >> appropriate code for OCTEON so there should be no need for any >> OCTEON-specific inline assembler in odp_atomic.h. I just >> didn't want >> to remove this piece of code when I redesigned the atomic support >> because I didn't understand why there is a SYNCW instruction >> *before* >> the load. Load-acquire type of loads may need sync or barrier >> instructions but they come after the load, not before. >> >> I think we should trust GCC to generate better code for OCTEON >> than us >> mere humans, at least until proven wrong. So please remove this >> OCTEON-specific snippet. >> >> -- Ola >> >> >> On 23 December 2014 at 18:01, Mike Holmes >> <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote: >> > >> > >> > On 23 December 2014 at 11:15, Taras Kondratiuk >> <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@linaro.org >> >> >> >> > wrote: >> >> >> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >> >> > how will this work, Octeon does not care about >> __atomic_fetch_add(&v, >> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I >> assume >> >> > >> >> > static inline uint32_t >> odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >> >> > { >> >> > #if defined __OCTEON__ >> >> >>-------uint32_t ret; >> >> >>-------__asm__ __volatile__ ("syncws"); >> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), >> "+m" (atom) : >> >> >>------->------->------- "r" (atom)); >> >> >>-------return ret; >> >> > #else >> >> >>-------return __atomic_fetch_add(&atom->v, 1, >> __ATOMIC_RELAXED); >> >> > #endif >> >> > } >> >> >> >> Linux-generic shouldn't have arch specific parts, because it is >> >> *generic*. Instead it can fall back to strong __sync >> functions if >> >> __atomic are not available. >> >> >> > >> > Linux kernel is generic but has core specifics - is this not >> the same ? >> > >> > What about these cases ? >> > >> > platform/linux-generic/odp_system_info.c:static int >> cpuinfo_octeon(FILE >> > *file, odp_system_info_t *sysinfo) >> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >> "octeon", >> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >> > cpuinfo_octeon >> > platform/linux-generic/include/api/odp_align.h:#if defined >> __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/include/api/odp_sync.h:#if defined >> __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/include/odp_spin_internal.h:#if >> defined __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/odp_system_info.c:#if defined >> __x86_64__ || defined >> > __i386__ || defined __OCTEON__ || \ >> > platform/linux-generic/odp_system_info.c:#if defined >> __x86_64__ || defined >> > __i386__ >> > platform/linux-generic/odp_system_info.c:static int >> cpuinfo_x86(FILE *file, >> > odp_system_info_t *sysinfo) >> > platform/linux-generic/odp_system_info.c: #if defined >> __x86_64__ || >> > defined __i386__ >> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >> "x86", >> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >> > cpuinfo_x86 >> > platform/linux-generic/odp_system_info.c:#if defined >> __x86_64__ || defined >> > __i386__ || defined __OCTEON__ || \ >> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || >> defined __i386__ >> > platform/linux-generic/include/api/odp_align.h:#elif defined >> __arm__ || >> > defined __aarch64__ >> > platform/linux-generic/include/api/odp_sync.h:#elif >> defined(__arm__) >> > platform/linux-generic/include/odp_spin_internal.h:#elif >> defined __arm__ >> > platform/linux-generic/odp_system_info.c:#elif defined >> __arm__ || defined >> > __aarch64__ >> > platform/linux-generic/odp_system_info.c:static int >> cpuinfo_arm(FILE *file >> > ODP_UNUSED, >> > platform/linux-generic/odp_system_info.c: #elif defined >> __arm__ || >> > defined __aarch64__ >> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >> "arm", >> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >> > cpuinfo_arm >> > >> > >> >> >> >> >> >> Taras Kondratiuk >> > >> > >> > >> > >> > -- >> > Mike Holmes >> > Linaro Sr Technical Manager >> > LNG - ODP >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >> http://lists.linaro.org/mailman/listinfo/lng-odp >> >> >> >> >> -- >> *Mike Holmes* >> Linaro Sr Technical Manager >> LNG - ODP >> >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp >> > > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp >
That makes sense. The only problem is GCC's problems with forward declarations and inlining. IMO a design issue with the compiler, but it is what it is. But we do need a general solution to this problem in 2015 since proper inlining support will be critical for production-grade implementations. On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org> wrote: > > > On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org> > wrote: > >> On 12/23/2014 10:40 PM, Mike Holmes wrote: >> >>> >>> >>> On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org >>> <mailto:bill.fischofer@linaro.org>> wrote: >>> >>> That's good to know and certainly the code is cleaner that way. I >>> think we should adopt the practice that any architecture-specific >>> changes to linux-generic code (adds, updates, or removals) must >>> either originate from or have a Reviewed-by a representative of >>> that architecture (in this case, Bala). That way there's no >>> confusion as to whether the known experts agree with the change, >>> since these sort of things can have subtle implications that are >>> not obvious to the layman. >>> >>> >>> I think linux-generic should be devoid of any per core optimizations in >>> its default state. >>> There could be in an arch directory and it would be supported as you say >>> Bill by someone who will care for that that variant, this also avoids >>> filling linux-generic with a possibly large list of preprocessor flags that >>> we can't test. >>> >>> In this case we don't have public access to the octeon to even know if >>> this code works so separating or deletion feels right to me, although >>> Cavium are adding their board in this case. >>> >>> To have optimizations we should understand which per arch optimization >> we need. I see that if newer gcc supports good optimizations then we need >> to use that gcc. Most of people use Open Embedded and Linaro gcc which is >> almost latest version. So if there is no optimization in standard >> environment (gcc, kernel, libc) then arch implementation should go to >> platform specific git variant. If it's covered by standard environment then >> we simple updating minimal version requirement for required component. >> >> > Lets take an example ODP API file > > static inline void odp_sync_stores(void) > { > #if defined __x86_64__ || defined __i386__ > > >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); > > #elif defined(__arm__) > #if __ARM_ARCH == 6 > >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ > >------->------->-------: : "r" (0) : "memory"); > #elif __ARM_ARCH >= 7 || defined __aarch64__ > > >-------__asm__ __volatile__ ("dmb st" : : : "memory"); > #else > >-------__asm__ __volatile__ ("" : : : "memory"); > #endif > > #elif defined __OCTEON__ > > >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); > > #else > >-------__sync_synchronize(); > #endif > > > How do we accommodate this without resorting to architecture specific > processor flags / assembly? > > I would prefer to see platform/linux-generic/include/api/odp_sync.h > changed to > > platform/linux-generic/include/api/x86/odp_sync.h > platform/linux-generic/include/api/arm/odp_sync.h > platform/linux-generic/include/api/octeon/odp_sync.h > > This assumes that there is currently no pure c code that can be used that > is acceptable on all three platforms. > > Or since linux-generic is not a performance platform but rather a > reference whenever we need to make a trade off between simplicity and > performance, we alter this to be a regular function and move to:- > > platform/linux-generic/include/api/odp_sync.h <- now a pure forward > declaration of the API with doxygen documentation > and implementations correctly broken down by arch > > arch/x86/odp_sync.h > arch/arm/odp_sync.h > arch/octeon/odp_sync.h > > > >> Maxim. >> >> >> >>> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >>> <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote: >>> >>> I had some discussions with Bala and we think the GCC >>> generates the >>> appropriate code for OCTEON so there should be no need for any >>> OCTEON-specific inline assembler in odp_atomic.h. I just >>> didn't want >>> to remove this piece of code when I redesigned the atomic support >>> because I didn't understand why there is a SYNCW instruction >>> *before* >>> the load. Load-acquire type of loads may need sync or barrier >>> instructions but they come after the load, not before. >>> >>> I think we should trust GCC to generate better code for OCTEON >>> than us >>> mere humans, at least until proven wrong. So please remove this >>> OCTEON-specific snippet. >>> >>> -- Ola >>> >>> >>> On 23 December 2014 at 18:01, Mike Holmes >>> <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote: >>> > >>> > >>> > On 23 December 2014 at 11:15, Taras Kondratiuk >>> <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@linaro.org >>> >> >>> >>> > wrote: >>> >> >>> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >>> >> > how will this work, Octeon does not care about >>> __atomic_fetch_add(&v, >>> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I >>> assume >>> >> > >>> >> > static inline uint32_t >>> odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >>> >> > { >>> >> > #if defined __OCTEON__ >>> >> >>-------uint32_t ret; >>> >> >>-------__asm__ __volatile__ ("syncws"); >>> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), >>> "+m" (atom) : >>> >> >>------->------->------- "r" (atom)); >>> >> >>-------return ret; >>> >> > #else >>> >> >>-------return __atomic_fetch_add(&atom->v, 1, >>> __ATOMIC_RELAXED); >>> >> > #endif >>> >> > } >>> >> >>> >> Linux-generic shouldn't have arch specific parts, because it >>> is >>> >> *generic*. Instead it can fall back to strong __sync >>> functions if >>> >> __atomic are not available. >>> >> >>> > >>> > Linux kernel is generic but has core specifics - is this not >>> the same ? >>> > >>> > What about these cases ? >>> > >>> > platform/linux-generic/odp_system_info.c:static int >>> cpuinfo_octeon(FILE >>> > *file, odp_system_info_t *sysinfo) >>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>> "octeon", >>> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >>> > cpuinfo_octeon >>> > platform/linux-generic/include/api/odp_align.h:#if defined >>> __x86_64__ || >>> > defined __i386__ >>> > platform/linux-generic/include/api/odp_sync.h:#if defined >>> __x86_64__ || >>> > defined __i386__ >>> > platform/linux-generic/include/odp_spin_internal.h:#if >>> defined __x86_64__ || >>> > defined __i386__ >>> > platform/linux-generic/odp_system_info.c:#if defined >>> __x86_64__ || defined >>> > __i386__ || defined __OCTEON__ || \ >>> > platform/linux-generic/odp_system_info.c:#if defined >>> __x86_64__ || defined >>> > __i386__ >>> > platform/linux-generic/odp_system_info.c:static int >>> cpuinfo_x86(FILE *file, >>> > odp_system_info_t *sysinfo) >>> > platform/linux-generic/odp_system_info.c: #if defined >>> __x86_64__ || >>> > defined __i386__ >>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>> "x86", >>> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >>> > cpuinfo_x86 >>> > platform/linux-generic/odp_system_info.c:#if defined >>> __x86_64__ || defined >>> > __i386__ || defined __OCTEON__ || \ >>> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || >>> defined __i386__ >>> > platform/linux-generic/include/api/odp_align.h:#elif defined >>> __arm__ || >>> > defined __aarch64__ >>> > platform/linux-generic/include/api/odp_sync.h:#elif >>> defined(__arm__) >>> > platform/linux-generic/include/odp_spin_internal.h:#elif >>> defined __arm__ >>> > platform/linux-generic/odp_system_info.c:#elif defined >>> __arm__ || defined >>> > __aarch64__ >>> > platform/linux-generic/odp_system_info.c:static int >>> cpuinfo_arm(FILE *file >>> > ODP_UNUSED, >>> > platform/linux-generic/odp_system_info.c: #elif defined >>> __arm__ || >>> > defined __aarch64__ >>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>> "arm", >>> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser = >>> > cpuinfo_arm >>> > >>> > >>> >> >>> >> >>> >> Taras Kondratiuk >>> > >>> > >>> > >>> > >>> > -- >>> > Mike Holmes >>> > Linaro Sr Technical Manager >>> > LNG - ODP >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >>> http://lists.linaro.org/mailman/listinfo/lng-odp >>> >>> >>> >>> >>> -- >>> *Mike Holmes* >>> Linaro Sr Technical Manager >>> LNG - ODP >>> >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> http://lists.linaro.org/mailman/listinfo/lng-odp >>> >> >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp >> > > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp > >
I suspect the long-term answer will be to use the autotools to construct the actual include hierarchy (and possibly the files themelves) needed for a given --with-platform specification. That would mean the canonical ODP API files may need to be meta-files that are used as input to ./configure to generate the actual ones used. On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer <bill.fischofer@linaro.org> wrote: > That makes sense. The only problem is GCC's problems with forward > declarations and inlining. IMO a design issue with the compiler, but it is > what it is. But we do need a general solution to this problem in 2015 > since proper inlining support will be critical for production-grade > implementations. > > On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org> > wrote: > >> >> >> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org> >> wrote: >> >>> On 12/23/2014 10:40 PM, Mike Holmes wrote: >>> >>>> >>>> >>>> On 23 December 2014 at 13:49, Bill Fischofer <bill.fischofer@linaro.org >>>> <mailto:bill.fischofer@linaro.org>> wrote: >>>> >>>> That's good to know and certainly the code is cleaner that way. I >>>> think we should adopt the practice that any architecture-specific >>>> changes to linux-generic code (adds, updates, or removals) must >>>> either originate from or have a Reviewed-by a representative of >>>> that architecture (in this case, Bala). That way there's no >>>> confusion as to whether the known experts agree with the change, >>>> since these sort of things can have subtle implications that are >>>> not obvious to the layman. >>>> >>>> >>>> I think linux-generic should be devoid of any per core optimizations in >>>> its default state. >>>> There could be in an arch directory and it would be supported as you >>>> say Bill by someone who will care for that that variant, this also avoids >>>> filling linux-generic with a possibly large list of preprocessor flags that >>>> we can't test. >>>> >>>> In this case we don't have public access to the octeon to even know if >>>> this code works so separating or deletion feels right to me, although >>>> Cavium are adding their board in this case. >>>> >>>> To have optimizations we should understand which per arch optimization >>> we need. I see that if newer gcc supports good optimizations then we need >>> to use that gcc. Most of people use Open Embedded and Linaro gcc which is >>> almost latest version. So if there is no optimization in standard >>> environment (gcc, kernel, libc) then arch implementation should go to >>> platform specific git variant. If it's covered by standard environment then >>> we simple updating minimal version requirement for required component. >>> >>> >> Lets take an example ODP API file >> >> static inline void odp_sync_stores(void) >> { >> #if defined __x86_64__ || defined __i386__ >> >> >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); >> >> #elif defined(__arm__) >> #if __ARM_ARCH == 6 >> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ >> >------->------->-------: : "r" (0) : "memory"); >> #elif __ARM_ARCH >= 7 || defined __aarch64__ >> >> >-------__asm__ __volatile__ ("dmb st" : : : "memory"); >> #else >> >-------__asm__ __volatile__ ("" : : : "memory"); >> #endif >> >> #elif defined __OCTEON__ >> >> >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); >> >> #else >> >-------__sync_synchronize(); >> #endif >> >> >> How do we accommodate this without resorting to architecture specific >> processor flags / assembly? >> >> I would prefer to see platform/linux-generic/include/api/odp_sync.h >> changed to >> >> platform/linux-generic/include/api/x86/odp_sync.h >> platform/linux-generic/include/api/arm/odp_sync.h >> platform/linux-generic/include/api/octeon/odp_sync.h >> >> This assumes that there is currently no pure c code that can be used that >> is acceptable on all three platforms. >> >> Or since linux-generic is not a performance platform but rather a >> reference whenever we need to make a trade off between simplicity and >> performance, we alter this to be a regular function and move to:- >> >> platform/linux-generic/include/api/odp_sync.h <- now a pure forward >> declaration of the API with doxygen documentation >> and implementations correctly broken down by arch >> >> arch/x86/odp_sync.h >> arch/arm/odp_sync.h >> arch/octeon/odp_sync.h >> >> >> >>> Maxim. >>> >>> >>> >>>> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >>>> <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> wrote: >>>> >>>> I had some discussions with Bala and we think the GCC >>>> generates the >>>> appropriate code for OCTEON so there should be no need for any >>>> OCTEON-specific inline assembler in odp_atomic.h. I just >>>> didn't want >>>> to remove this piece of code when I redesigned the atomic >>>> support >>>> because I didn't understand why there is a SYNCW instruction >>>> *before* >>>> the load. Load-acquire type of loads may need sync or barrier >>>> instructions but they come after the load, not before. >>>> >>>> I think we should trust GCC to generate better code for OCTEON >>>> than us >>>> mere humans, at least until proven wrong. So please remove this >>>> OCTEON-specific snippet. >>>> >>>> -- Ola >>>> >>>> >>>> On 23 December 2014 at 18:01, Mike Holmes >>>> <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote: >>>> > >>>> > >>>> > On 23 December 2014 at 11:15, Taras Kondratiuk >>>> <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@ >>>> linaro.org>> >>>> >>>> > wrote: >>>> >> >>>> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >>>> >> > how will this work, Octeon does not care about >>>> __atomic_fetch_add(&v, >>>> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I >>>> assume >>>> >> > >>>> >> > static inline uint32_t >>>> odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >>>> >> > { >>>> >> > #if defined __OCTEON__ >>>> >> >>-------uint32_t ret; >>>> >> >>-------__asm__ __volatile__ ("syncws"); >>>> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), >>>> "+m" (atom) : >>>> >> >>------->------->------- "r" (atom)); >>>> >> >>-------return ret; >>>> >> > #else >>>> >> >>-------return __atomic_fetch_add(&atom->v, 1, >>>> __ATOMIC_RELAXED); >>>> >> > #endif >>>> >> > } >>>> >> >>>> >> Linux-generic shouldn't have arch specific parts, because it >>>> is >>>> >> *generic*. Instead it can fall back to strong __sync >>>> functions if >>>> >> __atomic are not available. >>>> >> >>>> > >>>> > Linux kernel is generic but has core specifics - is this not >>>> the same ? >>>> > >>>> > What about these cases ? >>>> > >>>> > platform/linux-generic/odp_system_info.c:static int >>>> cpuinfo_octeon(FILE >>>> > *file, odp_system_info_t *sysinfo) >>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>>> "octeon", >>>> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser >>>> = >>>> > cpuinfo_octeon >>>> > platform/linux-generic/include/api/odp_align.h:#if defined >>>> __x86_64__ || >>>> > defined __i386__ >>>> > platform/linux-generic/include/api/odp_sync.h:#if defined >>>> __x86_64__ || >>>> > defined __i386__ >>>> > platform/linux-generic/include/odp_spin_internal.h:#if >>>> defined __x86_64__ || >>>> > defined __i386__ >>>> > platform/linux-generic/odp_system_info.c:#if defined >>>> __x86_64__ || defined >>>> > __i386__ || defined __OCTEON__ || \ >>>> > platform/linux-generic/odp_system_info.c:#if defined >>>> __x86_64__ || defined >>>> > __i386__ >>>> > platform/linux-generic/odp_system_info.c:static int >>>> cpuinfo_x86(FILE *file, >>>> > odp_system_info_t *sysinfo) >>>> > platform/linux-generic/odp_system_info.c: #if defined >>>> __x86_64__ || >>>> > defined __i386__ >>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>>> "x86", >>>> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser >>>> = >>>> > cpuinfo_x86 >>>> > platform/linux-generic/odp_system_info.c:#if defined >>>> __x86_64__ || defined >>>> > __i386__ || defined __OCTEON__ || \ >>>> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || >>>> defined __i386__ >>>> > platform/linux-generic/include/api/odp_align.h:#elif defined >>>> __arm__ || >>>> > defined __aarch64__ >>>> > platform/linux-generic/include/api/odp_sync.h:#elif >>>> defined(__arm__) >>>> > platform/linux-generic/include/odp_spin_internal.h:#elif >>>> defined __arm__ >>>> > platform/linux-generic/odp_system_info.c:#elif defined >>>> __arm__ || defined >>>> > __aarch64__ >>>> > platform/linux-generic/odp_system_info.c:static int >>>> cpuinfo_arm(FILE *file >>>> > ODP_UNUSED, >>>> > platform/linux-generic/odp_system_info.c: #elif defined >>>> __arm__ || >>>> > defined __aarch64__ >>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>>> "arm", >>>> > platform/linux-generic/odp_system_info.c: .cpuinfo_parser >>>> = >>>> > cpuinfo_arm >>>> > >>>> > >>>> >> >>>> >> >>>> >> Taras Kondratiuk >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > Mike Holmes >>>> > Linaro Sr Technical Manager >>>> > LNG - ODP >>>> >>>> _______________________________________________ >>>> lng-odp mailing list >>>> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>> >>>> >>>> >>>> >>>> -- >>>> *Mike Holmes* >>>> Linaro Sr Technical Manager >>>> LNG - ODP >>>> >>>> >>>> _______________________________________________ >>>> lng-odp mailing list >>>> lng-odp@lists.linaro.org >>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>> >>> >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> http://lists.linaro.org/mailman/listinfo/lng-odp >>> >> >> >> >> -- >> *Mike Holmes* >> Linaro Sr Technical Manager >> LNG - ODP >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp >> >> >
On 24 December 2014 at 10:05, Bill Fischofer <bill.fischofer@linaro.org> wrote: > I suspect the long-term answer will be to use the autotools to construct > the actual include hierarchy (and possibly the files themelves) needed for > a given --with-platform specification. > Let me check I have the same terminology in my head - we should have this defined in the arch doc but I did not look. platform = a physical board with interfaces etc and a core of some architecture as a complete solution arch = a cpu core for example octeon, arm, x86 In this case I then see a need to pick up arch independently from --with-platform so that the optimizations can be reused by all platforms with that architecture core. If we dont I would see a need for platform/linux-x86-generic platform/linux-arm-generic platform/linux-octeon-generic > That would mean the canonical ODP API files may need to be meta-files that > are used as input to ./configure to generate the actual ones used. > If the work Taras and Anders are doing to clean out the API and move it back to the root succeeds, and it looks good to me, then it opens the door to some more wriggle room in solving this issue I think. But I could see the configure step helping take a single API definition and attaching it to the required optimizations as a solution if it cannot be done within C / GCC > > On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer <bill.fischofer@linaro.org > > wrote: > >> That makes sense. The only problem is GCC's problems with forward >> declarations and inlining. IMO a design issue with the compiler, but it is >> what it is. But we do need a general solution to this problem in 2015 >> since proper inlining support will be critical for production-grade >> implementations. >> >> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org> >> wrote: >> >>> >>> >>> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org> >>> wrote: >>> >>>> On 12/23/2014 10:40 PM, Mike Holmes wrote: >>>> >>>>> >>>>> >>>>> On 23 December 2014 at 13:49, Bill Fischofer < >>>>> bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote: >>>>> >>>>> That's good to know and certainly the code is cleaner that way. I >>>>> think we should adopt the practice that any architecture-specific >>>>> changes to linux-generic code (adds, updates, or removals) must >>>>> either originate from or have a Reviewed-by a representative of >>>>> that architecture (in this case, Bala). That way there's no >>>>> confusion as to whether the known experts agree with the change, >>>>> since these sort of things can have subtle implications that are >>>>> not obvious to the layman. >>>>> >>>>> >>>>> I think linux-generic should be devoid of any per core optimizations >>>>> in its default state. >>>>> There could be in an arch directory and it would be supported as you >>>>> say Bill by someone who will care for that that variant, this also avoids >>>>> filling linux-generic with a possibly large list of preprocessor flags that >>>>> we can't test. >>>>> >>>>> In this case we don't have public access to the octeon to even know if >>>>> this code works so separating or deletion feels right to me, although >>>>> Cavium are adding their board in this case. >>>>> >>>>> To have optimizations we should understand which per arch >>>> optimization we need. I see that if newer gcc supports good optimizations >>>> then we need to use that gcc. Most of people use Open Embedded and Linaro >>>> gcc which is almost latest version. So if there is no optimization in >>>> standard environment (gcc, kernel, libc) then arch implementation should go >>>> to platform specific git variant. If it's covered by standard environment >>>> then we simple updating minimal version requirement for required component. >>>> >>>> >>> Lets take an example ODP API file >>> >>> static inline void odp_sync_stores(void) >>> { >>> #if defined __x86_64__ || defined __i386__ >>> >>> >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); >>> >>> #elif defined(__arm__) >>> #if __ARM_ARCH == 6 >>> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ >>> >------->------->-------: : "r" (0) : "memory"); >>> #elif __ARM_ARCH >= 7 || defined __aarch64__ >>> >>> >-------__asm__ __volatile__ ("dmb st" : : : "memory"); >>> #else >>> >-------__asm__ __volatile__ ("" : : : "memory"); >>> #endif >>> >>> #elif defined __OCTEON__ >>> >>> >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); >>> >>> #else >>> >-------__sync_synchronize(); >>> #endif >>> >>> >>> How do we accommodate this without resorting to architecture specific >>> processor flags / assembly? >>> >>> I would prefer to see platform/linux-generic/include/api/odp_sync.h >>> changed to >>> >>> platform/linux-generic/include/api/x86/odp_sync.h >>> platform/linux-generic/include/api/arm/odp_sync.h >>> platform/linux-generic/include/api/octeon/odp_sync.h >>> >>> This assumes that there is currently no pure c code that can be used >>> that is acceptable on all three platforms. >>> >>> Or since linux-generic is not a performance platform but rather a >>> reference whenever we need to make a trade off between simplicity and >>> performance, we alter this to be a regular function and move to:- >>> >>> platform/linux-generic/include/api/odp_sync.h <- now a pure forward >>> declaration of the API with doxygen documentation >>> and implementations correctly broken down by arch >>> >>> arch/x86/odp_sync.h >>> arch/arm/odp_sync.h >>> arch/octeon/odp_sync.h >>> >>> >>> >>>> Maxim. >>>> >>>> >>>> >>>>> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >>>>> <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> >>>>> wrote: >>>>> >>>>> I had some discussions with Bala and we think the GCC >>>>> generates the >>>>> appropriate code for OCTEON so there should be no need for any >>>>> OCTEON-specific inline assembler in odp_atomic.h. I just >>>>> didn't want >>>>> to remove this piece of code when I redesigned the atomic >>>>> support >>>>> because I didn't understand why there is a SYNCW instruction >>>>> *before* >>>>> the load. Load-acquire type of loads may need sync or barrier >>>>> instructions but they come after the load, not before. >>>>> >>>>> I think we should trust GCC to generate better code for OCTEON >>>>> than us >>>>> mere humans, at least until proven wrong. So please remove this >>>>> OCTEON-specific snippet. >>>>> >>>>> -- Ola >>>>> >>>>> >>>>> On 23 December 2014 at 18:01, Mike Holmes >>>>> <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> >>>>> wrote: >>>>> > >>>>> > >>>>> > On 23 December 2014 at 11:15, Taras Kondratiuk >>>>> <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@ >>>>> linaro.org>> >>>>> >>>>> > wrote: >>>>> >> >>>>> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >>>>> >> > how will this work, Octeon does not care about >>>>> __atomic_fetch_add(&v, >>>>> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I >>>>> assume >>>>> >> > >>>>> >> > static inline uint32_t >>>>> odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >>>>> >> > { >>>>> >> > #if defined __OCTEON__ >>>>> >> >>-------uint32_t ret; >>>>> >> >>-------__asm__ __volatile__ ("syncws"); >>>>> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), >>>>> "+m" (atom) : >>>>> >> >>------->------->------- "r" (atom)); >>>>> >> >>-------return ret; >>>>> >> > #else >>>>> >> >>-------return __atomic_fetch_add(&atom->v, 1, >>>>> __ATOMIC_RELAXED); >>>>> >> > #endif >>>>> >> > } >>>>> >> >>>>> >> Linux-generic shouldn't have arch specific parts, because >>>>> it is >>>>> >> *generic*. Instead it can fall back to strong __sync >>>>> functions if >>>>> >> __atomic are not available. >>>>> >> >>>>> > >>>>> > Linux kernel is generic but has core specifics - is this not >>>>> the same ? >>>>> > >>>>> > What about these cases ? >>>>> > >>>>> > platform/linux-generic/odp_system_info.c:static int >>>>> cpuinfo_octeon(FILE >>>>> > *file, odp_system_info_t *sysinfo) >>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>>>> "octeon", >>>>> > platform/linux-generic/odp_system_info.c: >>>>> .cpuinfo_parser = >>>>> > cpuinfo_octeon >>>>> > platform/linux-generic/include/api/odp_align.h:#if defined >>>>> __x86_64__ || >>>>> > defined __i386__ >>>>> > platform/linux-generic/include/api/odp_sync.h:#if defined >>>>> __x86_64__ || >>>>> > defined __i386__ >>>>> > platform/linux-generic/include/odp_spin_internal.h:#if >>>>> defined __x86_64__ || >>>>> > defined __i386__ >>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>> __x86_64__ || defined >>>>> > __i386__ || defined __OCTEON__ || \ >>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>> __x86_64__ || defined >>>>> > __i386__ >>>>> > platform/linux-generic/odp_system_info.c:static int >>>>> cpuinfo_x86(FILE *file, >>>>> > odp_system_info_t *sysinfo) >>>>> > platform/linux-generic/odp_system_info.c: #if defined >>>>> __x86_64__ || >>>>> > defined __i386__ >>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>>>> "x86", >>>>> > platform/linux-generic/odp_system_info.c: >>>>> .cpuinfo_parser = >>>>> > cpuinfo_x86 >>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>> __x86_64__ || defined >>>>> > __i386__ || defined __OCTEON__ || \ >>>>> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || >>>>> defined __i386__ >>>>> > platform/linux-generic/include/api/odp_align.h:#elif defined >>>>> __arm__ || >>>>> > defined __aarch64__ >>>>> > platform/linux-generic/include/api/odp_sync.h:#elif >>>>> defined(__arm__) >>>>> > platform/linux-generic/include/odp_spin_internal.h:#elif >>>>> defined __arm__ >>>>> > platform/linux-generic/odp_system_info.c:#elif defined >>>>> __arm__ || defined >>>>> > __aarch64__ >>>>> > platform/linux-generic/odp_system_info.c:static int >>>>> cpuinfo_arm(FILE *file >>>>> > ODP_UNUSED, >>>>> > platform/linux-generic/odp_system_info.c: #elif defined >>>>> __arm__ || >>>>> > defined __aarch64__ >>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str = >>>>> "arm", >>>>> > platform/linux-generic/odp_system_info.c: >>>>> .cpuinfo_parser = >>>>> > cpuinfo_arm >>>>> > >>>>> > >>>>> >> >>>>> >> >>>>> >> Taras Kondratiuk >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Mike Holmes >>>>> > Linaro Sr Technical Manager >>>>> > LNG - ODP >>>>> >>>>> _______________________________________________ >>>>> lng-odp mailing list >>>>> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Mike Holmes* >>>>> Linaro Sr Technical Manager >>>>> LNG - ODP >>>>> >>>>> >>>>> _______________________________________________ >>>>> lng-odp mailing list >>>>> lng-odp@lists.linaro.org >>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>> >>>> >>>> >>>> _______________________________________________ >>>> lng-odp mailing list >>>> lng-odp@lists.linaro.org >>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>> >>> >>> >>> >>> -- >>> *Mike Holmes* >>> Linaro Sr Technical Manager >>> LNG - ODP >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> http://lists.linaro.org/mailman/listinfo/lng-odp >>> >>> >> >
Getting the ODP API files out of platform/include/api is the first step. But based on earlier attempts to deal with inlining Taras kept running into issues because GCC insists that functions to be inlined must have inline in their forward declarations. There's no simple way to do that without a lot of macro ugliness, but I suspect autotools could help with that. Each ODP API specified in a common header may need to have a platform-specific inline specification attached to it, depending on what the implementation needs. If this were an all-or-nothing thing we could do that with a simple macro that gets set in odp_platform_types.h (or a similar file). Call it ODP_API and it's either set to null or to 'static inline'. Then the common ODP API files would say: ODP_API <rest of function prototype> for example, instead of today saying void *odp_buffer_addr(odp_buffer_t buf); the canonical form would be written as ODP_API void *odp_buffer_addr(odp_buffer_t buf); and we'd get the needed static inline (or not) depending on how the implementation is written. But it's not that simple because each API is independently inlineable, so a single macro won't do. We could have a separate macro for each API, but that gets unwieldy very quickly. Instead, the API file could contain some common symbols that get macro-expanded based on ./configure options so that what gets generated into platform/include/api is what's needed for that platform/arch/etc. This might also be a way to easily capture platform-specific doxygen since if the API files are being custom generated then they can have specific documentation attached to them as well. That might be a cleaner way of addressing Petri's earlier concern about odp_config.h containing linux-generic stuff mixed in with general ODP documentation. On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <mike.holmes@linaro.org> wrote: > > > On 24 December 2014 at 10:05, Bill Fischofer <bill.fischofer@linaro.org> > wrote: > >> I suspect the long-term answer will be to use the autotools to construct >> the actual include hierarchy (and possibly the files themelves) needed for >> a given --with-platform specification. >> > > Let me check I have the same terminology in my head - we should have this > defined in the arch doc but I did not look. > platform = a physical board with interfaces etc and a core of some > architecture as a complete solution > arch = a cpu core for example octeon, arm, x86 > > In this case I then see a need to pick up arch independently > from --with-platform so that the optimizations can be reused by all > platforms with that architecture core. If we dont I would see a need for > platform/linux-x86-generic > platform/linux-arm-generic > platform/linux-octeon-generic > > >> That would mean the canonical ODP API files may need to be meta-files >> that are used as input to ./configure to generate the actual ones used. >> > > If the work Taras and Anders are doing to clean out the API and move it > back to the root succeeds, and it looks good to me, then it opens the door > to some more wriggle room in solving this issue I think. > But I could see the configure step helping take a single API definition > and attaching it to the required optimizations as a solution if it cannot > be done within C / GCC > > > > >> >> On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer < >> bill.fischofer@linaro.org> wrote: >> >>> That makes sense. The only problem is GCC's problems with forward >>> declarations and inlining. IMO a design issue with the compiler, but it is >>> what it is. But we do need a general solution to this problem in 2015 >>> since proper inlining support will be critical for production-grade >>> implementations. >>> >>> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes <mike.holmes@linaro.org> >>> wrote: >>> >>>> >>>> >>>> On 24 December 2014 at 06:19, Maxim Uvarov <maxim.uvarov@linaro.org> >>>> wrote: >>>> >>>>> On 12/23/2014 10:40 PM, Mike Holmes wrote: >>>>> >>>>>> >>>>>> >>>>>> On 23 December 2014 at 13:49, Bill Fischofer < >>>>>> bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote: >>>>>> >>>>>> That's good to know and certainly the code is cleaner that way. I >>>>>> think we should adopt the practice that any architecture-specific >>>>>> changes to linux-generic code (adds, updates, or removals) must >>>>>> either originate from or have a Reviewed-by a representative of >>>>>> that architecture (in this case, Bala). That way there's no >>>>>> confusion as to whether the known experts agree with the change, >>>>>> since these sort of things can have subtle implications that are >>>>>> not obvious to the layman. >>>>>> >>>>>> >>>>>> I think linux-generic should be devoid of any per core optimizations >>>>>> in its default state. >>>>>> There could be in an arch directory and it would be supported as you >>>>>> say Bill by someone who will care for that that variant, this also avoids >>>>>> filling linux-generic with a possibly large list of preprocessor flags that >>>>>> we can't test. >>>>>> >>>>>> In this case we don't have public access to the octeon to even know >>>>>> if this code works so separating or deletion feels right to me, although >>>>>> Cavium are adding their board in this case. >>>>>> >>>>>> To have optimizations we should understand which per arch >>>>> optimization we need. I see that if newer gcc supports good optimizations >>>>> then we need to use that gcc. Most of people use Open Embedded and Linaro >>>>> gcc which is almost latest version. So if there is no optimization in >>>>> standard environment (gcc, kernel, libc) then arch implementation should go >>>>> to platform specific git variant. If it's covered by standard environment >>>>> then we simple updating minimal version requirement for required component. >>>>> >>>>> >>>> Lets take an example ODP API file >>>> >>>> static inline void odp_sync_stores(void) >>>> { >>>> #if defined __x86_64__ || defined __i386__ >>>> >>>> >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); >>>> >>>> #elif defined(__arm__) >>>> #if __ARM_ARCH == 6 >>>> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ >>>> >------->------->-------: : "r" (0) : "memory"); >>>> #elif __ARM_ARCH >= 7 || defined __aarch64__ >>>> >>>> >-------__asm__ __volatile__ ("dmb st" : : : "memory"); >>>> #else >>>> >-------__asm__ __volatile__ ("" : : : "memory"); >>>> #endif >>>> >>>> #elif defined __OCTEON__ >>>> >>>> >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); >>>> >>>> #else >>>> >-------__sync_synchronize(); >>>> #endif >>>> >>>> >>>> How do we accommodate this without resorting to architecture specific >>>> processor flags / assembly? >>>> >>>> I would prefer to see platform/linux-generic/include/api/odp_sync.h >>>> changed to >>>> >>>> platform/linux-generic/include/api/x86/odp_sync.h >>>> platform/linux-generic/include/api/arm/odp_sync.h >>>> platform/linux-generic/include/api/octeon/odp_sync.h >>>> >>>> This assumes that there is currently no pure c code that can be used >>>> that is acceptable on all three platforms. >>>> >>>> Or since linux-generic is not a performance platform but rather a >>>> reference whenever we need to make a trade off between simplicity and >>>> performance, we alter this to be a regular function and move to:- >>>> >>>> platform/linux-generic/include/api/odp_sync.h <- now a pure forward >>>> declaration of the API with doxygen documentation >>>> and implementations correctly broken down by arch >>>> >>>> arch/x86/odp_sync.h >>>> arch/arm/odp_sync.h >>>> arch/octeon/odp_sync.h >>>> >>>> >>>> >>>>> Maxim. >>>>> >>>>> >>>>> >>>>>> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >>>>>> <ola.liljedahl@linaro.org <mailto:ola.liljedahl@linaro.org>> >>>>>> wrote: >>>>>> >>>>>> I had some discussions with Bala and we think the GCC >>>>>> generates the >>>>>> appropriate code for OCTEON so there should be no need for any >>>>>> OCTEON-specific inline assembler in odp_atomic.h. I just >>>>>> didn't want >>>>>> to remove this piece of code when I redesigned the atomic >>>>>> support >>>>>> because I didn't understand why there is a SYNCW instruction >>>>>> *before* >>>>>> the load. Load-acquire type of loads may need sync or barrier >>>>>> instructions but they come after the load, not before. >>>>>> >>>>>> I think we should trust GCC to generate better code for OCTEON >>>>>> than us >>>>>> mere humans, at least until proven wrong. So please remove >>>>>> this >>>>>> OCTEON-specific snippet. >>>>>> >>>>>> -- Ola >>>>>> >>>>>> >>>>>> On 23 December 2014 at 18:01, Mike Holmes >>>>>> <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> >>>>>> wrote: >>>>>> > >>>>>> > >>>>>> > On 23 December 2014 at 11:15, Taras Kondratiuk >>>>>> <taras.kondratiuk@linaro.org <mailto:taras.kondratiuk@ >>>>>> linaro.org>> >>>>>> >>>>>> > wrote: >>>>>> >> >>>>>> >> On 12/23/2014 05:28 PM, Mike Holmes wrote: >>>>>> >> > how will this work, Octeon does not care about >>>>>> __atomic_fetch_add(&v, >>>>>> >> > 1, __ATOMIC_RELAXED); and yet you will fail configure I >>>>>> assume >>>>>> >> > >>>>>> >> > static inline uint32_t >>>>>> odp_atomic_fetch_inc_u32(odp_atomic_u32_t *atom) >>>>>> >> > { >>>>>> >> > #if defined __OCTEON__ >>>>>> >> >>-------uint32_t ret; >>>>>> >> >>-------__asm__ __volatile__ ("syncws"); >>>>>> >> >>-------__asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), >>>>>> "+m" (atom) : >>>>>> >> >>------->------->------- "r" (atom)); >>>>>> >> >>-------return ret; >>>>>> >> > #else >>>>>> >> >>-------return __atomic_fetch_add(&atom->v, 1, >>>>>> __ATOMIC_RELAXED); >>>>>> >> > #endif >>>>>> >> > } >>>>>> >> >>>>>> >> Linux-generic shouldn't have arch specific parts, because >>>>>> it is >>>>>> >> *generic*. Instead it can fall back to strong __sync >>>>>> functions if >>>>>> >> __atomic are not available. >>>>>> >> >>>>>> > >>>>>> > Linux kernel is generic but has core specifics - is this not >>>>>> the same ? >>>>>> > >>>>>> > What about these cases ? >>>>>> > >>>>>> > platform/linux-generic/odp_system_info.c:static int >>>>>> cpuinfo_octeon(FILE >>>>>> > *file, odp_system_info_t *sysinfo) >>>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str >>>>>> = >>>>>> "octeon", >>>>>> > platform/linux-generic/odp_system_info.c: >>>>>> .cpuinfo_parser = >>>>>> > cpuinfo_octeon >>>>>> > platform/linux-generic/include/api/odp_align.h:#if defined >>>>>> __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/include/api/odp_sync.h:#if defined >>>>>> __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/include/odp_spin_internal.h:#if >>>>>> defined __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>>> __x86_64__ || defined >>>>>> > __i386__ || defined __OCTEON__ || \ >>>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>>> __x86_64__ || defined >>>>>> > __i386__ >>>>>> > platform/linux-generic/odp_system_info.c:static int >>>>>> cpuinfo_x86(FILE *file, >>>>>> > odp_system_info_t *sysinfo) >>>>>> > platform/linux-generic/odp_system_info.c: #if defined >>>>>> __x86_64__ || >>>>>> > defined __i386__ >>>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str >>>>>> = >>>>>> "x86", >>>>>> > platform/linux-generic/odp_system_info.c: >>>>>> .cpuinfo_parser = >>>>>> > cpuinfo_x86 >>>>>> > platform/linux-generic/odp_system_info.c:#if defined >>>>>> __x86_64__ || defined >>>>>> > __i386__ || defined __OCTEON__ || \ >>>>>> > platform/linux-generic/odp_time.c:#if defined __x86_64__ || >>>>>> defined __i386__ >>>>>> > platform/linux-generic/include/api/odp_align.h:#elif >>>>>> defined >>>>>> __arm__ || >>>>>> > defined __aarch64__ >>>>>> > platform/linux-generic/include/api/odp_sync.h:#elif >>>>>> defined(__arm__) >>>>>> > platform/linux-generic/include/odp_spin_internal.h:#elif >>>>>> defined __arm__ >>>>>> > platform/linux-generic/odp_system_info.c:#elif defined >>>>>> __arm__ || defined >>>>>> > __aarch64__ >>>>>> > platform/linux-generic/odp_system_info.c:static int >>>>>> cpuinfo_arm(FILE *file >>>>>> > ODP_UNUSED, >>>>>> > platform/linux-generic/odp_system_info.c: #elif defined >>>>>> __arm__ || >>>>>> > defined __aarch64__ >>>>>> > platform/linux-generic/odp_system_info.c: .cpu_arch_str >>>>>> = >>>>>> "arm", >>>>>> > platform/linux-generic/odp_system_info.c: >>>>>> .cpuinfo_parser = >>>>>> > cpuinfo_arm >>>>>> > >>>>>> > >>>>>> >> >>>>>> >> >>>>>> >> Taras Kondratiuk >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Mike Holmes >>>>>> > Linaro Sr Technical Manager >>>>>> > LNG - ODP >>>>>> >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Mike Holmes* >>>>>> Linaro Sr Technical Manager >>>>>> LNG - ODP >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> lng-odp@lists.linaro.org >>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> lng-odp mailing list >>>>> lng-odp@lists.linaro.org >>>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>>> >>>> >>>> >>>> >>>> -- >>>> *Mike Holmes* >>>> Linaro Sr Technical Manager >>>> LNG - ODP >>>> >>>> _______________________________________________ >>>> lng-odp mailing list >>>> lng-odp@lists.linaro.org >>>> http://lists.linaro.org/mailman/listinfo/lng-odp >>>> >>>> >>> >> > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP >
There is no need to make it so complex. At least until 'static' before 'not static' functions is forbidden by some post-C11 standard). Me and Anders are making another try to separate clean API. In a previous try common (clean) API headers were public. They had <plat/...> include hooks to pull platform specific definitions. That appeared to be not flexible enough. Current approach is to make common API headers private in a sense that application won't include them directly, but instead implementation will provide public headers and embedded clean API headers there. Please check [1] for an example of this approach tried for a few headers (atomic, byteorder). It seems to be working fine for now. I'm reworking Keystone implementation based on this approach. [1] https://git.linaro.org/people/taras.kondratiuk/odp.git/shortlog/refs/heads/api/change_namespace On 12/24/2014 05:56 PM, Bill Fischofer wrote: > Getting the ODP API files out of platform/include/api is the first > step. But based on earlier attempts to deal with inlining Taras kept > running into issues because GCC insists that functions to be inlined > must have inline in their forward declarations. There's no simple way > to do that without a lot of macro ugliness, but I suspect autotools > could help with that. > > Each ODP API specified in a common header may need to have a > platform-specific inline specification attached to it, depending on what > the implementation needs. If this were an all-or-nothing thing we could > do that with a simple macro that gets set in odp_platform_types.h (or a > similar file). Call it ODP_API and it's either set to null or to > 'static inline'. > > Then the common ODP API files would say: > > ODP_API <rest of function prototype> > > for example, instead of today saying > > void *odp_buffer_addr(odp_buffer_t buf); > > the canonical form would be written as > > ODP_API void *odp_buffer_addr(odp_buffer_t buf); > > and we'd get the needed static inline (or not) depending on how the > implementation is written. But it's not that simple because each API is > independently inlineable, so a single macro won't do. We could have a > separate macro for each API, but that gets unwieldy very quickly. > > Instead, the API file could contain some common symbols that get > macro-expanded based on ./configure options so that what gets generated > into platform/include/api is what's needed for that platform/arch/etc. > > This might also be a way to easily capture platform-specific doxygen > since if the API files are being custom generated then they can have > specific documentation attached to them as well. That might be a > cleaner way of addressing Petri's earlier concern about odp_config.h > containing linux-generic stuff mixed in with general ODP documentation. > > On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <mike.holmes@linaro.org > <mailto:mike.holmes@linaro.org>> wrote: > > > > On 24 December 2014 at 10:05, Bill Fischofer > <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote: > > I suspect the long-term answer will be to use the autotools to > construct the actual include hierarchy (and possibly the files > themelves) needed for a given --with-platform specification. > > > Let me check I have the same terminology in my head - we should have > this defined in the arch doc but I did not look. > platform = a physical board with interfaces etc and a core of some > architecture as a complete solution > arch = a cpu core for example octeon, arm, x86 > > In this case I then see a need to pick up arch independently > from --with-platform so that the optimizations can be reused by all > platforms with that architecture core. If we dont I would see a need > for > platform/linux-x86-generic > platform/linux-arm-generic > platform/linux-octeon-generic > > > That would mean the canonical ODP API files may need to be > meta-files that are used as input to ./configure to generate the > actual ones used. > > > If the work Taras and Anders are doing to clean out the API and move > it back to the root succeeds, and it looks good to me, then it opens > the door to some more wriggle room in solving this issue I think. > But I could see the configure step helping take a single API > definition and attaching it to the required optimizations as a > solution if it cannot be done within C / GCC > > > > > > On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer > <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> > wrote: > > That makes sense. The only problem is GCC's problems with > forward declarations and inlining. IMO a design issue with > the compiler, but it is what it is. But we do need a > general solution to this problem in 2015 since proper > inlining support will be critical for production-grade > implementations. > > On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes > <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote: > > > > On 24 December 2014 at 06:19, Maxim Uvarov > <maxim.uvarov@linaro.org > <mailto:maxim.uvarov@linaro.org>> wrote: > > On 12/23/2014 10:40 PM, Mike Holmes wrote: > > > > On 23 December 2014 at 13:49, Bill Fischofer > <bill.fischofer@linaro.org > <mailto:bill.fischofer@linaro.org> > <mailto:bill.fischofer@linaro.__org > <mailto:bill.fischofer@linaro.org>>> wrote: > > That's good to know and certainly the code > is cleaner that way. I > think we should adopt the practice that any > architecture-specific > changes to linux-generic code (adds, > updates, or removals) must > either originate from or have a Reviewed-by > a representative of > that architecture (in this case, Bala). > That way there's no > confusion as to whether the known experts > agree with the change, > since these sort of things can have subtle > implications that are > not obvious to the layman. > > > I think linux-generic should be devoid of any > per core optimizations in its default state. > There could be in an arch directory and it would > be supported as you say Bill by someone who will > care for that that variant, this also avoids > filling linux-generic with a possibly large list > of preprocessor flags that we can't test. > > In this case we don't have public access to the > octeon to even know if this code works so > separating or deletion feels right to me, > although Cavium are adding their board in this case. > > To have optimizations we should understand which per > arch optimization we need. I see that if newer gcc > supports good optimizations then we need to use that > gcc. Most of people use Open Embedded and Linaro gcc > which is almost latest version. So if there is no > optimization in standard environment (gcc, kernel, > libc) then arch implementation should go to platform > specific git variant. If it's covered by standard > environment then we simple updating minimal version > requirement for required component. > > > Lets take an example ODP API file > > static inline void odp_sync_stores(void) > { > #if defined __x86_64__ || defined __i386__ > > >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); > > #elif defined(__arm__) > #if __ARM_ARCH == 6 > >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ > >------->------->-------: : "r" (0) : "memory"); > #elif __ARM_ARCH >= 7 || defined __aarch64__ > > >-------__asm__ __volatile__ ("dmb st" : : : "memory"); > #else > >-------__asm__ __volatile__ ("" : : : "memory"); > #endif > > #elif defined __OCTEON__ > > >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); > > #else > >-------__sync_synchronize(); > #endif > > > How do we accommodate this without resorting to > architecture specific processor flags / assembly? > > I would prefer to > see platform/linux-generic/include/api/odp_sync.h changed to > > platform/linux-generic/include/api/x86/odp_sync.h > platform/linux-generic/include/api/arm/odp_sync.h > platform/linux-generic/include/api/octeon/odp_sync.h > > This assumes that there is currently no pure c code that > can be used that is acceptable on all three platforms. > > Or since linux-generic is not a performance platform but > rather a reference whenever we need to make a trade off > between simplicity and performance, we alter this to be > a regular function and move to:- > > platform/linux-generic/include/api/odp_sync.h <- now a > pure forward declaration of the API with doxygen > documentation > and implementations correctly broken down by arch > > arch/x86/odp_sync.h > arch/arm/odp_sync.h > arch/octeon/odp_sync.h > > > > Maxim. > > > > On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl > <ola.liljedahl@linaro.org > <mailto:ola.liljedahl@linaro.org> > <mailto:ola.liljedahl@linaro.__org > <mailto:ola.liljedahl@linaro.org>>> wrote: > > I had some discussions with Bala and we > think the GCC > generates the > appropriate code for OCTEON so there > should be no need for any > OCTEON-specific inline assembler in > odp_atomic.h. I just > didn't want > to remove this piece of code when I > redesigned the atomic support > because I didn't understand why there is > a SYNCW instruction > *before* > the load. Load-acquire type of loads may > need sync or barrier > instructions but they come after the > load, not before. > > I think we should trust GCC to generate > better code for OCTEON > than us > mere humans, at least until proven > wrong. So please remove this > OCTEON-specific snippet. > > -- Ola > > > On 23 December 2014 at 18:01, Mike Holmes > <mike.holmes@linaro.org > <mailto:mike.holmes@linaro.org> > <mailto:mike.holmes@linaro.org > <mailto:mike.holmes@linaro.org>__>> wrote: > > > > > > On 23 December 2014 at 11:15, Taras > Kondratiuk > <taras.kondratiuk@linaro.org > <mailto:taras.kondratiuk@linaro.org> > <mailto:taras.kondratiuk@__linaro.org > <mailto:taras.kondratiuk@linaro.org>>> > > > wrote: > >> > >> On 12/23/2014 05:28 PM, Mike Holmes > wrote: > >> > how will this work, Octeon does not > care about > __atomic_fetch_add(&v, > >> > 1, __ATOMIC_RELAXED); and yet you > will fail configure I > assume > >> > > >> > static inline uint32_t > > odp_atomic_fetch_inc_u32(odp___atomic_u32_t *atom) > >> > { > >> > #if defined __OCTEON__ > >> >>-------uint32_t ret; > >> >>-------__asm__ __volatile__ ("syncws"); > >> >>-------__asm__ __volatile__ ("lai > %0,(%2)" : "=r" (ret), > "+m" (atom) : > >> >>------->------->------- "r" (atom)); > >> >>-------return ret; > >> > #else > >> >>-------return > __atomic_fetch_add(&atom->v, 1, > __ATOMIC_RELAXED); > >> > #endif > >> > } > >> > >> Linux-generic shouldn't have arch > specific parts, because it is > >> *generic*. Instead it can fall back > to strong __sync > functions if > >> __atomic are not available. > >> > > > > Linux kernel is generic but has core > specifics - is this not > the same ? > > > > What about these cases ? > > > > > platform/linux-generic/odp___system_info.c:static int > cpuinfo_octeon(FILE > > *file, odp_system_info_t *sysinfo) > > > platform/linux-generic/odp___system_info.c: > .cpu_arch_str = > "octeon", > > > platform/linux-generic/odp___system_info.c: > .cpuinfo_parser = > > cpuinfo_octeon > > > platform/linux-generic/__include/api/odp_align.h:#if > defined > __x86_64__ || > > defined __i386__ > > > platform/linux-generic/__include/api/odp_sync.h:#if > defined > __x86_64__ || > > defined __i386__ > > > platform/linux-generic/__include/odp_spin_internal.h:#__if > defined __x86_64__ || > > defined __i386__ > > > platform/linux-generic/odp___system_info.c:#if > defined > __x86_64__ || defined > > __i386__ || defined __OCTEON__ || \ > > > platform/linux-generic/odp___system_info.c:#if > defined > __x86_64__ || defined > > __i386__ > > > platform/linux-generic/odp___system_info.c:static int > cpuinfo_x86(FILE *file, > > odp_system_info_t *sysinfo) > > > platform/linux-generic/odp___system_info.c: > #if defined > __x86_64__ || > > defined __i386__ > > > platform/linux-generic/odp___system_info.c: > .cpu_arch_str = > "x86", > > > platform/linux-generic/odp___system_info.c: > .cpuinfo_parser = > > cpuinfo_x86 > > > platform/linux-generic/odp___system_info.c:#if > defined > __x86_64__ || defined > > __i386__ || defined __OCTEON__ || \ > > > platform/linux-generic/odp___time.c:#if defined > __x86_64__ || > defined __i386__ > > > platform/linux-generic/__include/api/odp_align.h:#elif > defined > __arm__ || > > defined __aarch64__ > > > platform/linux-generic/__include/api/odp_sync.h:#elif > defined(__arm__) > > > platform/linux-generic/__include/odp_spin_internal.h:#__elif > defined __arm__ > > > platform/linux-generic/odp___system_info.c:#elif > defined > __arm__ || defined > > __aarch64__ > > > platform/linux-generic/odp___system_info.c:static int > cpuinfo_arm(FILE *file > > ODP_UNUSED, > > > platform/linux-generic/odp___system_info.c: > #elif defined > __arm__ || > > defined __aarch64__ > > > platform/linux-generic/odp___system_info.c: > .cpu_arch_str = > "arm", > > > platform/linux-generic/odp___system_info.c: > .cpuinfo_parser = > > cpuinfo_arm > > > > > >> > >> > >> Taras Kondratiuk > > > > > > > > > > -- > > Mike Holmes > > Linaro Sr Technical Manager > > LNG - ODP > > > _________________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > <mailto:lng-odp@lists.linaro.org> > <mailto:lng-odp@lists.linaro.__org > <mailto:lng-odp@lists.linaro.org>> > > http://lists.linaro.org/__mailman/listinfo/lng-odp > <http://lists.linaro.org/mailman/listinfo/lng-odp> > > > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP > > > _________________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > <mailto:lng-odp@lists.linaro.org> > http://lists.linaro.org/__mailman/listinfo/lng-odp > <http://lists.linaro.org/mailman/listinfo/lng-odp> > > > > _________________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > <mailto:lng-odp@lists.linaro.org> > http://lists.linaro.org/__mailman/listinfo/lng-odp > <http://lists.linaro.org/mailman/listinfo/lng-odp> > > > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> > http://lists.linaro.org/mailman/listinfo/lng-odp > > > > > > > -- > *Mike Holmes* > Linaro Sr Technical Manager > LNG - ODP > > > > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/lng-odp >
On 12/24/2014 12:21 PM, Maxim Uvarov wrote: > On 12/23/2014 07:15 PM, Taras Kondratiuk wrote: >> Linux-generic shouldn't have arch specific parts, because it is >> *generic*. Instead it can fall back to strong __sync functions if >> __atomic are not available. > > Taras I think that even linux-generic should have it's own requirements. > It might be compiler version. > Dependence on libssl. Kernel version and etc. > > We do not say that linux-generic should work on any linux, even 20 years > old. We just define needed requirements > to compile it. The same is with atomics. Instead of creating 20 branches > for all gcc and not gcc version we just > stick with more fresh gcc and say that it's minimal requirement to run > linux-generic. I don't think it worth to make such a tight restriction to use only gcc 4.7+. Having fallback to __sync_* function will relax it to something like gcc 4.1+. Sure performance will be worse, but it should be ok for linux-generic.
On 12/25/2014 12:34 PM, Taras Kondratiuk wrote: > On 12/24/2014 12:21 PM, Maxim Uvarov wrote: >> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote: >>> Linux-generic shouldn't have arch specific parts, because it is >>> *generic*. Instead it can fall back to strong __sync functions if >>> __atomic are not available. >> Taras I think that even linux-generic should have it's own requirements. >> It might be compiler version. >> Dependence on libssl. Kernel version and etc. >> >> We do not say that linux-generic should work on any linux, even 20 years >> old. We just define needed requirements >> to compile it. The same is with atomics. Instead of creating 20 branches >> for all gcc and not gcc version we just >> stick with more fresh gcc and say that it's minimal requirement to run >> linux-generic. > I don't think it worth to make such a tight restriction to use only gcc > 4.7+. Having fallback to __sync_* function will relax it to something > like gcc 4.1+. Sure performance will be worse, but it should be ok for > linux-generic. > Why it is ok? If possible we need to make linux-generic version fast. Support of old gcc is not required. If it's needed for somebody he can do this in his own platform. Maxim.
On 12/25/2014 12:15 PM, Taras Kondratiuk wrote: > There is no need to make it so complex. At least until 'static' > before 'not static' functions is forbidden by some post-C11 standard). > > Me and Anders are making another try to separate clean API. In a > previous try common (clean) API headers were public. They had > <plat/...> include hooks to pull platform specific definitions. That > appeared to be not flexible enough. Current approach is to make common > API headers private in a sense that application won't include them > directly, but instead implementation will provide public headers and > embedded clean API headers there. > > Please check [1] for an example of this approach tried for a few > headers (atomic, byteorder). It seems to be working fine for now. I'm > reworking Keystone implementation based on this approach. > > [1] https://git.linaro.org/people/taras.kondratiuk/odp.git/shortlog/refs/heads/api/change_namespace I see from your patch you are doing: rename from platform/linux-generic/include/api/odp/buffer.h rename to include/api/odp/buffer.h "api" looks like is not needed and it can be simple include/buffer.h btw, what should I do with thread patch? Should I send updated version? Maxim. > On 12/24/2014 05:56 PM, Bill Fischofer wrote: >> Getting the ODP API files out of platform/include/api is the first >> step. But based on earlier attempts to deal with inlining Taras kept >> running into issues because GCC insists that functions to be inlined >> must have inline in their forward declarations. There's no simple way >> to do that without a lot of macro ugliness, but I suspect autotools >> could help with that. >> >> Each ODP API specified in a common header may need to have a >> platform-specific inline specification attached to it, depending on what >> the implementation needs. If this were an all-or-nothing thing we could >> do that with a simple macro that gets set in odp_platform_types.h (or a >> similar file). Call it ODP_API and it's either set to null or to >> 'static inline'. >> >> Then the common ODP API files would say: >> >> ODP_API <rest of function prototype> >> >> for example, instead of today saying >> >> void *odp_buffer_addr(odp_buffer_t buf); >> >> the canonical form would be written as >> >> ODP_API void *odp_buffer_addr(odp_buffer_t buf); >> >> and we'd get the needed static inline (or not) depending on how the >> implementation is written. But it's not that simple because each API is >> independently inlineable, so a single macro won't do. We could have a >> separate macro for each API, but that gets unwieldy very quickly. >> >> Instead, the API file could contain some common symbols that get >> macro-expanded based on ./configure options so that what gets generated >> into platform/include/api is what's needed for that platform/arch/etc. >> >> This might also be a way to easily capture platform-specific doxygen >> since if the API files are being custom generated then they can have >> specific documentation attached to them as well. That might be a >> cleaner way of addressing Petri's earlier concern about odp_config.h >> containing linux-generic stuff mixed in with general ODP documentation. >> >> On Wed, Dec 24, 2014 at 9:28 AM, Mike Holmes <mike.holmes@linaro.org >> <mailto:mike.holmes@linaro.org>> wrote: >> >> >> >> On 24 December 2014 at 10:05, Bill Fischofer >> <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> wrote: >> >> I suspect the long-term answer will be to use the autotools to >> construct the actual include hierarchy (and possibly the files >> themelves) needed for a given --with-platform specification. >> >> >> Let me check I have the same terminology in my head - we should have >> this defined in the arch doc but I did not look. >> platform = a physical board with interfaces etc and a core of some >> architecture as a complete solution >> arch = a cpu core for example octeon, arm, x86 >> >> In this case I then see a need to pick up arch independently >> from --with-platform so that the optimizations can be reused by all >> platforms with that architecture core. If we dont I would see a need >> for >> platform/linux-x86-generic >> platform/linux-arm-generic >> platform/linux-octeon-generic >> >> >> That would mean the canonical ODP API files may need to be >> meta-files that are used as input to ./configure to generate the >> actual ones used. >> >> >> If the work Taras and Anders are doing to clean out the API and move >> it back to the root succeeds, and it looks good to me, then it opens >> the door to some more wriggle room in solving this issue I think. >> But I could see the configure step helping take a single API >> definition and attaching it to the required optimizations as a >> solution if it cannot be done within C / GCC >> >> >> >> >> >> On Wed, Dec 24, 2014 at 9:03 AM, Bill Fischofer >> <bill.fischofer@linaro.org <mailto:bill.fischofer@linaro.org>> >> wrote: >> >> That makes sense. The only problem is GCC's problems with >> forward declarations and inlining. IMO a design issue with >> the compiler, but it is what it is. But we do need a >> general solution to this problem in 2015 since proper >> inlining support will be critical for production-grade >> implementations. >> >> On Wed, Dec 24, 2014 at 8:59 AM, Mike Holmes >> <mike.holmes@linaro.org <mailto:mike.holmes@linaro.org>> wrote: >> >> >> >> On 24 December 2014 at 06:19, Maxim Uvarov >> <maxim.uvarov@linaro.org >> <mailto:maxim.uvarov@linaro.org>> wrote: >> >> On 12/23/2014 10:40 PM, Mike Holmes wrote: >> >> >> >> On 23 December 2014 at 13:49, Bill Fischofer >> <bill.fischofer@linaro.org >> <mailto:bill.fischofer@linaro.org> >> <mailto:bill.fischofer@linaro.__org >> <mailto:bill.fischofer@linaro.org>>> wrote: >> >> That's good to know and certainly the code >> is cleaner that way. I >> think we should adopt the practice that any >> architecture-specific >> changes to linux-generic code (adds, >> updates, or removals) must >> either originate from or have a Reviewed-by >> a representative of >> that architecture (in this case, Bala). >> That way there's no >> confusion as to whether the known experts >> agree with the change, >> since these sort of things can have subtle >> implications that are >> not obvious to the layman. >> >> >> I think linux-generic should be devoid of any >> per core optimizations in its default state. >> There could be in an arch directory and it would >> be supported as you say Bill by someone who will >> care for that that variant, this also avoids >> filling linux-generic with a possibly large list >> of preprocessor flags that we can't test. >> >> In this case we don't have public access to the >> octeon to even know if this code works so >> separating or deletion feels right to me, >> although Cavium are adding their board in this case. >> >> To have optimizations we should understand which per >> arch optimization we need. I see that if newer gcc >> supports good optimizations then we need to use that >> gcc. Most of people use Open Embedded and Linaro gcc >> which is almost latest version. So if there is no >> optimization in standard environment (gcc, kernel, >> libc) then arch implementation should go to platform >> specific git variant. If it's covered by standard >> environment then we simple updating minimal version >> requirement for required component. >> >> >> Lets take an example ODP API file >> >> static inline void odp_sync_stores(void) >> { >> #if defined __x86_64__ || defined __i386__ >> >> >-------__asm__ __volatile__ ("sfence\n" : : : "memory"); >> >> #elif defined(__arm__) >> #if __ARM_ARCH == 6 >> >-------__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ >> >------->------->-------: : "r" (0) : "memory"); >> #elif __ARM_ARCH >= 7 || defined __aarch64__ >> >> >-------__asm__ __volatile__ ("dmb st" : : : "memory"); >> #else >> >-------__asm__ __volatile__ ("" : : : "memory"); >> #endif >> >> #elif defined __OCTEON__ >> >> >-------__asm__ __volatile__ ("syncws\n" : : : "memory"); >> >> #else >> >-------__sync_synchronize(); >> #endif >> >> >> How do we accommodate this without resorting to >> architecture specific processor flags / assembly? >> >> I would prefer to >> see platform/linux-generic/include/api/odp_sync.h changed to >> >> platform/linux-generic/include/api/x86/odp_sync.h >> platform/linux-generic/include/api/arm/odp_sync.h >> platform/linux-generic/include/api/octeon/odp_sync.h >> >> This assumes that there is currently no pure c code that >> can be used that is acceptable on all three platforms. >> >> Or since linux-generic is not a performance platform but >> rather a reference whenever we need to make a trade off >> between simplicity and performance, we alter this to be >> a regular function and move to:- >> >> platform/linux-generic/include/api/odp_sync.h <- now a >> pure forward declaration of the API with doxygen >> documentation >> and implementations correctly broken down by arch >> >> arch/x86/odp_sync.h >> arch/arm/odp_sync.h >> arch/octeon/odp_sync.h >> >> >> >> Maxim. >> >> >> >> On Tue, Dec 23, 2014 at 12:38 PM, Ola Liljedahl >> <ola.liljedahl@linaro.org >> <mailto:ola.liljedahl@linaro.org> >> <mailto:ola.liljedahl@linaro.__org >> <mailto:ola.liljedahl@linaro.org>>> wrote: >> >> I had some discussions with Bala and we >> think the GCC >> generates the >> appropriate code for OCTEON so there >> should be no need for any >> OCTEON-specific inline assembler in >> odp_atomic.h. I just >> didn't want >> to remove this piece of code when I >> redesigned the atomic support >> because I didn't understand why there is >> a SYNCW instruction >> *before* >> the load. Load-acquire type of loads may >> need sync or barrier >> instructions but they come after the >> load, not before. >> >> I think we should trust GCC to generate >> better code for OCTEON >> than us >> mere humans, at least until proven >> wrong. So please remove this >> OCTEON-specific snippet. >> >> -- Ola >> >> >> On 23 December 2014 at 18:01, Mike Holmes >> <mike.holmes@linaro.org >> <mailto:mike.holmes@linaro.org> >> <mailto:mike.holmes@linaro.org >> <mailto:mike.holmes@linaro.org>__>> wrote: >> > >> > >> > On 23 December 2014 at 11:15, Taras >> Kondratiuk >> <taras.kondratiuk@linaro.org >> <mailto:taras.kondratiuk@linaro.org> >> <mailto:taras.kondratiuk@__linaro.org >> <mailto:taras.kondratiuk@linaro.org>>> >> >> > wrote: >> >> >> >> On 12/23/2014 05:28 PM, Mike Holmes >> wrote: >> >> > how will this work, Octeon does not >> care about >> __atomic_fetch_add(&v, >> >> > 1, __ATOMIC_RELAXED); and yet you >> will fail configure I >> assume >> >> > >> >> > static inline uint32_t >> >> odp_atomic_fetch_inc_u32(odp___atomic_u32_t *atom) >> >> > { >> >> > #if defined __OCTEON__ >> >> >>-------uint32_t ret; >> >> >>-------__asm__ __volatile__ ("syncws"); >> >> >>-------__asm__ __volatile__ ("lai >> %0,(%2)" : "=r" (ret), >> "+m" (atom) : >> >> >>------->------->------- "r" (atom)); >> >> >>-------return ret; >> >> > #else >> >> >>-------return >> __atomic_fetch_add(&atom->v, 1, >> __ATOMIC_RELAXED); >> >> > #endif >> >> > } >> >> >> >> Linux-generic shouldn't have arch >> specific parts, because it is >> >> *generic*. Instead it can fall back >> to strong __sync >> functions if >> >> __atomic are not available. >> >> >> > >> > Linux kernel is generic but has core >> specifics - is this not >> the same ? >> > >> > What about these cases ? >> > >> > >> platform/linux-generic/odp___system_info.c:static int >> cpuinfo_octeon(FILE >> > *file, odp_system_info_t *sysinfo) >> > >> platform/linux-generic/odp___system_info.c: >> .cpu_arch_str = >> "octeon", >> > >> platform/linux-generic/odp___system_info.c: >> .cpuinfo_parser = >> > cpuinfo_octeon >> > >> platform/linux-generic/__include/api/odp_align.h:#if >> defined >> __x86_64__ || >> > defined __i386__ >> > >> platform/linux-generic/__include/api/odp_sync.h:#if >> defined >> __x86_64__ || >> > defined __i386__ >> > >> platform/linux-generic/__include/odp_spin_internal.h:#__if >> defined __x86_64__ || >> > defined __i386__ >> > >> platform/linux-generic/odp___system_info.c:#if >> defined >> __x86_64__ || defined >> > __i386__ || defined __OCTEON__ || \ >> > >> platform/linux-generic/odp___system_info.c:#if >> defined >> __x86_64__ || defined >> > __i386__ >> > >> platform/linux-generic/odp___system_info.c:static int >> cpuinfo_x86(FILE *file, >> > odp_system_info_t *sysinfo) >> > >> platform/linux-generic/odp___system_info.c: >> #if defined >> __x86_64__ || >> > defined __i386__ >> > >> platform/linux-generic/odp___system_info.c: >> .cpu_arch_str = >> "x86", >> > >> platform/linux-generic/odp___system_info.c: >> .cpuinfo_parser = >> > cpuinfo_x86 >> > >> platform/linux-generic/odp___system_info.c:#if >> defined >> __x86_64__ || defined >> > __i386__ || defined __OCTEON__ || \ >> > >> platform/linux-generic/odp___time.c:#if defined >> __x86_64__ || >> defined __i386__ >> > >> platform/linux-generic/__include/api/odp_align.h:#elif >> defined >> __arm__ || >> > defined __aarch64__ >> > >> platform/linux-generic/__include/api/odp_sync.h:#elif >> defined(__arm__) >> > >> platform/linux-generic/__include/odp_spin_internal.h:#__elif >> defined __arm__ >> > >> platform/linux-generic/odp___system_info.c:#elif >> defined >> __arm__ || defined >> > __aarch64__ >> > >> platform/linux-generic/odp___system_info.c:static int >> cpuinfo_arm(FILE *file >> > ODP_UNUSED, >> > >> platform/linux-generic/odp___system_info.c: >> #elif defined >> __arm__ || >> > defined __aarch64__ >> > >> platform/linux-generic/odp___system_info.c: >> .cpu_arch_str = >> "arm", >> > >> platform/linux-generic/odp___system_info.c: >> .cpuinfo_parser = >> > cpuinfo_arm >> > >> > >> >> >> >> >> >> Taras Kondratiuk >> > >> > >> > >> > >> > -- >> > Mike Holmes >> > Linaro Sr Technical Manager >> > LNG - ODP >> >> >> _________________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> <mailto:lng-odp@lists.linaro.org> >> <mailto:lng-odp@lists.linaro.__org >> <mailto:lng-odp@lists.linaro.org>> >> >> http://lists.linaro.org/__mailman/listinfo/lng-odp >> <http://lists.linaro.org/mailman/listinfo/lng-odp> >> >> >> >> >> -- >> *Mike Holmes* >> Linaro Sr Technical Manager >> LNG - ODP >> >> >> _________________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> <mailto:lng-odp@lists.linaro.org> >> http://lists.linaro.org/__mailman/listinfo/lng-odp >> <http://lists.linaro.org/mailman/listinfo/lng-odp> >> >> >> >> _________________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> <mailto:lng-odp@lists.linaro.org> >> http://lists.linaro.org/__mailman/listinfo/lng-odp >> <http://lists.linaro.org/mailman/listinfo/lng-odp> >> >> >> >> >> -- >> *Mike Holmes* >> Linaro Sr Technical Manager >> LNG - ODP >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org <mailto:lng-odp@lists.linaro.org> >> http://lists.linaro.org/mailman/listinfo/lng-odp >> >> >> >> >> >> >> -- >> *Mike Holmes* >> Linaro Sr Technical Manager >> LNG - ODP >> >> >> >> >> _______________________________________________ >> lng-odp mailing list >> lng-odp@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/lng-odp >> >
On 12/25/2014 11:53 AM, Maxim Uvarov wrote: > On 12/25/2014 12:34 PM, Taras Kondratiuk wrote: >> On 12/24/2014 12:21 PM, Maxim Uvarov wrote: >>> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote: >>>> Linux-generic shouldn't have arch specific parts, because it is >>>> *generic*. Instead it can fall back to strong __sync functions if >>>> __atomic are not available. >>> Taras I think that even linux-generic should have it's own requirements. >>> It might be compiler version. >>> Dependence on libssl. Kernel version and etc. >>> >>> We do not say that linux-generic should work on any linux, even 20 years >>> old. We just define needed requirements >>> to compile it. The same is with atomics. Instead of creating 20 branches >>> for all gcc and not gcc version we just >>> stick with more fresh gcc and say that it's minimal requirement to run >>> linux-generic. >> I don't think it worth to make such a tight restriction to use only gcc >> 4.7+. Having fallback to __sync_* function will relax it to something >> like gcc 4.1+. Sure performance will be worse, but it should be ok for >> linux-generic. >> > Why it is ok? If possible we need to make linux-generic version fast. > Support of old gcc is not required. If it's needed for somebody he can > do this in his own platform. Because performance is not the main target for linux-generic. I'm a bit puzzled with your arguments so want to clarify your point. I see three options here: 1. __atomic_* only, so we don't support older compilers. 2. __atomic_* with fallback to __sync_*. 3. __atomic_* and inline assembler for some architectures. Currently we have #3 in the code. As far as I understand you are for #1. Right? I'd support #2.
On 12/25/2014 12:23 PM, Maxim Uvarov wrote: > On 12/25/2014 12:15 PM, Taras Kondratiuk wrote: >> There is no need to make it so complex. At least until 'static' >> before 'not static' functions is forbidden by some post-C11 standard). >> >> Me and Anders are making another try to separate clean API. In a >> previous try common (clean) API headers were public. They had >> <plat/...> include hooks to pull platform specific definitions. That >> appeared to be not flexible enough. Current approach is to make common >> API headers private in a sense that application won't include them >> directly, but instead implementation will provide public headers and >> embedded clean API headers there. >> >> Please check [1] for an example of this approach tried for a few >> headers (atomic, byteorder). It seems to be working fine for now. I'm >> reworking Keystone implementation based on this approach. >> >> [1] https://git.linaro.org/people/taras.kondratiuk/odp.git/shortlog/refs/heads/api/change_namespace > > I see from your patch you are doing: > > rename from platform/linux-generic/include/api/odp/buffer.h > rename to include/api/odp/buffer.h > > "api" looks like is not needed and it can be simple include/buffer.h The exact path can be changed. It is just a demo of the approach. > > btw, what should I do with thread patch? Should I send updated version? I think atomic code should be fixed first not to use inline assembler, before you can add checker in configure.
On 12/25/2014 01:55 PM, Taras Kondratiuk wrote: > On 12/25/2014 11:53 AM, Maxim Uvarov wrote: >> On 12/25/2014 12:34 PM, Taras Kondratiuk wrote: >>> On 12/24/2014 12:21 PM, Maxim Uvarov wrote: >>>> On 12/23/2014 07:15 PM, Taras Kondratiuk wrote: >>>>> Linux-generic shouldn't have arch specific parts, because it is >>>>> *generic*. Instead it can fall back to strong __sync functions if >>>>> __atomic are not available. >>>> Taras I think that even linux-generic should have it's own requirements. >>>> It might be compiler version. >>>> Dependence on libssl. Kernel version and etc. >>>> >>>> We do not say that linux-generic should work on any linux, even 20 years >>>> old. We just define needed requirements >>>> to compile it. The same is with atomics. Instead of creating 20 branches >>>> for all gcc and not gcc version we just >>>> stick with more fresh gcc and say that it's minimal requirement to run >>>> linux-generic. >>> I don't think it worth to make such a tight restriction to use only gcc >>> 4.7+. Having fallback to __sync_* function will relax it to something >>> like gcc 4.1+. Sure performance will be worse, but it should be ok for >>> linux-generic. >>> >> Why it is ok? If possible we need to make linux-generic version fast. >> Support of old gcc is not required. If it's needed for somebody he can >> do this in his own platform. > Because performance is not the main target for linux-generic. > > I'm a bit puzzled with your arguments so want to clarify your point. > I see three options here: > 1. __atomic_* only, so we don't support older compilers. > 2. __atomic_* with fallback to __sync_*. > 3. __atomic_* and inline assembler for some architectures. > > Currently we have #3 in the code. As far as I understand you are for #1. > Right? I'd support #2. > Yes, currently we have #3. And what I understood from all discussions is to go with #1. At lease fall-back was never discussed. Ola's patch bellow replaced __sync with __atomic. If we need #2 fall-back then we need restore original __sync code. But before doing this we need to figure out who actually requires this. commit 8363d4608e67ecdd000fb77b440ec1707d1cc6f6 Author: Ola Liljedahl <ola.liljedahl@linaro.org> Date: Mon Nov 24 23:38:49 2014 +0100 api: odp_atomic.h: struct type, relaxed mm, missing funcs, use __atomic Maxim.
diff --git a/configure.ac b/configure.ac index 377e8be..f55beca 100644 --- a/configure.ac +++ b/configure.ac @@ -47,7 +47,9 @@ AC_ARG_WITH([platform], [AS_HELP_STRING([--with-platform=prefix], [Select platform to be used, default linux-generic])], [], - [with_platform=linux-generic]) + [with_platform=linux-generic + m4_include([./platform/linux-generic/m4/configure.m4]) + ]) AC_SUBST([with_platform]) diff --git a/platform/linux-generic/m4/configure.m4 b/platform/linux-generic/m4/configure.m4 new file mode 100644 index 0000000..bfb1fb0 --- /dev/null +++ b/platform/linux-generic/m4/configure.m4 @@ -0,0 +1,18 @@ +AC_MSG_CHECKING(for GCC atomic builtins) +AC_LINK_IFELSE( + [AC_LANG_SOURCE( + [[#include <stdint.h> + int main() { + uint32_t v = 1; + __atomic_fetch_add(&v, 1, __ATOMIC_RELAXED); + __atomic_fetch_sub(&v, 1, __ATOMIC_RELAXED); + __atomic_store_n(&v, 1, __ATOMIC_RELAXED); + __atomic_load_n(&v, __ATOMIC_RELAXED); + return 0; + } + ]])], + AC_MSG_RESULT(yes), + AC_MSG_RESULT(no) + echo "Atomic operation are not supported by your compiller." + echo "Use newer version. For gcc > 4.7.3" + exit -1)
Odp atomic operations based on compiler build-ins. Make sure that compiler supports such operation at configure stage. Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org> --- v3: make atomic checks platfrom specific configure.ac | 4 +++- platform/linux-generic/m4/configure.m4 | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 platform/linux-generic/m4/configure.m4