Message ID | 20240208130840.533348-3-adhemerval.zanella@linaro.org |
---|---|
State | Accepted |
Commit | 272708884cb750f12f5c74a00e6620c19dc6d567 |
Headers | show |
Series | x86: Improve ERMS usage on Zen3+ | expand |
On Thu, Feb 8, 2024 at 5:08 AM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > For AMD Zen3+ architecture, the performance of the vectorized loop is > slightly better than ERMS. > > Checked on x86_64-linux-gnu on Zen3. > --- > sysdeps/x86/dl-cacheinfo.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > index f34d12846c..5a98f70364 100644 > --- a/sysdeps/x86/dl-cacheinfo.h > +++ b/sysdeps/x86/dl-cacheinfo.h > @@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > minimum value is fixed. */ > rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, > long int, NULL); > + if (cpu_features->basic.kind == arch_kind_amd > + && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) > + /* For AMD Zen3+ architecture, the performance of the vectorized loop is > + slightly better than ERMS. */ > + rep_stosb_threshold = SIZE_MAX; > > TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); > TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); > -- > 2.34.1 > LGTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Thanks.
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index f34d12846c..5a98f70364 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) minimum value is fixed. */ rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, long int, NULL); + if (cpu_features->basic.kind == arch_kind_amd + && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) + /* For AMD Zen3+ architecture, the performance of the vectorized loop is + slightly better than ERMS. */ + rep_stosb_threshold = SIZE_MAX; TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);