Message ID | 20240208130840.533348-4-adhemerval.zanella@linaro.org |
---|---|
State | Accepted |
Commit | 491e55beab7457ed310a4a47496f4a333c5d1032 |
Headers | show |
Series | x86: Improve ERMS usage on Zen3+ | expand |
On Thu, Feb 8, 2024 at 5:08 AM Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > --- > sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > index 9984c3ca0f..97839a2248 100644 > --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > @@ -21,7 +21,9 @@ > 2. If size is less than VEC, use integer register stores. > 3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores. > 4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores. > - 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with > + 5. On machines ERMS feature, if size is greater or equal than > + __x86_rep_stosb_threshold then REP STOSB will be used. > + 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with > 4 VEC stores and store 4 * VEC at a time until done. */ > > #include <sysdep.h> > -- > 2.34.1 > LGTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Thanks.
diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 9984c3ca0f..97839a2248 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -21,7 +21,9 @@ 2. If size is less than VEC, use integer register stores. 3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores. 4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores. - 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with + 5. On machines ERMS feature, if size is greater or equal than + __x86_rep_stosb_threshold then REP STOSB will be used. + 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with 4 VEC stores and store 4 * VEC at a time until done. */ #include <sysdep.h>