Message ID | 20230831025729.1194388-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | tcg: Optimize loads and stores to env | expand |
Ping. r~ On 8/30/23 22:57, Richard Henderson wrote: > This is aimed at improving gvec generated code, which involves large > numbers of loads and stores to the env slots of the guest cpu vector > registers. The final patch helps eliminate redundant zero-extensions > that can appear with e.g. avx2 and sve. > > From the small amount of timing that I have done, there is no change. > But of course as we all know, x86 is very good with redundant memory. > And frankly, I haven't found a good test case for measuring. > What I need is an algorithm with lots of integer vector code that can > be expanded with gvec. Most of what I've found is either fp (out of > line) or too simple (small translation blocks with little scope for > optimization). > > That said, it appears to be simple enough, and does eliminate some > redundant operations, even in places that I didn't expect. > > > r~ > > > Richard Henderson (4): > tcg: Don't free vector results > tcg/optimize: Pipe OptContext into reset_ts > tcg: Optimize env memory operations > tcg: Eliminate duplicate env store operations > > tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- > tcg/tcg-op-gvec.c | 39 ++------ > 2 files changed, 225 insertions(+), 40 deletions(-) >
Ping 2. On 9/28/23 15:45, Richard Henderson wrote: > Ping. > > r~ > > On 8/30/23 22:57, Richard Henderson wrote: >> This is aimed at improving gvec generated code, which involves large >> numbers of loads and stores to the env slots of the guest cpu vector >> registers. The final patch helps eliminate redundant zero-extensions >> that can appear with e.g. avx2 and sve. >> >> From the small amount of timing that I have done, there is no change. >> But of course as we all know, x86 is very good with redundant memory. >> And frankly, I haven't found a good test case for measuring. >> What I need is an algorithm with lots of integer vector code that can >> be expanded with gvec. Most of what I've found is either fp (out of >> line) or too simple (small translation blocks with little scope for >> optimization). >> >> That said, it appears to be simple enough, and does eliminate some >> redundant operations, even in places that I didn't expect. >> >> >> r~ >> >> >> Richard Henderson (4): >> tcg: Don't free vector results >> tcg/optimize: Pipe OptContext into reset_ts >> tcg: Optimize env memory operations >> tcg: Eliminate duplicate env store operations >> >> tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- >> tcg/tcg-op-gvec.c | 39 ++------ >> 2 files changed, 225 insertions(+), 40 deletions(-) >> >
在 2023/8/31 上午10:57, Richard Henderson 写道: > This is aimed at improving gvec generated code, which involves large > numbers of loads and stores to the env slots of the guest cpu vector > registers. The final patch helps eliminate redundant zero-extensions > that can appear with e.g. avx2 and sve. > > From the small amount of timing that I have done, there is no change. > But of course as we all know, x86 is very good with redundant memory. > And frankly, I haven't found a good test case for measuring. > What I need is an algorithm with lots of integer vector code that can > be expanded with gvec. Most of what I've found is either fp (out of > line) or too simple (small translation blocks with little scope for > optimization). > > That said, it appears to be simple enough, and does eliminate some > redundant operations, even in places that I didn't expect. > > > r~ > > > Richard Henderson (4): > tcg: Don't free vector results > tcg/optimize: Pipe OptContext into reset_ts > tcg: Optimize env memory operations > tcg: Eliminate duplicate env store operations > > tcg/optimize.c | 226 ++++++++++++++++++++++++++++++++++++++++++++-- > tcg/tcg-op-gvec.c | 39 ++------ > 2 files changed, 225 insertions(+), 40 deletions(-) > Patch 1 and Patch 3, s -i "/cpu_env/tcg_env/g " Reviewed-by: Song Gao <gaosong@loongson.cn> Thanks. Song Gao