[0/4] tcg: Optimize loads and stores to env

Message ID	20230831025729.1194388-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH 0/4] tcg: Optimize loads and stores to env Date: Wed, 30 Aug 2023 19:57:25 -0700 Message-Id: <20230831025729.1194388-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::1033; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1033.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	tcg: Optimize loads and stores to env \| expand [0/4] tcg: Optimize loads and stores to env [1/4] tcg: Don't free vector results [2/4] tcg/optimize: Pipe OptContext into reset_ts [3/4] tcg: Optimize env memory operations [4/4] tcg: Eliminate duplicate env store operations

Message ID

20230831025729.1194388-1-richard.henderson@linaro.org

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH 0/4] tcg: Optimize loads and stores to env
Date: Wed, 30 Aug 2023 19:57:25 -0700
Message-Id: <20230831025729.1194388-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::1033;
 envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1033.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

tcg: Optimize loads and stores to env | expand

Message

Richard Henderson Aug. 31, 2023, 2:57 a.m. UTC

This is aimed at improving gvec generated code, which involves large
numbers of loads and stores to the env slots of the guest cpu vector
registers.  The final patch helps eliminate redundant zero-extensions
that can appear with e.g. avx2 and sve.

From the small amount of timing that I have done, there is no change.
But of course as we all know, x86 is very good with redundant memory.
And frankly, I haven't found a good test case for measuring.
What I need is an algorithm with lots of integer vector code that can
be expanded with gvec.  Most of what I've found is either fp (out of
line) or too simple (small translation blocks with little scope for
optimization).

That said, it appears to be simple enough, and does eliminate some
redundant operations, even in places that I didn't expect.


r~


Richard Henderson (4):
  tcg: Don't free vector results
  tcg/optimize: Pipe OptContext into reset_ts
  tcg: Optimize env memory operations
  tcg: Eliminate duplicate env store operations

 tcg/optimize.c    | 226 ++++++++++++++++++++++++++++++++++++++++++++--
 tcg/tcg-op-gvec.c |  39 ++------
 2 files changed, 225 insertions(+), 40 deletions(-)

Comments

Richard Henderson Sept. 28, 2023, 10:45 p.m. UTC | #1

Ping.

r~

On 8/30/23 22:57, Richard Henderson wrote:
> This is aimed at improving gvec generated code, which involves large
> numbers of loads and stores to the env slots of the guest cpu vector
> registers.  The final patch helps eliminate redundant zero-extensions
> that can appear with e.g. avx2 and sve.
> 
>  From the small amount of timing that I have done, there is no change.
> But of course as we all know, x86 is very good with redundant memory.
> And frankly, I haven't found a good test case for measuring.
> What I need is an algorithm with lots of integer vector code that can
> be expanded with gvec.  Most of what I've found is either fp (out of
> line) or too simple (small translation blocks with little scope for
> optimization).
> 
> That said, it appears to be simple enough, and does eliminate some
> redundant operations, even in places that I didn't expect.
> 
> 
> r~
> 
> 
> Richard Henderson (4):
>    tcg: Don't free vector results
>    tcg/optimize: Pipe OptContext into reset_ts
>    tcg: Optimize env memory operations
>    tcg: Eliminate duplicate env store operations
> 
>   tcg/optimize.c    | 226 ++++++++++++++++++++++++++++++++++++++++++++--
>   tcg/tcg-op-gvec.c |  39 ++------
>   2 files changed, 225 insertions(+), 40 deletions(-)
>

Richard Henderson Oct. 13, 2023, 5:40 p.m. UTC | #2

Ping 2.

On 9/28/23 15:45, Richard Henderson wrote:
> Ping.
> 
> r~
> 
> On 8/30/23 22:57, Richard Henderson wrote:
>> This is aimed at improving gvec generated code, which involves large
>> numbers of loads and stores to the env slots of the guest cpu vector
>> registers.  The final patch helps eliminate redundant zero-extensions
>> that can appear with e.g. avx2 and sve.
>>
>>  From the small amount of timing that I have done, there is no change.
>> But of course as we all know, x86 is very good with redundant memory.
>> And frankly, I haven't found a good test case for measuring.
>> What I need is an algorithm with lots of integer vector code that can
>> be expanded with gvec.  Most of what I've found is either fp (out of
>> line) or too simple (small translation blocks with little scope for
>> optimization).
>>
>> That said, it appears to be simple enough, and does eliminate some
>> redundant operations, even in places that I didn't expect.
>>
>>
>> r~
>>
>>
>> Richard Henderson (4):
>>    tcg: Don't free vector results
>>    tcg/optimize: Pipe OptContext into reset_ts
>>    tcg: Optimize env memory operations
>>    tcg: Eliminate duplicate env store operations
>>
>>   tcg/optimize.c    | 226 ++++++++++++++++++++++++++++++++++++++++++++--
>>   tcg/tcg-op-gvec.c |  39 ++------
>>   2 files changed, 225 insertions(+), 40 deletions(-)
>>
>

Song Gao Oct. 16, 2023, 3:01 a.m. UTC | #3

在 2023/8/31 上午10:57, Richard Henderson 写道:
> This is aimed at improving gvec generated code, which involves large
> numbers of loads and stores to the env slots of the guest cpu vector
> registers.  The final patch helps eliminate redundant zero-extensions
> that can appear with e.g. avx2 and sve.
>
>  From the small amount of timing that I have done, there is no change.
> But of course as we all know, x86 is very good with redundant memory.
> And frankly, I haven't found a good test case for measuring.
> What I need is an algorithm with lots of integer vector code that can
> be expanded with gvec.  Most of what I've found is either fp (out of
> line) or too simple (small translation blocks with little scope for
> optimization).
>
> That said, it appears to be simple enough, and does eliminate some
> redundant operations, even in places that I didn't expect.
>
>
> r~
>
>
> Richard Henderson (4):
>    tcg: Don't free vector results
>    tcg/optimize: Pipe OptContext into reset_ts
>    tcg: Optimize env memory operations
>    tcg: Eliminate duplicate env store operations
>
>   tcg/optimize.c    | 226 ++++++++++++++++++++++++++++++++++++++++++++--
>   tcg/tcg-op-gvec.c |  39 ++------
>   2 files changed, 225 insertions(+), 40 deletions(-)
>
Patch 1 and Patch 3,    s  -i  "/cpu_env/tcg_env/g "

Reviewed-by: Song Gao <gaosong@loongson.cn>

Thanks.
Song Gao