diff mbox series

[RFC] semihosting/arm-compat: remove heuristic softmmu SYS_HEAPINFO

Message ID 20210601090715.22330-1-alex.bennee@linaro.org
State New
Headers show
Series [RFC] semihosting/arm-compat: remove heuristic softmmu SYS_HEAPINFO | expand

Commit Message

Alex Bennée June 1, 2021, 9:07 a.m. UTC
The previous numbers were a guess at best. While we could extract the
information from a loaded ELF file via -kernel we could still get
tripped up by self decompressing or relocating code. Besides sane
library code has access to the same symbols in run time to make a
determination of the location of the heap.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Cc: Andrew <astrauss11@gmail.com>
---
 semihosting/arm-compat-semi.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

-- 
2.20.1

Comments

Peter Maydell June 1, 2021, 9:14 a.m. UTC | #1
On Tue, 1 Jun 2021 at 10:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>

> The previous numbers were a guess at best. While we could extract the

> information from a loaded ELF file via -kernel we could still get

> tripped up by self decompressing or relocating code. Besides sane

> library code has access to the same symbols in run time to make a

> determination of the location of the heap.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> Cc: Andrew <astrauss11@gmail.com>

> ---

>  semihosting/arm-compat-semi.c | 19 ++++++++++---------

>  1 file changed, 10 insertions(+), 9 deletions(-)


This seems like a pretty good candidate for breaking existing
working binaries. How much testing against different varieties of
guest-code-using-semihosting have you done ?

thanks
-- PMM
Alex Bennée June 1, 2021, 10:04 a.m. UTC | #2
Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 1 Jun 2021 at 10:12, Alex Bennée <alex.bennee@linaro.org> wrote:

>>

>> The previous numbers were a guess at best. While we could extract the

>> information from a loaded ELF file via -kernel we could still get

>> tripped up by self decompressing or relocating code. Besides sane

>> library code has access to the same symbols in run time to make a

>> determination of the location of the heap.

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> Cc: Andrew <astrauss11@gmail.com>

>> ---

>>  semihosting/arm-compat-semi.c | 19 ++++++++++---------

>>  1 file changed, 10 insertions(+), 9 deletions(-)

>

> This seems like a pretty good candidate for breaking existing

> working binaries. How much testing against different varieties of

> guest-code-using-semihosting have you done ?


None, which is why it's an RFC - but at least one user reported newlib
attempts to use the numbers we gave it rather than falling back to
numbers it knew from the build and getting it wrong. I suspect any code
that doesn't have a fallback path is getting it right more by luck than
judgement though. I'd be curious to hear of code that relies on the
numbers it gets from QEMU.

>

> thanks

> -- PMM



-- 
Alex Bennée
Peter Maydell June 1, 2021, 10:30 a.m. UTC | #3
On Tue, 1 Jun 2021 at 11:07, Alex Bennée <alex.bennee@linaro.org> wrote:
>

>

> Peter Maydell <peter.maydell@linaro.org> writes:

>

> > On Tue, 1 Jun 2021 at 10:12, Alex Bennée <alex.bennee@linaro.org> wrote:

> >>

> >> The previous numbers were a guess at best. While we could extract the

> >> information from a loaded ELF file via -kernel we could still get

> >> tripped up by self decompressing or relocating code. Besides sane

> >> library code has access to the same symbols in run time to make a

> >> determination of the location of the heap.

> >>

> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> >> Cc: Andrew <astrauss11@gmail.com>

> >> ---

> >>  semihosting/arm-compat-semi.c | 19 ++++++++++---------

> >>  1 file changed, 10 insertions(+), 9 deletions(-)

> >

> > This seems like a pretty good candidate for breaking existing

> > working binaries. How much testing against different varieties of

> > guest-code-using-semihosting have you done ?

>

> None, which is why it's an RFC - but at least one user reported newlib

> attempts to use the numbers we gave it rather than falling back to

> numbers it knew from the build and getting it wrong. I suspect any code

> that doesn't have a fallback path is getting it right more by luck than

> judgement though. I'd be curious to hear of code that relies on the

> numbers it gets from QEMU.


Well, newlib is the main one I had in mind -- does it have a fallback
codepath?

-- PMM
Andrew Strauss June 1, 2021, 10:52 a.m. UTC | #4
Yeah, newlib falls back to using the symbols __stack and end to figure out
where to start the stack and heap if 0 is returned by SYS_HEAPINFO.

-- Andrew Strauss

On Tue, Jun 1, 2021 at 6:31 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Tue, 1 Jun 2021 at 11:07, Alex Bennée <alex.bennee@linaro.org> wrote:

> >

> >

> > Peter Maydell <peter.maydell@linaro.org> writes:

> >

> > > On Tue, 1 Jun 2021 at 10:12, Alex Bennée <alex.bennee@linaro.org>

> wrote:

> > >>

> > >> The previous numbers were a guess at best. While we could extract the

> > >> information from a loaded ELF file via -kernel we could still get

> > >> tripped up by self decompressing or relocating code. Besides sane

> > >> library code has access to the same symbols in run time to make a

> > >> determination of the location of the heap.

> > >>

> > >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> > >> Cc: Andrew <astrauss11@gmail.com>

> > >> ---

> > >>  semihosting/arm-compat-semi.c | 19 ++++++++++---------

> > >>  1 file changed, 10 insertions(+), 9 deletions(-)

> > >

> > > This seems like a pretty good candidate for breaking existing

> > > working binaries. How much testing against different varieties of

> > > guest-code-using-semihosting have you done ?

> >

> > None, which is why it's an RFC - but at least one user reported newlib

> > attempts to use the numbers we gave it rather than falling back to

> > numbers it knew from the build and getting it wrong. I suspect any code

> > that doesn't have a fallback path is getting it right more by luck than

> > judgement though. I'd be curious to hear of code that relies on the

> > numbers it gets from QEMU.

>

> Well, newlib is the main one I had in mind -- does it have a fallback

> codepath?

>

> -- PMM

>
<div dir="ltr"><div dir="ltr">Yeah, newlib falls back to using the symbols __stack and end to figure out where to start the stack and heap if 0 is returned by SYS_HEAPINFO.</div><div dir="ltr"><br></div><div>-- Andrew Strauss<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 1, 2021 at 6:31 AM Peter Maydell &lt;<a href="mailto:peter.maydell@linaro.org">peter.maydell@linaro.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, 1 Jun 2021 at 11:07, Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org" target="_blank">alex.bennee@linaro.org</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt; Peter Maydell &lt;<a href="mailto:peter.maydell@linaro.org" target="_blank">peter.maydell@linaro.org</a>&gt; writes:<br>
&gt;<br>
&gt; &gt; On Tue, 1 Jun 2021 at 10:12, Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org" target="_blank">alex.bennee@linaro.org</a>&gt; wrote:<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; The previous numbers were a guess at best. While we could extract the<br>
&gt; &gt;&gt; information from a loaded ELF file via -kernel we could still get<br>
&gt; &gt;&gt; tripped up by self decompressing or relocating code. Besides sane<br>
&gt; &gt;&gt; library code has access to the same symbols in run time to make a<br>
&gt; &gt;&gt; determination of the location of the heap.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; Signed-off-by: Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org" target="_blank">alex.bennee@linaro.org</a>&gt;<br>
&gt; &gt;&gt; Cc: Andrew &lt;<a href="mailto:astrauss11@gmail.com" target="_blank">astrauss11@gmail.com</a>&gt;<br>
&gt; &gt;&gt; ---<br>
&gt; &gt;&gt;  semihosting/arm-compat-semi.c | 19 ++++++++++---------<br>
&gt; &gt;&gt;  1 file changed, 10 insertions(+), 9 deletions(-)<br>
&gt; &gt;<br>
&gt; &gt; This seems like a pretty good candidate for breaking existing<br>
&gt; &gt; working binaries. How much testing against different varieties of<br>
&gt; &gt; guest-code-using-semihosting have you done ?<br>
&gt;<br>
&gt; None, which is why it&#39;s an RFC - but at least one user reported newlib<br>
&gt; attempts to use the numbers we gave it rather than falling back to<br>
&gt; numbers it knew from the build and getting it wrong. I suspect any code<br>
&gt; that doesn&#39;t have a fallback path is getting it right more by luck than<br>
&gt; judgement though. I&#39;d be curious to hear of code that relies on the<br>
&gt; numbers it gets from QEMU.<br>
<br>
Well, newlib is the main one I had in mind -- does it have a fallback<br>
codepath?<br>
<br>
-- PMM<br>
</blockquote></div></div>
Andrew Strauss June 1, 2021, 10:54 a.m. UTC | #5
Tested-by: Andrew Strauss <astrauss11@gmail.com>

Reviewed-by: Andrew Strauss <astrauss11@gmail.com>


On Tue, Jun 1, 2021 at 5:07 AM Alex Bennée <alex.bennee@linaro.org> wrote:

> The previous numbers were a guess at best. While we could extract the

> information from a loaded ELF file via -kernel we could still get

> tripped up by self decompressing or relocating code. Besides sane

> library code has access to the same symbols in run time to make a

> determination of the location of the heap.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> Cc: Andrew <astrauss11@gmail.com>

> ---

>  semihosting/arm-compat-semi.c | 19 ++++++++++---------

>  1 file changed, 10 insertions(+), 9 deletions(-)

>

> diff --git a/semihosting/arm-compat-semi.c b/semihosting/arm-compat-semi.c

> index 1c29146dcf..041b4f6c04 100644

> --- a/semihosting/arm-compat-semi.c

> +++ b/semihosting/arm-compat-semi.c

> @@ -1165,12 +1165,10 @@ target_ulong do_common_semihosting(CPUState *cs)

>      case TARGET_SYS_HEAPINFO:

>          {

>              target_ulong retvals[4];

> -            target_ulong limit;

>              int i;

>  #ifdef CONFIG_USER_ONLY

> +            target_ulong limit;

>              TaskState *ts = cs->opaque;

> -#else

> -            target_ulong rambase = common_semi_rambase(cs);

>  #endif

>

>              GET_ARG(0);

> @@ -1201,12 +1199,15 @@ target_ulong do_common_semihosting(CPUState *cs)

>              retvals[2] = ts->stack_base;

>              retvals[3] = 0; /* Stack limit.  */

>  #else

> -            limit = current_machine->ram_size;

> -            /* TODO: Make this use the limit of the loaded application.

> */

> -            retvals[0] = rambase + limit / 2;

> -            retvals[1] = rambase + limit;

> -            retvals[2] = rambase + limit; /* Stack base */

> -            retvals[3] = rambase; /* Stack limit.  */

> +            /*

> +             * Reporting 0 indicates we couldn't calculate the real

> +             * values which should force most software to fall back to

> +             * using information it has.

> +             */

> +            retvals[0] = 0; /* Heap Base */

> +            retvals[1] = 0; /* Heap Limit */

> +            retvals[2] = 0; /* Stack base */

> +            retvals[3] = 0; /* Stack limit.  */

>  #endif

>

>              for (i = 0; i < ARRAY_SIZE(retvals); i++) {

> --

> 2.20.1

>

>
<div dir="ltr"><div>Tested-by: Andrew Strauss &lt;<a href="mailto:astrauss11@gmail.com">astrauss11@gmail.com</a>&gt;</div><div>Reviewed-by: Andrew Strauss &lt;<a href="mailto:astrauss11@gmail.com">astrauss11@gmail.com</a>&gt;<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 1, 2021 at 5:07 AM Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org">alex.bennee@linaro.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The previous numbers were a guess at best. While we could extract the<br>
information from a loaded ELF file via -kernel we could still get<br>
tripped up by self decompressing or relocating code. Besides sane<br>
library code has access to the same symbols in run time to make a<br>
determination of the location of the heap.<br>
<br>
Signed-off-by: Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org" target="_blank">alex.bennee@linaro.org</a>&gt;<br>

Cc: Andrew &lt;<a href="mailto:astrauss11@gmail.com" target="_blank">astrauss11@gmail.com</a>&gt;<br>
---<br>
 semihosting/arm-compat-semi.c | 19 ++++++++++---------<br>
 1 file changed, 10 insertions(+), 9 deletions(-)<br>
<br>
diff --git a/semihosting/arm-compat-semi.c b/semihosting/arm-compat-semi.c<br>
index 1c29146dcf..041b4f6c04 100644<br>
--- a/semihosting/arm-compat-semi.c<br>
+++ b/semihosting/arm-compat-semi.c<br>
@@ -1165,12 +1165,10 @@ target_ulong do_common_semihosting(CPUState *cs)<br>
     case TARGET_SYS_HEAPINFO:<br>
         {<br>
             target_ulong retvals[4];<br>
-            target_ulong limit;<br>
             int i;<br>
 #ifdef CONFIG_USER_ONLY<br>
+            target_ulong limit;<br>
             TaskState *ts = cs-&gt;opaque;<br>
-#else<br>
-            target_ulong rambase = common_semi_rambase(cs);<br>
 #endif<br>
<br>
             GET_ARG(0);<br>
@@ -1201,12 +1199,15 @@ target_ulong do_common_semihosting(CPUState *cs)<br>
             retvals[2] = ts-&gt;stack_base;<br>
             retvals[3] = 0; /* Stack limit.  */<br>
 #else<br>
-            limit = current_machine-&gt;ram_size;<br>
-            /* TODO: Make this use the limit of the loaded application.  */<br>
-            retvals[0] = rambase + limit / 2;<br>
-            retvals[1] = rambase + limit;<br>
-            retvals[2] = rambase + limit; /* Stack base */<br>
-            retvals[3] = rambase; /* Stack limit.  */<br>
+            /*<br>
+             * Reporting 0 indicates we couldn&#39;t calculate the real<br>
+             * values which should force most software to fall back to<br>
+             * using information it has.<br>
+             */<br>
+            retvals[0] = 0; /* Heap Base */<br>
+            retvals[1] = 0; /* Heap Limit */<br>
+            retvals[2] = 0; /* Stack base */<br>
+            retvals[3] = 0; /* Stack limit.  */<br>
 #endif<br>
<br>
             for (i = 0; i &lt; ARRAY_SIZE(retvals); i++) {<br>
-- <br>
2.20.1<br>
<br>
</blockquote></div>
diff mbox series

Patch

diff --git a/semihosting/arm-compat-semi.c b/semihosting/arm-compat-semi.c
index 1c29146dcf..041b4f6c04 100644
--- a/semihosting/arm-compat-semi.c
+++ b/semihosting/arm-compat-semi.c
@@ -1165,12 +1165,10 @@  target_ulong do_common_semihosting(CPUState *cs)
     case TARGET_SYS_HEAPINFO:
         {
             target_ulong retvals[4];
-            target_ulong limit;
             int i;
 #ifdef CONFIG_USER_ONLY
+            target_ulong limit;
             TaskState *ts = cs->opaque;
-#else
-            target_ulong rambase = common_semi_rambase(cs);
 #endif
 
             GET_ARG(0);
@@ -1201,12 +1199,15 @@  target_ulong do_common_semihosting(CPUState *cs)
             retvals[2] = ts->stack_base;
             retvals[3] = 0; /* Stack limit.  */
 #else
-            limit = current_machine->ram_size;
-            /* TODO: Make this use the limit of the loaded application.  */
-            retvals[0] = rambase + limit / 2;
-            retvals[1] = rambase + limit;
-            retvals[2] = rambase + limit; /* Stack base */
-            retvals[3] = rambase; /* Stack limit.  */
+            /*
+             * Reporting 0 indicates we couldn't calculate the real
+             * values which should force most software to fall back to
+             * using information it has.
+             */
+            retvals[0] = 0; /* Heap Base */
+            retvals[1] = 0; /* Heap Limit */
+            retvals[2] = 0; /* Stack base */
+            retvals[3] = 0; /* Stack limit.  */
 #endif
 
             for (i = 0; i < ARRAY_SIZE(retvals); i++) {