diff mbox series

[v2,1/3] kgdb: Add kgdb_mem2ebin function for converting memory to binary format

Message ID 20241211153955.33518-2-tjarlama@gmail.com
State Superseded
Headers show
Series Add a new command in kgdb for vmcoreinfo | expand

Commit Message

Amal Raj T Dec. 11, 2024, 3:39 p.m. UTC
From: Amal Raj T <amalrajt@meta.com>

Add a new function kgdb_mem2ebin that converts memory
to binary format, escaping special characters
('$', '#', and '}'). kgdb_mem2ebin function ensures
that memory data is properly formatted and escaped
before being sent over the wire. Additionally, this
function reduces the amount of data exchanged between
debugger compared to hex.

Signed-off-by: Amal Raj T <amalrajt@meta.com>
---
 include/linux/kgdb.h   |  1 +
 kernel/debug/gdbstub.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

Comments

Doug Anderson Dec. 13, 2024, 12:55 a.m. UTC | #1
Hi,

On Wed, Dec 11, 2024 at 7:40 AM Amal Raj T <tjarlama@gmail.com> wrote:
>
> From: Amal Raj T <amalrajt@meta.com>
>
> Add a new function kgdb_mem2ebin that converts memory
> to binary format, escaping special characters
> ('$', '#', and '}'). kgdb_mem2ebin function ensures
> that memory data is properly formatted and escaped
> before being sent over the wire. Additionally, this
> function reduces the amount of data exchanged between
> debugger compared to hex.
>
> Signed-off-by: Amal Raj T <amalrajt@meta.com>
> ---
>  include/linux/kgdb.h   |  1 +
>  kernel/debug/gdbstub.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index 76e891ee9e37..fa3cf38a14de 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -322,6 +322,7 @@ extern struct kgdb_io *dbg_io_ops;
>
>  extern int kgdb_hex2long(char **ptr, unsigned long *long_val);
>  extern char *kgdb_mem2hex(char *mem, char *buf, int count);
> +extern char *kgdb_mem2ebin(char *mem, char *buf, int count);
>  extern int kgdb_hex2mem(char *buf, char *mem, int count);
>
>  extern int kgdb_isremovedbreak(unsigned long addr);
> diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
> index f625172d4b67..6198d2eb49c4 100644
> --- a/kernel/debug/gdbstub.c
> +++ b/kernel/debug/gdbstub.c
> @@ -257,6 +257,37 @@ char *kgdb_mem2hex(char *mem, char *buf, int count)
>         return buf;
>  }
>
> +/*
> + * Convert memory to binary format for GDB remote protocol
> + * transmission, escaping special characters ($, #, and }).

Why exactly are those characters special? What considers them special
and so why do you need to escape them? I guess you really just care
about avoiding # and $ and you're using '}' as your escape character
so you need to escape that too?

Your function comment should describe the escaping method and ideally
provide a few examples.


> + */
> +char *kgdb_mem2ebin(char *mem, char *buf, int count)

One of the two buffers seems like it should be "const", right? That
would help document which was input and which was output. I guess
"mem" is the input?

"count" should be "size_t"

Presumably there should be two counts talking about the sizes of each
buffer, or at least some documentation should be in the function
comment talking about the fact that "buf" needs to be twice the size?


> +{
> +       char *tmp;
> +       int err;
> +
> +       tmp = buf + count;

Could use a comment that the buffer needs to be 2x long to handle
escaping and that you'll use the 2nd half as a temp buffer.


> +
> +       err = copy_from_kernel_nofault(tmp, mem, count);
> +       if (err)
> +               return NULL;
> +       while (count > 0) {

If you change `count` to `size_t` the above test won't work because
it'll be unsigned. Still probably better to use `size_t`, but just a
warning that you'll have to change the condition.


> +               unsigned char c = *tmp;
> +
> +               if (c == 0x7d || c == 0x23 || c == 0x24) {
> +                       *buf++ = 0x7d;

Don't hardcode. Much better to use '}', '#', '$'


> +                       *buf++ = c ^ 0x20;
> +               } else {
> +                       *buf++ = c;
> +               }
> +               count -= 1;
> +               tmp += 1;

count--;
tmp++;

> +       }
> +       *buf = 0;

Why is the resulting buffer '\0' terminated when the source buffer
isn't? Adding this termination means that the destination buffer needs
to be 1 byte bigger, right?

...or maybe the source buffer doesn't actually have any embedded '\0'
characters and you're using this for termination to the other side? It
would be good to clarify.

In other words, if the input is 2 bytes big:
'}}'

The output buffer will be 5 bytes big:
'}]}]\0'

> +
> +       return buf;

What exactly is this return value? It points right past the end of the buffer?

You seem to just use it as an error value (NULL means error, non-NULL
means no error). Why not return an error code then?

-Doug
Daniel Thompson Jan. 8, 2025, 11:40 a.m. UTC | #2
On Wed, Dec 11, 2024 at 07:39:53AM -0800, Amal Raj T wrote:
> From: Amal Raj T <amalrajt@meta.com>
>
> Add a new function kgdb_mem2ebin that converts memory
> to binary format, escaping special characters
> ('$', '#', and '}').

What about the '*' character?

The gdb docs say "Responses sent by the stub must also escape 0x2a
(ASCII ‘*’), so that it is not interpreted as the start of a run-length
encoded sequence".


> kgdb_mem2ebin function ensures
> that memory data is properly formatted and escaped
> before being sent over the wire. Additionally, this
> function reduces the amount of data exchanged between
> debugger compared to hex.
>
> Signed-off-by: Amal Raj T <amalrajt@meta.com>
> ---
>  include/linux/kgdb.h   |  1 +
>  kernel/debug/gdbstub.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index 76e891ee9e37..fa3cf38a14de 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -322,6 +322,7 @@ extern struct kgdb_io *dbg_io_ops;
>
>  extern int kgdb_hex2long(char **ptr, unsigned long *long_val);
>  extern char *kgdb_mem2hex(char *mem, char *buf, int count);
> +extern char *kgdb_mem2ebin(char *mem, char *buf, int count);
>  extern int kgdb_hex2mem(char *buf, char *mem, int count);
>
>  extern int kgdb_isremovedbreak(unsigned long addr);
> diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
> index f625172d4b67..6198d2eb49c4 100644
> --- a/kernel/debug/gdbstub.c
> +++ b/kernel/debug/gdbstub.c
> @@ -257,6 +257,37 @@ char *kgdb_mem2hex(char *mem, char *buf, int count)
>  	return buf;
>  }
>
> +/*
> + * Convert memory to binary format for GDB remote protocol
> + * transmission, escaping special characters ($, #, and }).
> + */

It would be good to include a link to the following in this comment:
https://sourceware.org/gdb/current/onlinedocs/gdb.html/Overview.html#Binary-Data

Doug already mentioned putting buffer size expectations in this comment.
Maybe also mention that the worst case packing is identical to
kgdb_mem2hex() (e.g. 2*count + 1).


Daniel.
Doug Anderson Jan. 24, 2025, 11:17 p.m. UTC | #3
Hi,

On Mon, Jan 13, 2025 at 8:09 AM Amal <tjarlama@gmail.com> wrote:
>
> On Fri, Dec 13, 2024 at 12:55 AM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Wed, Dec 11, 2024 at 7:40 AM Amal Raj T <tjarlama@gmail.com> wrote:
> > >
> > > From: Amal Raj T <amalrajt@meta.com>
> > >
> > > Add a new function kgdb_mem2ebin that converts memory
> > > to binary format, escaping special characters
> > > ('$', '#', and '}'). kgdb_mem2ebin function ensures
> > > that memory data is properly formatted and escaped
> > > before being sent over the wire. Additionally, this
> > > function reduces the amount of data exchanged between
> > > debugger compared to hex.
> > >
> > > Signed-off-by: Amal Raj T <amalrajt@meta.com>
> > > ---
> > >  include/linux/kgdb.h   |  1 +
> > >  kernel/debug/gdbstub.c | 31 +++++++++++++++++++++++++++++++
> > >  2 files changed, 32 insertions(+)

FWIW, it looks like Stephen has already responded to your V3 and the
protocol might change a bunch. I'm not going to do a detailed review
on it at this time. I'll comment on a few of the things below, though.

One other note is that you should have done a "reply to all" when
responding to my review feedback, not just a reply to me. The
responses should be archived on the lists unless there was a specific
reason to exclude them. Adding the lists back here since I don't see
anything sensitive in your reply.


> > > diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> > > index 76e891ee9e37..fa3cf38a14de 100644
> > > --- a/include/linux/kgdb.h
> > > +++ b/include/linux/kgdb.h
> > > @@ -322,6 +322,7 @@ extern struct kgdb_io *dbg_io_ops;
> > >
> > >  extern int kgdb_hex2long(char **ptr, unsigned long *long_val);
> > >  extern char *kgdb_mem2hex(char *mem, char *buf, int count);
> > > +extern char *kgdb_mem2ebin(char *mem, char *buf, int count);
> > >  extern int kgdb_hex2mem(char *buf, char *mem, int count);
> > >
> > >  extern int kgdb_isremovedbreak(unsigned long addr);
> > > diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
> > > index f625172d4b67..6198d2eb49c4 100644
> > > --- a/kernel/debug/gdbstub.c
> > > +++ b/kernel/debug/gdbstub.c
> > > @@ -257,6 +257,37 @@ char *kgdb_mem2hex(char *mem, char *buf, int count)
> > >         return buf;
> > >  }
> > >
> > > +/*
> > > + * Convert memory to binary format for GDB remote protocol
> > > + * transmission, escaping special characters ($, #, and }).
> >
> > Why exactly are those characters special? What considers them special
> > and so why do you need to escape them? I guess you really just care
> > about avoiding # and $ and you're using '}' as your escape character
> > so you need to escape that too?
> >
> > Your function comment should describe the escaping method and ideally
> > provide a few examples.
> >
> >
> Added relevant links in the commit message and commented two examples
> in code
> > > + */
> > > +char *kgdb_mem2ebin(char *mem, char *buf, int count)
> >
> > One of the two buffers seems like it should be "const", right? That
> > would help document which was input and which was output. I guess
> > "mem" is the input?
> >
> > "count" should be "size_t"
> >
> This was copied from `kgdb_mem2hex` which uses a similar template.
> Should we continue using `int` or just this function need to be migrated to
> `size_t`

In general the existing kdb/kgdb functions are not always the best
examples. I think size_t is correct here. Indeed, if you look at what
"kgdb_mem2hex" does with "count" it passes it to
copy_from_kernel_nofault() which takes a size_t.

Feel free to post a (separate) patch fixing kgdb_mem2hex().


> > Presumably there should be two counts talking about the sizes of each
> > buffer, or at least some documentation should be in the function
> > comment talking about the fact that "buf" needs to be twice the size?
> >
> >
> > > +{
> > > +       char *tmp;
> > > +       int err;
> > > +
> > > +       tmp = buf + count;
> >
> > Could use a comment that the buffer needs to be 2x long to handle
> > escaping and that you'll use the 2nd half as a temp buffer.
> >
> >
> Done, commented in v3
> > > +
> > > +       err = copy_from_kernel_nofault(tmp, mem, count);
> > > +       if (err)
> > > +               return NULL;
> > > +       while (count > 0) {
> >
> > If you change `count` to `size_t` the above test won't work because
> > it'll be unsigned. Still probably better to use `size_t`, but just a
> > warning that you'll have to change the condition.
> >
> >
> > > +               unsigned char c = *tmp;
> > > +
> > > +               if (c == 0x7d || c == 0x23 || c == 0x24) {
> > > +                       *buf++ = 0x7d;
> >
> > Don't hardcode. Much better to use '}', '#', '$'
> >
> Fixed in v3
> >
> > > +                       *buf++ = c ^ 0x20;
> > > +               } else {
> > > +                       *buf++ = c;
> > > +               }
> > > +               count -= 1;
> > > +               tmp += 1;
> >
> > count--;
> > tmp++;
> >
> Fixed in v3
> > > +       }
> > > +       *buf = 0;
> >
> > Why is the resulting buffer '\0' terminated when the source buffer
> > isn't? Adding this termination means that the destination buffer needs
> > to be 1 byte bigger, right?
> >
> > ...or maybe the source buffer doesn't actually have any embedded '\0'
> > characters and you're using this for termination to the other side? It
> > would be good to clarify.
> >
> > In other words, if the input is 2 bytes big:
> > '}}'
> >
> > The output buffer will be 5 bytes big:
> > '}]}]\0'
> >
> Don't think null termination is required in this case.
> In this case, the `gdb_serial_stub` already initializes the buffer with `\0`,
> and `put_packet` iterates over the buffer until the first occurrence of \0.
> This works for `vmcoreinfo` query. But for input buffers with additional
> `\0` characters need to be escaped, to make sure `put_packet` parsed the
> entire buffer.
> > > +
> > > +       return buf;
> >
> > What exactly is this return value? It points right past the end of the buffer?
> >
> > You seem to just use it as an error value (NULL means error, non-NULL
> > means no error). Why not return an error code then?
> >
> Other encoders also return a `char *`. However there is no error handling
> anywhere, the return value is discarded at the caller side.

As per above, existing kdb/kgdb code isn't always the best example to
follow. Personally I'd rather see an error code returned unless
there's a good reason not to.

If you really want to return a pointer like this, a bare minimum would
be to document it. The kgdb_mem2hex() function you pointed to _does_
at least document what is returned.

-Doug
diff mbox series

Patch

diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 76e891ee9e37..fa3cf38a14de 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -322,6 +322,7 @@  extern struct kgdb_io *dbg_io_ops;
 
 extern int kgdb_hex2long(char **ptr, unsigned long *long_val);
 extern char *kgdb_mem2hex(char *mem, char *buf, int count);
+extern char *kgdb_mem2ebin(char *mem, char *buf, int count);
 extern int kgdb_hex2mem(char *buf, char *mem, int count);
 
 extern int kgdb_isremovedbreak(unsigned long addr);
diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
index f625172d4b67..6198d2eb49c4 100644
--- a/kernel/debug/gdbstub.c
+++ b/kernel/debug/gdbstub.c
@@ -257,6 +257,37 @@  char *kgdb_mem2hex(char *mem, char *buf, int count)
 	return buf;
 }
 
+/*
+ * Convert memory to binary format for GDB remote protocol
+ * transmission, escaping special characters ($, #, and }).
+ */
+char *kgdb_mem2ebin(char *mem, char *buf, int count)
+{
+	char *tmp;
+	int err;
+
+	tmp = buf + count;
+
+	err = copy_from_kernel_nofault(tmp, mem, count);
+	if (err)
+		return NULL;
+	while (count > 0) {
+		unsigned char c = *tmp;
+
+		if (c == 0x7d || c == 0x23 || c == 0x24) {
+			*buf++ = 0x7d;
+			*buf++ = c ^ 0x20;
+		} else {
+			*buf++ = c;
+		}
+		count -= 1;
+		tmp += 1;
+	}
+	*buf = 0;
+
+	return buf;
+}
+
 /*
  * Convert the hex array pointed to by buf into binary to be placed in
  * mem.  Return a pointer to the character AFTER the last byte