mbox series

[0/4] spapr: Error handling fixes and cleanups (round 5)

Message ID 160371602625.305923.7832478283946753271.stgit@bahia.lan
Headers show
Series spapr: Error handling fixes and cleanups (round 5) | expand

Message

Greg Kurz Oct. 26, 2020, 12:40 p.m. UTC
Hi,

This the last round I had on my queue for 5.2. Basically ensuring that
meaningful negative errnos get propagated to VMState, with some fairly
simple cleanups on the way.

---

Greg Kurz (4):
      spapr: qemu_memalign() doesn't return NULL
      spapr: Use error_append_hint() in spapr_reallocate_hpt()
      target/ppc: Fix kvmppc_load_htab_chunk() error reporting
      spapr: Improve spapr_reallocate_hpt() error reporting


 hw/ppc/spapr.c         |   36 ++++++++++++++++++------------------
 hw/ppc/spapr_hcall.c   |    8 ++------
 include/hw/ppc/spapr.h |    3 +--
 target/ppc/kvm.c       |   11 +++++------
 target/ppc/kvm_ppc.h   |    5 +++--
 5 files changed, 29 insertions(+), 34 deletions(-)

--
Greg

Comments

Greg Kurz Oct. 26, 2020, 12:53 p.m. UTC | #1
Heh... this is round 4 actually :)

On Mon, 26 Oct 2020 13:40:26 +0100
Greg Kurz <groug@kaod.org> wrote:

> Hi,

> 

> This the last round I had on my queue for 5.2. Basically ensuring that

> meaningful negative errnos get propagated to VMState, with some fairly

> simple cleanups on the way.

> 

> ---

> 

> Greg Kurz (4):

>       spapr: qemu_memalign() doesn't return NULL

>       spapr: Use error_append_hint() in spapr_reallocate_hpt()

>       target/ppc: Fix kvmppc_load_htab_chunk() error reporting

>       spapr: Improve spapr_reallocate_hpt() error reporting

> 

> 

>  hw/ppc/spapr.c         |   36 ++++++++++++++++++------------------

>  hw/ppc/spapr_hcall.c   |    8 ++------

>  include/hw/ppc/spapr.h |    3 +--

>  target/ppc/kvm.c       |   11 +++++------

>  target/ppc/kvm_ppc.h   |    5 +++--

>  5 files changed, 29 insertions(+), 34 deletions(-)

> 

> --

> Greg

> 

>
Philippe Mathieu-Daudé Oct. 26, 2020, 1:43 p.m. UTC | #2
On 10/26/20 1:40 PM, Greg Kurz wrote:
> qemu_memalign() aborts if OOM. Drop some dead code.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>   hw/ppc/spapr.c       |    6 ------
>   hw/ppc/spapr_hcall.c |    8 ++------
>   2 files changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0cc19b5863a4..f098d0ee6d98 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
>           int i;
>   
>           spapr->htab = qemu_memalign(size, size);
> -        if (!spapr->htab) {
> -            error_setg_errno(errp, errno,
> -                             "Could not allocate HPT of order %d", shift);
> -            return;

Wasn't the idea to use qemu_try_memalign() here?

> -        }
> -
>           memset(spapr->htab, 0, size);
>           spapr->htab_shift = shift;
>   
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 607740150fa2..34e146f628fb 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque)
>       size_t size = 1ULL << pending->shift;
>   
>       pending->hpt = qemu_memalign(size, size);
> -    if (pending->hpt) {
> -        memset(pending->hpt, 0, size);
> -        pending->ret = H_SUCCESS;
> -    } else {
> -        pending->ret = H_NO_MEM;

Ditto.

> -    }
> +    memset(pending->hpt, 0, size);
> +    pending->ret = H_SUCCESS;
>   
>       qemu_mutex_lock_iothread();
>
Greg Kurz Oct. 26, 2020, 2:46 p.m. UTC | #3
On Mon, 26 Oct 2020 14:43:08 +0100
Philippe Mathieu-Daudé <philmd@redhat.com> wrote:

> On 10/26/20 1:40 PM, Greg Kurz wrote:

> > qemu_memalign() aborts if OOM. Drop some dead code.

> > 

> > Signed-off-by: Greg Kurz <groug@kaod.org>

> > ---

> >   hw/ppc/spapr.c       |    6 ------

> >   hw/ppc/spapr_hcall.c |    8 ++------

> >   2 files changed, 2 insertions(+), 12 deletions(-)

> > 

> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c

> > index 0cc19b5863a4..f098d0ee6d98 100644

> > --- a/hw/ppc/spapr.c

> > +++ b/hw/ppc/spapr.c

> > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,

> >           int i;

> >   

> >           spapr->htab = qemu_memalign(size, size);

> > -        if (!spapr->htab) {

> > -            error_setg_errno(errp, errno,

> > -                             "Could not allocate HPT of order %d", shift);

> > -            return;

> 

> Wasn't the idea to use qemu_try_memalign() here?

> 


Well... I have mixed feeling around this. The HTAB was first
introduced by commit:

commit f43e35255cffb6ac6230dd09d308f7909f823f96
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Fri Apr 1 15:15:22 2011 +1100

    Virtual hash page table handling on pSeries machine

using qemu_mallocz(), which was aborting on OOM. It then got
replaced by g_malloc0() when qemu_mallocz() got deprecated
and eventually by qemu_memalign() when KVM support was added.

Surviving OOM when allocating the HTAB never seemed to be an
option until this commit that introduced the check:

commit c5f54f3e31bf693f70a98d4d73ea5dbe05689857
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Tue Feb 9 10:21:56 2016 +1000

    pseries: Move hash page table allocation to reset time

I don't really see in the patch and in the changelog an obvious
desire to try to handle OOM.

> > -        }

> > -

> >           memset(spapr->htab, 0, size);

> >           spapr->htab_shift = shift;

> >   

> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c

> > index 607740150fa2..34e146f628fb 100644

> > --- a/hw/ppc/spapr_hcall.c

> > +++ b/hw/ppc/spapr_hcall.c

> > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque)

> >       size_t size = 1ULL << pending->shift;

> >   

> >       pending->hpt = qemu_memalign(size, size);

> > -    if (pending->hpt) {

> > -        memset(pending->hpt, 0, size);

> > -        pending->ret = H_SUCCESS;

> > -    } else {

> > -        pending->ret = H_NO_MEM;

> 

> Ditto.

> 


This one was introduced by commit:

commit 0b0b831016ae93bc14698a5d7202eb77feafea75
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Fri May 12 15:46:49 2017 +1000

    pseries: Implement HPT resizing

I agree that maybe the intent here could have been to use qemu_try_memalign(),
but again I don't quite see any strong justification to handle OOM in the
changelog.

David,

Any insight to share ?

> > -    }

> > +    memset(pending->hpt, 0, size);

> > +    pending->ret = H_SUCCESS;

> >   

> >       qemu_mutex_lock_iothread();

> >   

>
David Gibson Oct. 27, 2020, 1:56 a.m. UTC | #4
On Mon, Oct 26, 2020 at 03:46:47PM +0100, Greg Kurz wrote:
> On Mon, 26 Oct 2020 14:43:08 +0100
> Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> 
> > On 10/26/20 1:40 PM, Greg Kurz wrote:
> > > qemu_memalign() aborts if OOM. Drop some dead code.
> > > 
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > ---
> > >   hw/ppc/spapr.c       |    6 ------
> > >   hw/ppc/spapr_hcall.c |    8 ++------
> > >   2 files changed, 2 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 0cc19b5863a4..f098d0ee6d98 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
> > >           int i;
> > >   
> > >           spapr->htab = qemu_memalign(size, size);
> > > -        if (!spapr->htab) {
> > > -            error_setg_errno(errp, errno,
> > > -                             "Could not allocate HPT of order %d", shift);
> > > -            return;
> > 
> > Wasn't the idea to use qemu_try_memalign() here?
> > 
> 
> Well... I have mixed feeling around this. The HTAB was first
> introduced by commit:
> 
> commit f43e35255cffb6ac6230dd09d308f7909f823f96
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Fri Apr 1 15:15:22 2011 +1100
> 
>     Virtual hash page table handling on pSeries machine
> 
> using qemu_mallocz(), which was aborting on OOM. It then got
> replaced by g_malloc0() when qemu_mallocz() got deprecated
> and eventually by qemu_memalign() when KVM support was added.
> 
> Surviving OOM when allocating the HTAB never seemed to be an
> option until this commit that introduced the check:
> 
> commit c5f54f3e31bf693f70a98d4d73ea5dbe05689857
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Tue Feb 9 10:21:56 2016 +1000
> 
>     pseries: Move hash page table allocation to reset time
> 
> I don't really see in the patch and in the changelog an obvious
> desire to try to handle OOM.


This one is probably ok.  AFAICT all failures returned here would be
more or less fatal in the caller, one way or another (&error_fatal in
two cases, and failure to load an incoming migration stream in the
other).

> > > -        }
> > > -
> > >           memset(spapr->htab, 0, size);
> > >           spapr->htab_shift = shift;
> > >   
> > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > index 607740150fa2..34e146f628fb 100644
> > > --- a/hw/ppc/spapr_hcall.c
> > > +++ b/hw/ppc/spapr_hcall.c
> > > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque)
> > >       size_t size = 1ULL << pending->shift;
> > >   
> > >       pending->hpt = qemu_memalign(size, size);
> > > -    if (pending->hpt) {
> > > -        memset(pending->hpt, 0, size);
> > > -        pending->ret = H_SUCCESS;
> > > -    } else {
> > > -        pending->ret = H_NO_MEM;
> > 
> > Ditto.
> > 
> 
> This one was introduced by commit:
> 
> commit 0b0b831016ae93bc14698a5d7202eb77feafea75
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Fri May 12 15:46:49 2017 +1000
> 
>     pseries: Implement HPT resizing
> 
> I agree that maybe the intent here could have been to use qemu_try_memalign(),
> but again I don't quite see any strong justification to handle OOM in the
> changelog.
> 
> David,
> 
> Any insight to share ?

Aborting on an HPT resize failure is definitely not ok, though.  This
one needs to be a qemu_try_memalign().
David Gibson Oct. 27, 2020, 1:57 a.m. UTC | #5
On Mon, Oct 26, 2020 at 01:40:40PM +0100, Greg Kurz wrote:
> Hints should be added with the dedicated error_append_hint() API
> because we don't want to print them when using QMP. This requires
> to insert ERRP_GUARD as explained in "qapi/error.h".
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Applied to ppc-for-5.2.

> ---
>  hw/ppc/spapr.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f098d0ee6d98..f51b663f7dcb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1486,6 +1486,7 @@ void spapr_free_hpt(SpaprMachineState *spapr)
>  void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
>                            Error **errp)
>  {
> +    ERRP_GUARD();
>      long rc;
>  
>      /* Clean up any HPT info from a previous boot */
> @@ -1500,17 +1501,18 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
>  
>      if (rc < 0) {
>          /* kernel-side HPT needed, but couldn't allocate one */
> -        error_setg_errno(errp, errno,
> -                         "Failed to allocate KVM HPT of order %d (try smaller maxmem?)",
> +        error_setg_errno(errp, errno, "Failed to allocate KVM HPT of order %d",
>                           shift);
> +        error_append_hint(errp, "Try smaller maxmem?\n");
>          /* This is almost certainly fatal, but if the caller really
>           * wants to carry on with shift == 0, it's welcome to try */
>      } else if (rc > 0) {
>          /* kernel-side HPT allocated */
>          if (rc != shift) {
>              error_setg(errp,
> -                       "Requested order %d HPT, but kernel allocated order %ld (try smaller maxmem?)",
> +                       "Requested order %d HPT, but kernel allocated order %ld",
>                         shift, rc);
> +            error_append_hint(errp, "Try smaller maxmem?\n");
>          }
>  
>          spapr->htab_shift = shift;
> 
>
David Gibson Oct. 27, 2020, 2 a.m. UTC | #6
On Mon, Oct 26, 2020 at 01:40:47PM +0100, Greg Kurz wrote:
> If kvmppc_load_htab_chunk() fails, its return value is propagated up
> to vmstate_load(). It should thus be a negative errno, not -1 (which
> maps to EPERM and would lure the user into thinking that the problem
> is necessarily related to a lack of privilege).
> 
> Return the error reported by KVM or ENOSPC in case of short write.
> While here, propagate the error message through an @errp argument
> and have the caller to print it with error_report_err() instead
> of relying on fprintf().
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/ppc/spapr.c       |    4 +++-
>  target/ppc/kvm.c     |   11 +++++------
>  target/ppc/kvm_ppc.h |    5 +++--
>  3 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f51b663f7dcb..ff7de7da2875 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2341,8 +2341,10 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
>  
>              assert(fd >= 0);
>  
> -            rc = kvmppc_load_htab_chunk(f, fd, index, n_valid, n_invalid);
> +            rc = kvmppc_load_htab_chunk(f, fd, index, n_valid, n_invalid,
> +                                        &local_err);
>              if (rc < 0) {
> +                error_report_err(local_err);
>                  return rc;
>              }
>          }
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index d85ba8ffe00b..0223b93ea561 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2683,7 +2683,7 @@ int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns)
>  }
>  
>  int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
> -                           uint16_t n_valid, uint16_t n_invalid)
> +                           uint16_t n_valid, uint16_t n_invalid, Error **errp)
>  {
>      struct kvm_get_htab_header *buf;
>      size_t chunksize = sizeof(*buf) + n_valid * HASH_PTE_SIZE_64;
> @@ -2698,14 +2698,13 @@ int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
>  
>      rc = write(fd, buf, chunksize);
>      if (rc < 0) {
> -        fprintf(stderr, "Error writing KVM hash table: %s\n",
> -                strerror(errno));
> -        return rc;
> +        error_setg_errno(errp, errno, "Error writing the KVM hash table");
> +        return -errno;
>      }
>      if (rc != chunksize) {
>          /* We should never get a short write on a single chunk */
> -        fprintf(stderr, "Short write, restoring KVM hash table\n");
> -        return -1;
> +        error_setg(errp, "Short write while restoring the KVM hash table");
> +        return -ENOSPC;

I'm not entirely sure -ENOSPC is the right choice here - this
indicates that the kernel interface is not behaving as we expect.  But
I can't immediately think of what's a better choice, so, applied to
ppc-for-5.2.


>      }
>      return 0;
>  }
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 72e05f1cd2fc..73ce2bc95114 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -56,7 +56,7 @@ int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
>  int kvmppc_get_htab_fd(bool write, uint64_t index, Error **errp);
>  int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns);
>  int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
> -                           uint16_t n_valid, uint16_t n_invalid);
> +                           uint16_t n_valid, uint16_t n_invalid, Error **errp);
>  void kvmppc_read_hptes(ppc_hash_pte64_t *hptes, hwaddr ptex, int n);
>  void kvmppc_write_hpte(hwaddr ptex, uint64_t pte0, uint64_t pte1);
>  bool kvmppc_has_cap_fixup_hcalls(void);
> @@ -316,7 +316,8 @@ static inline int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize,
>  }
>  
>  static inline int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
> -                                         uint16_t n_valid, uint16_t n_invalid)
> +                                         uint16_t n_valid, uint16_t n_invalid,
> +                                         Error **errp)
>  {
>      abort();
>  }
> 
>
Greg Kurz Oct. 27, 2020, 7:32 a.m. UTC | #7
On Tue, 27 Oct 2020 12:56:40 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Oct 26, 2020 at 03:46:47PM +0100, Greg Kurz wrote:
> > On Mon, 26 Oct 2020 14:43:08 +0100
> > Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> > 
> > > On 10/26/20 1:40 PM, Greg Kurz wrote:
> > > > qemu_memalign() aborts if OOM. Drop some dead code.
> > > > 
> > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > ---
> > > >   hw/ppc/spapr.c       |    6 ------
> > > >   hw/ppc/spapr_hcall.c |    8 ++------
> > > >   2 files changed, 2 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index 0cc19b5863a4..f098d0ee6d98 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
> > > >           int i;
> > > >   
> > > >           spapr->htab = qemu_memalign(size, size);
> > > > -        if (!spapr->htab) {
> > > > -            error_setg_errno(errp, errno,
> > > > -                             "Could not allocate HPT of order %d", shift);
> > > > -            return;
> > > 
> > > Wasn't the idea to use qemu_try_memalign() here?
> > > 
> > 
> > Well... I have mixed feeling around this. The HTAB was first
> > introduced by commit:
> > 
> > commit f43e35255cffb6ac6230dd09d308f7909f823f96
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri Apr 1 15:15:22 2011 +1100
> > 
> >     Virtual hash page table handling on pSeries machine
> > 
> > using qemu_mallocz(), which was aborting on OOM. It then got
> > replaced by g_malloc0() when qemu_mallocz() got deprecated
> > and eventually by qemu_memalign() when KVM support was added.
> > 
> > Surviving OOM when allocating the HTAB never seemed to be an
> > option until this commit that introduced the check:
> > 
> > commit c5f54f3e31bf693f70a98d4d73ea5dbe05689857
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Tue Feb 9 10:21:56 2016 +1000
> > 
> >     pseries: Move hash page table allocation to reset time
> > 
> > I don't really see in the patch and in the changelog an obvious
> > desire to try to handle OOM.
> 
> 
> This one is probably ok.  AFAICT all failures returned here would be
> more or less fatal in the caller, one way or another (&error_fatal in
> two cases, and failure to load an incoming migration stream in the
> other).
> 
> > > > -        }
> > > > -
> > > >           memset(spapr->htab, 0, size);
> > > >           spapr->htab_shift = shift;
> > > >   
> > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > > index 607740150fa2..34e146f628fb 100644
> > > > --- a/hw/ppc/spapr_hcall.c
> > > > +++ b/hw/ppc/spapr_hcall.c
> > > > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque)
> > > >       size_t size = 1ULL << pending->shift;
> > > >   
> > > >       pending->hpt = qemu_memalign(size, size);
> > > > -    if (pending->hpt) {
> > > > -        memset(pending->hpt, 0, size);
> > > > -        pending->ret = H_SUCCESS;
> > > > -    } else {
> > > > -        pending->ret = H_NO_MEM;
> > > 
> > > Ditto.
> > > 
> > 
> > This one was introduced by commit:
> > 
> > commit 0b0b831016ae93bc14698a5d7202eb77feafea75
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri May 12 15:46:49 2017 +1000
> > 
> >     pseries: Implement HPT resizing
> > 
> > I agree that maybe the intent here could have been to use qemu_try_memalign(),
> > but again I don't quite see any strong justification to handle OOM in the
> > changelog.
> > 
> > David,
> > 
> > Any insight to share ?
> 
> Aborting on an HPT resize failure is definitely not ok, though.  This
> one needs to be a qemu_try_memalign().
> 

Ok, I'll fix that.