Message ID | 160371602625.305923.7832478283946753271.stgit@bahia.lan |
---|---|
Headers | show |
Series | spapr: Error handling fixes and cleanups (round 5) | expand |
Heh... this is round 4 actually :) On Mon, 26 Oct 2020 13:40:26 +0100 Greg Kurz <groug@kaod.org> wrote: > Hi, > > This the last round I had on my queue for 5.2. Basically ensuring that > meaningful negative errnos get propagated to VMState, with some fairly > simple cleanups on the way. > > --- > > Greg Kurz (4): > spapr: qemu_memalign() doesn't return NULL > spapr: Use error_append_hint() in spapr_reallocate_hpt() > target/ppc: Fix kvmppc_load_htab_chunk() error reporting > spapr: Improve spapr_reallocate_hpt() error reporting > > > hw/ppc/spapr.c | 36 ++++++++++++++++++------------------ > hw/ppc/spapr_hcall.c | 8 ++------ > include/hw/ppc/spapr.h | 3 +-- > target/ppc/kvm.c | 11 +++++------ > target/ppc/kvm_ppc.h | 5 +++-- > 5 files changed, 29 insertions(+), 34 deletions(-) > > -- > Greg > >
On 10/26/20 1:40 PM, Greg Kurz wrote: > qemu_memalign() aborts if OOM. Drop some dead code. > > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > hw/ppc/spapr.c | 6 ------ > hw/ppc/spapr_hcall.c | 8 ++------ > 2 files changed, 2 insertions(+), 12 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 0cc19b5863a4..f098d0ee6d98 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, > int i; > > spapr->htab = qemu_memalign(size, size); > - if (!spapr->htab) { > - error_setg_errno(errp, errno, > - "Could not allocate HPT of order %d", shift); > - return; Wasn't the idea to use qemu_try_memalign() here? > - } > - > memset(spapr->htab, 0, size); > spapr->htab_shift = shift; > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > index 607740150fa2..34e146f628fb 100644 > --- a/hw/ppc/spapr_hcall.c > +++ b/hw/ppc/spapr_hcall.c > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque) > size_t size = 1ULL << pending->shift; > > pending->hpt = qemu_memalign(size, size); > - if (pending->hpt) { > - memset(pending->hpt, 0, size); > - pending->ret = H_SUCCESS; > - } else { > - pending->ret = H_NO_MEM; Ditto. > - } > + memset(pending->hpt, 0, size); > + pending->ret = H_SUCCESS; > > qemu_mutex_lock_iothread(); >
On Mon, 26 Oct 2020 14:43:08 +0100 Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > On 10/26/20 1:40 PM, Greg Kurz wrote: > > qemu_memalign() aborts if OOM. Drop some dead code. > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > --- > > hw/ppc/spapr.c | 6 ------ > > hw/ppc/spapr_hcall.c | 8 ++------ > > 2 files changed, 2 insertions(+), 12 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 0cc19b5863a4..f098d0ee6d98 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, > > int i; > > > > spapr->htab = qemu_memalign(size, size); > > - if (!spapr->htab) { > > - error_setg_errno(errp, errno, > > - "Could not allocate HPT of order %d", shift); > > - return; > > Wasn't the idea to use qemu_try_memalign() here? > Well... I have mixed feeling around this. The HTAB was first introduced by commit: commit f43e35255cffb6ac6230dd09d308f7909f823f96 Author: David Gibson <david@gibson.dropbear.id.au> Date: Fri Apr 1 15:15:22 2011 +1100 Virtual hash page table handling on pSeries machine using qemu_mallocz(), which was aborting on OOM. It then got replaced by g_malloc0() when qemu_mallocz() got deprecated and eventually by qemu_memalign() when KVM support was added. Surviving OOM when allocating the HTAB never seemed to be an option until this commit that introduced the check: commit c5f54f3e31bf693f70a98d4d73ea5dbe05689857 Author: David Gibson <david@gibson.dropbear.id.au> Date: Tue Feb 9 10:21:56 2016 +1000 pseries: Move hash page table allocation to reset time I don't really see in the patch and in the changelog an obvious desire to try to handle OOM. > > - } > > - > > memset(spapr->htab, 0, size); > > spapr->htab_shift = shift; > > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > > index 607740150fa2..34e146f628fb 100644 > > --- a/hw/ppc/spapr_hcall.c > > +++ b/hw/ppc/spapr_hcall.c > > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque) > > size_t size = 1ULL << pending->shift; > > > > pending->hpt = qemu_memalign(size, size); > > - if (pending->hpt) { > > - memset(pending->hpt, 0, size); > > - pending->ret = H_SUCCESS; > > - } else { > > - pending->ret = H_NO_MEM; > > Ditto. > This one was introduced by commit: commit 0b0b831016ae93bc14698a5d7202eb77feafea75 Author: David Gibson <david@gibson.dropbear.id.au> Date: Fri May 12 15:46:49 2017 +1000 pseries: Implement HPT resizing I agree that maybe the intent here could have been to use qemu_try_memalign(), but again I don't quite see any strong justification to handle OOM in the changelog. David, Any insight to share ? > > - } > > + memset(pending->hpt, 0, size); > > + pending->ret = H_SUCCESS; > > > > qemu_mutex_lock_iothread(); > > >
On Mon, Oct 26, 2020 at 03:46:47PM +0100, Greg Kurz wrote: > On Mon, 26 Oct 2020 14:43:08 +0100 > Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > > On 10/26/20 1:40 PM, Greg Kurz wrote: > > > qemu_memalign() aborts if OOM. Drop some dead code. > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > --- > > > hw/ppc/spapr.c | 6 ------ > > > hw/ppc/spapr_hcall.c | 8 ++------ > > > 2 files changed, 2 insertions(+), 12 deletions(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index 0cc19b5863a4..f098d0ee6d98 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, > > > int i; > > > > > > spapr->htab = qemu_memalign(size, size); > > > - if (!spapr->htab) { > > > - error_setg_errno(errp, errno, > > > - "Could not allocate HPT of order %d", shift); > > > - return; > > > > Wasn't the idea to use qemu_try_memalign() here? > > > > Well... I have mixed feeling around this. The HTAB was first > introduced by commit: > > commit f43e35255cffb6ac6230dd09d308f7909f823f96 > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Fri Apr 1 15:15:22 2011 +1100 > > Virtual hash page table handling on pSeries machine > > using qemu_mallocz(), which was aborting on OOM. It then got > replaced by g_malloc0() when qemu_mallocz() got deprecated > and eventually by qemu_memalign() when KVM support was added. > > Surviving OOM when allocating the HTAB never seemed to be an > option until this commit that introduced the check: > > commit c5f54f3e31bf693f70a98d4d73ea5dbe05689857 > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Tue Feb 9 10:21:56 2016 +1000 > > pseries: Move hash page table allocation to reset time > > I don't really see in the patch and in the changelog an obvious > desire to try to handle OOM. This one is probably ok. AFAICT all failures returned here would be more or less fatal in the caller, one way or another (&error_fatal in two cases, and failure to load an incoming migration stream in the other). > > > - } > > > - > > > memset(spapr->htab, 0, size); > > > spapr->htab_shift = shift; > > > > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > > > index 607740150fa2..34e146f628fb 100644 > > > --- a/hw/ppc/spapr_hcall.c > > > +++ b/hw/ppc/spapr_hcall.c > > > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque) > > > size_t size = 1ULL << pending->shift; > > > > > > pending->hpt = qemu_memalign(size, size); > > > - if (pending->hpt) { > > > - memset(pending->hpt, 0, size); > > > - pending->ret = H_SUCCESS; > > > - } else { > > > - pending->ret = H_NO_MEM; > > > > Ditto. > > > > This one was introduced by commit: > > commit 0b0b831016ae93bc14698a5d7202eb77feafea75 > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Fri May 12 15:46:49 2017 +1000 > > pseries: Implement HPT resizing > > I agree that maybe the intent here could have been to use qemu_try_memalign(), > but again I don't quite see any strong justification to handle OOM in the > changelog. > > David, > > Any insight to share ? Aborting on an HPT resize failure is definitely not ok, though. This one needs to be a qemu_try_memalign().
On Mon, Oct 26, 2020 at 01:40:40PM +0100, Greg Kurz wrote: > Hints should be added with the dedicated error_append_hint() API > because we don't want to print them when using QMP. This requires > to insert ERRP_GUARD as explained in "qapi/error.h". > > Signed-off-by: Greg Kurz <groug@kaod.org> Applied to ppc-for-5.2. > --- > hw/ppc/spapr.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index f098d0ee6d98..f51b663f7dcb 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1486,6 +1486,7 @@ void spapr_free_hpt(SpaprMachineState *spapr) > void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, > Error **errp) > { > + ERRP_GUARD(); > long rc; > > /* Clean up any HPT info from a previous boot */ > @@ -1500,17 +1501,18 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, > > if (rc < 0) { > /* kernel-side HPT needed, but couldn't allocate one */ > - error_setg_errno(errp, errno, > - "Failed to allocate KVM HPT of order %d (try smaller maxmem?)", > + error_setg_errno(errp, errno, "Failed to allocate KVM HPT of order %d", > shift); > + error_append_hint(errp, "Try smaller maxmem?\n"); > /* This is almost certainly fatal, but if the caller really > * wants to carry on with shift == 0, it's welcome to try */ > } else if (rc > 0) { > /* kernel-side HPT allocated */ > if (rc != shift) { > error_setg(errp, > - "Requested order %d HPT, but kernel allocated order %ld (try smaller maxmem?)", > + "Requested order %d HPT, but kernel allocated order %ld", > shift, rc); > + error_append_hint(errp, "Try smaller maxmem?\n"); > } > > spapr->htab_shift = shift; > >
On Mon, Oct 26, 2020 at 01:40:47PM +0100, Greg Kurz wrote: > If kvmppc_load_htab_chunk() fails, its return value is propagated up > to vmstate_load(). It should thus be a negative errno, not -1 (which > maps to EPERM and would lure the user into thinking that the problem > is necessarily related to a lack of privilege). > > Return the error reported by KVM or ENOSPC in case of short write. > While here, propagate the error message through an @errp argument > and have the caller to print it with error_report_err() instead > of relying on fprintf(). > > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > hw/ppc/spapr.c | 4 +++- > target/ppc/kvm.c | 11 +++++------ > target/ppc/kvm_ppc.h | 5 +++-- > 3 files changed, 11 insertions(+), 9 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index f51b663f7dcb..ff7de7da2875 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2341,8 +2341,10 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) > > assert(fd >= 0); > > - rc = kvmppc_load_htab_chunk(f, fd, index, n_valid, n_invalid); > + rc = kvmppc_load_htab_chunk(f, fd, index, n_valid, n_invalid, > + &local_err); > if (rc < 0) { > + error_report_err(local_err); > return rc; > } > } > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c > index d85ba8ffe00b..0223b93ea561 100644 > --- a/target/ppc/kvm.c > +++ b/target/ppc/kvm.c > @@ -2683,7 +2683,7 @@ int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns) > } > > int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index, > - uint16_t n_valid, uint16_t n_invalid) > + uint16_t n_valid, uint16_t n_invalid, Error **errp) > { > struct kvm_get_htab_header *buf; > size_t chunksize = sizeof(*buf) + n_valid * HASH_PTE_SIZE_64; > @@ -2698,14 +2698,13 @@ int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index, > > rc = write(fd, buf, chunksize); > if (rc < 0) { > - fprintf(stderr, "Error writing KVM hash table: %s\n", > - strerror(errno)); > - return rc; > + error_setg_errno(errp, errno, "Error writing the KVM hash table"); > + return -errno; > } > if (rc != chunksize) { > /* We should never get a short write on a single chunk */ > - fprintf(stderr, "Short write, restoring KVM hash table\n"); > - return -1; > + error_setg(errp, "Short write while restoring the KVM hash table"); > + return -ENOSPC; I'm not entirely sure -ENOSPC is the right choice here - this indicates that the kernel interface is not behaving as we expect. But I can't immediately think of what's a better choice, so, applied to ppc-for-5.2. > } > return 0; > } > diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h > index 72e05f1cd2fc..73ce2bc95114 100644 > --- a/target/ppc/kvm_ppc.h > +++ b/target/ppc/kvm_ppc.h > @@ -56,7 +56,7 @@ int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function); > int kvmppc_get_htab_fd(bool write, uint64_t index, Error **errp); > int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns); > int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index, > - uint16_t n_valid, uint16_t n_invalid); > + uint16_t n_valid, uint16_t n_invalid, Error **errp); > void kvmppc_read_hptes(ppc_hash_pte64_t *hptes, hwaddr ptex, int n); > void kvmppc_write_hpte(hwaddr ptex, uint64_t pte0, uint64_t pte1); > bool kvmppc_has_cap_fixup_hcalls(void); > @@ -316,7 +316,8 @@ static inline int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, > } > > static inline int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index, > - uint16_t n_valid, uint16_t n_invalid) > + uint16_t n_valid, uint16_t n_invalid, > + Error **errp) > { > abort(); > } > >
On Tue, 27 Oct 2020 12:56:40 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Mon, Oct 26, 2020 at 03:46:47PM +0100, Greg Kurz wrote: > > On Mon, 26 Oct 2020 14:43:08 +0100 > > Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > > > > On 10/26/20 1:40 PM, Greg Kurz wrote: > > > > qemu_memalign() aborts if OOM. Drop some dead code. > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > --- > > > > hw/ppc/spapr.c | 6 ------ > > > > hw/ppc/spapr_hcall.c | 8 ++------ > > > > 2 files changed, 2 insertions(+), 12 deletions(-) > > > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > index 0cc19b5863a4..f098d0ee6d98 100644 > > > > --- a/hw/ppc/spapr.c > > > > +++ b/hw/ppc/spapr.c > > > > @@ -1521,12 +1521,6 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, > > > > int i; > > > > > > > > spapr->htab = qemu_memalign(size, size); > > > > - if (!spapr->htab) { > > > > - error_setg_errno(errp, errno, > > > > - "Could not allocate HPT of order %d", shift); > > > > - return; > > > > > > Wasn't the idea to use qemu_try_memalign() here? > > > > > > > Well... I have mixed feeling around this. The HTAB was first > > introduced by commit: > > > > commit f43e35255cffb6ac6230dd09d308f7909f823f96 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri Apr 1 15:15:22 2011 +1100 > > > > Virtual hash page table handling on pSeries machine > > > > using qemu_mallocz(), which was aborting on OOM. It then got > > replaced by g_malloc0() when qemu_mallocz() got deprecated > > and eventually by qemu_memalign() when KVM support was added. > > > > Surviving OOM when allocating the HTAB never seemed to be an > > option until this commit that introduced the check: > > > > commit c5f54f3e31bf693f70a98d4d73ea5dbe05689857 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Tue Feb 9 10:21:56 2016 +1000 > > > > pseries: Move hash page table allocation to reset time > > > > I don't really see in the patch and in the changelog an obvious > > desire to try to handle OOM. > > > This one is probably ok. AFAICT all failures returned here would be > more or less fatal in the caller, one way or another (&error_fatal in > two cases, and failure to load an incoming migration stream in the > other). > > > > > - } > > > > - > > > > memset(spapr->htab, 0, size); > > > > spapr->htab_shift = shift; > > > > > > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > > > > index 607740150fa2..34e146f628fb 100644 > > > > --- a/hw/ppc/spapr_hcall.c > > > > +++ b/hw/ppc/spapr_hcall.c > > > > @@ -361,12 +361,8 @@ static void *hpt_prepare_thread(void *opaque) > > > > size_t size = 1ULL << pending->shift; > > > > > > > > pending->hpt = qemu_memalign(size, size); > > > > - if (pending->hpt) { > > > > - memset(pending->hpt, 0, size); > > > > - pending->ret = H_SUCCESS; > > > > - } else { > > > > - pending->ret = H_NO_MEM; > > > > > > Ditto. > > > > > > > This one was introduced by commit: > > > > commit 0b0b831016ae93bc14698a5d7202eb77feafea75 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri May 12 15:46:49 2017 +1000 > > > > pseries: Implement HPT resizing > > > > I agree that maybe the intent here could have been to use qemu_try_memalign(), > > but again I don't quite see any strong justification to handle OOM in the > > changelog. > > > > David, > > > > Any insight to share ? > > Aborting on an HPT resize failure is definitely not ok, though. This > one needs to be a qemu_try_memalign(). > Ok, I'll fix that.