Message ID | 20201102153010.11979-2-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Two extra fixes | expand |
* Peter Xu (peterx@redhat.com) wrote: > When postcopy recover happens, we need to reset last_rb after each return of > postcopy_pause_fault_thread() because that means we just got the postcopy > migration continued. > > Unify this reset to the place right before we want to kick the fault thread > again, when we get the command MIG_CMD_POSTCOPY_RESUME from source. > > This is actually more than that - because the main thread on destination will > now be able to call migrate_send_rp_req_pages_pending() too, so the fault > thread is not the only user of last_rb now. Move the reset earlier will allow > the first call to migrate_send_rp_req_pages_pending() to use the reset value > even if called from the main thread. > > (NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is just > a mark that when picking up 0c26781c09 we'd better have this one too; the real > fix will come later) > > Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery") > Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/postcopy-ram.c | 2 -- > migration/savevm.c | 6 ++++++ > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index d3bb3a744b..d99842eb1b 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -903,7 +903,6 @@ static void *postcopy_ram_fault_thread(void *opaque) > * the channel is rebuilt. > */ > if (postcopy_pause_fault_thread(mis)) { > - mis->last_rb = NULL; > /* Continue to read the userfaultfd */ > } else { > error_report("%s: paused but don't allow to continue", > @@ -985,7 +984,6 @@ retry: > /* May be network failure, try to wait for recovery */ > if (ret == -EIO && postcopy_pause_fault_thread(mis)) { > /* We got reconnected somehow, try to continue */ > - mis->last_rb = NULL; > goto retry; > } else { > /* This is a unavoidable fault */ > diff --git a/migration/savevm.c b/migration/savevm.c > index 21ccba9fb3..e8834991ec 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2061,6 +2061,12 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > return 0; > } > > + /* > + * Reset the last_rb before we resend any page req to source again, since > + * the source should have it reset already. > + */ > + mis->last_rb = NULL; > + > /* > * This means source VM is ready to resume the postcopy migration. > * It's time to switch state and release the fault thread to > -- > 2.26.2 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index d3bb3a744b..d99842eb1b 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -903,7 +903,6 @@ static void *postcopy_ram_fault_thread(void *opaque) * the channel is rebuilt. */ if (postcopy_pause_fault_thread(mis)) { - mis->last_rb = NULL; /* Continue to read the userfaultfd */ } else { error_report("%s: paused but don't allow to continue", @@ -985,7 +984,6 @@ retry: /* May be network failure, try to wait for recovery */ if (ret == -EIO && postcopy_pause_fault_thread(mis)) { /* We got reconnected somehow, try to continue */ - mis->last_rb = NULL; goto retry; } else { /* This is a unavoidable fault */ diff --git a/migration/savevm.c b/migration/savevm.c index 21ccba9fb3..e8834991ec 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2061,6 +2061,12 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) return 0; } + /* + * Reset the last_rb before we resend any page req to source again, since + * the source should have it reset already. + */ + mis->last_rb = NULL; + /* * This means source VM is ready to resume the postcopy migration. * It's time to switch state and release the fault thread to