mbox series

[0/5] migration/postcopy: Sync faulted addresses after network recovered

Message ID 20200903152646.93336-1-peterx@redhat.com
Headers show
Series migration/postcopy: Sync faulted addresses after network recovered | expand

Message

Peter Xu Sept. 3, 2020, 3:26 p.m. UTC
We've seen conditional guest hangs on destination VM after postcopy recovered.
However the hang will resolve itself after a few minutes.

The problem is: after a postcopy recovery, the prioritized postcopy queue on
the source VM is actually missing.  So all the faulted threads before the
postcopy recovery happened will keep halted until (accidentally) the page got
copied by the background precopy migration stream.

The solution is to also refresh this information after postcopy recovery.  To
achieve this, we need to maintain a list of faulted addresses on the
destination node, so that we can resend the list when necessary.  This work is
done via patch 1-4.

With that, the last thing we need to do is to send this extra information to
source VM after recovered.  Very luckily, this synchronization can be
"emulated" by sending a bunch of page requests (although these pages have been
sent previously!) to source VM just like when we've got a page fault.  Even in
the 1st version of the postcopy code we'll handle duplicated pages well.  So
this fix does not even need a new capability bit and it'll work smoothly on old
QEMUs when we migrate from them to the new QEMUs.

Please review, thanks.

Peter Xu (5):
  migration: Rework migrate_send_rp_req_pages() function
  migration: Introduce migrate_send_rp_message_req_pages()
  migration: Pass incoming state into qemu_ufd_copy_ioctl()
  migration: Maintain postcopy faulted addresses
  migration: Sync requested pages after postcopy recovery

 migration/migration.c    | 71 +++++++++++++++++++++++++++++++++++-----
 migration/migration.h    | 23 +++++++++++--
 migration/postcopy-ram.c | 46 +++++++++++---------------
 migration/savevm.c       | 56 +++++++++++++++++++++++++++++++
 migration/trace-events   |  3 ++
 5 files changed, 163 insertions(+), 36 deletions(-)

-- 
2.26.2

Comments

Dr. David Alan Gilbert Sept. 8, 2020, 9:57 a.m. UTC | #1
* Peter Xu (peterx@redhat.com) wrote:
> This is another layer wrapper for sending a page request to the source VM,

Ah, it's not obvious why this is needed until 4/5 :-)


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 10 ++++++++--
>  migration/migration.h |  2 ++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6761e3f233..6b43ffddbd 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -316,8 +316,8 @@ error:
>   *   Start: Address offset within the RB
>   *   Len: Length in bytes required - must be a multiple of pagesize
>   */
> -int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
> -                              ram_addr_t start)
> +int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
> +                                      RAMBlock *rb, ram_addr_t start)
>  {
>      uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
>      size_t msglen = 12; /* start + len */
> @@ -353,6 +353,12 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
>      return migrate_send_rp_message(mis, msg_type, msglen, bufc);
>  }
>  
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis,
> +                              RAMBlock *rb, ram_addr_t start)
> +{
> +    return migrate_send_rp_message_req_pages(mis, rb, start);
> +}
> +
>  static bool migration_colo_enabled;
>  bool migration_incoming_colo_enabled(void)
>  {
> diff --git a/migration/migration.h b/migration/migration.h
> index ca8dc4c773..f552725305 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -330,6 +330,8 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
>                                ram_addr_t start);
> +int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
> +                                      RAMBlock *rb, ram_addr_t start);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                   char *block_name);
>  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> -- 
> 2.26.2
>
Peter Xu Sept. 8, 2020, 8:20 p.m. UTC | #2
On Tue, Sep 08, 2020 at 10:57:57AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > This is another layer wrapper for sending a page request to the source VM,
> 
> Ah, it's not obvious why this is needed until 4/5 :-)

Yeah. :) I'll try to move this to be before that one in the next version.

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!