Message ID | alpine.DEB.2.02.1410221313370.876@kaball.uk.xensource.com |
---|---|
State | New |
Headers | show |
On 22/10/2014 11:13 PM, Stefano Stabellini wrote: > On Wed, 22 Oct 2014, Ian Campbell wrote: >> On Wed, 2014-10-22 at 12:57 +0100, Stefano Stabellini wrote: >>> On Wed, 22 Oct 2014, Ian Campbell wrote: >>>> On Wed, 2014-10-22 at 11:59 +0200, Samuel Thibault wrote: >>>>> Ian Campbell, le Wed 22 Oct 2014 10:00:36 +0100, a écrit : >>>>>> On Wed, 2014-10-22 at 08:24 +1100, Steven Haigh wrote: >>>>>>> As a side note to this - if I use pygrub as a bootloader vs using >>>>>>> pvgrub, then VNC works perfectly. >>>>>>> >>>>>>> So, what options exist to make pvgrub behave properly for booting with >>>>>>> VNC enabled? >>>>>> >>>>>> ISTR (vaguely) that way back when the backends needed to be modified to >>>>>> cope with kexec (which is effectively what pvgrub does) by not exiting >>>>>> when the frontend disconnects, instead sticking around waiting for a new >>>>>> frontend, this relates somehow to the "online" key in xenstore. >>>>>> >>>>>> Perhaps the pvfb backend never got that treatment, which would explain >>>>>> #2? >>>>> >>>>> Probably, yes. >>>> >>>> Adding Stefano and Anthony, since the backend in this case is in qemu. >>>> >>>> When the frontend disconnects and the online node == 1 then the backend >>>> is supposed to go from Closed back to InitWait and wait for a new >>>> connection, as opposed to shutting down. This is needed for kexec (which >>>> pvgrub uses). >>>> >>>> I can see some handling of the online node in hw/xen/xen_backend.c but >>>> it doesn't look like it would do what is needed here. I also don't see >>>> any handling in either hw/block/xen_disk.c or hw/display/xenfb.c. Which >>>> makes me suspect that as well as pvfb not working with kexec/pvgrub >>>> neither does the qdisk backend, which would be unfortunate. >>> >>> Looking at the code in xen_backend.c, it seems that on XenbusStateClosed >>> xen_backend is going to try to reset to XenbusStateInitialising, unless >>> the frontend state is XenbusStateInitialising (no idea why). See: >>> xen_be_try_reset and xen_be_check_state. >>> >>> Maybe it should go to XenbusStateInitWait instead? >> >> Possibly? >> >> Doesn't xen_be_check_state do that though, i.e. once you hit >> XenbusStateInitialising you have: >> case XenbusStateInitialising: >> rc = xen_be_try_init(xendev); >> which will push on to XenbusStateInitWait? >> >> There's quite a few xen_be_printf surrounding these state transitions, >> which ought to be printed at level >= 2. How can Steven control the >> loglevel and where would they go (/var/log/xen/qemu-dm-$domname.log?) > > > I think that this should do: > > > diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c > index b2cb22b..d1d5d8e 100644 > --- a/hw/xen/xen_backend.c > +++ b/hw/xen/xen_backend.c > @@ -50,7 +50,7 @@ const char *xen_protocol; > > /* private */ > static QTAILQ_HEAD(XenDeviceHead, XenDevice) xendevs = QTAILQ_HEAD_INITIALIZER(xendevs); > -static int debug = 0; > +static int debug = 9; > > /* ------------------------------------------------------------- */ > > I applied this patch and posted testing packages.... For completeness, this is the DomU config: ---------- DomU Config ----------- name = "dev.vm" memory = 8192 vcpus = 6 cpus = "1-7" disk = [ 'phy:/dev/vg_hosting/dev.vm,xvda,w', 'file:/root/SL-65-x86_64-2013-12-05-boot.iso,xvdd:cdrom,r' ] vif = [ 'mac=20:34:01:36:00:42, vifname=vif.dev, bridge=br0' ] kernel = "/usr/lib/xen/boot/pv-grub-x86_64.gz" extra = "(hd0)/boot/grub/grub.conf" #bootloader = "pygrub" vfb = [ 'type=vnc, vnclisten=203.4.136.1, vncdisplay=2' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' ---------------------------------- Output using pv-grub: Xen Minimal OS! start_info: 0x19ac000(VA) nr_pages: 0x200000 shared_inf: 0xa5d0a000(MA) pt_base: 0x19af000(VA) nr_pt_frames: 0x11 mfn_list: 0x9ac000(VA) mod_start: 0x0(VA) mod_len: 0 flags: 0x0 cmd_line: (hd0)/boot/grub/grub.conf stack: 0x96b100-0x98b100 MM: Init _text: 0x0(VA) _etext: 0x7c814(VA) _erodata: 0x98000(VA) _edata: 0x9dd00(VA) stack start: 0x96b100(VA) _end: 0x9ab700(VA) start_pfn: 19c3 max_pfn: 200000 Mapping memory range 0x1c00000 - 0x200000000 setting 0x0-0x98000 readonly skipped 0x1000 MM: Initialise page allocator for 29bc000(29bc000)-200000000(200000000) MM: done Demand map pfns at 200001000-2200001000. Heap resides at 2200002000-4200002000. Initialising timer interface Initialising console ... done. gnttab_table mapped at 0x200001000. Initialising scheduler Thread "Idle": pointer: 0x2200002050, stack: 0x3a10000 Thread "xenstore": pointer: 0x2200002800, stack: 0x3a20000 xenbus initialised on irq 1 mfn 0x3f46d1 Thread "shutdown": pointer: 0x2200002fb0, stack: 0x3a30000 Dummy main: start_info=0x98b200 Thread "main": pointer: 0x2200003760, stack: 0x3a40000 "main" "(hd0)/boot/grub/grub.conf" vbd 51712 is hd0 ******************* BLKFRONT for device/vbd/51712 ********** Shutting down () Shutdown requested: 3 Thread "shutdown" exited. backend at /local/domain/0/backend/vbd/20/51712 125829120 sectors of 512 bytes ************************** vbd 51760 is hd1 ******************* BLKFRONT for device/vbd/51760 ********** backend at /local/domain/0/backend/qdisk/20/51760 Failed to read /local/domain/0/backend/qdisk/20/51760/feature-barrier. 436224 sectors of 512 bytes ************************** Thread "kbdfront": pointer: 0x2200004580, stack: 0x3a30000 ******************* FBFRONT for device/vfb/0 ********** ******************* KBDFRONT for device/vkbd/0 ********** backend at /local/domain/0/backend/vkbd/20/0 backend at /local/domain/0/backend/vfb/20/0 /local/domain/0/backend/vkbd/20/0 connected ************************** KBDFRONT Thread "kbdfront" exited. /local/domain/0/backend/vfb/20/0 connected ************************** FBFRONT ((( Hit enter to boot a grub entry ))) Thread "kbdfront close": pointer: 0x2200004580, stack: 0x3a30000 close fb: backend at /local/domain/0/backend/vfb/21/0 close kbd: backend at /local/domain/0/backend/vkbd/21/0 Booting 'Scientific Linux (3.14.21-1.el6xen.x86_64)' root (hd0) Filesystem type is ext2fs, using whole disk kernel /boot/vmlinuz-3.14.21-1.el6xen.x86_64 ro root=/dev/xvda rd_NO_LUKS rd_NO _DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us cras hkernel=auto console=hvc0 Thread "kbdfront close" exited. initrd /boot/initramfs-3.14.21-1.el6xen.x86_64.img ============= Init TPM Front ================ Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront initialization! error = ENOENT Tpmfront:Info Shutting down tpmfront close blk: backend=/local/domain/0/backend/vbd/21/51712 node=device/vbd/51712 close blk: backend=/local/domain/0/backend/qdisk/21/51760 node=device/vbd/51760 ---------------------------------- This gives a VNC display on port 5092 - and the system waits at the grub prompt (ie the timeout is never reached). I don't get a console from this point on in either VNC or via 'xl console dev.vm' On another note, I noticed this within the Dom0 kernel dmesg: device vif.dev entered promiscuous mode IPv6: ADDRCONF(NETDEV_UP): vif.dev: link is not ready xen-blkback:ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi) qemu-system-i38[3956]: segfault at 0 ip (null) sp 00007fffb4573638 error 4 xen-blkback:backend/vbd/21/51712: prepare for reconnect br0: port 8(vif.dev) entered disabled state I also noticed that if I pass console=tty0 on the grub command line in "(hd0)/boot/grub/grub.conf" - then I get the expected console - however the grub menu timeout still fails - almost as if a keypress has been registered and cancelled the timeout... For example, a 'sort of' working grub.conf for the DomU that hangs at grub, but when manually selected, works as expected: default=0 timeout=1 splashimage=(hd0)/boot/grub/splash.xpm.gz title Scientific Linux (3.14.21-1.el6xen.x86_64) root (hd0) kernel /boot/vmlinuz-3.14.21-1.el6xen.x86_64 ro root=/dev/xvda rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto console=tty0 initrd /boot/initramfs-3.14.21-1.el6xen.x86_64.img
On Thu, 2014-10-23 at 02:23 +1100, Steven Haigh wrote: > Output using pv-grub: Can you also post the qemu logs please (under /var/log/xen somewhere I think). > qemu-system-i38[3956]: segfault at 0 ip (null) sp > 00007fffb4573638 error 4 That might be a smoking gun. Is there a core dump and/or could you try and run qemu under gdb? Ian.
On 23/10/2014 2:40 AM, Ian Campbell wrote: > On Thu, 2014-10-23 at 02:23 +1100, Steven Haigh wrote: > >> Output using pv-grub: > > Can you also post the qemu logs please (under /var/log/xen somewhere I > think). I get very little out of this: -rw-r--r-- 1 root root 0 Oct 23 02:45 qemu-dm-dev.vm.log -rw-r--r-- 1 root root 0 Oct 23 02:44 xen-hotplug.log -rw-r--r-- 1 root root 55 Oct 23 02:45 xl-dev.vm.log [root@dom0 xen]# cat xl-dev.vm.log Waiting for domain dev.vm (domid 36) to die [pid 6970] That's it :\ >> qemu-system-i38[3956]: segfault at 0 ip (null) sp >> 00007fffb4573638 error 4 > > That might be a smoking gun. Is there a core dump and/or could you try > and run qemu under gdb? Any hints on doing this? I can't say I'm a gdb guru.... I can't find any core dumps anywhere so that's not really helpful...
On Thu, 2014-10-23 at 02:53 +1100, Steven Haigh wrote: > On 23/10/2014 2:40 AM, Ian Campbell wrote: > > On Thu, 2014-10-23 at 02:23 +1100, Steven Haigh wrote: > > > >> Output using pv-grub: > > > > Can you also post the qemu logs please (under /var/log/xen somewhere I > > think). > > I get very little out of this: > -rw-r--r-- 1 root root 0 Oct 23 02:45 qemu-dm-dev.vm.log > -rw-r--r-- 1 root root 0 Oct 23 02:44 xen-hotplug.log > -rw-r--r-- 1 root root 55 Oct 23 02:45 xl-dev.vm.log > [root@dom0 xen]# cat xl-dev.vm.log > Waiting for domain dev.vm (domid 36) to die [pid 6970] > > That's it :\ :-/ indeed. > >> qemu-system-i38[3956]: segfault at 0 ip (null) sp > >> 00007fffb4573638 error 4 > > > > That might be a smoking gun. Is there a core dump and/or could you try > > and run qemu under gdb? > > Any hints on doing this? I can't say I'm a gdb guru.... I can't find any > core dumps anywhere so that's not really helpful... Fiddling with ulimit might cause core dumps to be created. If not then https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00302.html https://lists.gnu.org/archive/html/qemu-devel/2011-12/msg02575.html have some hints on running qemu via gdbserver. I've also had luck by configuring the guest with a device model which is a script that dumps its args to a file ("echo $@ > /tmp/qemu.args") and then sleeps for an hour, in another terminal you can then run (fairly quickly, before xl times out) something like: # gdb /path/to/qemu (gdb) run [the content of that file] or possibly even # gdb --args /path/to/qemu `cat /tmp/qemu.args (gdb) run After it crashes the "bt" will get a back trace. Ian.
diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c index b2cb22b..d1d5d8e 100644 --- a/hw/xen/xen_backend.c +++ b/hw/xen/xen_backend.c @@ -50,7 +50,7 @@ const char *xen_protocol; /* private */ static QTAILQ_HEAD(XenDeviceHead, XenDevice) xendevs = QTAILQ_HEAD_INITIALIZER(xendevs); -static int debug = 0; +static int debug = 9; /* ------------------------------------------------------------- */