From patchwork Wed Jan 13 23:45:43 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cole Robinson X-Patchwork-Id: 59698 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp3669188lbb; Wed, 13 Jan 2016 15:48:24 -0800 (PST) X-Received: by 10.28.93.140 with SMTP id r134mr1567444wmb.80.1452728903625; Wed, 13 Jan 2016 15:48:23 -0800 (PST) Return-Path: Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com. [209.132.183.25]) by mx.google.com with ESMTPS id j10si5491219wje.70.2016.01.13.15.48.23 (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 13 Jan 2016 15:48:23 -0800 (PST) Received-SPF: pass (google.com: domain of libvir-list-bounces@redhat.com designates 209.132.183.25 as permitted sender) client-ip=209.132.183.25; Authentication-Results: mx.google.com; spf=pass (google.com: domain of libvir-list-bounces@redhat.com designates 209.132.183.25 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u0DNjkOZ006270; Wed, 13 Jan 2016 18:45:46 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u0DNjjEM008012; Wed, 13 Jan 2016 18:45:45 -0500 Received: from [10.3.113.150] (ovpn-113-150.phx2.redhat.com [10.3.113.150]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u0DNjiE6017436; Wed, 13 Jan 2016 18:45:44 -0500 To: "Richard W.M. Jones" , libvir-list@redhat.com References: <20160113101842.GA28965@redhat.com> From: Cole Robinson Message-ID: <5696E1A7.4050209@redhat.com> Date: Wed, 13 Jan 2016 18:45:43 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <20160113101842.GA28965@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-loop: libvir-list@redhat.com Cc: ykaul@redhat.com, libguestfs@redhat.com Subject: Re: [libvirt] Quantifying libvirt errors in launching the libguestfs appliance X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com On 01/13/2016 05:18 AM, Richard W.M. Jones wrote: > As people may know, we frequently encounter errors caused by libvirt > when running the libguestfs appliance. > > I wanted to find out exactly how frequently these happen and classify > the errors, so I ran the 'virt-df' tool overnight 1700 times. This > tool runs several parallel qemu:///session libvirt connections both > creating a short-lived appliance guest. > > Note that I have added Cole's patch to fix https://bugzilla.redhat.com/1271183 > "XML-RPC error : Cannot write data: Transport endpoint is not connected" > > Results: > > The test failed 538 times (32% of the time), which is pretty dismal. > To be fair, virt-df is aggressive about how it launches parallel > libvirt connections. Most other virt-* tools use only a single > libvirt connection and are consequently more reliable. > > Of the failures, 518 (96%) were of the form: > > process exited while connecting to monitor: qemu: could not load kernel '/home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/kernel': Permission denied > > which is https://bugzilla.redhat.com/921135 or maybe > https://bugzilla.redhat.com/1269975. It's not clear to me if these > bugs have different causes, but if they do then potentially we're > seeing a mix of both since my test has no way to distinguish them. > I just experimented with this, I think it's the issue I suggested at: https://bugzilla.redhat.com/show_bug.cgi?id=1269975#c4 I created two VMs, kernel1 and kernel2, just booting off a kernel in $HOME/session-kernel/vmlinuz. Then I added this patch: Which is right after selinux labels are set on VM startup. This is then easy to reproduce with: virsh start kernel1 (sleeps) virsh start kernel2 && virsh destroy kernel2 The shared vmlinuz is reset to user_home_t after kernel2 is shut down, so kernel1 fails to start after the patch's timeout When we detect similar issues with devices, like when the media already has the expected label, we encode 'relabel=no' in the disk XML, which tells libvirt not to run restorecon on the disks path when the VM is shutdown. However kernel/initrd XML doesn't have support for this XML, so it won't work there. Adding that could be one fix. But I think there's longer term plans for this type of issue by using ACLs, or virtlockd or something, Michal had patches but I don't know the specifics. Unfortunately even hardlinks share selinux labels so I don't think there's any workaround on the libguestfs side short of using a separate copy of the appliance kernel for each VM - Cole -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f083f3f..5d9f0fa 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4901,6 +4901,13 @@ qemuProcessLaunch(virConnectPtr conn, incoming ? incoming->path : NULL) < 0) goto cleanup; + if (STREQ(vm->def->name, "kernel1")) { + for (int z = 0; z < 30; z++) { + printf("kernel1: sleeping %d of 30\n", z + 1); + sleep(1); + } + } + /* Security manager labeled all devices, therefore * if any operation from now on fails, we need to ask the caller to * restore labels.