From patchwork Thu Aug 28 18:50:46 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 36259 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ob0-f200.google.com (mail-ob0-f200.google.com [209.85.214.200]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id B1D94202DD for ; Thu, 28 Aug 2014 18:51:05 +0000 (UTC) Received: by mail-ob0-f200.google.com with SMTP id va2sf9684125obc.3 for ; Thu, 28 Aug 2014 11:51:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:date:from:to:cc:subject:message-id :mime-version:user-agent:sender:precedence:list-id:x-original-sender :x-original-authentication-results:mailing-list:list-post:list-help :list-archive:list-unsubscribe:content-type:content-disposition; bh=TnOkNMPZVd62Mo1sWrNYfE/cYVEaC/ncWoDrKZD6rnI=; b=Mnd9AiKTcJXRq2roANE1xDXjUsXx+XjEV31fodD61KDRpy26mGfEAHq2HxsXBcghn0 R2ibk9lhwTnuC/sUk1smyzW3PZyWDwfRpdLQgMJ+qnm8nlWTo+Tfkw889zMfCwELKeko DE8ZJO+OhPUXdTCyrPnci+mamgobyDQdOhni7dtOKkbJNgjRrMW4sVRpxAJbREgYpQCJ 8YmxavQ5Sh7dDh3QBk/zD6vr3MYPfZsdgwSiJTwKc8ZEm3DK4ArsdAKHpj6k0QOJ0CYy dzsvC8qmYmUTEsMg8SWPzjWjTfvDiLTKQna2q2aaV2ycn5Lw5jQZaRf579r1cUq77GFk RcZA== X-Gm-Message-State: ALoCoQkIIcuQq/TiDuZUwuXsdQTFTdOwznN191qAUMurncCVdFShwqf4S0iUWfW4rpGGhTnkOXvI X-Received: by 10.42.96.193 with SMTP id k1mr3581760icn.13.1409251865310; Thu, 28 Aug 2014 11:51:05 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.16.232 with SMTP id 95ls742287qgb.94.gmail; Thu, 28 Aug 2014 11:51:05 -0700 (PDT) X-Received: by 10.220.110.77 with SMTP id m13mr3327700vcp.35.1409251865185; Thu, 28 Aug 2014 11:51:05 -0700 (PDT) Received: from mail-vc0-f180.google.com (mail-vc0-f180.google.com [209.85.220.180]) by mx.google.com with ESMTPS id g8si4747391vdu.82.2014.08.28.11.51.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 28 Aug 2014 11:51:05 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.180 as permitted sender) client-ip=209.85.220.180; Received: by mail-vc0-f180.google.com with SMTP id lf12so1356218vcb.11 for ; Thu, 28 Aug 2014 11:51:05 -0700 (PDT) X-Received: by 10.220.187.134 with SMTP id cw6mr2994711vcb.71.1409251865084; Thu, 28 Aug 2014 11:51:05 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.45.67 with SMTP id uj3csp281631vcb; Thu, 28 Aug 2014 11:51:04 -0700 (PDT) X-Received: by 10.70.88.140 with SMTP id bg12mr8786446pdb.106.1409251864224; Thu, 28 Aug 2014 11:51:04 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id fr9si7927958pdb.239.2014.08.28.11.51.03 for ; Thu, 28 Aug 2014 11:51:04 -0700 (PDT) Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754360AbaH1Sut (ORCPT + 26 others); Thu, 28 Aug 2014 14:50:49 -0400 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]:39861 "EHLO cam-admin0.cambridge.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754329AbaH1Suo (ORCPT ); Thu, 28 Aug 2014 14:50:44 -0400 Received: from arm.com (edgewater-inn.cambridge.arm.com [10.1.203.34]) by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id s7SIoPwo017194; Thu, 28 Aug 2014 19:50:25 +0100 (BST) Date: Thu, 28 Aug 2014 19:50:46 +0100 From: Will Deacon To: axboe@kernel.dk, rusty@rustcorp.com.au, tj@kernel.org, hch@lst.de Cc: linux-kernel@vger.kernel.org Subject: 3.17-rc2 percpu-ref oops with virtblk remove Message-ID: <20140828185046.GR22580@arm.com> MIME-Version: 1.0 User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: will.deacon@arm.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.180 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Content-Disposition: inline Hi guys, I've been debugging an issue triggered by virtio-blk and vfio on an arm64 system running 3.17-rc2. The problem occurs when removing a virtio-pci block device, which triggers the following warning in the percpu-ref code: WARNING: CPU: 0 PID: 1312 at lib/percpu-refcount.c:179 percpu_ref_kill_and_confirm+0x90/0x9c() percpu_ref_kill() called more than once! Modules linked in: CPU: 0 PID: 1312 Comm: vfio-setup.sh Not tainted 3.17.0-rc2+ #5 Call trace: [] percpu_ref_kill_and_confirm+0x8c/0x9c [] blk_mq_freeze_queue+0x34/0xc8 [] blk_mq_free_queue+0x1c/0x10c [] blk_release_queue+0x68/0xa8 [] kobject_release+0x40/0x84 [] kobject_put+0x38/0x6c [] blk_put_queue+0xc/0x18 [] disk_release+0x7c/0xb8 [] device_release+0x30/0x98 [] kobject_release+0x40/0x84 [] kobject_put+0x38/0x6c [] put_disk+0x10/0x1c [] virtblk_remove+0x74/0xd4 Some more debugging shows that virtblk_remove earlier called blk_cleanup_queue: [] percpu_ref_kill_and_confirm+0x48/0x9c [] blk_mq_freeze_queue+0x34/0xc8 [] blk_cleanup_queue+0x70/0xf4 [] virtblk_remove+0x44/0xd4 which ends up marking the percpu ref as dead and queues some RCU work to switch over to the atomic_t version using call_rcu_sched. The virtblk_remove function then continues happily and calls put_disk (first backtrace above), which ends up trying to get exclusive access to the queue by killing the usage counter and waiting for the percpu reference to become zero. Unfortunately, since the reference is already marked as dead, the call to percpu_ref_kill triggers the warning. Worse still, we then queue the RCU callback a second time, which dies with something like: Unable to handle kernel paging request at virtual address 87f971000 pgd = ffffffc87aca8000 [87f971000] *pgd=0000000000000000, *pud=0000000000000000 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1312 Comm: systemd-udevd Tainted: G W 3.17.0-rc2+ #9 task: ffffffc87bb89480 ti: ffffffc87ace0000 task.ti: ffffffc87ace0000 PC is at percpu_ref_kill_rcu+0x50/0x188 LR is at percpu_ref_kill_rcu+0x6c/0x188 pc : [] lr : [] pstate: 80000145 sp : ffffffc87ace3840 x29: ffffffc87ace3840 x28: ffffffc000470000 x27: 0000000000000000 x26: ffffffc87ff74a00 x25: ffffffc00061f238 x24: ffffffc07f858d68 x23: ffffffc000657000 x22: ffffffc07f858d88 x21: ffffffc00064bc68 x20: 0000000000000000 x19: 0000000000000000 x18: 0000007fc81227e0 x17: 0000007fb74f1194 x16: ffffffc000161a44 x15: 0000007fb757b5a0 x14: 0000000000000008 x13: 0000000000000000 x12: ffffffc000470c70 x11: 0000000000000005 x10: 0101010101010101 x9 : ffffffc000470c4b x8 : ffffffc000647318 x7 : fffffffffffffe40 x6 : 0000000000005b00 x5 : ffffffc00064bc68 x4 : 0000000000000038 x3 : 00000000000000ff x2 : 0000000000000000 x1 : ffffffc000657ee8 x0 : 000000087f971000 because the percpu_ref has been freed from blk_mq_free_queue. I'm not really sure how to fix this -- it seems like we shouldn't try to kill a reference that is already dead, but using __pcpu_ref_alive isn't the right answer. Simply removing the warning works for me (patch below), but that also feels like a hack (we skip the confirm callback, for a start). Any ideas? Will --->8 --- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index fe5a3342e960..c2faa2a97dff 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -175,8 +175,8 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu) void percpu_ref_kill_and_confirm(struct percpu_ref *ref, percpu_ref_func_t *confirm_kill) { - WARN_ONCE(ref->pcpu_count_ptr & PCPU_REF_DEAD, - "percpu_ref_kill() called more than once!\n"); + if (ref->pcpu_count_ptr & PCPU_REF_DEAD) + return; ref->pcpu_count_ptr |= PCPU_REF_DEAD; ref->confirm_kill = confirm_kill;