From patchwork Mon Nov 10 19:37:24 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Cohen X-Patchwork-Id: 40533 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f71.google.com (mail-ee0-f71.google.com [74.125.83.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id D4B55218DE for ; Mon, 10 Nov 2014 19:39:30 +0000 (UTC) Received: by mail-ee0-f71.google.com with SMTP id e51sf5963758eek.10 for ; Mon, 10 Nov 2014 11:39:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:message-id:date:from:user-agent :mime-version:to:subject:references:in-reply-to:cc:precedence :list-id:list-unsubscribe:list-archive:list-post:list-help :list-subscribe:sender:errors-to:x-original-sender :x-original-authentication-results:mailing-list:content-type; bh=sD9wlNqPkM/SZYv/j0/sX3Cm0dsfRwiA5AJQ587y7OY=; b=Z6BdofZcssmifKL7YzMB3gsaYyn3Bp7SPw29SSzXnUj/evcI9L2d0/NS/rF00kMjuZ ZT63PcCRIt9SoSgVenCtIlHYIT1GX+ZQB4nNhgH7/71SHXf1QixR7R8O24f67t+jJNzv mnXMpsI9Dcd49jUlW2l0nAx4rmkOo6R10Uwmwxf3yJvCcbE1dUZOvFmNn1KOFXmOGLe4 i0SXYI1uLO8BwnRI7aCnDIz3e8VI2nQqnsW3XbsDx+oWa1GolNbHmoobny2n/CF66ter psINShQON0CM6bnDs6/XDqnapnik9sjii2GFqHEjsx11D2u6Z3HQYdbeAJzXri2rGuiZ 6aXA== X-Gm-Message-State: ALoCoQnB5dji7Pk8O935hpt4cDW2sDyKDZjo2ql5IBq9l8hIHigbEsgl0fF/vSjAHg/jLxK14SzX X-Received: by 10.112.166.102 with SMTP id zf6mr795681lbb.12.1415648370003; Mon, 10 Nov 2014 11:39:30 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.22.72 with SMTP id b8ls193989laf.2.gmail; Mon, 10 Nov 2014 11:39:29 -0800 (PST) X-Received: by 10.152.21.199 with SMTP id x7mr31661651lae.66.1415648369571; Mon, 10 Nov 2014 11:39:29 -0800 (PST) Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com. [209.85.217.176]) by mx.google.com with ESMTPS id c5si28451384lah.117.2014.11.10.11.39.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Nov 2014 11:39:29 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.176 as permitted sender) client-ip=209.85.217.176; Received: by mail-lb0-f176.google.com with SMTP id 10so6341833lbg.21 for ; Mon, 10 Nov 2014 11:39:29 -0800 (PST) X-Received: by 10.152.42.226 with SMTP id r2mr31723245lal.29.1415648369471; Mon, 10 Nov 2014 11:39:29 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.184.201 with SMTP id ew9csp151315lbc; Mon, 10 Nov 2014 11:39:28 -0800 (PST) X-Received: by 10.66.147.227 with SMTP id tn3mr34211002pab.108.1415648367301; Mon, 10 Nov 2014 11:39:27 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id q2si13341934pdf.63.2014.11.10.11.39.26 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Nov 2014 11:39:27 -0800 (PST) Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1Xnumg-0003Z7-Rk; Mon, 10 Nov 2014 19:37:54 +0000 Received: from mx1.redhat.com ([209.132.183.28]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1Xnumd-0003Te-Mr for linux-arm-kernel@lists.infradead.org; Mon, 10 Nov 2014 19:37:52 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sAAJbO3q030337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 10 Nov 2014 14:37:24 -0500 Received: from [10.13.129.24] (dhcp129-24.rdu.redhat.com [10.13.129.24]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sAAJbO19014588; Mon, 10 Nov 2014 14:37:24 -0500 Message-ID: <546113F4.1050304@redhat.com> Date: Mon, 10 Nov 2014 14:37:24 -0500 From: William Cohen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Will Deacon Subject: Re: [PATCH] Correct the race condition in aarch64_insn_patch_text_sync() References: <1415637362-30754-1-git-send-email-wcohen@redhat.com> <20141110170846.GH23942@arm.com> In-Reply-To: <20141110170846.GH23942@arm.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20141110_113751_792796_8711049E X-CRM114-Status: GOOD ( 27.96 ) X-Spam-Score: -5.6 (-----) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-5.6 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [209.132.183.28 listed in list.dnswl.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.132.183.28 listed in wl.mailspike.net] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Cc: Catalin Marinas , "dave.long@linaro.org" , "linux-arm-kernel@lists.infradead.org" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: wcohen@redhat.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.176 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 On 11/10/2014 12:08 PM, Will Deacon wrote: > Hi Will, > > Thanks for the tracking this down. > > On Mon, Nov 10, 2014 at 04:36:02PM +0000, William Cohen wrote: >> When experimenting with patches to provide kprobes support for aarch64 >> smp machines would hang when inserting breakpoints into kernel code. >> The hangs were caused by a race condition in the code called by >> aarch64_insn_patch_text_sync(). The first processor in the >> aarch64_insn_patch_text_cb() function would patch the code while other >> processors were still entering the function and decrementing the > > s/decrementing/incrementing/ > >> cpu_count field. This resulted in some processors never observing the >> exit condition and exiting the function. Thus, processors in the >> system hung. >> >> The patching function now waits for all processors to enter the >> patching function before changing code to ensure that none of the >> processors are in code that is going to be patched. Once all the >> processors have entered the function, the last processor to enter the >> patching function performs the pathing and signals that the patching >> is complete with one last decrement of the cpu_count field to make it >> -1. >> >> Signed-off-by: William Cohen >> --- >> arch/arm64/kernel/insn.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c >> index e007714..e6266db 100644 >> --- a/arch/arm64/kernel/insn.c >> +++ b/arch/arm64/kernel/insn.c >> @@ -153,8 +153,10 @@ static int __kprobes aarch64_insn_patch_text_cb(void *arg) >> int i, ret = 0; >> struct aarch64_insn_patch *pp = arg; >> >> - /* The first CPU becomes master */ >> - if (atomic_inc_return(&pp->cpu_count) == 1) { >> + /* Make sure all the processors are in this functionaarch64_insn_patch_text_cb( >> + before patching the code. The last CPU to this function >> + does the update. */ >> + if (atomic_dec_return(&pp->cpu_count) == 0) { >> for (i = 0; ret == 0 && i < pp->insn_cnt; i++) >> ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i], >> pp->new_insns[i]); >> @@ -163,7 +165,8 @@ static int __kprobes aarch64_insn_patch_text_cb(void *arg) >> * which ends with "dsb; isb" pair guaranteeing global >> * visibility. >> */ >> - atomic_set(&pp->cpu_count, -1); >> + /* Notifiy other processors with an additional decrement. */ >> + atomic_dec(&pp->cpu_count); >> } else { >> while (atomic_read(&pp->cpu_count) != -1) >> cpu_relax(); >> @@ -185,6 +188,7 @@ int __kprobes aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt) >> if (cnt <= 0) >> return -EINVAL; >> >> + atomic_set(&patch.cpu_count, num_online_cpus()); > > I think this is still racy with hotplug before stop_machine has done > get_online_cpus. How about we leave the increment in the callback and change > the exit condition to compare with num_online_cpus() instead? > > Cheers, > > Will > Hi Will, Thanks for the feedback. I am no expert in the corner cases involved with hotplug. Dave Long suggested something similar with num_online_cpus in the arch64_insn_patch_text_cb() and using increments and checking the num_cpus_online() inside aarch64_insn_patch_text_cb(). Moving the num_cpu_online() inside the aarch64_insn_patch_text_cb() is sufficient to avoid race conditions with hotplug? If so, would the attached patch be appropriate? -Will Cohen >From d02e3244c436234d0d07500be6d4df64feb2052a Mon Sep 17 00:00:00 2001 From: William Cohen Date: Mon, 10 Nov 2014 14:26:44 -0500 Subject: [PATCH] Correct the race condition in aarch64_insn_patch_text_sync() When experimenting with patches to provide kprobes support for aarch64 smp machines would hang when inserting breakpoints into kernel code. The hangs were caused by a race condition in the code called by aarch64_insn_patch_text_sync(). The first processor in the aarch64_insn_patch_text_cb() function would patch the code while other processors were still entering the function and incrementing the cpu_count field. This resulted in some processors never observing the exit condition and exiting the function. Thus, processors in the system hung. The patching function now waits for all processors to enter the patching function before changing code to ensure that none of the processors are in code that is going to be patched. Once all the processors have entered the function, the last processor to enter the patching function performs the patching and signals that the patching is complete with one last increment of the cpu_count field to make it num_cpus_online()+1. Signed-off-by: William Cohen --- arch/arm64/kernel/insn.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c index e007714..4fdddf1 100644 --- a/arch/arm64/kernel/insn.c +++ b/arch/arm64/kernel/insn.c @@ -151,10 +151,13 @@ struct aarch64_insn_patch { static int __kprobes aarch64_insn_patch_text_cb(void *arg) { int i, ret = 0; + int count = num_online_cpus(); struct aarch64_insn_patch *pp = arg; - /* The first CPU becomes master */ - if (atomic_inc_return(&pp->cpu_count) == 1) { + /* Make sure all the processors are in this function + before patching the code. The last CPU to this function + does the update. */ + if (atomic_inc_return(&pp->cpu_count) == count) { for (i = 0; ret == 0 && i < pp->insn_cnt; i++) ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i], pp->new_insns[i]); @@ -163,9 +166,10 @@ static int __kprobes aarch64_insn_patch_text_cb(void *arg) * which ends with "dsb; isb" pair guaranteeing global * visibility. */ - atomic_set(&pp->cpu_count, -1); + /* Notifiy other processors with an additional increment. */ + atomic_inc(&pp->cpu_count); } else { - while (atomic_read(&pp->cpu_count) != -1) + while (atomic_read(&pp->cpu_count) <= count) cpu_relax(); isb(); } -- 1.8.3.1