From patchwork Tue Jan 26 11:33:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ding Tianhong X-Patchwork-Id: 60445 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp1899815lbb; Tue, 26 Jan 2016 03:35:37 -0800 (PST) X-Received: by 10.98.31.8 with SMTP id f8mr32491846pff.71.1453808137260; Tue, 26 Jan 2016 03:35:37 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id 69si1479756pfh.242.2016.01.26.03.35.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Jan 2016 03:35:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1aO1tF-0001dD-9r; Tue, 26 Jan 2016 11:34:29 +0000 Received: from szxga01-in.huawei.com ([58.251.152.64]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1aO1t8-0001Fh-MV for linux-arm-kernel@lists.infradead.org; Tue, 26 Jan 2016 11:34:25 +0000 Received: from 172.24.1.49 (EHLO szxeml426-hub.china.huawei.com) ([172.24.1.49]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DDS73121; Tue, 26 Jan 2016 19:33:30 +0800 (CST) Received: from [127.0.0.1] (10.177.22.246) by szxeml426-hub.china.huawei.com (10.82.67.181) with Microsoft SMTP Server id 14.3.235.1; Tue, 26 Jan 2016 19:33:19 +0800 Subject: Re: Unhandled level 2 translation fault on A72 board. To: Catalin Marinas References: <56A72246.4050105@huawei.com> <20160126110358.GA23579@localhost.localdomain> From: Ding Tianhong Message-ID: <56A7597D.6020609@huawei.com> Date: Tue, 26 Jan 2016 19:33:17 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <20160126110358.GA23579@localhost.localdomain> X-Originating-IP: [10.177.22.246] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.56A7598B.0045, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: c592e1dedd102df65d54e297bdf4ed22 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160126_033423_742207_7B1EC00C X-CRM114-Status: GOOD ( 17.62 ) X-Spam-Score: -4.2 (----) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-4.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [58.251.152.64 listed in list.dnswl.org] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [58.251.152.64 listed in wl.mailspike.net] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arnd Bergmann , Will Deacon , Linuxarm , "linux-arm-kernel@lists.infradead.org" , "Guohanjun \(Hanjun Guo\)" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org On 2016/1/26 19:03, Catalin Marinas wrote: > On Tue, Jan 26, 2016 at 03:37:42PM +0800, Ding Tianhong wrote: >> I met this problem when running the hackbench test on A72 chip board: >> >> sh[4779]: unhandled level 2 translation fault (11) at 0x7f96be0c80, esr 0x83000006 >> pgd = ffffffc01a1f0000 >> [7f96be0c80] *pgd=0000000084a20003, *pud=0000000084a20003, *pmd=0000000000000000 >> >> CPU: 1 PID: 4779 Comm: sh Tainted: G O 4.1.15+ #21 >> Hardware name: Hisilicon PhosphorHi1382 EVB (DT) >> task: ffffffc0163cc500 ti: ffffffc083abc000 task.ti: ffffffc083abc000 >> PC is at 0x7f96be0c80 >> LR is at 0x7fb2684eb4 >> pc : [<0000007f96be0c80>] lr : [<0000007fb2684eb4>] pstate: 60000000 > > So here it's user space trying to execute from 0x7f96be0c80 (instruction > abort). > >> sh[4963]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x92000006 >> pgd = ffffffc0180c6000 >> [00000000] *pgd=0000000015157003, *pud=0000000015157003, *pmd=0000000000000000 >> >> CPU: 0 PID: 4963 Comm: sh Tainted: G O 4.1.15+ #21 >> Hardware name: Hisilicon PhosphorHi1382 EVB (DT) >> task: ffffffc0163cb980 ti: ffffffc0840c8000 task.ti: ffffffc0840c8000 >> PC is at 0x42c0c8 >> LR is at 0x42c03c >> pc : [<000000000042c0c8>] lr : [<000000000042c03c>] pstate: 80000000 > > And here you have a null pointer dereference. > >> if I run the benchmark only on the core which is in the same cluster, >> it looks fine and no error happened, but if I enable the core which in >> the different cluster, it will happened. >> >> I remember that I met the same problem on the A57 and fix it by enable >> the [bit6] of the CPUECTLR_EL1 and enable MN, But this time, I enable >> the same setting and looks no effort, I have no idea about this >> problem, does A57 and A72 has so big difference on TLB? > > I can't tell for sure it's a TLB issue. The kernel page table dump shows > *pmd being 0, so the fault is correctly called "level 2 translation > fault". It also seems that there is no vma at this address, hence the > kernel reports it as unhandled. It looks like data corruption which > could be caused by cache or TLB incoherence. Just make sure the > interconnect linking the two clusters is configured correctly by > _firmware_ before Linux starts. > Hi Catalin: Thanks for the apply, I have try to apply this patch to test: --- arch/arm64/kernel/process.c | 9 +++++++++ 1 file changed, 9 insertions(+) hw_breakpoint_thread_switch(next); contextidr_thread_switch(next); +tlb_flush_thread(prev); + /* * Complete any pending TLB or cache maintenance on this CPU in case * the thread migrates to a different CPU. The hackbench would work fine after this patch, so I guess that the old thread tlb may not be invalidate as soon as possible, but I don't know why, everything is fine on A57, Does I miss something? Ding _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 6391485..d7d8439 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -283,6 +283,13 @@ static void tls_thread_switch(struct task_struct *next) : : "r" (tpidr), "r" (tpidrro)); } +static void tlb_flush_thread(struct task_struct *prev) +{ +/* Flush the prev task's TLB entries */ +if (prev->mm) +flush_tlb_mm(prev->mm); +} + /* * Thread switching. */ @@ -296,6 +303,8 @@ struct task_struct *__switch_to(struct task_struct *prev,