From patchwork Tue Dec 6 02:43:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andy Smith X-Patchwork-Id: 86670 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp1807473qgi; Mon, 5 Dec 2016 18:46:21 -0800 (PST) X-Received: by 10.36.43.193 with SMTP id h184mr662035ita.29.1480992381466; Mon, 05 Dec 2016 18:46:21 -0800 (PST) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org. [192.237.175.120]) by mx.google.com with ESMTPS id 186si1034643itu.98.2016.12.05.18.46.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Dec 2016 18:46:21 -0800 (PST) Received-SPF: neutral (google.com: 192.237.175.120 is neither permitted nor denied by best guess record for domain of xen-devel-bounces@lists.xen.org) client-ip=192.237.175.120; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@bitfolk.com; spf=neutral (google.com: 192.237.175.120 is neither permitted nor denied by best guess record for domain of xen-devel-bounces@lists.xen.org) smtp.mailfrom=xen-devel-bounces@lists.xen.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cE5jh-00065T-18; Tue, 06 Dec 2016 02:44:05 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cE5jf-00065M-H7 for xen-devel@lists.xensource.com; Tue, 06 Dec 2016 02:44:03 +0000 Received: from [85.158.143.35] by server-6.bemta-6.messagelabs.com id 69/2C-28843-2F526485; Tue, 06 Dec 2016 02:44:02 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrGKsWRWlGSWpSXmKPExsUSWh5wX/ejqlu EwZo+Q4t7U96zOzB6bO/bxR7AGMWamZeUX5HAmrFm4U3Wgrd6FfcfX2BrYNym0cXIxcEicINF 4vr1SWwgjpDAbEaJmT0bWLsYOYEyKhJbf10Cs9kE1CQOvzvNCGKLCOhKPFvwjA3EZgayJ3b3A 8U5OIQFMiWm3k0BMXkF9CWu3OUBqRASSJb4O+kHWDWvgKDEyZlPWCA6rST2tKwE62QWkJZY/o 8DIiwv0bx1NjOIzSlgJ3Gmcw/YUn4BB4nenW3sILaEgLbE8c2XweKiQEde29/OBhHXlDh4aRM LhK0hcX/9Qqh6H4kVcz6wTGAUmYXkillIrpiFcMUsJFdA2OoSf+ZdYsarRFvi4cSpTBC2jcS9 5QdYFzByrmLUKE4tKkst0jUy10sqykzPKMlNzMzRNTQw08tNLS5OTE/NSUwq1kvOz93ECIxDB iDYwbh4beAhRkkOJiVR3oUPXCOE+JLyUyozEosz4otKc1KLDzHKcHAoSfCeVHGLEBIsSk1PrU jLzAEmBJi0BAePkghvMjApCPEWFyTmFmemQ6ROMepyTHu2+CmTEEtefl6qlDjvKZAZAiBFGaV 5cCNgyekSo6yUMC8j0FFCPAWpRbmZJajyrxjFORiVhHl/gUzhycwrgdv0CugIJqAjThx3Bjmi JBEhJdXAeGhJ+pp7AQI6T77srdKJyrjUEml3e1tnZ0Bzc9esgm/xPN+DN+cF7Dlnuk7u7l7OE J/JSfbdbxK/qt12KfT6tpVttup2R1apg/mr5vpd1w7dufmqqMOEk3OO/Z1XcXS7UvY15mxl/S kvKz3M72Wwi1/QnVDhcD1apOj8vPv1m3UXrGOdHMzMr8RSnJFoqMVcVJwIAOs0DbpJAwAA X-Env-Sender: andy@strugglers.net X-Msg-Ref: server-16.tower-21.messagelabs.com!1480992241!42303942!1 X-Originating-IP: [85.119.80.223] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.0.16; banners=-,-,- X-VirusChecked: Checked Received: (qmail 63099 invoked from network); 6 Dec 2016 02:44:01 -0000 Received: from bitfolk.com (HELO mail.bitfolk.com) (85.119.80.223) by server-16.tower-21.messagelabs.com with AES256-SHA encrypted SMTP; 6 Dec 2016 02:44:01 -0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bitfolk.com; s=alpha; h=In-Reply-To:Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=ZzoeqsLH6d2amIe/P65g2P5wTLwdc/9gbo1zQjpQo30=; b=GhPA2CD/wx4hXqu6edvQS6p7JUI68gfRdiXWyYpcZvBerVMTKhaMe1ZRk9z+bxFHACS8h7umFiMxNzravzRnHynl6IeWbP+q/pIGH9F9t829NKSDTqe3+emKrtc54VyJslzy6eH64kGarCSk34ZB6HqcEMPZETJ7v4RND108ZOyoqZXomGKEdApcAkXkA+Z0a+hYNonrOZ4dn0/Tu9kwPMT0FdOWhXPb2llFff5QVDLTUy4romYDl5p7WzMkbUiAbzicokSirzlWLoDe7EBVi3tWcDI4UotjlFdo3cXOPuHpYI6yEhU51GiBXH1SJhSgQ2xPpb74j5KY9fguUDxLCw==; Received: from andy by mail.bitfolk.com with local (Exim 4.72) (envelope-from ) id 1cE5jS-0006Jz-Eb; Tue, 06 Dec 2016 02:43:50 +0000 Date: Tue, 6 Dec 2016 02:43:50 +0000 From: Andy Smith To: Andrew Cooper Message-ID: <20161206024350.GV1804@bitfolk.com> References: <20161204083205.GT21587@bitfolk.com> <2226c155-43ab-d170-c9e4-d112b8fa2de2@citrix.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <2226c155-43ab-d170-c9e4-d112b8fa2de2@citrix.com> OpenPGP: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc X-URL: http://strugglers.net/wiki/User:Andy User-Agent: Mutt/1.5.20 (2009-06-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: andy@strugglers.net X-SA-Exim-Scanned: No (on mail.bitfolk.com); SAEximRunCond expanded to false Cc: xen-devel Subject: Re: [Xen-devel] mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full" X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" Hi Andrew, On Sun, Dec 04, 2016 at 03:59:20PM +0000, Andrew Cooper wrote: > On 04/12/16 08:32, Andy Smith wrote: > > Under the Debian jessie amd64 kernel (linux-image-3.16.0-4-amd64 > > 3.16.36-1+deb8u2) running under Xen, I cannot put the system's > > storage under heavy load without receiving a bunch of "swiotlb > > buffer is full" kernel error messages and severely degraded > > performance. Sometimes the system panics and reboots itself. […] > Can you try these two patches from the XenServer Patch queue? > https://github.com/xenserver/linux-3.x.pg/blob/master/master/series#L613-L614 Looking good. Using those patches I'm ~20 minutes into this now: Every 2.0s: cat /proc/mdstat Tue Dec 6 02:16:40 2016 Personalities : [raid1] [raid10] md5 : active raid10 sdb[0] sda[1] 1875243008 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU] [==>..................] check = 11.5% (217058176/1875243008) finish=133.9min speed=206252K/sec bitmap: 0/14 pages [0KB], 65536KB chunk md4 : active raid10 sdc[0] sdd[1] 3906886656 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU] [>....................] check = 2.6% (102650880/3906886656) finish=674.4min speed=94007K/sec bitmap: 0/30 pages [0KB], 65536KB chunk …where previously it would have given kernel errors within 5 seconds, so I think that fixes it. I will have to perform some more strenuous testing. Those two patches did not apply cleanly to source of linux-image-3.16.0-4-amd64 3.16.36-1+deb8u2. The last bit of each patch was rejected, so I removed them and put them into a separate patch file (0003-fixup.patch attached). I have not done this process in a long time so just for the archives, my process was as per: https://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official # mkdir -p /data/debian # chown andy: /data/debian # apt-get install build-essential fakeroot # apt-get build-dep linux $ cd /data/debian $ apt-get source linux $ wget https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0001-dma-add-dma_get_required_mask_from_max_pfn.patch $ wget https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch $ # remove last parts of each patch file, create 0003-fixup.patch that performs equivalent changes $ cd linux-3.16.36 $ # applying these patches is going to change symbols so changing the abiname $ # is necessary. $ # See https://kernel-handbook.alioth.debian.org/ch-versions.html#s-abi-name $ sed -i -e 's/^abiname: 4/abiname: 4bf/' debian/config/defines $ fakeroot debian/rules debian/control-real $ bash debian/bin/test-patches -f amd64 ../0001-dma-add-dma_get_required_mask_from_max_pfn.patch ../0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch ../0003-fixup.patch # dpkg -i ../linux-headers-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb ../linux-image-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb boot into new kernel under Xen $ uname -a Linux elephant 3.16.0-4bf-amd64 #1 SMP Debian 3.16.36-1+deb8u2a~test (2016-12-05) x86_64 GNU/Linux I think my next steps should be: 1. Do some more strenuous testing 2. Report bug against source package "linux" in Debian jessie with pointer to those two patches. 3. Check if those fixes are already applied in Debian backports and/or Debian testing linux package. > > Dec 4 07:06:00 elephant kernel: [22019.373653] mpt3sas 0000:01:00.0: swiotlb buffer is full (sz: 57344 bytes) > > Dec 4 07:06:00 elephant kernel: [22019.374707] mpt3sas 0000:01:00.0: swiotlb buffer is full > > Dec 4 07:06:00 elephant kernel: [22019.375754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > > Dec 4 07:06:00 elephant kernel: [22019.376430] IP: [] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas] > > Dec 4 07:06:00 elephant kernel: [22019.377122] PGD 0 > > This alone is a clear error handling bug in the mpt3sas driver. It > hasn't checked the DMA mapping call for a successful mapping before > following the NULL pointer it got given back. It is collateral damage > from the swiotlb buffer being full, but a bug none the less. Does that require reporting as an upstream linux bug in mpt3sas then? Thanks for your help. Cheers, Andy diff -u a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c --- a/drivers/xen/swiotlb-xen.c 2016-06-15 20:29:36.000000000 +0000 +++ b/drivers/xen/swiotlb-xen.c 2016-12-05 07:05:13.009992832 +0000 @@ -673,6 +673,13 @@ } EXPORT_SYMBOL_GPL(xen_swiotlb_dma_supported); +u64 +xen_swiotlb_get_required_mask(struct device *dev) +{ + return DMA_BIT_MASK(64); +} +EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask); + int xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask) { diff -u a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h --- a/include/linux/dma-mapping.h 2016-06-15 20:29:36.000000000 +0000 +++ b/include/linux/dma-mapping.h 2016-12-05 07:03:13.992601404 +0000 @@ -127,6 +127,7 @@ return dma_set_mask_and_coherent(dev, mask); } +extern u64 dma_get_required_mask_from_max_pfn(struct device *dev); extern u64 dma_get_required_mask(struct device *dev); #ifndef set_arch_dma_coherent_ops diff -u a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h --- a/include/xen/swiotlb-xen.h 2016-06-15 20:29:36.000000000 +0000 +++ b/include/xen/swiotlb-xen.h 2016-12-05 07:06:01.084938801 +0000 @@ -56,6 +56,10 @@ extern int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask); +extern u64 +xen_swiotlb_get_required_mask(struct device *dev); + + extern int xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask); #endif /* __LINUX_SWIOTLB_XEN_H */