diff mbox

[V2,3/3] mmc: mmci: Reverse IRQ handling for the arm_variant

Message ID 1402906147-26596-1-git-send-email-ulf.hansson@linaro.org
State Accepted
Commit 7878289b269d41c8e611aa6d4519feae706e49f3
Headers show

Commit Message

Ulf Hansson June 16, 2014, 8:09 a.m. UTC
Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
when using the ARM model of the PL181 and running QEMU.

The bug was reported for the following QEMU version:
$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard

To resolve the problem, let's restore the old behavior were the DATA
irq is handled prior the CMD irq, but only for the arm_variant, which
the problem was reported for.

Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/host/mmci.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

John Stultz June 16, 2014, 6:58 p.m. UTC | #1
On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
> when using the ARM model of the PL181 and running QEMU.
>
> The bug was reported for the following QEMU version:
> $ qemu-system-arm -version
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
> (c) 2003-2008 Fabrice Bellard
>
> To resolve the problem, let's restore the old behavior were the DATA
> irq is handled prior the CMD irq, but only for the arm_variant, which
> the problem was reported for.
>
> Reported-by: John Stultz <john.stultz@linaro.org>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Hey Ulf,
  Thanks for sending these out (though I wasn't cc'ed on this 3'rd
patch). The first two apply for me, but this one doesn't. What should
I be basing this off of?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 16, 2014, 9:20 p.m. UTC | #2
On 16 June 2014 20:58, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 1:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> Commit "mmc: mmci: Handle CMD irq before DATA irq", caused an issue
>> when using the ARM model of the PL181 and running QEMU.
>>
>> The bug was reported for the following QEMU version:
>> $ qemu-system-arm -version
>> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
>> (c) 2003-2008 Fabrice Bellard
>>
>> To resolve the problem, let's restore the old behavior were the DATA
>> irq is handled prior the CMD irq, but only for the arm_variant, which
>> the problem was reported for.
>>
>> Reported-by: John Stultz <john.stultz@linaro.org>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> Hey Ulf,
>   Thanks for sending these out (though I wasn't cc'ed on this 3'rd
> patch). The first two apply for me, but this one doesn't. What should
> I be basing this off of?

Hi John,

Thanks for helping out testing!

This patch based upon my latest mmc tree and the next branch. I tried
to apply it for 3.15, and I think you will be able resolve the
conflict - I should be quite trivial.

Patch 1 and 2 applies on 3.15 as well. So I guess testing this on 3.15
will provide us with answers if it solves our problem.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz June 16, 2014, 10:41 p.m. UTC | #3
On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> This patch based upon my latest mmc tree and the next branch. I tried
> to apply it for 3.15, and I think you will be able resolve the
> conflict - I should be quite trivial.

No worries. I just didn't want to waste time resolving it if it was
logically dependent on some other change.

I'll give it a shot and get back to you.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz June 16, 2014, 11:29 p.m. UTC | #4
On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> This patch based upon my latest mmc tree and the next branch. I tried
>> to apply it for 3.15, and I think you will be able resolve the
>> conflict - I should be quite trivial.
>
> No worries. I just didn't want to waste time resolving it if it was
> logically dependent on some other change.
>
> I'll give it a shot and get back to you.

So unfortunately I'm still seeing trouble..

[   94.202843] EXT4-fs error (device mmcblk0p5):
ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
gd; block bitmap corrupt.
[   94.203873] Aborting journal on device mmcblk0p5-8.
[   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
panic forced after error
[   94.206553]
[   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
3.15.0-00002-g044f37a-dirty #589
[   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
(show_stack+0x11/0x14)
[   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
(dump_stack+0x59/0x7c)
[   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
[   94.209724] [<c042a57f>] (panic) from [<c0135055>]
(ext4_handle_error+0x69/0x74)
[   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
(__ext4_grp_locked_error+0x6b/0x160)
[   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
[<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
[   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
(ext4_mb_init_cache+0x219/0x4e0)
[   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
(ext4_mb_init_group+0xbb/0x13c)
[   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
(ext4_mb_good_group+0xf3/0xfc)
[   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
(ext4_mb_regular_allocator+0x153/0x2c4)
[   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
[<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
[   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
(ext4_ext_map_blocks+0x965/0x10f0)
[   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
(ext4_map_blocks+0xff/0x374)
[   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
[<c0127049>] (ext4_writepages+0x2b9/0x4e8)
[   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
(do_writepages+0x19/0x28)
[   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
(__filemap_fdatawrite_range+0x3d/0x44)
[   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
[<c008cc3f>] (filemap_flush+0x23/0x28)
[   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
(ext4_rename+0x2f9/0x3e4)
[   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
(vfs_rename+0x183/0x45c)
[   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
(SyS_renameat2+0x22b/0x26c)
[   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
(SyS_rename+0x1f/0x24)
[   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
(ret_fast_syscall+0x1/0x5c)


That said, this mirrors the behavior when I was reverting your change
by hand on-top of 3.15. While git bisect pointed to your patch and
reverting it from the commit seems to resolve the issue at that point,
there seems to be some other commit in the 3.14->3.15-rc1 interval
that is causing problems as well.

Are there any sort of debugging options for mmc that I can use to try
to better narrow down whats going wrong?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 17, 2014, 7:33 a.m. UTC | #5
On 17 June 2014 01:29, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 3:41 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Mon, Jun 16, 2014 at 2:20 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> This patch based upon my latest mmc tree and the next branch. I tried
>>> to apply it for 3.15, and I think you will be able resolve the
>>> conflict - I should be quite trivial.
>>
>> No worries. I just didn't want to waste time resolving it if it was
>> logically dependent on some other change.
>>
>> I'll give it a shot and get back to you.
>
> So unfortunately I'm still seeing trouble..
>
> [   94.202843] EXT4-fs error (device mmcblk0p5):
> ext4_mb_generate_buddy:756: group 1, 2303 clusters in bitmap, 2272 in
> gd; block bitmap corrupt.
> [   94.203873] Aborting journal on device mmcblk0p5-8.
> [   94.206553] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
> panic forced after error
> [   94.206553]
> [   94.207420] CPU: 0 PID: 1 Comm: init Not tainted
> 3.15.0-00002-g044f37a-dirty #589
> [   94.208330] [<c0011725>] (unwind_backtrace) from [<c000f3f1>]
> (show_stack+0x11/0x14)
> [   94.208835] [<c000f3f1>] (show_stack) from [<c042d599>]
> (dump_stack+0x59/0x7c)
> [   94.209288] [<c042d599>] (dump_stack) from [<c042a57f>] (panic+0x67/0x178)
> [   94.209724] [<c042a57f>] (panic) from [<c0135055>]
> (ext4_handle_error+0x69/0x74)
> [   94.210184] [<c0135055>] (ext4_handle_error) from [<c01358db>]
> (__ext4_grp_locked_error+0x6b/0x160)
> [   94.210747] [<c01358db>] (__ext4_grp_locked_error) from
> [<c0143691>] (ext4_mb_generate_buddy+0x1b1/0x29c)
> [   94.211392] [<c0143691>] (ext4_mb_generate_buddy) from [<c0144dfd>]
> (ext4_mb_init_cache+0x219/0x4e0)
> [   94.211959] [<c0144dfd>] (ext4_mb_init_cache) from [<c014517f>]
> (ext4_mb_init_group+0xbb/0x13c)
> [   94.213973] [<c014517f>] (ext4_mb_init_group) from [<c01452f3>]
> (ext4_mb_good_group+0xf3/0xfc)
> [   94.214873] [<c01452f3>] (ext4_mb_good_group) from [<c01462ab>]
> (ext4_mb_regular_allocator+0x153/0x2c4)
> [   94.215953] [<c01462ab>] (ext4_mb_regular_allocator) from
> [<c01486b1>] (ext4_mb_new_blocks+0x2fd/0x4e4)
> [   94.216939] [<c01486b1>] (ext4_mb_new_blocks) from [<c013fe41>]
> (ext4_ext_map_blocks+0x965/0x10f0)
> [   94.217694] [<c013fe41>] (ext4_ext_map_blocks) from [<c01230ff>]
> (ext4_map_blocks+0xff/0x374)
> [   94.219200] [<c0126839>] (mpage_map_and_submit_extent) from
> [<c0127049>] (ext4_writepages+0x2b9/0x4e8)
> [   94.219972] [<c0127049>] (ext4_writepages) from [<c0094e69>]
> (do_writepages+0x19/0x28)
> [   94.220648] [<c0094e69>] (do_writepages) from [<c008cbcd>]
> (__filemap_fdatawrite_range+0x3d/0x44)
> [   94.221391] [<c008cbcd>] (__filemap_fdatawrite_range) from
> [<c008cc3f>] (filemap_flush+0x23/0x28)
> [   94.222135] [<c008cc3f>] (filemap_flush) from [<c012c419>]
> (ext4_rename+0x2f9/0x3e4)
> [   94.222806] [<c012c419>] (ext4_rename) from [<c00c3707>]
> (vfs_rename+0x183/0x45c)
> [   94.223496] [<c00c3707>] (vfs_rename) from [<c00c3c0b>]
> (SyS_renameat2+0x22b/0x26c)
> [   94.224154] [<c00c3c0b>] (SyS_renameat2) from [<c00c3c83>]
> (SyS_rename+0x1f/0x24)
> [   94.224801] [<c00c3c83>] (SyS_rename) from [<c000cd41>]
> (ret_fast_syscall+0x1/0x5c)
>
>
> That said, this mirrors the behavior when I was reverting your change
> by hand on-top of 3.15. While git bisect pointed to your patch and
> reverting it from the commit seems to resolve the issue at that point,
> there seems to be some other commit in the 3.14->3.15-rc1 interval
> that is causing problems as well.
>
> Are there any sort of debugging options for mmc that I can use to try
> to better narrow down whats going wrong?

It seems like you want to debug the mmci host driver and unfortunate
the debug utilities available are only dev_dbg prints. I wouldn't be
surprised if the problem goes away when you enable them. :-)

I have some other locally stored debug patches for mmci, but those are
not re-based and I am not sure you want to deal with them as is.

I guess I need to set up the QEMU environment and run the tests
myself, unless we go for the revert path.
How do you perform the tests, is just a simple mounting/un-mounting
that triggers the problem?
Any specific things that I need to think of when running QEMU?

Kind regards
Uffe

>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz June 17, 2014, 1:57 p.m. UTC | #6
On Tue, Jun 17, 2014 at 12:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> It seems like you want to debug the mmci host driver and unfortunate
> the debug utilities available are only dev_dbg prints. I wouldn't be
> surprised if the problem goes away when you enable them. :-)
>
> I have some other locally stored debug patches for mmci, but those are
> not re-based and I am not sure you want to deal with them as is.
>
> I guess I need to set up the QEMU environment and run the tests
> myself, unless we go for the revert path.

Again, a simple revert *doesn't* seem to work. There's something else
in the 3.14->3.15-rc1 series that is also causing problems. Reverting
the patch only seems to work very close to the point that the commit
was made. I'm trying to redo the bisection reverting the first
problematic change each step to see if I can narrow the second problem
(but I've had some interruptions so I've not made much progress there
yet).

> How do you perform the tests, is just a simple mounting/un-mounting
> that triggers the problem?
> Any specific things that I need to think of when running QEMU?

Unfortunately its not quite so simple as mounting/unmounting.

My environment is the Linaro Android image:
http://releases.linaro.org/14.04/android/images/armv7-lsk/vexpress.img.bz2
(using the initrd in
http://releases.linaro.org/14.04/android/images/armv7-lsk/boot.tar.bz2).

My qemu line is:
QEMU_AUDIO_DRV=none qemu-system-arm -kernel zImage -initrd initrd -M
vexpress-a9 -cpu cortex-a9 -nographic -vnc :0 -m 1G -append
'root=/dev/mmcblk0p1 rw mem=1024M raid=noautodetect
console=ttyAMA0,38400n8 rootwait vmalloc=256MB devtmpfs.mount=0' -sd
vexpress-android.img -redir tcp:5555::5555

The first bootup is usually *very* slow (since it has to dex
everything), and I usually use the zImage kernel in the boot.tar.bz2
to let that finish so I can get a sane backup image. Connecting via
vnc to localhost:0 will let you know when its booted all the way and
finished dexing. I'd expect to wait ~5-10 minutes.

I then kill qemu, and make a backup copy of the disk image so future
boots are a bit faster.

And then booting again with my kernel zImage (config attached).

It most frequently triggers within ~90 seconds of boot, but not always
(I'll go ahead and kill qemu again if I don't see it). Usually I can
reproduce it within 4 boots. One potential clue here is that it
doesn't seem to trigger on the first boot w/ the affected kernel.

Every time I see corruption, I revert back to the backup dex'ed disk image.

thanks
-john
diff mbox

Patch

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 5d20bfba..e4d4707 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -74,6 +74,7 @@  static unsigned int fmax = 515633;
  * @pwrreg_nopower: bits in MMCIPOWER don't controls ext. power supply
  * @explicit_mclk_control: enable explicit mclk control in driver.
  * @qcom_fifo: enables qcom specific fifo pio read logic.
+ * @reversed_irq_handling: handle data irq before cmd irq.
  */
 struct variant_data {
 	unsigned int		clkreg;
@@ -97,6 +98,7 @@  struct variant_data {
 	bool			pwrreg_nopower;
 	bool			explicit_mclk_control;
 	bool			qcom_fifo;
+	bool			reversed_irq_handling;
 };
 
 static struct variant_data variant_arm = {
@@ -105,6 +107,7 @@  static struct variant_data variant_arm = {
 	.datalength_bits	= 16,
 	.pwrreg_powerup		= MCI_PWR_UP,
 	.f_max			= 100000000,
+	.reversed_irq_handling	= true,
 };
 
 static struct variant_data variant_arm_extended_fifo = {
@@ -1236,8 +1239,13 @@  static irqreturn_t mmci_irq(int irq, void *dev_id)
 
 		dev_dbg(mmc_dev(host->mmc), "irq0 (data+cmd) %08x\n", status);
 
-		mmci_cmd_irq(host, host->cmd, status);
-		mmci_data_irq(host, host->data, status);
+		if (host->variant->reversed_irq_handling) {
+			mmci_data_irq(host, host->data, status);
+			mmci_cmd_irq(host, host->cmd, status);
+		} else {
+			mmci_cmd_irq(host, host->cmd, status);
+			mmci_data_irq(host, host->data, status);
+		}
 
 		/* Don't poll for busy completion in irq context. */
 		if (host->busy_status)