diff mbox

mmiotracer hangs the system

Message ID CAEXux-boRqEqPzfyBmz3i+wHhuFADZm5vMkj6co7L8p1c_N8nw@mail.gmail.com
State New
Headers show

Commit Message

Karol Herbst Nov. 24, 2016, 8:50 p.m. UTC
sorry for that, but I forgot the patch

2016-11-19 11:56 GMT+01:00 Karol Herbst <karolherbst@gmail.com>:
> this is odd, I found a bug related to nouveau (modprobe/bind doesn't

> return), but that isn't related to your issue at all or maybe it is

> exactly this, cause the binding of the device doesn't return and

> depending on the kind of driver, it would hang the system... yeah,

> maybe it is the same issue.

>

> anyway, could you try to trace with the attached patch? Maybe the

> additional output would help me to verify it. Currently I am working

> on the bugfix I mentioned above and this may also fix your issue. I

> was still able to get a working mmiotrace file, even if the dvice

> binding didn't finish. Is this the same for you? (try cat

> "/sys/kernel/debug/tracing/trace_pipe > some_file"; and see if this

> contains anything usefull).

>

> This really looks like an odd issue, because the mmiotracer still

> behaves as expected.

>

> 2016-10-22 18:02 GMT+02:00 Andy Shevchenko <andy.shevchenko@gmail.com>:

>> On Fri, Oct 14, 2016 at 12:12 AM, Karol Herbst <karolherbst@gmail.com> wrote:

>>> sorry for the delay fixing that bug. I got occupied with other things

>>> and didn't really got to the issue again, it is on my todo list as the

>>> next item though and I hope I will be able to get a fix ready this

>>> weekend. I think I might know where the issue is, but didn't confirm

>>> it yet.

>>

>> Thanks.I'm still using revert. Feel free to Cc me when you will have

>> some material to test.

>>

>>>

>>> Again, sorry for the delay.

>>>

>>> Karol

>>>

>>> 2016-08-19 22:46 GMT+02:00 Karol Herbst <karolherbst@gmail.com>:

>>>> Hi again,

>>>>

>>>> I was able to get a crash/freeze/something while unbinding/binding my

>>>> nvidia gpu from nouveau.

>>>>

>>>> Guess that means something is odd. I will investigate this more over

>>>> the weekend.

>>>>

>>>> 2016-08-19 17:35 GMT+02:00 Andy Shevchenko <andy.shevchenko@gmail.com>:

>>>>> On Fri, Aug 19, 2016 at 6:08 PM, karol herbst <karolherbst@gmail.com> wrote:

>>>>>> 2016-08-19 15:02 GMT+02:00 Andy Shevchenko <andy.shevchenko@gmail.com>:

>>>>>>> On Fri, Aug 19, 2016 at 1:35 PM, karol herbst <karolherbst@gmail.com> wrote:

>>>>>>>> is there any update on that issue I missed somehow? I really don't

>>>>>>>> want to leave the mmiotracer in a state, where it breaks something

>>>>>>>> while fixing other issues.

>>>>>>>

>>>>>>> No updates. I'm busy right now with more priority tasks and revert

>>>>>>> works for me. Issue is reproducible in my case 100%.

>>>>>>>

>>>>>>

>>>>>> Is there something I could do with a "normal" haswell desktop system

>>>>>> to reproduce this issue?

>>>>>

>>>>> Try LPSS UART device(s)

>>>>>

>>>>>>

>>>>>> I'll try to play around the next days a bit and maybe I find something

>>>>>> that works out here as well. It seems to be related to

>>>>>> unmapping-mapping cycles.

>>>>>

>>>>> That is the only thing I would think of.

>>>>>

>>>>>>

>>>>>> Because if this only happens with the pwm-lpss driver,

>>>>>

>>>>> It has nothing to do with pwm-lpss since it's a HS UART and served by

>>>>> intel-lpss driver.

>>>>>

>>>>>> it may be

>>>>>> really troublesome to debug, because I don't really know the code that

>>>>>> well to be sure where the issue might be.

>>>>>>

>>>>>>> So, I would able to attach dmesg in case it would be helpful.

>>>>>>> Otherwise tell me exact instructions how to debug the issue.

>>>>>>>

>>>>>>> Here you are:

>>>>>>> http://pastebin.com/raw/VfTZENt7

>>>>>>>

>>>>>>>> But for now, without being able to even reproduce the issue, I can't

>>>>>>>> really do much, because the code in the current state looks sane to

>>>>>>>> me. Maybe this case includes the mmiotracer cleaning things up and

>>>>>>>> arms new region for mmiotracing and that's why it fails? Besides that,

>>>>>>>> I have no idea and no way to reproduce this, so I can't help this way.

>>>>>>>

>>>>>>> Maybe. First thing happened is iounmap().

>>>>>

>>>>>

>>>>> --

>>>>> With Best Regards,

>>>>> Andy Shevchenko

>>

>>

>>

>> --

>> With Best Regards,

>> Andy Shevchenko
diff mbox

Patch

From 92aea447a776f10aad0a2e971b5f2b208a1161d2 Mon Sep 17 00:00:00 2001
From: Karol Herbst <nouveau@karolherbst.de>
Date: Thu, 24 Nov 2016 21:46:27 +0100
Subject: [PATCH] temp hack

---
 arch/x86/mm/kmmio.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index afc47f5c9531..a002ee314a0c 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -97,11 +97,16 @@  static DEFINE_PER_CPU(struct kmmio_context, kmmio_ctx);
 static struct kmmio_probe *get_kmmio_probe(unsigned long addr)
 {
 	struct kmmio_probe *p;
+	struct kmmio_probe *result = NULL;
 	list_for_each_entry_rcu(p, &kmmio_probes, list) {
-		if (addr >= p->addr && addr < (p->addr + p->len))
-			return p;
+		if (addr >= p->addr && addr < (p->addr + p->len)) {
+			if (!result)
+				result = p;
+			else
+				printk(KERN_ERR " %s collision detected %lu", __FUNCTION__, addr);
+		}
 	}
-	return NULL;
+	return result;
 }
 
 /* You must be holding RCU read lock. */
@@ -109,6 +114,7 @@  static struct kmmio_fault_page *get_kmmio_fault_page(unsigned long addr)
 {
 	struct list_head *head;
 	struct kmmio_fault_page *f;
+	struct kmmio_fault_page *result = NULL;
 	unsigned int l;
 	pte_t *pte = lookup_address(addr, &l);
 
@@ -116,11 +122,16 @@  static struct kmmio_fault_page *get_kmmio_fault_page(unsigned long addr)
 		return NULL;
 	addr &= page_level_mask(l);
 	head = kmmio_page_list(addr);
+
 	list_for_each_entry_rcu(f, head, list) {
-		if (f->addr == addr)
-			return f;
+		if (f->addr == addr) {
+			if (!result)
+				return f;
+			else
+				printk(KERN_ERR " %s collision detected %lu", __FUNCTION__, addr);
+		}
 	}
-	return NULL;
+	return result;
 }
 
 static void clear_pmd_presence(pmd_t *pmd, bool clear, pmdval_t *old)
@@ -375,6 +386,7 @@  static int add_kmmio_fault_page(unsigned long addr)
 {
 	struct kmmio_fault_page *f;
 
+	printk(KERN_WARNING " %s %lx", __FUNCTION__, addr);
 	f = get_kmmio_fault_page(addr);
 	if (f) {
 		if (!f->count)
@@ -406,6 +418,7 @@  static void release_kmmio_fault_page(unsigned long addr,
 {
 	struct kmmio_fault_page *f;
 
+	printk(KERN_WARNING " %s %lx", __FUNCTION__, addr);
 	f = get_kmmio_fault_page(addr);
 	if (!f)
 		return;
@@ -445,6 +458,8 @@  int register_kmmio_probe(struct kmmio_probe *p)
 	}
 
 	pte = lookup_address(p->addr, &l);
+	printk(KERN_WARNING " %s %lx %u", __FUNCTION__, p->addr, l);
+
 	if (!pte) {
 		ret = -EINVAL;
 		goto out;
@@ -537,6 +552,8 @@  void unregister_kmmio_probe(struct kmmio_probe *p)
 	if (!pte)
 		return;
 
+	printk(KERN_WARNING " %s %lx %u", __FUNCTION__, p->addr, l);
+
 	spin_lock_irqsave(&kmmio_lock, flags);
 	while (size < size_lim) {
 		release_kmmio_fault_page(p->addr + size, &release_list);
-- 
2.11.0.rc2