Message ID | 7fd16549-1349-a9e5-ceff-9aa6f748caae@intel.com |
---|---|
State | New |
Headers | show |
On 01/09/2017 02:15 AM, Daniel Vetter wrote: ... > Can you pls do some printk tracing to make sure that without your patch > we're indeed releasing the same connector twice from this loop? I suspect > you're just ever-so-slightly shifting the timing and things blow up > somewhre else. But no idea where :( Your suspicious appear correct. I still get an oops even with my patch applied. I did add some printk's but they're weird. I'm seeing a 'kdev' that looks to be -2, although it should never get set to that unless device_create_with_groups() and th error checking failed somehow. I'll take out the locking portion of the patch and add some more printk's and see if anything else shows up. Here's a copy of the printk's and associated oops, with my locking modification patch applied, if this helps at all: > [ 7927.049763] drm_dp_destroy_port() kfree(ffff88040c759000) > [ 7927.050039] drm_dp_destroy_connector_work() port: ffff88040c759800 connector: ffff88040c75a000 > [ 7927.050061] drm_sysfs_connector_remove() connector: ffff88040c75a000 kdev: fffffffffffffffe > [ 7927.050106] BUG: unable to handle kernel NULL pointer dereference at 000000000000009e > [ 7927.050242] IP: device_del+0x19/0x330 > [ 7927.050300] PGD 0 > [ 7927.050302] > [ 7927.050364] Oops: 0000 [#1] SMP > [ 7927.050414] Modules linked in: netconsole ctr ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc iptable_filter ip_tables ebtable_nat ebtables x_tables cmac rfcomm bnep dm_crypt arc4 iwlmvm mac80211 snd_hda_codec_hdmi iwlwifi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_seq_midi snd_seq_midi_event snd_hda_codec cfg80211 snd_hwdep snd_rawmidi intel_rapl snd_hda_core iosf_mbi snd_pcm snd_seq btusb x86_pkg_temp_thermal hid_logitech_hidpp btrtl btbcm btintel coretemp shpchp snd_timer ghash_clmulni_intel bluetooth joydev thinkpad_acpi snd_seq_device nvram wmi snd soundcore mac_hid aesni_intel aes_x86_64 crypto_simd cryptd glue_helper > [ 7927.051492] kvm_intel kvm irqbypass hid_generic hid_logitech_dj usbhid hid [last unloaded: netconsole] > [ 7927.051642] CPU: 2 PID: 123 Comm: kworker/2:2 Tainted: G W 4.10.0-rc2-dirty #53 > [ 7927.051763] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > [ 7927.051887] Workqueue: events drm_dp_destroy_connector_work > [ 7927.051971] task: ffff88040c21bc00 task.stack: ffffc90002484000 > [ 7927.052065] RIP: 0010:device_del+0x19/0x330 > [ 7927.052129] RSP: 0018:ffffc90002487d58 EFLAGS: 00010282 > [ 7927.052209] RAX: 0000000000000000 RBX: fffffffffffffffe RCX: ffff88040c72d1f8 > [ 7927.052310] RDX: ffffffff81cb69b2 RSI: 0000000000000001 RDI: fffffffffffffffe > [ 7927.052412] RBP: ffffc90002487d90 R08: 0000000000000000 R09: 0000000000000392 > [ 7927.052517] R10: 00000003e05de802 R11: 0000000000000000 R12: fffffffffffffffe > [ 7927.052677] R13: ffff88040c516c18 R14: 0000000000000000 R15: ffff88040c516bd8 > [ 7927.052781] FS: 0000000000000000(0000) GS:ffff880421500000(0000) knlGS:0000000000000000 > [ 7927.052897] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7927.052981] CR2: 000000000000009e CR3: 0000000001e0b000 CR4: 00000000003406e0 > [ 7927.053082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 7927.053186] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 7927.053288] Call Trace: > [ 7927.053338] ? printk+0x4d/0x4f > [ 7927.053396] device_unregister+0x12/0x30 > [ 7927.053464] drm_sysfs_connector_remove+0x57/0x70 > [ 7927.053538] drm_connector_unregister.part.8+0x27/0x40 > [ 7927.053616] drm_connector_unregister+0x14/0x20 > [ 7927.053690] intel_dp_destroy_mst_connector+0x1a/0x80 > [ 7927.053771] drm_dp_destroy_connector_work+0xce/0x150 > [ 7927.053850] process_one_work+0x14b/0x430 > [ 7927.053914] worker_thread+0x12b/0x4a0 > [ 7927.053982] kthread+0x10c/0x140 > [ 7927.054037] ? process_one_work+0x430/0x430 > [ 7927.054104] ? kthread_create_on_node+0x40/0x40 > [ 7927.054179] ret_from_fork+0x27/0x40 > [ 7927.054237] Code: 00 00 00 00 00 00 00 5b 41 5c 41 5d 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 <48> 8b 87 a0 00 00 00 4c 8b 2f 48 85 c0 74 1b 48 8b b8 90 00 00 > [ 7927.054462] RIP: device_del+0x19/0x330 RSP: ffffc90002487d58 > [ 7927.054486] CR2: 000000000000009e > [ 7927.065278] ---[ end trace 1a2a5ca2cf6c3a4f ]---
Well, now I found where the -2 comes from. intel_dp_register_mst_connector() calls drm_connector_register(), which fails to add the kobject (warning below). But, it does zero error checking on the drm_connector_register() call and leaves the partially-constructed connector in place. The next time some poor, hapless code goes and tries to do anything with that kdev, they oops. I'm perplexed by this, though. The drm_dp_mst_topology_cbs->register_connector just returns void. It seems a bit goofy that it can't even _return_ failure. Is there some stable code to go back to here? Or, is there something about my configuration that's unique? I really wonder why nobody else is running into this. There's probably some other race going on here. This warning doesn't happen on every boot. > [ 1.396475] ------------[ cut here ]------------ > [ 1.396486] WARNING: CPU: 0 PID: 91 at lib/kobject.c:244 kobject_add_internal+0x108/0x2d0 > [ 1.396492] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) > [ 1.396496] Modules linked in: > [ 1.396500] CPU: 0 PID: 91 Comm: kworker/0:2 Not tainted 4.10.0-rc2-dirty #58 > [ 1.396503] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > [ 1.396509] Workqueue: events_long drm_dp_mst_link_probe_work > [ 1.396511] Call Trace: > [ 1.396517] dump_stack+0x67/0x99 > [ 1.396521] __warn+0xd1/0xf0 > [ 1.396524] warn_slowpath_fmt+0x4f/0x60 > [ 1.396527] kobject_add_internal+0x108/0x2d0 > [ 1.396531] ? vsnprintf+0x112/0x4b0 > [ 1.396533] kobject_add+0x65/0xb0 > [ 1.396537] ? klist_init+0x31/0x40 > [ 1.396542] device_add+0x102/0x5d0 > [ 1.396546] ? kfree_const+0x22/0x30 > [ 1.396549] device_create_groups_vargs+0xd8/0x100 > [ 1.396553] device_create_with_groups+0x36/0x40 > [ 1.396557] ? drm_fb_helper_add_one_connector+0x57/0xd0 > [ 1.396561] ? kmem_cache_alloc_trace+0x1d2/0x1f0 > [ 1.396566] drm_sysfs_connector_add+0x60/0xe0 > [ 1.396570] drm_connector_register+0x21/0xc0 > [ 1.396575] intel_dp_register_mst_connector+0x41/0x50 > [ 1.396579] drm_dp_add_port+0x350/0x450 > [ 1.396583] ? rcu_early_boot_tests+0x1/0x10 > [ 1.396587] ? schedule_timeout+0x1cd/0x390 > [ 1.396591] ? __might_sleep+0x4a/0x90 > [ 1.396594] ? mutex_lock+0x25/0x50 > [ 1.396597] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 > [ 1.396600] ? prepare_to_wait_event+0x120/0x120 > [ 1.396603] ? drm_dp_check_mstb_guid+0x3d/0x120 > [ 1.396606] drm_sysfs_connector_add() connector: ffff88040ed3f000 kdev: ffff88040c55dc00 > [ 1.396610] drm_dp_send_link_address+0x185/0x1f0 > [ 1.396614] drm_dp_check_and_send_link_address+0xad/0xc0 > [ 1.396617] drm_dp_mst_link_probe_work+0x57/0xa0 > [ 1.396620] process_one_work+0x14b/0x430 > [ 1.396623] worker_thread+0x12b/0x4a0 > [ 1.396626] kthread+0x10c/0x140 > [ 1.396629] ? process_one_work+0x430/0x430 > [ 1.396632] ? kthread_create_on_node+0x40/0x40 > [ 1.396635] ret_from_fork+0x27/0x40 > [ 1.396639] ---[ end trace dd05bb2cd9ebd900 ]--- > [ 1.396643] drm_sysfs_connector_add() connector: ffff88040c75a000 kdev: fffffffffffffffe > [ 1.396648] [drm:drm_sysfs_connector_add] *ERROR* failed to register connector device: -2 > [ 1.396651] drm_connector_register(ffff88040ed3f000) connector->registered: 1 > [ 1.396666] drm_sysfs_connector_add() connector: ffff88040ed3f800 kdev: ffff88040c55e400 > [ 1.396700] drm_connector_register(ffff88040ed3f800) connector->registered: 1 > [ 1.396717] drm_sysfs_connector_add() connector: ffff88040ed39800 kdev: ffff88040c55e800 > [ 1.396726] drm_connector_register(ffff88040ed39800) connector->registered: 1 > [ 1.396743] drm_sysfs_connector_add() connector: ffff88040c758000 kdev: ffff88040c55ec00 > [ 1.396769] drm_connector_register(ffff88040c758000) connector->registered: 1 > [ 1.396786] drm_sysfs_connector_add() connector: ffff88040c758800 kdev: ffff88040c55f000 > [ 1.396799] drm_connector_register(ffff88040c758800) connector->registered: 1 > [ 1.396804] drm_connector_register(ffff88040c75a000) connector->registered: 1 > [ 1.398665] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) > [ 1.398893] drm_sysfs_connector_add() connector: ffff88040c75b000 kdev: ffff88040c790400 > [ 1.398907] drm_connector_register(ffff88040c75b000) connector->registered: 1 > [ 1.398930] drm_sysfs_connector_add() connector: ffff88040c75c000 kdev: ffff88040c790800 > [ 1.398939] drm_connector_register(ffff88040c75c000) connector->registered: 1
On 01/09/2017 05:40 AM, Dave Hansen wrote: > Is there some stable code to go back to here? Or, is there something > about my configuration that's unique? I really wonder why nobody else > is running into this. Here are a couple of similar-looking reports, if that helps: https://bugzilla.redhat.com/attachment.cgi?id=1209947 https://bugzilla.redhat.com/show_bug.cgi?id=1382342 http://marc.info/?l=openbsd-bugs&m=147487082821348&w=2
On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: > Well, now I found where the -2 comes from. > intel_dp_register_mst_connector() calls drm_connector_register(), which > fails to add the kobject (warning below). But, it does zero error > checking on the drm_connector_register() call and leaves the > partially-constructed connector in place. > > The next time some poor, hapless code goes and tries to do anything with > that kdev, they oops. I'm perplexed by this, though. The > drm_dp_mst_topology_cbs->register_connector just returns void. It seems > a bit goofy that it can't even _return_ failure. > > Is there some stable code to go back to here? Or, is there something > about my configuration that's unique? I really wonder why nobody else > is running into this. > > There's probably some other race going on here. This warning doesn't > happen on every boot. This smells more like the root-cause: Something goes wrong on boot that prevents connectors from properly registering, then we fall over later on. And the register callback is intentionally void, assuming that any prep work has been done earlier and that therefore the register step can't fail. Can you pls check whether the oops later on only happens together with this warning at boot, or whether they're not correlated? Still no idea though why this blows up for you. -Daniel > >> [ 1.396475] ------------[ cut here ]------------ >> [ 1.396486] WARNING: CPU: 0 PID: 91 at lib/kobject.c:244 kobject_add_internal+0x108/0x2d0 >> [ 1.396492] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) >> [ 1.396496] Modules linked in: >> [ 1.396500] CPU: 0 PID: 91 Comm: kworker/0:2 Not tainted 4.10.0-rc2-dirty #58 >> [ 1.396503] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 >> [ 1.396509] Workqueue: events_long drm_dp_mst_link_probe_work >> [ 1.396511] Call Trace: >> [ 1.396517] dump_stack+0x67/0x99 >> [ 1.396521] __warn+0xd1/0xf0 >> [ 1.396524] warn_slowpath_fmt+0x4f/0x60 >> [ 1.396527] kobject_add_internal+0x108/0x2d0 >> [ 1.396531] ? vsnprintf+0x112/0x4b0 >> [ 1.396533] kobject_add+0x65/0xb0 >> [ 1.396537] ? klist_init+0x31/0x40 >> [ 1.396542] device_add+0x102/0x5d0 >> [ 1.396546] ? kfree_const+0x22/0x30 >> [ 1.396549] device_create_groups_vargs+0xd8/0x100 >> [ 1.396553] device_create_with_groups+0x36/0x40 >> [ 1.396557] ? drm_fb_helper_add_one_connector+0x57/0xd0 >> [ 1.396561] ? kmem_cache_alloc_trace+0x1d2/0x1f0 >> [ 1.396566] drm_sysfs_connector_add+0x60/0xe0 >> [ 1.396570] drm_connector_register+0x21/0xc0 >> [ 1.396575] intel_dp_register_mst_connector+0x41/0x50 >> [ 1.396579] drm_dp_add_port+0x350/0x450 >> [ 1.396583] ? rcu_early_boot_tests+0x1/0x10 >> [ 1.396587] ? schedule_timeout+0x1cd/0x390 >> [ 1.396591] ? __might_sleep+0x4a/0x90 >> [ 1.396594] ? mutex_lock+0x25/0x50 >> [ 1.396597] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 >> [ 1.396600] ? prepare_to_wait_event+0x120/0x120 >> [ 1.396603] ? drm_dp_check_mstb_guid+0x3d/0x120 >> [ 1.396606] drm_sysfs_connector_add() connector: ffff88040ed3f000 kdev: ffff88040c55dc00 >> [ 1.396610] drm_dp_send_link_address+0x185/0x1f0 >> [ 1.396614] drm_dp_check_and_send_link_address+0xad/0xc0 >> [ 1.396617] drm_dp_mst_link_probe_work+0x57/0xa0 >> [ 1.396620] process_one_work+0x14b/0x430 >> [ 1.396623] worker_thread+0x12b/0x4a0 >> [ 1.396626] kthread+0x10c/0x140 >> [ 1.396629] ? process_one_work+0x430/0x430 >> [ 1.396632] ? kthread_create_on_node+0x40/0x40 >> [ 1.396635] ret_from_fork+0x27/0x40 >> [ 1.396639] ---[ end trace dd05bb2cd9ebd900 ]--- >> [ 1.396643] drm_sysfs_connector_add() connector: ffff88040c75a000 kdev: fffffffffffffffe >> [ 1.396648] [drm:drm_sysfs_connector_add] *ERROR* failed to register connector device: -2 >> [ 1.396651] drm_connector_register(ffff88040ed3f000) connector->registered: 1 >> [ 1.396666] drm_sysfs_connector_add() connector: ffff88040ed3f800 kdev: ffff88040c55e400 >> [ 1.396700] drm_connector_register(ffff88040ed3f800) connector->registered: 1 >> [ 1.396717] drm_sysfs_connector_add() connector: ffff88040ed39800 kdev: ffff88040c55e800 >> [ 1.396726] drm_connector_register(ffff88040ed39800) connector->registered: 1 >> [ 1.396743] drm_sysfs_connector_add() connector: ffff88040c758000 kdev: ffff88040c55ec00 >> [ 1.396769] drm_connector_register(ffff88040c758000) connector->registered: 1 >> [ 1.396786] drm_sysfs_connector_add() connector: ffff88040c758800 kdev: ffff88040c55f000 >> [ 1.396799] drm_connector_register(ffff88040c758800) connector->registered: 1 >> [ 1.396804] drm_connector_register(ffff88040c75a000) connector->registered: 1 >> [ 1.398665] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) >> [ 1.398893] drm_sysfs_connector_add() connector: ffff88040c75b000 kdev: ffff88040c790400 >> [ 1.398907] drm_connector_register(ffff88040c75b000) connector->registered: 1 >> [ 1.398930] drm_sysfs_connector_add() connector: ffff88040c75c000 kdev: ffff88040c790800 >> [ 1.398939] drm_connector_register(ffff88040c75c000) connector->registered: 1 > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On 01/09/2017 08:41 AM, Daniel Vetter wrote: > On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: >> Well, now I found where the -2 comes from. >> intel_dp_register_mst_connector() calls drm_connector_register(), which >> fails to add the kobject (warning below). But, it does zero error >> checking on the drm_connector_register() call and leaves the >> partially-constructed connector in place. >> >> The next time some poor, hapless code goes and tries to do anything with >> that kdev, they oops. I'm perplexed by this, though. The >> drm_dp_mst_topology_cbs->register_connector just returns void. It seems >> a bit goofy that it can't even _return_ failure. >> >> Is there some stable code to go back to here? Or, is there something >> about my configuration that's unique? I really wonder why nobody else >> is running into this. >> >> There's probably some other race going on here. This warning doesn't >> happen on every boot. > This smells more like the root-cause: Something goes wrong on boot > that prevents connectors from properly registering, then we fall over > later on. And the register callback is intentionally void, assuming > that any prep work has been done earlier and that therefore the > register step can't fail. Can you pls check whether the oops later on > only happens together with this warning at boot, or whether they're > not correlated? Looking through my logs, I can't find any instance of the oops without the warning at boot. So I do think the later oops is entirely caused by the issue warned about in early boot. My distro kernel (4.4.0-57-generic) is also unstable, but I haven't managed to capture a good oops there. It's hitting this, which I assume is unrelated: WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux- lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675 skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]()
On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen <dave.hansen@intel.com> wrote: > On 01/09/2017 08:41 AM, Daniel Vetter wrote: >> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: >>> Well, now I found where the -2 comes from. >>> intel_dp_register_mst_connector() calls drm_connector_register(), which >>> fails to add the kobject (warning below). But, it does zero error >>> checking on the drm_connector_register() call and leaves the >>> partially-constructed connector in place. >>> >>> The next time some poor, hapless code goes and tries to do anything with >>> that kdev, they oops. I'm perplexed by this, though. The >>> drm_dp_mst_topology_cbs->register_connector just returns void. It seems >>> a bit goofy that it can't even _return_ failure. >>> >>> Is there some stable code to go back to here? Or, is there something >>> about my configuration that's unique? I really wonder why nobody else >>> is running into this. >>> >>> There's probably some other race going on here. This warning doesn't >>> happen on every boot. >> This smells more like the root-cause: Something goes wrong on boot >> that prevents connectors from properly registering, then we fall over >> later on. And the register callback is intentionally void, assuming >> that any prep work has been done earlier and that therefore the >> register step can't fail. Can you pls check whether the oops later on >> only happens together with this warning at boot, or whether they're >> not correlated? > > Looking through my logs, I can't find any instance of the oops without > the warning at boot. So I do think the later oops is entirely caused by > the issue warned about in early boot. Hm, I guess then we'd need to fix that boot-up warning. Can you try to figure out why it's unhappy? On a hunch it could be that we call drm_connector_register from the mst probe worker before the main driver load thread has reached the drm_dev_register call. A few printk to decide whether that's the case (plus a few boot-up tests to gather the statistics, sorry about that) would be real great. If that's inconclusive I'm again a bit low on ideas ... > My distro kernel (4.4.0-57-generic) is also unstable, but I haven't > managed to capture a good oops there. It's hitting this, which I assume > is unrelated: > > WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux- > lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675 > skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]() wm programming issues, which will kill your box. Needs a newer kernel to fix (both the wm programming issues, and that wm programming issues lead to system death). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On 01/09/2017 08:59 AM, Daniel Vetter wrote: > On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen <dave.hansen@intel.com> wrote: >> On 01/09/2017 08:41 AM, Daniel Vetter wrote: >>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: >>>> Well, now I found where the -2 comes from. >>>> intel_dp_register_mst_connector() calls drm_connector_register(), which >>>> fails to add the kobject (warning below). But, it does zero error >>>> checking on the drm_connector_register() call and leaves the >>>> partially-constructed connector in place. >>>> >>>> The next time some poor, hapless code goes and tries to do anything with >>>> that kdev, they oops. I'm perplexed by this, though. The >>>> drm_dp_mst_topology_cbs->register_connector just returns void. It seems >>>> a bit goofy that it can't even _return_ failure. >>>> >>>> Is there some stable code to go back to here? Or, is there something >>>> about my configuration that's unique? I really wonder why nobody else >>>> is running into this. >>>> >>>> There's probably some other race going on here. This warning doesn't >>>> happen on every boot. >>> This smells more like the root-cause: Something goes wrong on boot >>> that prevents connectors from properly registering, then we fall over >>> later on. And the register callback is intentionally void, assuming >>> that any prep work has been done earlier and that therefore the >>> register step can't fail. Can you pls check whether the oops later on >>> only happens together with this warning at boot, or whether they're >>> not correlated? >> >> Looking through my logs, I can't find any instance of the oops without >> the warning at boot. So I do think the later oops is entirely caused by >> the issue warned about in early boot. > > Hm, I guess then we'd need to fix that boot-up warning. Can you try to > figure out why it's unhappy? On a hunch it could be that we call > drm_connector_register from the mst probe worker before the main > driver load thread has reached the drm_dev_register call. A few printk > to decide whether that's the case (plus a few boot-up tests to gather > the statistics, sorry about that) would be real great. > > If that's inconclusive I'm again a bit low on ideas ... I'll do that shortly. But, for now I can confirm that the failure is precipitated by the !parent check in sysfs_create_dir_ns(). I also can't reproduce this if I build i915 as a module. It only happens when built in. > Jan 9 09:07:34 ray kernel: [ 1.400547] sysfs_create_dir_ns()::53 error: -2 > Jan 9 09:07:34 ray kernel: [ 1.400554] create_dir()::75 error: -2 > Jan 9 09:07:34 ray kernel: [ 1.400558] ------------[ cut here ]------------ > Jan 9 09:07:34 ray kernel: [ 1.400565] WARNING: CPU: 1 PID: 90 at lib/kobject.c:249 kobject_add_internal+0x273/0x320 > Jan 9 09:07:34 ray kernel: [ 1.400569] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) > Jan 9 09:07:34 ray kernel: [ 1.400572] Modules linked in: > Jan 9 09:07:34 ray kernel: [ 1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 Not tainted 4.10.0-rc3-dirty #61 > Jan 9 09:07:34 ray kernel: [ 1.400579] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > Jan 9 09:07:34 ray kernel: [ 1.400585] Workqueue: events_long drm_dp_mst_link_probe_work > Jan 9 09:07:34 ray kernel: [ 1.400588] Call Trace: > Jan 9 09:07:34 ray kernel: [ 1.400593] dump_stack+0x67/0x99 > Jan 9 09:07:34 ray kernel: [ 1.400598] __warn+0xd1/0xf0 > Jan 9 09:07:34 ray kernel: [ 1.400601] warn_slowpath_fmt+0x4f/0x60 > Jan 9 09:07:34 ray kernel: [ 1.400604] kobject_add_internal+0x273/0x320 > Jan 9 09:07:34 ray kernel: [ 1.400607] kobject_add+0x65/0xb0 > Jan 9 09:07:34 ray kernel: [ 1.400611] ? klist_init+0x31/0x40 > Jan 9 09:07:34 ray kernel: [ 1.400615] device_add+0x102/0x5d0 > Jan 9 09:07:34 ray kernel: [ 1.400619] ? kfree_const+0x22/0x30 > Jan 9 09:07:34 ray kernel: [ 1.400623] device_create_groups_vargs+0xd8/0x100 > Jan 9 09:07:34 ray kernel: [ 1.400626] device_create_with_groups+0x36/0x40 > Jan 9 09:07:34 ray kernel: [ 1.400631] ? drm_fb_helper_add_one_connector+0x57/0xd0 > Jan 9 09:07:34 ray kernel: [ 1.400636] ? kmem_cache_alloc_trace+0x1d2/0x1f0 > Jan 9 09:07:34 ray kernel: [ 1.400641] drm_sysfs_connector_add+0x60/0xe0 > Jan 9 09:07:34 ray kernel: [ 1.400645] drm_connector_register+0x21/0xc0 > Jan 9 09:07:34 ray kernel: [ 1.400649] intel_dp_register_mst_connector+0x41/0x50 > Jan 9 09:07:34 ray kernel: [ 1.400653] drm_dp_add_port+0x350/0x450 > Jan 9 09:07:34 ray kernel: [ 1.400657] ? rcu_early_boot_tests+0x1/0x10 > Jan 9 09:07:34 ray kernel: [ 1.400660] ? schedule_timeout+0x1cd/0x390 > Jan 9 09:07:34 ray kernel: [ 1.400664] ? __might_sleep+0x4a/0x90 > Jan 9 09:07:34 ray kernel: [ 1.400667] ? mutex_lock+0x25/0x50 > Jan 9 09:07:34 ray kernel: [ 1.400670] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 > Jan 9 09:07:34 ray kernel: [ 1.400673] ? prepare_to_wait_event+0x120/0x120 > Jan 9 09:07:34 ray kernel: [ 1.400675] drm_sysfs_connector_add() connector: ffff88040c778000 kdev: ffff88040ef15000 > Jan 9 09:07:34 ray kernel: [ 1.400681] ? drm_dp_check_mstb_guid+0x3d/0x120 > Jan 9 09:07:34 ray kernel: [ 1.400684] drm_dp_send_link_address+0x185/0x1f0 > Jan 9 09:07:34 ray kernel: [ 1.400688] drm_dp_check_and_send_link_address+0xad/0xc0 > Jan 9 09:07:34 ray kernel: [ 1.400691] drm_dp_mst_link_probe_work+0x57/0xa0 > Jan 9 09:07:34 ray kernel: [ 1.400694] process_one_work+0x14b/0x430 > Jan 9 09:07:34 ray kernel: [ 1.400697] worker_thread+0x12b/0x4a0 > Jan 9 09:07:34 ray kernel: [ 1.400700] kthread+0x10c/0x140 > Jan 9 09:07:34 ray kernel: [ 1.400703] ? process_one_work+0x430/0x430 > Jan 9 09:07:34 ray kernel: [ 1.400706] ? kthread_create_on_node+0x40/0x40 > Jan 9 09:07:34 ray kernel: [ 1.400709] ret_from_fork+0x27/0x40 > Jan 9 09:07:34 ray kernel: [ 1.400714] ---[ end trace 0009c9dc7b253d9c ]---
On 01/09/2017 08:59 AM, Daniel Vetter wrote: > On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen <dave.hansen@intel.com> wrote: >> On 01/09/2017 08:41 AM, Daniel Vetter wrote: >>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: >>>> Well, now I found where the -2 comes from. >>>> intel_dp_register_mst_connector() calls drm_connector_register(), which >>>> fails to add the kobject (warning below). But, it does zero error >>>> checking on the drm_connector_register() call and leaves the >>>> partially-constructed connector in place. >>>> >>>> The next time some poor, hapless code goes and tries to do anything with >>>> that kdev, they oops. I'm perplexed by this, though. The >>>> drm_dp_mst_topology_cbs->register_connector just returns void. It seems >>>> a bit goofy that it can't even _return_ failure. >>>> >>>> Is there some stable code to go back to here? Or, is there something >>>> about my configuration that's unique? I really wonder why nobody else >>>> is running into this. >>>> >>>> There's probably some other race going on here. This warning doesn't >>>> happen on every boot. >>> This smells more like the root-cause: Something goes wrong on boot >>> that prevents connectors from properly registering, then we fall over >>> later on. And the register callback is intentionally void, assuming >>> that any prep work has been done earlier and that therefore the >>> register step can't fail. Can you pls check whether the oops later on >>> only happens together with this warning at boot, or whether they're >>> not correlated? >> >> Looking through my logs, I can't find any instance of the oops without >> the warning at boot. So I do think the later oops is entirely caused by >> the issue warned about in early boot. > > Hm, I guess then we'd need to fix that boot-up warning. Can you try to > figure out why it's unhappy? On a hunch it could be that we call > drm_connector_register from the mst probe worker before the main > driver load thread has reached the drm_dev_register call. A few printk > to decide whether that's the case (plus a few boot-up tests to gather > the statistics, sorry about that) would be real great. I think your suspicion is correct. The WARNING is getting printed from CPU0 (the drm_dp_mst_link_probe_work() thread) and we can see the drm_connector_register()s happening in parallel on CPU1 while the WARNING is getting printed out. This seems to happen very consistently when i915 is built into the kernel. > [ 1.404237] drm_dev_register(00000000000002da)::209092608 cpu: 1 > [ 1.404284] drm_connector_register(ffff88040ed3e800)::379 connector->registered: 0 cpu: 0 > [ 1.404293] sysfs_create_dir_ns()::53 error: -2 > [ 1.404295] create_dir()::75 error: -2 cpu: 0 > [ 1.404298] ------------[ cut here ]------------ > [ 1.404304] WARNING: CPU: 0 PID: 3 at lib/kobject.c:249 kobject_add_internal+0x273/0x330 > [ 1.404308] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) > [ 1.404311] Modules linked in: > [ 1.404316] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc3-i915borked-dirty #66 > [ 1.404319] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > [ 1.404325] Workqueue: events_long drm_dp_mst_link_probe_work > [ 1.404328] Call Trace: > [ 1.404334] dump_stack+0x67/0x99 > [ 1.404338] __warn+0xd1/0xf0 > [ 1.404342] warn_slowpath_fmt+0x4f/0x60 > [ 1.404345] kobject_add_internal+0x273/0x330 > [ 1.404347] kobject_add+0x65/0xb0 > [ 1.404352] ? klist_init+0x31/0x40 > [ 1.404356] device_add+0x102/0x5d0 > [ 1.404360] ? kfree_const+0x22/0x30 > [ 1.404364] device_create_groups_vargs+0xd8/0x100 > [ 1.404367] device_create_with_groups+0x36/0x40 > [ 1.404370] drm_connector_register(ffff88040c779000)::379 connector->registered: 0 cpu: 1 > [ 1.404374] ? vprintk_default+0x29/0x50 > [ 1.404375] ? printk+0x4d/0x4f > [ 1.404378] drm_sysfs_connector_add+0x60/0xe0 > [ 1.404381] drm_connector_register+0x4e/0x100 > [ 1.404383] intel_dp_register_mst_connector+0x41/0x50 > [ 1.404385] drm_dp_add_port+0x350/0x450 > [ 1.404388] ? rcu_early_boot_tests+0x1/0x10 > [ 1.404390] ? schedule_timeout+0x1cd/0x390 > [ 1.404392] ? __might_sleep+0x4a/0x90 > [ 1.404394] ? mutex_lock+0x25/0x50 > [ 1.404395] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 > [ 1.404397] ? prepare_to_wait_event+0x120/0x120 > [ 1.404399] ? drm_dp_check_mstb_guid+0x3d/0x120 > [ 1.404401] drm_dp_send_link_address+0x185/0x1f0 > [ 1.404403] drm_dp_check_and_send_link_address+0xad/0xc0 > [ 1.404405] drm_dp_mst_link_probe_work+0x57/0xa0 > [ 1.404407] process_one_work+0x14b/0x430 > [ 1.404408] worker_thread+0x12b/0x4a0 > [ 1.404410] kthread+0x10c/0x140 > [ 1.404411] ? process_one_work+0x430/0x430 > [ 1.404413] ? kthread_create_on_node+0x40/0x40 > [ 1.404415] ret_from_fork+0x27/0x40 > [ 1.404418] ---[ end trace 952a97b7cba4984f ]--- > [ 1.404420] drm_sysfs_connector_add() connector: ffff88040ed3e800 kdev: fffffffffffffffe > [ 1.404422] [drm:drm_sysfs_connector_add] *ERROR* failed to register connector device: -2 > [ 1.404453] drm_sysfs_connector_add() connector: ffff88040c779000 kdev: ffff88040c771c00 > [ 1.404508] drm_connector_register(ffff88040c779000)::402 connector->registered: 1 cpu: 1 > [ 1.404512] drm_connector_register(ffff88040ed3b800)::379 connector->registered: 0 cpu: 1 > [ 1.404531] drm_sysfs_connector_add() connector: ffff88040ed3b800 kdev: ffff88040c772000 > [ 1.404559] drm_connector_register(ffff88040ed3b800)::402 connector->registered: 1 cpu: 1 > [ 1.404564] drm_connector_register(ffff88040ed3c000)::379 connector->registered: 0 cpu: 1 > [ 1.404583] drm_sysfs_connector_add() connector: ffff88040ed3c000 kdev: ffff88040c772400
On Mon, Jan 9, 2017 at 12:22 PM, Dave Hansen <dave.hansen@intel.com> wrote: > On 01/09/2017 08:59 AM, Daniel Vetter wrote: >> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen <dave.hansen@intel.com> wrote: >>> On 01/09/2017 08:41 AM, Daniel Vetter wrote: >>>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: >>>>> Well, now I found where the -2 comes from. >>>>> intel_dp_register_mst_connector() calls drm_connector_register(), which >>>>> fails to add the kobject (warning below). But, it does zero error >>>>> checking on the drm_connector_register() call and leaves the >>>>> partially-constructed connector in place. >>>>> >>>>> The next time some poor, hapless code goes and tries to do anything with >>>>> that kdev, they oops. I'm perplexed by this, though. The >>>>> drm_dp_mst_topology_cbs->register_connector just returns void. It seems >>>>> a bit goofy that it can't even _return_ failure. >>>>> >>>>> Is there some stable code to go back to here? Or, is there something >>>>> about my configuration that's unique? I really wonder why nobody else >>>>> is running into this. >>>>> >>>>> There's probably some other race going on here. This warning doesn't >>>>> happen on every boot. >>>> This smells more like the root-cause: Something goes wrong on boot >>>> that prevents connectors from properly registering, then we fall over >>>> later on. And the register callback is intentionally void, assuming >>>> that any prep work has been done earlier and that therefore the >>>> register step can't fail. Can you pls check whether the oops later on >>>> only happens together with this warning at boot, or whether they're >>>> not correlated? >>> >>> Looking through my logs, I can't find any instance of the oops without >>> the warning at boot. So I do think the later oops is entirely caused by >>> the issue warned about in early boot. >> >> Hm, I guess then we'd need to fix that boot-up warning. Can you try to >> figure out why it's unhappy? On a hunch it could be that we call >> drm_connector_register from the mst probe worker before the main >> driver load thread has reached the drm_dev_register call. A few printk >> to decide whether that's the case (plus a few boot-up tests to gather >> the statistics, sorry about that) would be real great. >> >> If that's inconclusive I'm again a bit low on ideas ... > > I'll do that shortly. But, for now I can confirm that the failure is > precipitated by the !parent check in sysfs_create_dir_ns(). > > I also can't reproduce this if I build i915 as a module. It only > happens when built in. FWIW, we ran into a race with fbdev and mst topology discovery. Maybe you are seeing something similar? Alex > >> Jan 9 09:07:34 ray kernel: [ 1.400547] sysfs_create_dir_ns()::53 error: -2 >> Jan 9 09:07:34 ray kernel: [ 1.400554] create_dir()::75 error: -2 >> Jan 9 09:07:34 ray kernel: [ 1.400558] ------------[ cut here ]------------ >> Jan 9 09:07:34 ray kernel: [ 1.400565] WARNING: CPU: 1 PID: 90 at lib/kobject.c:249 kobject_add_internal+0x273/0x320 >> Jan 9 09:07:34 ray kernel: [ 1.400569] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) >> Jan 9 09:07:34 ray kernel: [ 1.400572] Modules linked in: >> Jan 9 09:07:34 ray kernel: [ 1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 Not tainted 4.10.0-rc3-dirty #61 >> Jan 9 09:07:34 ray kernel: [ 1.400579] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 >> Jan 9 09:07:34 ray kernel: [ 1.400585] Workqueue: events_long drm_dp_mst_link_probe_work >> Jan 9 09:07:34 ray kernel: [ 1.400588] Call Trace: >> Jan 9 09:07:34 ray kernel: [ 1.400593] dump_stack+0x67/0x99 >> Jan 9 09:07:34 ray kernel: [ 1.400598] __warn+0xd1/0xf0 >> Jan 9 09:07:34 ray kernel: [ 1.400601] warn_slowpath_fmt+0x4f/0x60 >> Jan 9 09:07:34 ray kernel: [ 1.400604] kobject_add_internal+0x273/0x320 >> Jan 9 09:07:34 ray kernel: [ 1.400607] kobject_add+0x65/0xb0 >> Jan 9 09:07:34 ray kernel: [ 1.400611] ? klist_init+0x31/0x40 >> Jan 9 09:07:34 ray kernel: [ 1.400615] device_add+0x102/0x5d0 >> Jan 9 09:07:34 ray kernel: [ 1.400619] ? kfree_const+0x22/0x30 >> Jan 9 09:07:34 ray kernel: [ 1.400623] device_create_groups_vargs+0xd8/0x100 >> Jan 9 09:07:34 ray kernel: [ 1.400626] device_create_with_groups+0x36/0x40 >> Jan 9 09:07:34 ray kernel: [ 1.400631] ? drm_fb_helper_add_one_connector+0x57/0xd0 >> Jan 9 09:07:34 ray kernel: [ 1.400636] ? kmem_cache_alloc_trace+0x1d2/0x1f0 >> Jan 9 09:07:34 ray kernel: [ 1.400641] drm_sysfs_connector_add+0x60/0xe0 >> Jan 9 09:07:34 ray kernel: [ 1.400645] drm_connector_register+0x21/0xc0 >> Jan 9 09:07:34 ray kernel: [ 1.400649] intel_dp_register_mst_connector+0x41/0x50 >> Jan 9 09:07:34 ray kernel: [ 1.400653] drm_dp_add_port+0x350/0x450 >> Jan 9 09:07:34 ray kernel: [ 1.400657] ? rcu_early_boot_tests+0x1/0x10 >> Jan 9 09:07:34 ray kernel: [ 1.400660] ? schedule_timeout+0x1cd/0x390 >> Jan 9 09:07:34 ray kernel: [ 1.400664] ? __might_sleep+0x4a/0x90 >> Jan 9 09:07:34 ray kernel: [ 1.400667] ? mutex_lock+0x25/0x50 >> Jan 9 09:07:34 ray kernel: [ 1.400670] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 >> Jan 9 09:07:34 ray kernel: [ 1.400673] ? prepare_to_wait_event+0x120/0x120 >> Jan 9 09:07:34 ray kernel: [ 1.400675] drm_sysfs_connector_add() connector: ffff88040c778000 kdev: ffff88040ef15000 >> Jan 9 09:07:34 ray kernel: [ 1.400681] ? drm_dp_check_mstb_guid+0x3d/0x120 >> Jan 9 09:07:34 ray kernel: [ 1.400684] drm_dp_send_link_address+0x185/0x1f0 >> Jan 9 09:07:34 ray kernel: [ 1.400688] drm_dp_check_and_send_link_address+0xad/0xc0 >> Jan 9 09:07:34 ray kernel: [ 1.400691] drm_dp_mst_link_probe_work+0x57/0xa0 >> Jan 9 09:07:34 ray kernel: [ 1.400694] process_one_work+0x14b/0x430 >> Jan 9 09:07:34 ray kernel: [ 1.400697] worker_thread+0x12b/0x4a0 >> Jan 9 09:07:34 ray kernel: [ 1.400700] kthread+0x10c/0x140 >> Jan 9 09:07:34 ray kernel: [ 1.400703] ? process_one_work+0x430/0x430 >> Jan 9 09:07:34 ray kernel: [ 1.400706] ? kthread_create_on_node+0x40/0x40 >> Jan 9 09:07:34 ray kernel: [ 1.400709] ret_from_fork+0x27/0x40 >> Jan 9 09:07:34 ray kernel: [ 1.400714] ---[ end trace 0009c9dc7b253d9c ]--- > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Jan 09, 2017 at 09:42:22AM -0800, Dave Hansen wrote: > On 01/09/2017 08:59 AM, Daniel Vetter wrote: > > On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen <dave.hansen@intel.com> wrote: > >> On 01/09/2017 08:41 AM, Daniel Vetter wrote: > >>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen@intel.com> wrote: > >>>> Well, now I found where the -2 comes from. > >>>> intel_dp_register_mst_connector() calls drm_connector_register(), which > >>>> fails to add the kobject (warning below). But, it does zero error > >>>> checking on the drm_connector_register() call and leaves the > >>>> partially-constructed connector in place. > >>>> > >>>> The next time some poor, hapless code goes and tries to do anything with > >>>> that kdev, they oops. I'm perplexed by this, though. The > >>>> drm_dp_mst_topology_cbs->register_connector just returns void. It seems > >>>> a bit goofy that it can't even _return_ failure. > >>>> > >>>> Is there some stable code to go back to here? Or, is there something > >>>> about my configuration that's unique? I really wonder why nobody else > >>>> is running into this. > >>>> > >>>> There's probably some other race going on here. This warning doesn't > >>>> happen on every boot. > >>> This smells more like the root-cause: Something goes wrong on boot > >>> that prevents connectors from properly registering, then we fall over > >>> later on. And the register callback is intentionally void, assuming > >>> that any prep work has been done earlier and that therefore the > >>> register step can't fail. Can you pls check whether the oops later on > >>> only happens together with this warning at boot, or whether they're > >>> not correlated? > >> > >> Looking through my logs, I can't find any instance of the oops without > >> the warning at boot. So I do think the later oops is entirely caused by > >> the issue warned about in early boot. > > > > Hm, I guess then we'd need to fix that boot-up warning. Can you try to > > figure out why it's unhappy? On a hunch it could be that we call > > drm_connector_register from the mst probe worker before the main > > driver load thread has reached the drm_dev_register call. A few printk > > to decide whether that's the case (plus a few boot-up tests to gather > > the statistics, sorry about that) would be real great. > > I think your suspicion is correct. > > The WARNING is getting printed from CPU0 (the > drm_dp_mst_link_probe_work() thread) and we can see the > drm_connector_register()s happening in parallel on CPU1 while the > WARNING is getting printed out. > > This seems to happen very consistently when i915 is built into the kernel. Do you have commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sun Dec 18 14:35:45 2016 +0100 drm: prevent double-(un)registration for connectors Lack of that would perfectly explain that oops ... Otherwise still no idea what's going wrong. -Daniel > > > [ 1.404237] drm_dev_register(00000000000002da)::209092608 cpu: 1 > > [ 1.404284] drm_connector_register(ffff88040ed3e800)::379 connector->registered: 0 cpu: 0 > > [ 1.404293] sysfs_create_dir_ns()::53 error: -2 > > [ 1.404295] create_dir()::75 error: -2 cpu: 0 > > [ 1.404298] ------------[ cut here ]------------ > > [ 1.404304] WARNING: CPU: 0 PID: 3 at lib/kobject.c:249 kobject_add_internal+0x273/0x330 > > [ 1.404308] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) > > [ 1.404311] Modules linked in: > > [ 1.404316] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc3-i915borked-dirty #66 > > [ 1.404319] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > > [ 1.404325] Workqueue: events_long drm_dp_mst_link_probe_work > > [ 1.404328] Call Trace: > > [ 1.404334] dump_stack+0x67/0x99 > > [ 1.404338] __warn+0xd1/0xf0 > > [ 1.404342] warn_slowpath_fmt+0x4f/0x60 > > [ 1.404345] kobject_add_internal+0x273/0x330 > > [ 1.404347] kobject_add+0x65/0xb0 > > [ 1.404352] ? klist_init+0x31/0x40 > > [ 1.404356] device_add+0x102/0x5d0 > > [ 1.404360] ? kfree_const+0x22/0x30 > > [ 1.404364] device_create_groups_vargs+0xd8/0x100 > > [ 1.404367] device_create_with_groups+0x36/0x40 > > [ 1.404370] drm_connector_register(ffff88040c779000)::379 connector->registered: 0 cpu: 1 > > [ 1.404374] ? vprintk_default+0x29/0x50 > > [ 1.404375] ? printk+0x4d/0x4f > > [ 1.404378] drm_sysfs_connector_add+0x60/0xe0 > > [ 1.404381] drm_connector_register+0x4e/0x100 > > [ 1.404383] intel_dp_register_mst_connector+0x41/0x50 > > [ 1.404385] drm_dp_add_port+0x350/0x450 > > [ 1.404388] ? rcu_early_boot_tests+0x1/0x10 > > [ 1.404390] ? schedule_timeout+0x1cd/0x390 > > [ 1.404392] ? __might_sleep+0x4a/0x90 > > [ 1.404394] ? mutex_lock+0x25/0x50 > > [ 1.404395] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 > > [ 1.404397] ? prepare_to_wait_event+0x120/0x120 > > [ 1.404399] ? drm_dp_check_mstb_guid+0x3d/0x120 > > [ 1.404401] drm_dp_send_link_address+0x185/0x1f0 > > [ 1.404403] drm_dp_check_and_send_link_address+0xad/0xc0 > > [ 1.404405] drm_dp_mst_link_probe_work+0x57/0xa0 > > [ 1.404407] process_one_work+0x14b/0x430 > > [ 1.404408] worker_thread+0x12b/0x4a0 > > [ 1.404410] kthread+0x10c/0x140 > > [ 1.404411] ? process_one_work+0x430/0x430 > > [ 1.404413] ? kthread_create_on_node+0x40/0x40 > > [ 1.404415] ret_from_fork+0x27/0x40 > > [ 1.404418] ---[ end trace 952a97b7cba4984f ]--- > > [ 1.404420] drm_sysfs_connector_add() connector: ffff88040ed3e800 kdev: fffffffffffffffe > > [ 1.404422] [drm:drm_sysfs_connector_add] *ERROR* failed to register connector device: -2 > > [ 1.404453] drm_sysfs_connector_add() connector: ffff88040c779000 kdev: ffff88040c771c00 > > [ 1.404508] drm_connector_register(ffff88040c779000)::402 connector->registered: 1 cpu: 1 > > [ 1.404512] drm_connector_register(ffff88040ed3b800)::379 connector->registered: 0 cpu: 1 > > [ 1.404531] drm_sysfs_connector_add() connector: ffff88040ed3b800 kdev: ffff88040c772000 > > [ 1.404559] drm_connector_register(ffff88040ed3b800)::402 connector->registered: 1 cpu: 1 > > [ 1.404564] drm_connector_register(ffff88040ed3c000)::379 connector->registered: 0 cpu: 1 > > [ 1.404583] drm_sysfs_connector_add() connector: ffff88040ed3c000 kdev: ffff88040c772400 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On 01/10/2017 02:31 AM, Daniel Vetter wrote: > commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Sun Dec 18 14:35:45 2016 +0100 > > drm: prevent double-(un)registration for connectors > > Lack of that would perfectly explain that oops ... Otherwise still no idea > what's going wrong. No... That's not in mainline as far as I can see. Should I test with it applied?
On 01/10/2017 11:43 PM, Daniel Vetter wrote: > On Tue, Jan 10, 2017 at 08:52:47AM -0800, Dave Hansen wrote: >> On 01/10/2017 02:31 AM, Daniel Vetter wrote: >>> commit e73ab00e9a0f1731f34d0620a9c55f5c30c4ad4e >>> Author: Daniel Vetter <daniel.vetter@ffwll.ch> >>> Date: Sun Dec 18 14:35:45 2016 +0100 >>> >>> drm: prevent double-(un)registration for connectors >>> >>> Lack of that would perfectly explain that oops ... Otherwise still no idea >>> what's going wrong. >> No... That's not in mainline as far as I can see. Should I test with >> it applied? > Hm, I guess failed to cc: stable that one properly, iirc we decided the > race fix is too academic and can't be hit in reality ;-) > > Testing would be great. Probably conflicts because we extracted > drm_connector.c only recently, but running s/drm_connector\.c/drm_crtc.c/ > over the diff and then applying with some fudge should take care of that. It doesn't apply to mainline, with or without the substitution you suggest. Do you want that commit itself tested from -next?
On 01/11/2017 07:39 AM, Daniel Vetter wrote: > Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies > cleanly there. The substituation was for 4.9. I can send you the patch > here, but seems all fine from what I can tell ... All of the printk's that I added were making it fail to apply. So, I took a 4.10-rc3 kernel with i915 compiled in (not as a module) and applied e73ab00e9a0f17, which I grabbed from linux-next. I'm seeing basically the same behavior that I did before applying e73ab00e9a0f17. sysfs_create_dir_ns() fails because of a NULL kobj->parent. Have you guys tried testing this yourselves? It seems really easy to reproduce if you just compile the driver in. > [ 1.400797] drm_dev_register(ffff88040c730000)::730 cpu: 2 > [ 1.400860] drm_connector_register(ffff88040c76b000)::382 connector->registered: 0 cpu: 1 > [ 1.400870] sysfs_create_dir_ns()::53 error: -2 > [ 1.400874] create_dir()::75 error: -2 cpu: 1 > [ 1.400878] ------------[ cut here ]------------ > [ 1.400884] WARNING: CPU: 1 PID: 91 at lib/kobject.c:249 kobject_add_internal+0x273/0x330 > [ 1.400888] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) > [ 1.400892] Modules linked in: > [ 1.400896] CPU: 1 PID: 91 Comm: kworker/1:2 Not tainted 4.10.0-rc3-i915borked-dirty #67 > [ 1.400900] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > [ 1.400906] Workqueue: events_long drm_dp_mst_link_probe_work > [ 1.400909] Call Trace: > [ 1.400914] dump_stack+0x67/0x99 > [ 1.400918] __warn+0xd1/0xf0 > [ 1.400922] warn_slowpath_fmt+0x4f/0x60 > [ 1.400925] kobject_add_internal+0x273/0x330 > [ 1.400927] kobject_add+0x65/0xb0 > [ 1.400931] ? klist_init+0x31/0x40 > [ 1.400936] device_add+0x102/0x5d0 > [ 1.400940] ? kfree_const+0x22/0x30 > [ 1.400944] device_create_groups_vargs+0xd8/0x100 > [ 1.400947] device_create_with_groups+0x36/0x40 > [ 1.400952] ? vprintk_default+0x29/0x50 > [ 1.400957] ? __might_sleep+0x4a/0x90 > [ 1.400962] drm_sysfs_connector_add+0x60/0xe0 > [ 1.400967] drm_connector_register+0x74/0xd0 > [ 1.400971] intel_dp_register_mst_connector+0x41/0x50 > [ 1.400975] drm_dp_add_port+0x350/0x450 > [ 1.400977] drm_connector_register(ffff88040ee6f800)::382 connector->registered: 0 cpu: 2 > [ 1.400982] ? rcu_early_boot_tests+0x1/0x10 > [ 1.400986] ? schedule_timeout+0x1cd/0x390 > [ 1.400989] ? __might_sleep+0x4a/0x90 > [ 1.400992] ? mutex_lock+0x25/0x50 > [ 1.400995] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 > [ 1.400996] drm_sysfs_connector_add() connector: ffff88040ee6f800 kdev: ffff88040eef9c00 > [ 1.401002] ? prepare_to_wait_event+0x120/0x120 > [ 1.401005] ? drm_dp_check_mstb_guid+0x3d/0x120 > [ 1.401008] drm_dp_send_link_address+0x185/0x1f0 > [ 1.401012] drm_dp_check_and_send_link_address+0xad/0xc0 > [ 1.401015] drm_dp_mst_link_probe_work+0x57/0xa0 > [ 1.401018] process_one_work+0x14b/0x430 > [ 1.401021] worker_thread+0x12b/0x4a0 > [ 1.401025] kthread+0x10c/0x140 > [ 1.401027] ? process_one_work+0x430/0x430 > [ 1.401030] ? kthread_create_on_node+0x40/0x40 > [ 1.401034] ret_from_fork+0x27/0x40 > [ 1.401038] ---[ end trace ba43fc250fbf282d ]--- > [ 1.401041] drm_sysfs_connector_add() connector: ffff88040c76b000 kdev: fffffffffffffffe > [ 1.401043] drm_connector_register(ffff88040c768000)::382 connector->registered: 0 cpu: 2 > [ 1.401050] [drm:drm_sysfs_connector_add] *ERROR* failed to register connector device: -2 > [ 1.401057] drm_sysfs_connector_add() connector: ffff88040c768000 kdev: ffff88040eefa000 > [ 1.401093] drm_connector_register(ffff88040c768800)::382 connector->registered: 0 cpu: 2 > [ 1.401113] drm_sysfs_connector_add() connector: ffff88040c768800 kdev: ffff88040eefa400 > [ 1.401122] drm_connector_register(ffff88040c769000)::382 connector->registered: 0 cpu: 2 > [ 1.401140] drm_sysfs_connector_add() connector: ffff88040c769000 kdev: ffff88040eefa800 > [ 1.401167] drm_connector_register(ffff88040c769800)::382 connector->registered: 0 cpu: 2 > [ 1.401186] drm_sysfs_connector_add() connector: ffff88040c769800 kdev: ffff88040eefac00 > [ 1.401195] drm_connector_register(ffff88040c76b000)::382 connector->registered: 0 cpu: 2
On Wed, Jan 11, 2017 at 08:16:56AM -0800, Dave Hansen wrote: > On 01/11/2017 07:39 AM, Daniel Vetter wrote: > > Hm, just cherry-picked it on top of Linus' latest 4.10 git, applies > > cleanly there. The substituation was for 4.9. I can send you the patch > > here, but seems all fine from what I can tell ... > > All of the printk's that I added were making it fail to apply. > > So, I took a 4.10-rc3 kernel with i915 compiled in (not as a module) and > applied e73ab00e9a0f17, which I grabbed from linux-next. > > I'm seeing basically the same behavior that I did before applying > e73ab00e9a0f17. sysfs_create_dir_ns() fails because of a NULL kobj->parent. > > Have you guys tried testing this yourselves? It seems really easy to > reproduce if you just compile the driver in. With some screaming I can reproduce it. I'll try to take a look, but since I'm travelling to lca might need someone else to look into this too (I don't have an mst machine to carry around). And I still don't get why this blows up even. -Daniel > > > [ 1.400797] drm_dev_register(ffff88040c730000)::730 cpu: 2 > > [ 1.400860] drm_connector_register(ffff88040c76b000)::382 connector->registered: 0 cpu: 1 > > [ 1.400870] sysfs_create_dir_ns()::53 error: -2 > > [ 1.400874] create_dir()::75 error: -2 cpu: 1 > > [ 1.400878] ------------[ cut here ]------------ > > [ 1.400884] WARNING: CPU: 1 PID: 91 at lib/kobject.c:249 kobject_add_internal+0x273/0x330 > > [ 1.400888] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0) > > [ 1.400892] Modules linked in: > > [ 1.400896] CPU: 1 PID: 91 Comm: kworker/1:2 Not tainted 4.10.0-rc3-i915borked-dirty #67 > > [ 1.400900] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016 > > [ 1.400906] Workqueue: events_long drm_dp_mst_link_probe_work > > [ 1.400909] Call Trace: > > [ 1.400914] dump_stack+0x67/0x99 > > [ 1.400918] __warn+0xd1/0xf0 > > [ 1.400922] warn_slowpath_fmt+0x4f/0x60 > > [ 1.400925] kobject_add_internal+0x273/0x330 > > [ 1.400927] kobject_add+0x65/0xb0 > > [ 1.400931] ? klist_init+0x31/0x40 > > [ 1.400936] device_add+0x102/0x5d0 > > [ 1.400940] ? kfree_const+0x22/0x30 > > [ 1.400944] device_create_groups_vargs+0xd8/0x100 > > [ 1.400947] device_create_with_groups+0x36/0x40 > > [ 1.400952] ? vprintk_default+0x29/0x50 > > [ 1.400957] ? __might_sleep+0x4a/0x90 > > [ 1.400962] drm_sysfs_connector_add+0x60/0xe0 > > [ 1.400967] drm_connector_register+0x74/0xd0 > > [ 1.400971] intel_dp_register_mst_connector+0x41/0x50 > > [ 1.400975] drm_dp_add_port+0x350/0x450 > > [ 1.400977] drm_connector_register(ffff88040ee6f800)::382 connector->registered: 0 cpu: 2 > > [ 1.400982] ? rcu_early_boot_tests+0x1/0x10 > > [ 1.400986] ? schedule_timeout+0x1cd/0x390 > > [ 1.400989] ? __might_sleep+0x4a/0x90 > > [ 1.400992] ? mutex_lock+0x25/0x50 > > [ 1.400995] ? drm_dp_mst_wait_tx_reply+0x118/0x1e0 > > [ 1.400996] drm_sysfs_connector_add() connector: ffff88040ee6f800 kdev: ffff88040eef9c00 > > [ 1.401002] ? prepare_to_wait_event+0x120/0x120 > > [ 1.401005] ? drm_dp_check_mstb_guid+0x3d/0x120 > > [ 1.401008] drm_dp_send_link_address+0x185/0x1f0 > > [ 1.401012] drm_dp_check_and_send_link_address+0xad/0xc0 > > [ 1.401015] drm_dp_mst_link_probe_work+0x57/0xa0 > > [ 1.401018] process_one_work+0x14b/0x430 > > [ 1.401021] worker_thread+0x12b/0x4a0 > > [ 1.401025] kthread+0x10c/0x140 > > [ 1.401027] ? process_one_work+0x430/0x430 > > [ 1.401030] ? kthread_create_on_node+0x40/0x40 > > [ 1.401034] ret_from_fork+0x27/0x40 > > [ 1.401038] ---[ end trace ba43fc250fbf282d ]--- > > [ 1.401041] drm_sysfs_connector_add() connector: ffff88040c76b000 kdev: fffffffffffffffe > > [ 1.401043] drm_connector_register(ffff88040c768000)::382 connector->registered: 0 cpu: 2 > > [ 1.401050] [drm:drm_sysfs_connector_add] *ERROR* failed to register connector device: -2 > > [ 1.401057] drm_sysfs_connector_add() connector: ffff88040c768000 kdev: ffff88040eefa000 > > [ 1.401093] drm_connector_register(ffff88040c768800)::382 connector->registered: 0 cpu: 2 > > [ 1.401113] drm_sysfs_connector_add() connector: ffff88040c768800 kdev: ffff88040eefa400 > > [ 1.401122] drm_connector_register(ffff88040c769000)::382 connector->registered: 0 cpu: 2 > > [ 1.401140] drm_sysfs_connector_add() connector: ffff88040c769000 kdev: ffff88040eefa800 > > [ 1.401167] drm_connector_register(ffff88040c769800)::382 connector->registered: 0 cpu: 2 > > [ 1.401186] drm_sysfs_connector_add() connector: ffff88040c769800 kdev: ffff88040eefac00 > > [ 1.401195] drm_connector_register(ffff88040c76b000)::382 connector->registered: 0 cpu: 2 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index aa64448..85beebc 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -2914,13 +2914,14 @@ static void drm_dp_destroy_connector_work(struct work_struct *work) break; } list_del(&port->next); - mutex_unlock(&mgr->destroy_connector_lock); kref_init(&port->kref); INIT_LIST_HEAD(&port->next); mgr->cbs->destroy_connector(mgr, port->connector); + mutex_unlock(&mgr->destroy_connector_lock); + drm_dp_port_teardown_pdt(port, port->pdt); port->pdt = DP_PEER_DEVICE_NONE;