Message ID | 1571221192-216909-1-git-send-email-john.garry@huawei.com |
---|---|
State | Accepted |
Commit | 130f4caf145c3562108b245a576db30b916199d2 |
Headers | show |
Series | libata: Ensure ata_port probe has completed before detach | expand |
On 10/16/19 4:19 AM, John Garry wrote: > With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN: > > [ 23.452574] ------------[ cut here ]------------ > [ 23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168 > [ 23.466047] Modules linked in: > [ 23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296 > [ 23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 > [ 23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO) > [ 23.491065] pc : ata_host_detach+0x15c/0x168 > [ 23.495322] lr : ata_host_detach+0x88/0x168 > [ 23.499491] sp : ffff800011cabb50 > [ 23.502792] x29: ffff800011cabb50 x28: 0000000000000007 > [ 23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28 > [ 23.513390] x25: 0000000000003848 x24: ffff0023ea185300 > [ 23.518689] x23: 0000000000000001 x22: 00000000000014c0 > [ 23.523987] x21: 0000000000013740 x20: ffff0023bdc20000 > [ 23.529286] x19: 0000000000000000 x18: 0000000000000004 > [ 23.534584] x17: 0000000000000001 x16: 00000000000000f0 > [ 23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408 > [ 23.545181] x13: 0000000000000000 x12: ffff0023eac13790 > [ 23.550480] x11: ffff0023eb76c228 x10: 0000000000000000 > [ 23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000 > [ 23.561077] x7 : 0000000000000002 x6 : 0000000000000001 > [ 23.566376] x5 : 0000000000000002 x4 : 0000000000000000 > [ 23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000 > [ 23.576972] x1 : 3099674201f72700 x0 : 0000000000400284 > [ 23.582272] Call trace: > [ 23.584706] ata_host_detach+0x15c/0x168 > [ 23.588616] ata_pci_remove_one+0x10/0x18 > [ 23.592615] ahci_remove_one+0x20/0x40 > [ 23.596356] pci_device_remove+0x3c/0xe0 > [ 23.600267] really_probe+0xdc/0x3e0 > [ 23.603830] driver_probe_device+0x58/0x100 > [ 23.608000] device_driver_attach+0x6c/0x90 > [ 23.612169] __driver_attach+0x84/0xc8 > [ 23.615908] bus_for_each_dev+0x74/0xc8 > [ 23.619730] driver_attach+0x20/0x28 > [ 23.623292] bus_add_driver+0x148/0x1f0 > [ 23.627115] driver_register+0x60/0x110 > [ 23.630938] __pci_register_driver+0x40/0x48 > [ 23.635199] ahci_pci_driver_init+0x20/0x28 > [ 23.639372] do_one_initcall+0x5c/0x1b0 > [ 23.643199] kernel_init_freeable+0x1a4/0x24c > [ 23.647546] kernel_init+0x10/0x108 > [ 23.651023] ret_from_fork+0x10/0x18 > [ 23.654590] ---[ end trace 634a14b675b71c13 ]--- > > With KASAN also enabled, we may also get many use-after-free reports. > > The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may > attempt to detach the ata_port before it has been probed. > > This is because the ata_ports are async probed, meaning that there is no > guarantee that the ata_port has probed prior to detach. When the ata_port > does probe in this scenario, we get all sorts of issues as the detach may > have already happened. > > Fix by ensuring synchronisation with async_synchronize_full(). We could > alternatively use the cookie returned from the ata_port probe > async_schedule() call, but that means managing the cookie, so more > complicated. > > Signed-off-by: John Garry <john.garry@huawei.com> > --- > Note: This has only been boot tested and manual driver remove/add. > My system has no disk attached to the ahci host. > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 28c492be0a57..74c9b3032d46 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host) > { > int i; > > + /* Ensure ata_port probe has completed */ > + async_synchronize_full(); > + > for (i = 0; i < host->n_ports; i++) > ata_port_detach(host->ports[i]); > > Nice debugging, and the fix looks appropriate to me. I don't think there's any point in trying to individually synchronize cookies. I'll let this simmer on the list for a day or two to let other folks take a look at it, before queuing it up. -- Jens Axboe
On 16/10/2019 20:09, Jens Axboe wrote: > On 10/16/19 4:19 AM, John Garry wrote: >> With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN: >> >> [ 23.452574] ------------[ cut here ]------------ >> [ 23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168 >> [ 23.466047] Modules linked in: >> [ 23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296 >> [ 23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 >> [ 23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO) >> [ 23.491065] pc : ata_host_detach+0x15c/0x168 >> [ 23.495322] lr : ata_host_detach+0x88/0x168 >> [ 23.499491] sp : ffff800011cabb50 >> [ 23.502792] x29: ffff800011cabb50 x28: 0000000000000007 >> [ 23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28 >> [ 23.513390] x25: 0000000000003848 x24: ffff0023ea185300 >> [ 23.518689] x23: 0000000000000001 x22: 00000000000014c0 >> [ 23.523987] x21: 0000000000013740 x20: ffff0023bdc20000 >> [ 23.529286] x19: 0000000000000000 x18: 0000000000000004 >> [ 23.534584] x17: 0000000000000001 x16: 00000000000000f0 >> [ 23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408 >> [ 23.545181] x13: 0000000000000000 x12: ffff0023eac13790 >> [ 23.550480] x11: ffff0023eb76c228 x10: 0000000000000000 >> [ 23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000 >> [ 23.561077] x7 : 0000000000000002 x6 : 0000000000000001 >> [ 23.566376] x5 : 0000000000000002 x4 : 0000000000000000 >> [ 23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000 >> [ 23.576972] x1 : 3099674201f72700 x0 : 0000000000400284 >> [ 23.582272] Call trace: >> [ 23.584706] ata_host_detach+0x15c/0x168 >> [ 23.588616] ata_pci_remove_one+0x10/0x18 >> [ 23.592615] ahci_remove_one+0x20/0x40 >> [ 23.596356] pci_device_remove+0x3c/0xe0 >> [ 23.600267] really_probe+0xdc/0x3e0 >> [ 23.603830] driver_probe_device+0x58/0x100 >> [ 23.608000] device_driver_attach+0x6c/0x90 >> [ 23.612169] __driver_attach+0x84/0xc8 >> [ 23.615908] bus_for_each_dev+0x74/0xc8 >> [ 23.619730] driver_attach+0x20/0x28 >> [ 23.623292] bus_add_driver+0x148/0x1f0 >> [ 23.627115] driver_register+0x60/0x110 >> [ 23.630938] __pci_register_driver+0x40/0x48 >> [ 23.635199] ahci_pci_driver_init+0x20/0x28 >> [ 23.639372] do_one_initcall+0x5c/0x1b0 >> [ 23.643199] kernel_init_freeable+0x1a4/0x24c >> [ 23.647546] kernel_init+0x10/0x108 >> [ 23.651023] ret_from_fork+0x10/0x18 >> [ 23.654590] ---[ end trace 634a14b675b71c13 ]--- >> >> With KASAN also enabled, we may also get many use-after-free reports. >> >> The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may >> attempt to detach the ata_port before it has been probed. >> >> This is because the ata_ports are async probed, meaning that there is no >> guarantee that the ata_port has probed prior to detach. When the ata_port >> does probe in this scenario, we get all sorts of issues as the detach may >> have already happened. >> >> Fix by ensuring synchronisation with async_synchronize_full(). We could >> alternatively use the cookie returned from the ata_port probe >> async_schedule() call, but that means managing the cookie, so more >> complicated. >> >> Signed-off-by: John Garry <john.garry@huawei.com> >> --- >> Note: This has only been boot tested and manual driver remove/add. >> My system has no disk attached to the ahci host. >> >> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c >> index 28c492be0a57..74c9b3032d46 100644 >> --- a/drivers/ata/libata-core.c >> +++ b/drivers/ata/libata-core.c >> @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host) >> { >> int i; >> >> + /* Ensure ata_port probe has completed */ >> + async_synchronize_full(); >> + >> for (i = 0; i < host->n_ports; i++) >> ata_port_detach(host->ports[i]); >> >> > > Nice debugging, and the fix looks appropriate to me. I don't think > there's any point in trying to individually synchronize cookies. > I'll let this simmer on the list for a day or two to let other folks > take a look at it, before queuing it up. > Hi Jens, FWIW, I did also now test this on qemu with an emulated disk and it was ok. Anyway, I don't mind if prefer to queue this early for 5.6 so it can sit on next for longer. Cheers, John
On 10/31/19 12:35 PM, John Garry wrote: > On 16/10/2019 20:09, Jens Axboe wrote: >> On 10/16/19 4:19 AM, John Garry wrote: >>> With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN: >>> >>> [ 23.452574] ------------[ cut here ]------------ >>> [ 23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168 >>> [ 23.466047] Modules linked in: >>> [ 23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296 >>> [ 23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 >>> [ 23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO) >>> [ 23.491065] pc : ata_host_detach+0x15c/0x168 >>> [ 23.495322] lr : ata_host_detach+0x88/0x168 >>> [ 23.499491] sp : ffff800011cabb50 >>> [ 23.502792] x29: ffff800011cabb50 x28: 0000000000000007 >>> [ 23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28 >>> [ 23.513390] x25: 0000000000003848 x24: ffff0023ea185300 >>> [ 23.518689] x23: 0000000000000001 x22: 00000000000014c0 >>> [ 23.523987] x21: 0000000000013740 x20: ffff0023bdc20000 >>> [ 23.529286] x19: 0000000000000000 x18: 0000000000000004 >>> [ 23.534584] x17: 0000000000000001 x16: 00000000000000f0 >>> [ 23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408 >>> [ 23.545181] x13: 0000000000000000 x12: ffff0023eac13790 >>> [ 23.550480] x11: ffff0023eb76c228 x10: 0000000000000000 >>> [ 23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000 >>> [ 23.561077] x7 : 0000000000000002 x6 : 0000000000000001 >>> [ 23.566376] x5 : 0000000000000002 x4 : 0000000000000000 >>> [ 23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000 >>> [ 23.576972] x1 : 3099674201f72700 x0 : 0000000000400284 >>> [ 23.582272] Call trace: >>> [ 23.584706] ata_host_detach+0x15c/0x168 >>> [ 23.588616] ata_pci_remove_one+0x10/0x18 >>> [ 23.592615] ahci_remove_one+0x20/0x40 >>> [ 23.596356] pci_device_remove+0x3c/0xe0 >>> [ 23.600267] really_probe+0xdc/0x3e0 >>> [ 23.603830] driver_probe_device+0x58/0x100 >>> [ 23.608000] device_driver_attach+0x6c/0x90 >>> [ 23.612169] __driver_attach+0x84/0xc8 >>> [ 23.615908] bus_for_each_dev+0x74/0xc8 >>> [ 23.619730] driver_attach+0x20/0x28 >>> [ 23.623292] bus_add_driver+0x148/0x1f0 >>> [ 23.627115] driver_register+0x60/0x110 >>> [ 23.630938] __pci_register_driver+0x40/0x48 >>> [ 23.635199] ahci_pci_driver_init+0x20/0x28 >>> [ 23.639372] do_one_initcall+0x5c/0x1b0 >>> [ 23.643199] kernel_init_freeable+0x1a4/0x24c >>> [ 23.647546] kernel_init+0x10/0x108 >>> [ 23.651023] ret_from_fork+0x10/0x18 >>> [ 23.654590] ---[ end trace 634a14b675b71c13 ]--- >>> >>> With KASAN also enabled, we may also get many use-after-free reports. >>> >>> The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may >>> attempt to detach the ata_port before it has been probed. >>> >>> This is because the ata_ports are async probed, meaning that there is no >>> guarantee that the ata_port has probed prior to detach. When the ata_port >>> does probe in this scenario, we get all sorts of issues as the detach may >>> have already happened. >>> >>> Fix by ensuring synchronisation with async_synchronize_full(). We could >>> alternatively use the cookie returned from the ata_port probe >>> async_schedule() call, but that means managing the cookie, so more >>> complicated. >>> >>> Signed-off-by: John Garry <john.garry@huawei.com> >>> --- >>> Note: This has only been boot tested and manual driver remove/add. >>> My system has no disk attached to the ahci host. >>> >>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c >>> index 28c492be0a57..74c9b3032d46 100644 >>> --- a/drivers/ata/libata-core.c >>> +++ b/drivers/ata/libata-core.c >>> @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host) >>> { >>> int i; >>> >>> + /* Ensure ata_port probe has completed */ >>> + async_synchronize_full(); >>> + >>> for (i = 0; i < host->n_ports; i++) >>> ata_port_detach(host->ports[i]); >>> >>> >> >> Nice debugging, and the fix looks appropriate to me. I don't think >> there's any point in trying to individually synchronize cookies. >> I'll let this simmer on the list for a day or two to let other folks >> take a look at it, before queuing it up. >> > > Hi Jens, > > FWIW, I did also now test this on qemu with an emulated disk and it was ok. > > Anyway, I don't mind if prefer to queue this early for 5.6 so it can sit > on next for longer. I've queued it up for 5.5, no point waiting one extra release :-) -- Jens Axboe
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 28c492be0a57..74c9b3032d46 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -6708,6 +6708,9 @@ void ata_host_detach(struct ata_host *host) { int i; + /* Ensure ata_port probe has completed */ + async_synchronize_full(); + for (i = 0; i < host->n_ports; i++) ata_port_detach(host->ports[i]);
With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN: [ 23.452574] ------------[ cut here ]------------ [ 23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168 [ 23.466047] Modules linked in: [ 23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296 [ 23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019 [ 23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO) [ 23.491065] pc : ata_host_detach+0x15c/0x168 [ 23.495322] lr : ata_host_detach+0x88/0x168 [ 23.499491] sp : ffff800011cabb50 [ 23.502792] x29: ffff800011cabb50 x28: 0000000000000007 [ 23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28 [ 23.513390] x25: 0000000000003848 x24: ffff0023ea185300 [ 23.518689] x23: 0000000000000001 x22: 00000000000014c0 [ 23.523987] x21: 0000000000013740 x20: ffff0023bdc20000 [ 23.529286] x19: 0000000000000000 x18: 0000000000000004 [ 23.534584] x17: 0000000000000001 x16: 00000000000000f0 [ 23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408 [ 23.545181] x13: 0000000000000000 x12: ffff0023eac13790 [ 23.550480] x11: ffff0023eb76c228 x10: 0000000000000000 [ 23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000 [ 23.561077] x7 : 0000000000000002 x6 : 0000000000000001 [ 23.566376] x5 : 0000000000000002 x4 : 0000000000000000 [ 23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000 [ 23.576972] x1 : 3099674201f72700 x0 : 0000000000400284 [ 23.582272] Call trace: [ 23.584706] ata_host_detach+0x15c/0x168 [ 23.588616] ata_pci_remove_one+0x10/0x18 [ 23.592615] ahci_remove_one+0x20/0x40 [ 23.596356] pci_device_remove+0x3c/0xe0 [ 23.600267] really_probe+0xdc/0x3e0 [ 23.603830] driver_probe_device+0x58/0x100 [ 23.608000] device_driver_attach+0x6c/0x90 [ 23.612169] __driver_attach+0x84/0xc8 [ 23.615908] bus_for_each_dev+0x74/0xc8 [ 23.619730] driver_attach+0x20/0x28 [ 23.623292] bus_add_driver+0x148/0x1f0 [ 23.627115] driver_register+0x60/0x110 [ 23.630938] __pci_register_driver+0x40/0x48 [ 23.635199] ahci_pci_driver_init+0x20/0x28 [ 23.639372] do_one_initcall+0x5c/0x1b0 [ 23.643199] kernel_init_freeable+0x1a4/0x24c [ 23.647546] kernel_init+0x10/0x108 [ 23.651023] ret_from_fork+0x10/0x18 [ 23.654590] ---[ end trace 634a14b675b71c13 ]--- With KASAN also enabled, we may also get many use-after-free reports. The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may attempt to detach the ata_port before it has been probed. This is because the ata_ports are async probed, meaning that there is no guarantee that the ata_port has probed prior to detach. When the ata_port does probe in this scenario, we get all sorts of issues as the detach may have already happened. Fix by ensuring synchronisation with async_synchronize_full(). We could alternatively use the cookie returned from the ata_port probe async_schedule() call, but that means managing the cookie, so more complicated. Signed-off-by: John Garry <john.garry@huawei.com> --- Note: This has only been boot tested and manual driver remove/add. My system has no disk attached to the ahci host. -- 2.17.1