Message ID | 20210427085246.13728-1-mwilck@suse.com |
---|---|
State | New |
Headers | show |
Series | [v3] nvme: rdma/tcp: fix list corruption with anatt timer | expand |
On 4/27/21 9:54 PM, Martin Wilck wrote: > On Tue, 2021-04-27 at 20:05 +0200, Hannes Reinecke wrote: >> On 4/27/21 6:25 PM, Christoph Hellwig wrote: >>> On Tue, Apr 27, 2021 at 11:33:04AM +0200, Hannes Reinecke wrote: >>>> As indicated in my previous mail, please change the description. >>>> We have >>>> since established a actual reason (duplicate calls to >>>> add_timer()), so >>>> please list it here. >>> >>> So what happens if the offending add_timer is changed to mod_timer? >>> >> I guess that should be fine, as the boilerplate said it can act >> as a safe version of add_timer. >> >> But that would just solve the crash upon add_timer(). > > The code doesn't use add_timer(), only mod_timer() and > del_timer_sync(). And we didn't observe a crash upon add_timer(). What > we observed was that a timer had been enqueued multiple times, and the > kernel crashes in expire_timers()->detach_timer(), when it encounters > an already detached entry in the timer list. > nvme_mpath_init() doesn't use add_timer, but it uses timer_setup(). And calling that on an already pending timer is even worse :-) And my point is that the anatt timer is not stopped at the end of nvme_init_identify() if any of the calls to nvme_configure_apst() nvme_configure_timestamp() nvme_configure_directives() nvme_configure_acre() returns with an error. If they do the controller is reset, causing eg nvme_tcp_configure_admin_queue() to be called, which will be calling timer_setup() with the original timer still running. If the (original) timer triggers _after_ that time we have the crash. Cheers, Hannes
On Wed, 2021-04-28 at 08:35 +0200, Hannes Reinecke wrote: > On 4/27/21 9:54 PM, Martin Wilck wrote: > > On Tue, 2021-04-27 at 20:05 +0200, Hannes Reinecke wrote: > > > On 4/27/21 6:25 PM, Christoph Hellwig wrote: > > > > On Tue, Apr 27, 2021 at 11:33:04AM +0200, Hannes Reinecke > > > > wrote: > > > > > As indicated in my previous mail, please change the > > > > > description. > > > > > We have > > > > > since established a actual reason (duplicate calls to > > > > > add_timer()), so > > > > > please list it here. > > > > > > > > So what happens if the offending add_timer is changed to > > > > mod_timer? > > > > > > > I guess that should be fine, as the boilerplate said it can act > > > as a safe version of add_timer. > > > > > > But that would just solve the crash upon add_timer(). > > > > The code doesn't use add_timer(), only mod_timer() and > > del_timer_sync(). And we didn't observe a crash upon add_timer(). > > What > > we observed was that a timer had been enqueued multiple times, and > > the > > kernel crashes in expire_timers()->detach_timer(), when it > > encounters > > an already detached entry in the timer list. > > > nvme_mpath_init() doesn't use add_timer, but it uses timer_setup(). Yes. The notion that this is wrong was the idea behind my patch. > And > calling that on an already pending timer is even worse :-) > > And my point is that the anatt timer is not stopped at the end of > nvme_init_identify() if any of the calls to > > nvme_configure_apst() > nvme_configure_timestamp() > nvme_configure_directives() > nvme_configure_acre() > > returns with an error. If they do the controller is reset, causing > eg nvme_tcp_configure_admin_queue() to be called, which will be > calling timer_setup() with the original timer still running. > If the (original) timer triggers _after_ that time we have the crash. You are right. But afaics the problem doesn't have to originate in these 4 function calls. The same applies even after nvme_init_identify() has returned. Any error that would trigger the error recovery work after the anatt timer has started would have this effect. Regards, Martin
st 28. 4. 2021 v 9:06 odesÃlatel Martin Wilck <mwilck@suse.com> napsal: > > Yes, that's what I think has happened. timer_setup() doesn't clear any > pointers in the list of pending timers pointing to this entry. If the > newly-initialized timer is then added with mod_timer(), it becomes > linked in a second timer list. When the first one expires, the timer > will be detached, but only from one of the lists it's pending in. In a > scenario like the one we faced, this could actually happen multiple > times. If the detached timer remains linked into a timer list, once > that list is traversed, the kernel dereferences a pointer with value > LIST_POISON2, and crashes. Yes I think it makes sense. timer_setup() modifies the timer's base in the "flags" field, then mod_timer() could add the timer to the wrong base structure. Maurizio
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index a1d476e1ac02..c63dd5dfa7ff 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -586,6 +586,7 @@ void nvme_mpath_stop(struct nvme_ctrl *ctrl) del_timer_sync(&ctrl->anatt_timer); cancel_work_sync(&ctrl->ana_work); } +EXPORT_SYMBOL_GPL(nvme_mpath_stop); #define SUBSYS_ATTR_RW(_name, _mode, _show, _store) \ struct device_attribute subsys_attr_##_name = \ diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index be905d4fdb47..fc07a7b0dc1d 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1202,6 +1202,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work) return; } + nvme_mpath_stop(&ctrl->ctrl); nvme_rdma_reconnect_or_remove(ctrl); } diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index a0f00cb8f9f3..46287b4f4d10 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2068,6 +2068,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work) return; } + nvme_mpath_stop(ctrl); nvme_tcp_reconnect_or_remove(ctrl); }