Message ID | d4d59d801f4521e562c9ecf2d8767077aaefb456.1628933864.git.leonro@nvidia.com |
---|---|
State | New |
Headers | show |
Series | Devlink cleanup for delay event series | expand |
On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > The struct devlink itself is protected by internal lock and doesn't > need global lock during operation. That global lock is used to protect > addition/removal new devlink instances from the global list in use by > all devlink consumers in the system. > > The future conversion of linked list to be xarray will allow us to > actually delete that lock, but first we need to count all struct devlink > users. Not a problem with this set but to state the obvious the global devlink lock also protects from concurrent execution of all the ops which don't take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely know this but I thought I'd comment on an off chance it helps. > The reference counting provides us a way to ensure that no new user > space commands success to grab devlink instance which is going to be > destroyed makes it is safe to access it without lock.
On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > The struct devlink itself is protected by internal lock and doesn't > > need global lock during operation. That global lock is used to protect > > addition/removal new devlink instances from the global list in use by > > all devlink consumers in the system. > > > > The future conversion of linked list to be xarray will allow us to > > actually delete that lock, but first we need to count all struct devlink > > users. > > Not a problem with this set but to state the obvious the global devlink > lock also protects from concurrent execution of all the ops which don't > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely know > this but I thought I'd comment on an off chance it helps. The end goal will be something like that: 1. Delete devlink lock 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) 3. Convert devlink->lock to be read/write lock to make sure that we can run get query in parallel. 4. Open devlink netlink to parallel ops, ".parallel_ops = true". Thanks > > > The reference counting provides us a way to ensure that no new user > > space commands success to grab devlink instance which is going to be > > destroyed makes it is safe to access it without lock. >
On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > need global lock during operation. That global lock is used to protect > > > addition/removal new devlink instances from the global list in use by > > > all devlink consumers in the system. > > > > > > The future conversion of linked list to be xarray will allow us to > > > actually delete that lock, but first we need to count all struct devlink > > > users. > > > > Not a problem with this set but to state the obvious the global devlink > > lock also protects from concurrent execution of all the ops which don't > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely know > > this but I thought I'd comment on an off chance it helps. > > The end goal will be something like that: > 1. Delete devlink lock > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) > 3. Convert devlink->lock to be read/write lock to make sure that we can run > get query in parallel. > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". IIUC that'd mean setting eswitch mode would hold write lock on the dl instance. What locks does e.g. registering a dl port take then?
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Monday, August 16, 2021 9:07 AM > To: Leon Romanovsky <leon@kernel.org> > Cc: David S . Miller <davem@davemloft.net>; Guangbin Huang > <huangguangbin2@huawei.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Jiri > Pirko <jiri@nvidia.com>; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; > Salil Mehta <salil.mehta@huawei.com>; Shannon Nelson > <snelson@pensando.io>; Yisen Zhuang <yisen.zhuang@huawei.com>; Yufeng > Mo <moyufeng@huawei.com> > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > > need global lock during operation. That global lock is used to protect > > > > addition/removal new devlink instances from the global list in use by > > > > all devlink consumers in the system. > > > > > > > > The future conversion of linked list to be xarray will allow us to > > > > actually delete that lock, but first we need to count all struct devlink > > > > users. > > > > > > Not a problem with this set but to state the obvious the global devlink > > > lock also protects from concurrent execution of all the ops which don't > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely know > > > this but I thought I'd comment on an off chance it helps. > > > > The end goal will be something like that: > > 1. Delete devlink lock > > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) > > 3. Convert devlink->lock to be read/write lock to make sure that we can run > > get query in parallel. > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". > > IIUC that'd mean setting eswitch mode would hold write lock on > the dl instance. What locks does e.g. registering a dl port take > then? Also that I think we have some cases where we want to allow the driver to allocate new devlink objects in response to adding a port, but still want to block other global operations from running?
On Mon, Aug 16, 2021 at 09:07:00AM -0700, Jakub Kicinski wrote: > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > > need global lock during operation. That global lock is used to protect > > > > addition/removal new devlink instances from the global list in use by > > > > all devlink consumers in the system. > > > > > > > > The future conversion of linked list to be xarray will allow us to > > > > actually delete that lock, but first we need to count all struct devlink > > > > users. > > > > > > Not a problem with this set but to state the obvious the global devlink > > > lock also protects from concurrent execution of all the ops which don't > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely know > > > this but I thought I'd comment on an off chance it helps. > > > > The end goal will be something like that: > > 1. Delete devlink lock > > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) > > 3. Convert devlink->lock to be read/write lock to make sure that we can run > > get query in parallel. > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". > > IIUC that'd mean setting eswitch mode would hold write lock on > the dl instance. What locks does e.g. registering a dl port take > then? write lock, because we are adding port to devlink->port_list. 9099 int devlink_port_register(struct devlink *devlink, 9100 struct devlink_port *devlink_port, 9101 unsigned int port_index) 9102 { ... 9115 list_add_tail(&devlink_port->list, &devlink->port_list);
On Mon, Aug 16, 2021 at 09:32:17PM +0000, Keller, Jacob E wrote: > > > > -----Original Message----- > > From: Jakub Kicinski <kuba@kernel.org> > > Sent: Monday, August 16, 2021 9:07 AM > > To: Leon Romanovsky <leon@kernel.org> > > Cc: David S . Miller <davem@davemloft.net>; Guangbin Huang > > <huangguangbin2@huawei.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Jiri > > Pirko <jiri@nvidia.com>; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; > > Salil Mehta <salil.mehta@huawei.com>; Shannon Nelson > > <snelson@pensando.io>; Yisen Zhuang <yisen.zhuang@huawei.com>; Yufeng > > Mo <moyufeng@huawei.com> > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > > > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > > > need global lock during operation. That global lock is used to protect > > > > > addition/removal new devlink instances from the global list in use by > > > > > all devlink consumers in the system. > > > > > > > > > > The future conversion of linked list to be xarray will allow us to > > > > > actually delete that lock, but first we need to count all struct devlink > > > > > users. > > > > > > > > Not a problem with this set but to state the obvious the global devlink > > > > lock also protects from concurrent execution of all the ops which don't > > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely know > > > > this but I thought I'd comment on an off chance it helps. > > > > > > The end goal will be something like that: > > > 1. Delete devlink lock > > > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) > > > 3. Convert devlink->lock to be read/write lock to make sure that we can run > > > get query in parallel. > > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". > > > > IIUC that'd mean setting eswitch mode would hold write lock on > > the dl instance. What locks does e.g. registering a dl port take > > then? > > Also that I think we have some cases where we want to allow the driver to allocate new devlink objects in response to adding a port, but still want to block other global operations from running? I don't see the flow where operations on devlink_A should block devlink_B. Only in such flows we will need global lock like we have now - devlink->lock. In all other flows, write lock of devlink instance will protect from parallel execution. Thanks
> -----Original Message----- > From: Leon Romanovsky <leon@kernel.org> > Sent: Wednesday, August 18, 2021 1:12 AM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: Jakub Kicinski <kuba@kernel.org>; David S . Miller <davem@davemloft.net>; > Guangbin Huang <huangguangbin2@huawei.com>; Jiri Pirko <jiri@nvidia.com>; > linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Salil Mehta > <salil.mehta@huawei.com>; Shannon Nelson <snelson@pensando.io>; Yisen > Zhuang <yisen.zhuang@huawei.com>; Yufeng Mo <moyufeng@huawei.com> > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > On Mon, Aug 16, 2021 at 09:32:17PM +0000, Keller, Jacob E wrote: > > > > > > > -----Original Message----- > > > From: Jakub Kicinski <kuba@kernel.org> > > > Sent: Monday, August 16, 2021 9:07 AM > > > To: Leon Romanovsky <leon@kernel.org> > > > Cc: David S . Miller <davem@davemloft.net>; Guangbin Huang > > > <huangguangbin2@huawei.com>; Keller, Jacob E <jacob.e.keller@intel.com>; > Jiri > > > Pirko <jiri@nvidia.com>; linux-kernel@vger.kernel.org; > netdev@vger.kernel.org; > > > Salil Mehta <salil.mehta@huawei.com>; Shannon Nelson > > > <snelson@pensando.io>; Yisen Zhuang <yisen.zhuang@huawei.com>; Yufeng > > > Mo <moyufeng@huawei.com> > > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > > > > > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > > > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > > > > need global lock during operation. That global lock is used to protect > > > > > > addition/removal new devlink instances from the global list in use by > > > > > > all devlink consumers in the system. > > > > > > > > > > > > The future conversion of linked list to be xarray will allow us to > > > > > > actually delete that lock, but first we need to count all struct devlink > > > > > > users. > > > > > > > > > > Not a problem with this set but to state the obvious the global devlink > > > > > lock also protects from concurrent execution of all the ops which don't > > > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely > know > > > > > this but I thought I'd comment on an off chance it helps. > > > > > > > > The end goal will be something like that: > > > > 1. Delete devlink lock > > > > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) > > > > 3. Convert devlink->lock to be read/write lock to make sure that we can run > > > > get query in parallel. > > > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". > > > > > > IIUC that'd mean setting eswitch mode would hold write lock on > > > the dl instance. What locks does e.g. registering a dl port take > > > then? > > > > Also that I think we have some cases where we want to allow the driver to > allocate new devlink objects in response to adding a port, but still want to block > other global operations from running? > > I don't see the flow where operations on devlink_A should block devlink_B. > Only in such flows we will need global lock like we have now - devlink->lock. > In all other flows, write lock of devlink instance will protect from > parallel execution. > > Thanks But how do we handle what is essentially recursion? If we add a port on the devlink A: userspace sends PORT_ADD for devlink A driver responds by creating a port adding a port causes driver to add a region, or other devlink object In the current design, if I understand correctly, we hold the global lock but *not* the instance lock. We can't hold the instance lock while adding port without breaking a bunch of drivers that add many devlink objects in response to port creation.. because they'll deadlock when going to add the sub objects. But if we don't hold the global lock, then in theory another userspace program could attempt to do something inbetween PORT_ADD starting and finishing which might not be desirable. (Remember, we had to drop the instance lock otherwise drivers get stuck when trying to add many subobjects) Thanks, Jake
On Wed, Aug 18, 2021 at 05:50:11PM +0000, Keller, Jacob E wrote: > > > > -----Original Message----- > > From: Leon Romanovsky <leon@kernel.org> > > Sent: Wednesday, August 18, 2021 1:12 AM > > To: Keller, Jacob E <jacob.e.keller@intel.com> > > Cc: Jakub Kicinski <kuba@kernel.org>; David S . Miller <davem@davemloft.net>; > > Guangbin Huang <huangguangbin2@huawei.com>; Jiri Pirko <jiri@nvidia.com>; > > linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Salil Mehta > > <salil.mehta@huawei.com>; Shannon Nelson <snelson@pensando.io>; Yisen > > Zhuang <yisen.zhuang@huawei.com>; Yufeng Mo <moyufeng@huawei.com> > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > > > On Mon, Aug 16, 2021 at 09:32:17PM +0000, Keller, Jacob E wrote: > > > > > > > > > > -----Original Message----- > > > > From: Jakub Kicinski <kuba@kernel.org> > > > > Sent: Monday, August 16, 2021 9:07 AM > > > > To: Leon Romanovsky <leon@kernel.org> > > > > Cc: David S . Miller <davem@davemloft.net>; Guangbin Huang > > > > <huangguangbin2@huawei.com>; Keller, Jacob E <jacob.e.keller@intel.com>; > > Jiri > > > > Pirko <jiri@nvidia.com>; linux-kernel@vger.kernel.org; > > netdev@vger.kernel.org; > > > > Salil Mehta <salil.mehta@huawei.com>; Shannon Nelson > > > > <snelson@pensando.io>; Yisen Zhuang <yisen.zhuang@huawei.com>; Yufeng > > > > Mo <moyufeng@huawei.com> > > > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > > > > > > > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > > > > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > > > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > > > > > need global lock during operation. That global lock is used to protect > > > > > > > addition/removal new devlink instances from the global list in use by > > > > > > > all devlink consumers in the system. > > > > > > > > > > > > > > The future conversion of linked list to be xarray will allow us to > > > > > > > actually delete that lock, but first we need to count all struct devlink > > > > > > > users. > > > > > > > > > > > > Not a problem with this set but to state the obvious the global devlink > > > > > > lock also protects from concurrent execution of all the ops which don't > > > > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely > > know > > > > > > this but I thought I'd comment on an off chance it helps. > > > > > > > > > > The end goal will be something like that: > > > > > 1. Delete devlink lock > > > > > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get) > > > > > 3. Convert devlink->lock to be read/write lock to make sure that we can run > > > > > get query in parallel. > > > > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". > > > > > > > > IIUC that'd mean setting eswitch mode would hold write lock on > > > > the dl instance. What locks does e.g. registering a dl port take > > > > then? > > > > > > Also that I think we have some cases where we want to allow the driver to > > allocate new devlink objects in response to adding a port, but still want to block > > other global operations from running? > > > > I don't see the flow where operations on devlink_A should block devlink_B. > > Only in such flows we will need global lock like we have now - devlink->lock. > > In all other flows, write lock of devlink instance will protect from > > parallel execution. > > > > Thanks > > > But how do we handle what is essentially recursion? Let's wait till implementation, I promise it will be covered :). > > If we add a port on the devlink A: > > userspace sends PORT_ADD for devlink A > driver responds by creating a port > adding a port causes driver to add a region, or other devlink object > > In the current design, if I understand correctly, we hold the global lock but *not* the instance lock. We can't hold the instance lock while adding port without breaking a bunch of drivers that add many devlink objects in response to port creation.. because they'll deadlock when going to add the sub objects. > > But if we don't hold the global lock, then in theory another userspace program could attempt to do something inbetween PORT_ADD starting and finishing which might not be desirable. (Remember, we had to drop the instance lock otherwise drivers get stuck when trying to add many subobjects) You just surfaced my main issue with the current devlink implementation - the purpose of devlink_lock. Over the years devlink code lost clear separation between user space flows and kernel flows. Thanks > > Thanks, > Jake
> -----Original Message----- > From: Leon Romanovsky <leon@kernel.org> > Sent: Friday, August 20, 2021 6:07 AM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: Jakub Kicinski <kuba@kernel.org>; David S . Miller <davem@davemloft.net>; > Guangbin Huang <huangguangbin2@huawei.com>; Jiri Pirko <jiri@nvidia.com>; > linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Salil Mehta > <salil.mehta@huawei.com>; Shannon Nelson <snelson@pensando.io>; Yisen > Zhuang <yisen.zhuang@huawei.com>; Yufeng Mo <moyufeng@huawei.com> > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > On Wed, Aug 18, 2021 at 05:50:11PM +0000, Keller, Jacob E wrote: > > > > > > > -----Original Message----- > > > From: Leon Romanovsky <leon@kernel.org> > > > Sent: Wednesday, August 18, 2021 1:12 AM > > > To: Keller, Jacob E <jacob.e.keller@intel.com> > > > Cc: Jakub Kicinski <kuba@kernel.org>; David S . Miller > <davem@davemloft.net>; > > > Guangbin Huang <huangguangbin2@huawei.com>; Jiri Pirko > <jiri@nvidia.com>; > > > linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Salil Mehta > > > <salil.mehta@huawei.com>; Shannon Nelson <snelson@pensando.io>; Yisen > > > Zhuang <yisen.zhuang@huawei.com>; Yufeng Mo <moyufeng@huawei.com> > > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > > > > > On Mon, Aug 16, 2021 at 09:32:17PM +0000, Keller, Jacob E wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Jakub Kicinski <kuba@kernel.org> > > > > > Sent: Monday, August 16, 2021 9:07 AM > > > > > To: Leon Romanovsky <leon@kernel.org> > > > > > Cc: David S . Miller <davem@davemloft.net>; Guangbin Huang > > > > > <huangguangbin2@huawei.com>; Keller, Jacob E > <jacob.e.keller@intel.com>; > > > Jiri > > > > > Pirko <jiri@nvidia.com>; linux-kernel@vger.kernel.org; > > > netdev@vger.kernel.org; > > > > > Salil Mehta <salil.mehta@huawei.com>; Shannon Nelson > > > > > <snelson@pensando.io>; Yisen Zhuang <yisen.zhuang@huawei.com>; > Yufeng > > > > > Mo <moyufeng@huawei.com> > > > > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers > > > > > > > > > > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote: > > > > > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote: > > > > > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote: > > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > > > > > The struct devlink itself is protected by internal lock and doesn't > > > > > > > > need global lock during operation. That global lock is used to protect > > > > > > > > addition/removal new devlink instances from the global list in use by > > > > > > > > all devlink consumers in the system. > > > > > > > > > > > > > > > > The future conversion of linked list to be xarray will allow us to > > > > > > > > actually delete that lock, but first we need to count all struct devlink > > > > > > > > users. > > > > > > > > > > > > > > Not a problem with this set but to state the obvious the global devlink > > > > > > > lock also protects from concurrent execution of all the ops which don't > > > > > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely > > > know > > > > > > > this but I thought I'd comment on an off chance it helps. > > > > > > > > > > > > The end goal will be something like that: > > > > > > 1. Delete devlink lock > > > > > > 2. Rely on xa_lock() while grabbing devlink instance (past > devlink_try_get) > > > > > > 3. Convert devlink->lock to be read/write lock to make sure that we can > run > > > > > > get query in parallel. > > > > > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true". > > > > > > > > > > IIUC that'd mean setting eswitch mode would hold write lock on > > > > > the dl instance. What locks does e.g. registering a dl port take > > > > > then? > > > > > > > > Also that I think we have some cases where we want to allow the driver to > > > allocate new devlink objects in response to adding a port, but still want to > block > > > other global operations from running? > > > > > > I don't see the flow where operations on devlink_A should block devlink_B. > > > Only in such flows we will need global lock like we have now - devlink->lock. > > > In all other flows, write lock of devlink instance will protect from > > > parallel execution. > > > > > > Thanks > > > > > > But how do we handle what is essentially recursion? > > Let's wait till implementation, I promise it will be covered :). > Sure. It's certainly easier to talk about a proposed implementation once we have it. > > > > If we add a port on the devlink A: > > > > userspace sends PORT_ADD for devlink A > > driver responds by creating a port > > adding a port causes driver to add a region, or other devlink object > > > > In the current design, if I understand correctly, we hold the global lock but > *not* the instance lock. We can't hold the instance lock while adding port > without breaking a bunch of drivers that add many devlink objects in response to > port creation.. because they'll deadlock when going to add the sub objects. > > > > But if we don't hold the global lock, then in theory another userspace program > could attempt to do something inbetween PORT_ADD starting and finishing > which might not be desirable. (Remember, we had to drop the instance lock > otherwise drivers get stuck when trying to add many subobjects) > > You just surfaced my main issue with the current devlink > implementation - the purpose of devlink_lock. Over the years devlink > code lost clear separation between user space flows and kernel flows. > > Thanks > Yep. It's definitely complex. > > > > Thanks, > > Jake
diff --git a/include/net/devlink.h b/include/net/devlink.h index 1151497c0ec5..4c60d61d92da 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -56,6 +56,8 @@ struct devlink { */ u8 reload_failed:1, reload_enabled:1; + refcount_t refcount; + struct completion comp; char priv[0] __aligned(NETDEV_ALIGN); }; diff --git a/net/core/devlink.c b/net/core/devlink.c index c8a8eecad1c5..76f459da6e05 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -108,10 +108,22 @@ struct net *devlink_net(const struct devlink *devlink) } EXPORT_SYMBOL_GPL(devlink_net); +static void devlink_put(struct devlink *devlink) +{ + if (refcount_dec_and_test(&devlink->refcount)) + complete(&devlink->comp); +} + +static bool __must_check devlink_try_get(struct devlink *devlink) +{ + return refcount_inc_not_zero(&devlink->refcount); +} + static struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs) { struct devlink *devlink; + bool found = false; char *busname; char *devname; @@ -126,16 +138,16 @@ static struct devlink *devlink_get_from_attrs(struct net *net, list_for_each_entry(devlink, &devlink_list, list) { if (strcmp(devlink->dev->bus->name, busname) == 0 && strcmp(dev_name(devlink->dev), devname) == 0 && - net_eq(devlink_net(devlink), net)) - return devlink; + net_eq(devlink_net(devlink), net)) { + found = true; + break; + } } - return ERR_PTR(-ENODEV); -} + if (!found || !devlink_try_get(devlink)) + devlink = ERR_PTR(-ENODEV); -static struct devlink *devlink_get_from_info(struct genl_info *info) -{ - return devlink_get_from_attrs(genl_info_net(info), info->attrs); + return devlink; } static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink, @@ -486,7 +498,7 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops, int err; mutex_lock(&devlink_mutex); - devlink = devlink_get_from_info(info); + devlink = devlink_get_from_attrs(genl_info_net(info), info->attrs); if (IS_ERR(devlink)) { mutex_unlock(&devlink_mutex); return PTR_ERR(devlink); @@ -529,6 +541,7 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops, unlock: if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK) mutex_unlock(&devlink->lock); + devlink_put(devlink); mutex_unlock(&devlink_mutex); return err; } @@ -541,6 +554,7 @@ static void devlink_nl_post_doit(const struct genl_ops *ops, devlink = info->user_ptr[0]; if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK) mutex_unlock(&devlink->lock); + devlink_put(devlink); mutex_unlock(&devlink_mutex); } @@ -1078,8 +1092,12 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(devlink_rate, &devlink->rate_list, list) { enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; @@ -1094,11 +1112,14 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, NLM_F_MULTI, NULL); if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -1173,15 +1194,24 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) { + devlink_put(devlink); + continue; + } + if (idx < start) { idx++; + devlink_put(devlink); continue; } + err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI); + devlink_put(devlink); if (err) goto out; idx++; @@ -1226,8 +1256,12 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(devlink_port, &devlink->port_list, list) { if (idx < start) { @@ -1241,11 +1275,14 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, NLM_F_MULTI, cb->extack); if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -1884,8 +1921,12 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { if (idx < start) { @@ -1899,11 +1940,14 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, NLM_F_MULTI); if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -2028,9 +2072,13 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { + if (!devlink_try_get(devlink)) + continue; + if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) || !devlink->ops->sb_pool_get) - continue; + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { err = __sb_pool_get_dumpit(msg, start, &idx, devlink, @@ -2041,10 +2089,13 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, err = 0; } else if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -2241,9 +2292,13 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { + if (!devlink_try_get(devlink)) + continue; + if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) || !devlink->ops->sb_port_pool_get) - continue; + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { err = __sb_port_pool_get_dumpit(msg, start, &idx, @@ -2254,10 +2309,13 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, err = 0; } else if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -2482,9 +2540,12 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { + if (!devlink_try_get(devlink)) + continue; + if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) || !devlink->ops->sb_tc_pool_bind_get) - continue; + goto retry; mutex_lock(&devlink->lock); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { @@ -2497,10 +2558,13 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, err = 0; } else if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -4552,8 +4616,12 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(param_item, &devlink->param_list, list) { if (idx < start) { @@ -4569,11 +4637,14 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, err = 0; } else if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -4820,8 +4891,12 @@ static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(devlink_port, &devlink->port_list, list) { list_for_each_entry(param_item, @@ -4841,12 +4916,15 @@ static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg, err = 0; } else if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -5385,14 +5463,20 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, struct devlink *devlink; int start = cb->args[0]; int idx = 0; - int err; + int err = 0; mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, &idx, start); +retry: + devlink_put(devlink); if (err) goto out; } @@ -5755,6 +5839,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, nla_nest_end(skb, chunks_attr); genlmsg_end(skb, hdr); mutex_unlock(&devlink->lock); + devlink_put(devlink); mutex_unlock(&devlink_mutex); return skb->len; @@ -5763,6 +5848,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, genlmsg_cancel(skb, hdr); out_unlock: mutex_unlock(&devlink->lock); + devlink_put(devlink); out_dev: mutex_unlock(&devlink_mutex); return err; @@ -5914,17 +6000,14 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; - if (idx < start) { - idx++; - continue; - } - if (!devlink->ops->info_get) { - idx++; - continue; - } + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + + if (idx < start || !devlink->ops->info_get) + goto inc; mutex_lock(&devlink->lock); err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, @@ -5934,9 +6017,14 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, mutex_unlock(&devlink->lock); if (err == -EOPNOTSUPP) err = 0; - else if (err) + else if (err) { + devlink_put(devlink); break; + } +inc: idx++; +retry: + devlink_put(devlink); } mutex_unlock(&devlink_mutex); @@ -7021,6 +7109,7 @@ devlink_health_reporter_get_from_cb(struct netlink_callback *cb) goto unlock; reporter = devlink_health_reporter_get_from_attrs(devlink, attrs); + devlink_put(devlink); mutex_unlock(&devlink_mutex); return reporter; unlock: @@ -7092,8 +7181,12 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry_rep; + mutex_lock(&devlink->reporters_lock); list_for_each_entry(reporter, &devlink->reporter_list, list) { @@ -7107,16 +7200,23 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, NLM_F_MULTI); if (err) { mutex_unlock(&devlink->reporters_lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->reporters_lock); +retry_rep: + devlink_put(devlink); } list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry_port; + mutex_lock(&devlink->lock); list_for_each_entry(port, &devlink->port_list, list) { mutex_lock(&port->reporters_lock); @@ -7133,6 +7233,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, if (err) { mutex_unlock(&port->reporters_lock); mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; @@ -7140,6 +7241,8 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, mutex_unlock(&port->reporters_lock); } mutex_unlock(&devlink->lock); +retry_port: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -7673,8 +7776,12 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(trap_item, &devlink->trap_list, list) { if (idx < start) { @@ -7688,11 +7795,14 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, NLM_F_MULTI); if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -7892,8 +8002,12 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(group_item, &devlink->trap_group_list, list) { @@ -7908,11 +8022,14 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, NLM_F_MULTI); if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -8198,8 +8315,12 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + if (!devlink_try_get(devlink)) continue; + + if (!net_eq(devlink_net(devlink), sock_net(msg->sk))) + goto retry; + mutex_lock(&devlink->lock); list_for_each_entry(policer_item, &devlink->trap_policer_list, list) { @@ -8214,11 +8335,14 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, NLM_F_MULTI); if (err) { mutex_unlock(&devlink->lock); + devlink_put(devlink); goto out; } idx++; } mutex_unlock(&devlink->lock); +retry: + devlink_put(devlink); } out: mutex_unlock(&devlink_mutex); @@ -8801,6 +8925,9 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, INIT_LIST_HEAD(&devlink->trap_policer_list); mutex_init(&devlink->lock); mutex_init(&devlink->reporters_lock); + refcount_set(&devlink->refcount, 1); + init_completion(&devlink->comp); + return devlink; } EXPORT_SYMBOL_GPL(devlink_alloc_ns); @@ -8827,6 +8954,9 @@ EXPORT_SYMBOL_GPL(devlink_register); */ void devlink_unregister(struct devlink *devlink) { + devlink_put(devlink); + wait_for_completion(&devlink->comp); + mutex_lock(&devlink_mutex); WARN_ON(devlink_reload_supported(devlink->ops) && devlink->reload_enabled); @@ -11374,9 +11504,12 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) */ mutex_lock(&devlink_mutex); list_for_each_entry(devlink, &devlink_list, list) { - if (!net_eq(devlink_net(devlink), net)) + if (!devlink_try_get(devlink)) continue; + if (!net_eq(devlink_net(devlink), net)) + goto retry; + WARN_ON(!devlink_reload_supported(devlink->ops)); err = devlink_reload(devlink, &init_net, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, @@ -11384,6 +11517,8 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) &actions_performed, NULL); if (err && err != -EOPNOTSUPP) pr_warn("Failed to reload devlink instance into init_net\n"); +retry: + devlink_put(devlink); } mutex_unlock(&devlink_mutex); }