Message ID | 20210210175423.1873-1-mike.ximing.chen@intel.com |
---|---|
Headers | show |
Series | dlb: introduce DLB device driver | expand |
> -----Original Message----- > From: Mike Ximing Chen <mike.ximing.chen@intel.com> > Sent: Wednesday, February 10, 2021 12:54 PM > To: netdev@vger.kernel.org > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de; > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>; pierre- > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com> > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver > > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc- > devices/dlb.rst > new file mode 100644 > index 000000000000..aa79be07ee49 > --- /dev/null > +++ b/Documentation/misc-devices/dlb.rst > @@ -0,0 +1,259 @@ > +.. SPDX-License-Identifier: GPL-2.0-only > + > +=========================================== > +Intel(R) Dynamic Load Balancer Overview > +=========================================== > + > +:Authors: Gage Eads and Mike Ximing Chen > + > +Contents > +======== > + > +- Introduction > +- Scheduling > +- Queue Entry > +- Port > +- Queue > +- Credits > +- Scheduling Domain > +- Interrupts > +- Power Management > +- User Interface > +- Reset > + > +Introduction > +============ > + > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that > +provides load-balanced, prioritized scheduling of core-to-core communication. > + > +Intel DLB is an accelerator for the event-driven programming model of > +DPDK's Event Device Library[2]. The library is used in packet processing > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and > +variety of packet distribution and synchronization schemes. > + > +Intel DLB device consists of queues and arbiters that connect producer > +cores and consumer cores. The device implements load-balanced queueing > features > +including: > +- Lock-free multi-producer/multi-consumer operation. > +- Multiple priority levels for varying traffic types. > +- 'Direct' traffic (i.e. multi-producer/single-consumer) > +- Simple unordered load-balanced distribution. > +- Atomic lock free load balancing across multiple consumers. > +- Queue element reordering feature allowing ordered load-balanced distribution. > + Hi Jakub/Dave, This is a device driver for a HW core-to-core communication accelerator. It is submitted to "linux-kernel" for a module under device/misc. Greg suggested (see below) that we also sent it to you for any potential feedback in case there is any interaction with networking initiatives. The device is used to handle the load balancing among CPU cores after the packets are received and forwarded to CPU. We don't think it interferes with networking operations, but would appreciate very much your review/comment on this. Thanks for your help. Mike > As this is a networking related thing, I would like you to get the >proper reviews/acks from the networking maintainers before I can take >this. > >Or, if they think it has nothing to do with networking, that's fine too, >but please do not try to route around them. > >thanks, > >greg k-h
On Thu, Feb 18, 2021 at 07:34:31AM +0000, Chen, Mike Ximing wrote: > > > > -----Original Message----- > > From: Mike Ximing Chen <mike.ximing.chen@intel.com> > > Sent: Wednesday, February 10, 2021 12:54 PM > > To: netdev@vger.kernel.org > > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de; > > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>; pierre- > > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com> > > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver > > > > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc- > > devices/dlb.rst > > new file mode 100644 > > index 000000000000..aa79be07ee49 > > --- /dev/null > > +++ b/Documentation/misc-devices/dlb.rst > > @@ -0,0 +1,259 @@ > > +.. SPDX-License-Identifier: GPL-2.0-only > > + > > +=========================================== > > +Intel(R) Dynamic Load Balancer Overview > > +=========================================== > > + > > +:Authors: Gage Eads and Mike Ximing Chen > > + > > +Contents > > +======== > > + > > +- Introduction > > +- Scheduling > > +- Queue Entry > > +- Port > > +- Queue > > +- Credits > > +- Scheduling Domain > > +- Interrupts > > +- Power Management > > +- User Interface > > +- Reset > > + > > +Introduction > > +============ > > + > > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that > > +provides load-balanced, prioritized scheduling of core-to-core communication. > > + > > +Intel DLB is an accelerator for the event-driven programming model of > > +DPDK's Event Device Library[2]. The library is used in packet processing > > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and > > +variety of packet distribution and synchronization schemes. > > + > > +Intel DLB device consists of queues and arbiters that connect producer > > +cores and consumer cores. The device implements load-balanced queueing > > features > > +including: > > +- Lock-free multi-producer/multi-consumer operation. > > +- Multiple priority levels for varying traffic types. > > +- 'Direct' traffic (i.e. multi-producer/single-consumer) > > +- Simple unordered load-balanced distribution. > > +- Atomic lock free load balancing across multiple consumers. > > +- Queue element reordering feature allowing ordered load-balanced distribution. > > + > > Hi Jakub/Dave, > This is a device driver for a HW core-to-core communication accelerator. It is submitted > to "linux-kernel" for a module under device/misc. Greg suggested (see below) that we > also sent it to you for any potential feedback in case there is any interaction with > networking initiatives. The device is used to handle the load balancing among CPU cores > after the packets are received and forwarded to CPU. We don't think it interferes > with networking operations, but would appreciate very much your review/comment on this. It's the middle of the merge window, getting maintainers to review new stuff until after 5.12-rc1 is out is going to be a very difficult thing to do. In the meantime, why don't you all help out and review submitted patches to the mailing lists for the subsystems you all are trying to get this patch into. I know maintainers would appreciate the help, right? thanks, greg k-h
> -----Original Message----- > From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> > Sent: Thursday, February 18, 2021 2:53 AM > To: Chen, Mike Ximing <mike.ximing.chen@intel.com> > Cc: netdev@vger.kernel.org; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org>; davem@davemloft.net; kuba@kernel.org; arnd@arndb.de; > Williams, Dan J <dan.j.williams@intel.com>; pierre-louis.bossart@linux.intel.com > Subject: Re: [PATCH v10 01/20] dlb: add skeleton for DLB driver > > On Thu, Feb 18, 2021 at 07:34:31AM +0000, Chen, Mike Ximing wrote: > > > > > > > -----Original Message----- > > > From: Mike Ximing Chen <mike.ximing.chen@intel.com> > > > Sent: Wednesday, February 10, 2021 12:54 PM > > > To: netdev@vger.kernel.org > > > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de; > > > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>; > pierre- > > > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com> > > > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver > > > > > > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc- > > > devices/dlb.rst > > > new file mode 100644 > > > index 000000000000..aa79be07ee49 > > > --- /dev/null > > > +++ b/Documentation/misc-devices/dlb.rst > > > @@ -0,0 +1,259 @@ > > > +.. SPDX-License-Identifier: GPL-2.0-only > > > + > > > +=========================================== > > > +Intel(R) Dynamic Load Balancer Overview > > > +=========================================== > > > + > > > +:Authors: Gage Eads and Mike Ximing Chen > > > + > > > +Contents > > > +======== > > > + > > > +- Introduction > > > +- Scheduling > > > +- Queue Entry > > > +- Port > > > +- Queue > > > +- Credits > > > +- Scheduling Domain > > > +- Interrupts > > > +- Power Management > > > +- User Interface > > > +- Reset > > > + > > > +Introduction > > > +============ > > > + > > > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that > > > +provides load-balanced, prioritized scheduling of core-to-core communication. > > > + > > > +Intel DLB is an accelerator for the event-driven programming model of > > > +DPDK's Event Device Library[2]. The library is used in packet processing > > > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and > > > +variety of packet distribution and synchronization schemes. > > > + > > > +Intel DLB device consists of queues and arbiters that connect producer > > > +cores and consumer cores. The device implements load-balanced queueing > > > features > > > +including: > > > +- Lock-free multi-producer/multi-consumer operation. > > > +- Multiple priority levels for varying traffic types. > > > +- 'Direct' traffic (i.e. multi-producer/single-consumer) > > > +- Simple unordered load-balanced distribution. > > > +- Atomic lock free load balancing across multiple consumers. > > > +- Queue element reordering feature allowing ordered load-balanced distribution. > > > + > > > > Hi Jakub/Dave, > > This is a device driver for a HW core-to-core communication accelerator. It is > submitted > > to "linux-kernel" for a module under device/misc. Greg suggested (see below) that > we > > also sent it to you for any potential feedback in case there is any interaction with > > networking initiatives. The device is used to handle the load balancing among CPU > cores > > after the packets are received and forwarded to CPU. We don't think it interferes > > with networking operations, but would appreciate very much your > review/comment on this. > > It's the middle of the merge window, getting maintainers to review new > stuff until after 5.12-rc1 is out is going to be a very difficult thing > to do. > > In the meantime, why don't you all help out and review submitted patches > to the mailing lists for the subsystems you all are trying to get this > patch into. I know maintainers would appreciate the help, right? > > thanks, > > greg k-h Sure. I am a little new to the community and process, but will try to help. Thanks Mike
> -----Original Message----- > From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> > Sent: Thursday, February 18, 2021 2:53 AM > To: Chen, Mike Ximing <mike.ximing.chen@intel.com> > Cc: netdev@vger.kernel.org; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org>; davem@davemloft.net; kuba@kernel.org; arnd@arndb.de; > Williams, Dan J <dan.j.williams@intel.com>; pierre-louis.bossart@linux.intel.com > Subject: Re: [PATCH v10 01/20] dlb: add skeleton for DLB driver > > On Thu, Feb 18, 2021 at 07:34:31AM +0000, Chen, Mike Ximing wrote: > > > > > > > -----Original Message----- > > > From: Mike Ximing Chen <mike.ximing.chen@intel.com> > > > Sent: Wednesday, February 10, 2021 12:54 PM > > > To: netdev@vger.kernel.org > > > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de; > > > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>; > pierre- > > > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com> > > > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver > > > > > > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc- > > > devices/dlb.rst > > > new file mode 100644 > > > index 000000000000..aa79be07ee49 > > > --- /dev/null > > > +++ b/Documentation/misc-devices/dlb.rst > > > @@ -0,0 +1,259 @@ > > > +.. SPDX-License-Identifier: GPL-2.0-only > > > + > > > +=========================================== > > > +Intel(R) Dynamic Load Balancer Overview > > > +=========================================== > > > + > > > +:Authors: Gage Eads and Mike Ximing Chen > > > + > > > +Contents > > > +======== > > > + > > > +- Introduction > > > +- Scheduling > > > +- Queue Entry > > > +- Port > > > +- Queue > > > +- Credits > > > +- Scheduling Domain > > > +- Interrupts > > > +- Power Management > > > +- User Interface > > > +- Reset > > > + > > > +Introduction > > > +============ > > > + > > > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that > > > +provides load-balanced, prioritized scheduling of core-to-core communication. > > > + > > > +Intel DLB is an accelerator for the event-driven programming model of > > > +DPDK's Event Device Library[2]. The library is used in packet processing > > > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and > > > +variety of packet distribution and synchronization schemes. > > > + > > > +Intel DLB device consists of queues and arbiters that connect producer > > > +cores and consumer cores. The device implements load-balanced queueing > > > features > > > +including: > > > +- Lock-free multi-producer/multi-consumer operation. > > > +- Multiple priority levels for varying traffic types. > > > +- 'Direct' traffic (i.e. multi-producer/single-consumer) > > > +- Simple unordered load-balanced distribution. > > > +- Atomic lock free load balancing across multiple consumers. > > > +- Queue element reordering feature allowing ordered load-balanced distribution. > > > + > > > > Hi Jakub/Dave, > > This is a device driver for a HW core-to-core communication accelerator. It is > submitted > > to "linux-kernel" for a module under device/misc. Greg suggested (see below) that > we > > also sent it to you for any potential feedback in case there is any interaction with > > networking initiatives. The device is used to handle the load balancing among CPU > cores > > after the packets are received and forwarded to CPU. We don't think it interferes > > with networking operations, but would appreciate very much your > review/comment on this. > > It's the middle of the merge window, getting maintainers to review new > stuff until after 5.12-rc1 is out is going to be a very difficult thing > to do. > Hi Jakub/Dave, Just wonder if you had a chance to take a look at our patch. With the close of 5.12 merge window, we would like to get the process moving again. > In the meantime, why don't you all help out and review submitted patches > to the mailing lists for the subsystems you all are trying to get this > patch into. I know maintainers would appreciate the help, right? > > thanks, > > greg k-h Did a few reviews last weekend, and will continue to help. Thanks Mike
On Wed, Feb 10, 2021 at 11:54:06AM -0600, Mike Ximing Chen wrote: > --- a/drivers/misc/dlb/dlb_hw_types.h > +++ b/drivers/misc/dlb/dlb_hw_types.h > @@ -5,6 +5,13 @@ > #define __DLB_HW_TYPES_H > > #include <linux/io.h> > +#include <linux/types.h> > + > +#include "dlb_bitmap.h" > + > +#define BITS_SET(x, val, mask) (x = ((x) & ~(mask)) \ > + | (((val) << (mask##_LOC)) & (mask))) > +#define BITS_GET(x, mask) (((x) & (mask)) >> (mask##_LOC)) Why not use the built-in kernel functions for this? Why are you creating your own? > -static void > -dlb_pf_unmap_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) > +static void dlb_pf_unmap_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) Why reformat code here, and not do it right the first time around? > { > pcim_iounmap(pdev, dlb->hw.csr_kva); > pcim_iounmap(pdev, dlb->hw.func_kva); > } > > -static int > -dlb_pf_map_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) > +static int dlb_pf_map_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) Same here. > { > dlb->hw.func_kva = pcim_iomap_table(pdev)[DLB_FUNC_BAR]; > dlb->hw.func_phys_addr = pci_resource_start(pdev, DLB_FUNC_BAR); > @@ -40,6 +42,59 @@ dlb_pf_map_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) > return 0; > } > > +/*******************************/ > +/****** Driver management ******/ > +/*******************************/ > + > +static int dlb_pf_init_driver_state(struct dlb *dlb) > +{ > + mutex_init(&dlb->resource_mutex); > + > + return 0; If this can not fail, why is this not just void? > diff --git a/drivers/misc/dlb/dlb_resource.h b/drivers/misc/dlb/dlb_resource.h > new file mode 100644 > index 000000000000..2229813d9c45 > --- /dev/null > +++ b/drivers/misc/dlb/dlb_resource.h Why do you have lots of little .h files and not just one simple .h file for the driver? That makes it much easier to maintain over time, right? thanks, greg k-h
On Wed, Feb 10, 2021 at 11:54:08AM -0600, Mike Ximing Chen wrote: > +/** > + * dlb_bitmap_clear_range() - clear a range of bitmap entries > + * @bitmap: pointer to dlb_bitmap structure. > + * @bit: starting bit index. > + * @len: length of the range. > + * > + * Return: > + * Returns 0 upon success, < 0 otherwise. > + * > + * Errors: > + * EINVAL - bitmap is NULL or is uninitialized, or the range exceeds the bitmap > + * length. > + */ > +static inline int dlb_bitmap_clear_range(struct dlb_bitmap *bitmap, > + unsigned int bit, > + unsigned int len) > +{ > + if (!bitmap || !bitmap->map) > + return -EINVAL; > + > + if (bitmap->len <= bit) > + return -EINVAL; > + > + bitmap_clear(bitmap->map, bit, len); > + > + return 0; > +} Why isn't logic like this just added to the lib/bitmap.c file? > +/** > + * dlb_bitmap_find_set_bit_range() - find an range of set bits > + * @bitmap: pointer to dlb_bitmap structure. > + * @len: length of the range. > + * > + * This function looks for a range of set bits of length @len. > + * > + * Return: > + * Returns the base bit index upon success, < 0 otherwise. > + * > + * Errors: > + * ENOENT - unable to find a length *len* range of set bits. > + * EINVAL - bitmap is NULL or is uninitialized, or len is invalid. > + */ > +static inline int dlb_bitmap_find_set_bit_range(struct dlb_bitmap *bitmap, > + unsigned int len) > +{ > + struct dlb_bitmap *complement_mask = NULL; > + int ret; > + > + if (!bitmap || !bitmap->map || len == 0) > + return -EINVAL; > + > + if (bitmap->len < len) > + return -ENOENT; > + > + ret = dlb_bitmap_alloc(&complement_mask, bitmap->len); > + if (ret) > + return ret; > + > + bitmap_zero(complement_mask->map, complement_mask->len); > + > + bitmap_complement(complement_mask->map, bitmap->map, bitmap->len); > + > + ret = bitmap_find_next_zero_area(complement_mask->map, > + complement_mask->len, > + 0, > + len, > + 0); > + > + dlb_bitmap_free(complement_mask); > + > + /* No set bit range of length len? */ > + return (ret >= (int)bitmap->len) ? -ENOENT : ret; > +} Same here, why not put this in the core kernel instead of a tiny random driver like this? thanks, greg k-h
On Wed, Feb 10, 2021 at 11:54:17AM -0600, Mike Ximing Chen wrote: > Add ioctl to start a domain. Once a scheduling domain and its resources > have been configured, this ioctl is called to allow the domain's ports to > begin enqueueing to the device. Once started, the domain's resources cannot > be configured again until after the domain is reset. > > This ioctl instructs the DLB device to start load-balancing operations. > It corresponds to rte_event_dev_start() function in DPDK' eventdev library. > > Signed-off-by: Gage Eads <gage.eads@intel.com> > Signed-off-by: Mike Ximing Chen <mike.ximing.chen@intel.com> > Reviewed-by: Björn Töpel <bjorn.topel@intel.com> > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/misc/dlb/dlb_ioctl.c | 3 + > drivers/misc/dlb/dlb_main.h | 4 ++ > drivers/misc/dlb/dlb_pf_ops.c | 9 +++ > drivers/misc/dlb/dlb_resource.c | 116 ++++++++++++++++++++++++++++++++ > drivers/misc/dlb/dlb_resource.h | 4 ++ > include/uapi/linux/dlb.h | 22 ++++++ > 6 files changed, 158 insertions(+) > > diff --git a/drivers/misc/dlb/dlb_ioctl.c b/drivers/misc/dlb/dlb_ioctl.c > index 6a311b969643..9b05344f03c8 100644 > --- a/drivers/misc/dlb/dlb_ioctl.c > +++ b/drivers/misc/dlb/dlb_ioctl.c > @@ -51,6 +51,7 @@ DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_ldb_queue) > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_dir_queue) > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_ldb_queue_depth) > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_dir_queue_depth) > +DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(start_domain) > > /* > * Port creation ioctls don't use the callback template macro. > @@ -322,6 +323,8 @@ long dlb_domain_ioctl(struct file *f, unsigned int cmd, unsigned long arg) > return dlb_domain_ioctl_get_dir_port_pp_fd(dlb, dom, arg); > case DLB_IOC_GET_DIR_PORT_CQ_FD: > return dlb_domain_ioctl_get_dir_port_cq_fd(dlb, dom, arg); > + case DLB_IOC_START_DOMAIN: > + return dlb_domain_ioctl_start_domain(dlb, dom, arg); > default: > return -ENOTTY; > } > diff --git a/drivers/misc/dlb/dlb_main.h b/drivers/misc/dlb/dlb_main.h > index 477974e1a178..2f3096a45b1e 100644 > --- a/drivers/misc/dlb/dlb_main.h > +++ b/drivers/misc/dlb/dlb_main.h > @@ -63,6 +63,10 @@ struct dlb_device_ops { > struct dlb_create_dir_port_args *args, > uintptr_t cq_dma_base, > struct dlb_cmd_response *resp); > + int (*start_domain)(struct dlb_hw *hw, > + u32 domain_id, > + struct dlb_start_domain_args *args, > + struct dlb_cmd_response *resp); > int (*get_num_resources)(struct dlb_hw *hw, > struct dlb_get_num_resources_args *args); > int (*reset_domain)(struct dlb_hw *hw, u32 domain_id); > diff --git a/drivers/misc/dlb/dlb_pf_ops.c b/drivers/misc/dlb/dlb_pf_ops.c > index 02a188aa5a60..ce9d29b94a55 100644 > --- a/drivers/misc/dlb/dlb_pf_ops.c > +++ b/drivers/misc/dlb/dlb_pf_ops.c > @@ -160,6 +160,14 @@ dlb_pf_create_dir_port(struct dlb_hw *hw, u32 id, > resp, false, 0); > } > > +static int > +dlb_pf_start_domain(struct dlb_hw *hw, u32 id, > + struct dlb_start_domain_args *args, > + struct dlb_cmd_response *resp) > +{ > + return dlb_hw_start_domain(hw, id, args, resp, false, 0); > +} > + > static int dlb_pf_get_num_resources(struct dlb_hw *hw, > struct dlb_get_num_resources_args *args) > { > @@ -232,6 +240,7 @@ struct dlb_device_ops dlb_pf_ops = { > .create_dir_queue = dlb_pf_create_dir_queue, > .create_ldb_port = dlb_pf_create_ldb_port, > .create_dir_port = dlb_pf_create_dir_port, > + .start_domain = dlb_pf_start_domain, Why do you have a "callback" when you only ever call one function? Why is that needed at all? thanks, greg k-h
> -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > > On Wed, Feb 10, 2021 at 11:54:06AM -0600, Mike Ximing Chen wrote: > > + > > +#include "dlb_bitmap.h" > > + > > +#define BITS_SET(x, val, mask) (x = ((x) & ~(mask)) \ > > + | (((val) << (mask##_LOC)) & (mask))) > > +#define BITS_GET(x, mask) (((x) & (mask)) >> (mask##_LOC)) > > Why not use the built-in kernel functions for this? Why are you > creating your own? > FIELD_GET(_mask, _val) and FIELD_PREP(_mask, _val) in include/linux/bitfield.h are similar to our BITS_GET() and BITS_SET(). However in our case, mask##_LOC is a known constant defined in dlb_regs.h, so we don't need to use _buildin_ffs(mask) to calculate the location of mask as FIELD_GET() and FIELD_PREP() do. We can still use FIELD_GET and FIELD_PREP, but our macros are a little more efficient. Would it be OK to keep them? > > > > -static void > > -dlb_pf_unmap_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) > > +static void dlb_pf_unmap_pci_bar_space(struct dlb *dlb, struct pci_dev *pdev) > > Why reformat code here, and not do it right the first time around? > Sorry, this should not happen. Will fix it. > > +/*******************************/ > > +/****** Driver management ******/ > > +/*******************************/ > > + > > +static int dlb_pf_init_driver_state(struct dlb *dlb) > > +{ > > + mutex_init(&dlb->resource_mutex); > > + > > + return 0; > > If this can not fail, why is this not just void? Sure, will change it void. > > > diff --git a/drivers/misc/dlb/dlb_resource.h b/drivers/misc/dlb/dlb_resource.h > > new file mode 100644 > > index 000000000000..2229813d9c45 > > --- /dev/null > > +++ b/drivers/misc/dlb/dlb_resource.h > > Why do you have lots of little .h files and not just one simple .h file > for the driver? That makes it much easier to maintain over time, right? > I combined a couple of header files in this version. dlb_regs.h is pretty big (3640 lines), and is generated by SW. I will merge other .h files into one. Thanks Mike
> -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > > On Wed, Feb 10, 2021 at 11:54:08AM -0600, Mike Ximing Chen wrote: > > +static inline int dlb_bitmap_clear_range(struct dlb_bitmap *bitmap, > > + unsigned int bit, > > + unsigned int len) > > +{ > > + if (!bitmap || !bitmap->map) > > + return -EINVAL; > > + > > + if (bitmap->len <= bit) > > + return -EINVAL; > > + > > + bitmap_clear(bitmap->map, bit, len); > > + > > + return 0; > > +} > > Why isn't logic like this just added to the lib/bitmap.c file? > > > +static inline int dlb_bitmap_find_set_bit_range(struct dlb_bitmap *bitmap, > > + unsigned int len) > > +{ > > + struct dlb_bitmap *complement_mask = NULL; > > + int ret; > > + > > + if (!bitmap || !bitmap->map || len == 0) > > + return -EINVAL; > > + > > + if (bitmap->len < len) > > + return -ENOENT; > > + > > + ret = dlb_bitmap_alloc(&complement_mask, bitmap->len); > > + if (ret) > > + return ret; > > + > > + bitmap_zero(complement_mask->map, complement_mask->len); > > + > > + bitmap_complement(complement_mask->map, bitmap->map, bitmap->len); > > + > > + ret = bitmap_find_next_zero_area(complement_mask->map, > > + complement_mask->len, > > + 0, > > + len, > > + 0); > > + > > + dlb_bitmap_free(complement_mask); > > + > > + /* No set bit range of length len? */ > > + return (ret >= (int)bitmap->len) ? -ENOENT : ret; > > +} > > Same here, why not put this in the core kernel instead of a tiny random > driver like this? > OK, we will put them in include/Linux/bitmap.h. Thanks Mike
> -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > On Wed, Feb 10, 2021 at 11:54:17AM -0600, Mike Ximing Chen wrote: > > > > diff --git a/drivers/misc/dlb/dlb_ioctl.c b/drivers/misc/dlb/dlb_ioctl.c > > index 6a311b969643..9b05344f03c8 100644 > > --- a/drivers/misc/dlb/dlb_ioctl.c > > +++ b/drivers/misc/dlb/dlb_ioctl.c > > @@ -51,6 +51,7 @@ > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_ldb_queue) > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_dir_queue) > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_ldb_queue_depth) > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_dir_queue_depth) > > +DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(start_domain) > > > > --- a/drivers/misc/dlb/dlb_pf_ops.c > > +++ b/drivers/misc/dlb/dlb_pf_ops.c > > @@ -160,6 +160,14 @@ dlb_pf_create_dir_port(struct dlb_hw *hw, u32 id, > > resp, false, 0); > > } > > > > +static int > > +dlb_pf_start_domain(struct dlb_hw *hw, u32 id, > > + struct dlb_start_domain_args *args, > > + struct dlb_cmd_response *resp) > > +{ > > + return dlb_hw_start_domain(hw, id, args, resp, false, 0); > > +} > > + > > static int dlb_pf_get_num_resources(struct dlb_hw *hw, > > struct dlb_get_num_resources_args *args) > > { > > @@ -232,6 +240,7 @@ struct dlb_device_ops dlb_pf_ops = { > > .create_dir_queue = dlb_pf_create_dir_queue, > > .create_ldb_port = dlb_pf_create_ldb_port, > > .create_dir_port = dlb_pf_create_dir_port, > > + .start_domain = dlb_pf_start_domain, > > Why do you have a "callback" when you only ever call one function? Why > is that needed at all? > In our next submission, we are going to add virtual function (VF) support. The callbacks for VFs are different from those for PF which is what we support in this submission. We can defer the introduction of the callback structure to when we add the VF support. But since we have many callback functions, that approach will generate many changes in then "existing" code. We thought that putting the callback structure in place now would make the job of adding VF support easier. Is it OK? Thanks Mike
On Wed, Mar 10, 2021 at 01:33:24AM +0000, Chen, Mike Ximing wrote: > > > -----Original Message----- > > From: Greg KH <gregkh@linuxfoundation.org> > > > > On Wed, Feb 10, 2021 at 11:54:06AM -0600, Mike Ximing Chen wrote: > > > + > > > +#include "dlb_bitmap.h" > > > + > > > +#define BITS_SET(x, val, mask) (x = ((x) & ~(mask)) \ > > > + | (((val) << (mask##_LOC)) & (mask))) > > > +#define BITS_GET(x, mask) (((x) & (mask)) >> (mask##_LOC)) > > > > Why not use the built-in kernel functions for this? Why are you > > creating your own? > > > FIELD_GET(_mask, _val) and FIELD_PREP(_mask, _val) in include/linux/bitfield.h > are similar to our BITS_GET() and BITS_SET(). However in our case, mask##_LOC > is a known constant defined in dlb_regs.h, so we don't need to use > _buildin_ffs(mask) to calculate the location of mask as FIELD_GET() and FIELD_PREP() > do. We can still use FIELD_GET and FIELD_PREP, but our macros are a little more > efficient. Would it be OK to keep them? No, please use the kernel-wide proper functions, there's no need for single tiny driver to be "special" in this regard. If somehow the in-kernel functions are not sufficient, it's always better to fix them up than to go your own way here. thanks, greg k-h
On Wed, Mar 10, 2021 at 02:45:10AM +0000, Chen, Mike Ximing wrote: > > > > -----Original Message----- > > From: Greg KH <gregkh@linuxfoundation.org> > > On Wed, Feb 10, 2021 at 11:54:17AM -0600, Mike Ximing Chen wrote: > > > > > > diff --git a/drivers/misc/dlb/dlb_ioctl.c b/drivers/misc/dlb/dlb_ioctl.c > > > index 6a311b969643..9b05344f03c8 100644 > > > --- a/drivers/misc/dlb/dlb_ioctl.c > > > +++ b/drivers/misc/dlb/dlb_ioctl.c > > > @@ -51,6 +51,7 @@ > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_ldb_queue) > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_dir_queue) > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_ldb_queue_depth) > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_dir_queue_depth) > > > +DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(start_domain) > > > > > > --- a/drivers/misc/dlb/dlb_pf_ops.c > > > +++ b/drivers/misc/dlb/dlb_pf_ops.c > > > @@ -160,6 +160,14 @@ dlb_pf_create_dir_port(struct dlb_hw *hw, u32 id, > > > resp, false, 0); > > > } > > > > > > +static int > > > +dlb_pf_start_domain(struct dlb_hw *hw, u32 id, > > > + struct dlb_start_domain_args *args, > > > + struct dlb_cmd_response *resp) > > > +{ > > > + return dlb_hw_start_domain(hw, id, args, resp, false, 0); > > > +} > > > + > > > static int dlb_pf_get_num_resources(struct dlb_hw *hw, > > > struct dlb_get_num_resources_args *args) > > > { > > > @@ -232,6 +240,7 @@ struct dlb_device_ops dlb_pf_ops = { > > > .create_dir_queue = dlb_pf_create_dir_queue, > > > .create_ldb_port = dlb_pf_create_ldb_port, > > > .create_dir_port = dlb_pf_create_dir_port, > > > + .start_domain = dlb_pf_start_domain, > > > > Why do you have a "callback" when you only ever call one function? Why > > is that needed at all? > > > In our next submission, we are going to add virtual function (VF) support. The > callbacks for VFs are different from those for PF which is what we support in this > submission. We can defer the introduction of the callback structure to when we > add the VF support. But since we have many callback functions, that approach > will generate many changes in then "existing" code. We thought that putting > the callback structure in place now would make the job of adding VF support easier. > Is it OK? No, do not add additional complexity when it is not needed. It causes much more review work and I and no one else have any idea that "something might be coming in the future", so please do not make our lives harder. Make it simple, and work, now. You can always add additional changes later, if it is ever needed. thanks, greg k-h
On Wed, Feb 10, 2021 at 11:54:03AM -0600, Mike Ximing Chen wrote: > Intel DLB is an accelerator for the event-driven programming model of > DPDK's Event Device Library[2]. The library is used in packet processing > pipelines that arrange for multi-core scalability, dynamic load-balancing, > and variety of packet distribution and synchronization schemes The more that I look at this driver, the more I think this is a "run around" the networking stack. Why are you all adding kernel code to support DPDK which is an out-of-kernel networking stack? We can't support that at all. Why not just use the normal networking functionality instead of this custom char-device-node-monstrosity? What is missing from todays kernel networking code that requires this run-around? thanks, greg k-h
On Wed, Mar 10, 2021 at 12:14 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Mar 10, 2021 at 02:45:10AM +0000, Chen, Mike Ximing wrote: > > > > > > > -----Original Message----- > > > From: Greg KH <gregkh@linuxfoundation.org> > > > On Wed, Feb 10, 2021 at 11:54:17AM -0600, Mike Ximing Chen wrote: > > > > > > > > diff --git a/drivers/misc/dlb/dlb_ioctl.c b/drivers/misc/dlb/dlb_ioctl.c > > > > index 6a311b969643..9b05344f03c8 100644 > > > > --- a/drivers/misc/dlb/dlb_ioctl.c > > > > +++ b/drivers/misc/dlb/dlb_ioctl.c > > > > @@ -51,6 +51,7 @@ > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_ldb_queue) > > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(create_dir_queue) > > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_ldb_queue_depth) > > > > DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(get_dir_queue_depth) > > > > +DLB_DOMAIN_IOCTL_CALLBACK_TEMPLATE(start_domain) > > > > > > > > --- a/drivers/misc/dlb/dlb_pf_ops.c > > > > +++ b/drivers/misc/dlb/dlb_pf_ops.c > > > > @@ -160,6 +160,14 @@ dlb_pf_create_dir_port(struct dlb_hw *hw, u32 id, > > > > resp, false, 0); > > > > } > > > > > > > > +static int > > > > +dlb_pf_start_domain(struct dlb_hw *hw, u32 id, > > > > + struct dlb_start_domain_args *args, > > > > + struct dlb_cmd_response *resp) > > > > +{ > > > > + return dlb_hw_start_domain(hw, id, args, resp, false, 0); > > > > +} > > > > + > > > > static int dlb_pf_get_num_resources(struct dlb_hw *hw, > > > > struct dlb_get_num_resources_args *args) > > > > { > > > > @@ -232,6 +240,7 @@ struct dlb_device_ops dlb_pf_ops = { > > > > .create_dir_queue = dlb_pf_create_dir_queue, > > > > .create_ldb_port = dlb_pf_create_ldb_port, > > > > .create_dir_port = dlb_pf_create_dir_port, > > > > + .start_domain = dlb_pf_start_domain, > > > > > > Why do you have a "callback" when you only ever call one function? Why > > > is that needed at all? > > > > > In our next submission, we are going to add virtual function (VF) support. The > > callbacks for VFs are different from those for PF which is what we support in this > > submission. We can defer the introduction of the callback structure to when we > > add the VF support. But since we have many callback functions, that approach > > will generate many changes in then "existing" code. We thought that putting > > the callback structure in place now would make the job of adding VF support easier. > > Is it OK? > > No, do not add additional complexity when it is not needed. It causes > much more review work and I and no one else have any idea that > "something might be coming in the future", so please do not make our > lives harder. > > Make it simple, and work, now. You can always add additional changes > later, if it is ever needed. > Good points Greg, the internal reviews missed this, let me take another once over before v11.
> -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > Sent: Wednesday, March 10, 2021 3:13 AM > To: Chen, Mike Ximing <mike.ximing.chen@intel.com> > Cc: netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org; > arnd@arndb.de; Williams, Dan J <dan.j.williams@intel.com>; pierre- > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com> > Subject: Re: [PATCH v10 03/20] dlb: add resource and device initialization > > On Wed, Mar 10, 2021 at 01:33:24AM +0000, Chen, Mike Ximing wrote: > > > > > -----Original Message----- > > > From: Greg KH <gregkh@linuxfoundation.org> > > > > > > On Wed, Feb 10, 2021 at 11:54:06AM -0600, Mike Ximing Chen wrote: > > > > + > > > > +#include "dlb_bitmap.h" > > > > + > > > > +#define BITS_SET(x, val, mask) (x = ((x) & ~(mask)) \ > > > > + | (((val) << (mask##_LOC)) & (mask))) > > > > +#define BITS_GET(x, mask) (((x) & (mask)) >> (mask##_LOC)) > > > > > > Why not use the built-in kernel functions for this? Why are you > > > creating your own? > > > > > FIELD_GET(_mask, _val) and FIELD_PREP(_mask, _val) in include/linux/bitfield.h > > are similar to our BITS_GET() and BITS_SET(). However in our case, mask##_LOC > > is a known constant defined in dlb_regs.h, so we don't need to use > > _buildin_ffs(mask) to calculate the location of mask as FIELD_GET() and > FIELD_PREP() > > do. We can still use FIELD_GET and FIELD_PREP, but our macros are a little more > > efficient. Would it be OK to keep them? > > No, please use the kernel-wide proper functions, there's no need for > single tiny driver to be "special" in this regard. If somehow the > in-kernel functions are not sufficient, it's always better to fix them > up than to go your own way here. > OK. I will use FIELD_GET() and FIELD_PREP() macros in the next revision. Thanks Mike
> -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > > On Wed, Mar 10, 2021 at 02:45:10AM +0000, Chen, Mike Ximing wrote: > > > > > > > -----Original Message----- > > > From: Greg KH <gregkh@linuxfoundation.org> > > > On Wed, Feb 10, 2021 at 11:54:17AM -0600, Mike Ximing Chen wrote: > > > > > > > > { > > > > @@ -232,6 +240,7 @@ struct dlb_device_ops dlb_pf_ops = { > > > > .create_dir_queue = dlb_pf_create_dir_queue, > > > > .create_ldb_port = dlb_pf_create_ldb_port, > > > > .create_dir_port = dlb_pf_create_dir_port, > > > > + .start_domain = dlb_pf_start_domain, > > > > > > Why do you have a "callback" when you only ever call one function? Why > > > is that needed at all? > > > > > In our next submission, we are going to add virtual function (VF) support. The > > callbacks for VFs are different from those for PF which is what we support in this > > submission. We can defer the introduction of the callback structure to when we > > add the VF support. But since we have many callback functions, that approach > > will generate many changes in then "existing" code. We thought that putting > > the callback structure in place now would make the job of adding VF support easier. > > Is it OK? > > No, do not add additional complexity when it is not needed. It causes > much more review work and I and no one else have any idea that > "something might be coming in the future", so please do not make our > lives harder. > > Make it simple, and work, now. You can always add additional changes > later, if it is ever needed. > Sure. We will remove the callback structure from this patch set. Thanks for reviewing Mike
On Wed, Mar 10, 2021 at 1:02 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Feb 10, 2021 at 11:54:03AM -0600, Mike Ximing Chen wrote: > > Intel DLB is an accelerator for the event-driven programming model of > > DPDK's Event Device Library[2]. The library is used in packet processing > > pipelines that arrange for multi-core scalability, dynamic load-balancing, > > and variety of packet distribution and synchronization schemes > > The more that I look at this driver, the more I think this is a "run > around" the networking stack. Why are you all adding kernel code to > support DPDK which is an out-of-kernel networking stack? We can't > support that at all. > > Why not just use the normal networking functionality instead of this > custom char-device-node-monstrosity? Hey Greg, I've come to find out that this driver does not bypass kernel networking, and the kernel functionality I thought it bypassed, IPC / Scheduling, is not even in the picture in the non-accelerated case. So given you and I are both confused by this submission that tells me that the problem space needs to be clarified and assumptions need to be enumerated. > What is missing from todays kernel networking code that requires this > run-around? Yes, first and foremost Mike, what are the kernel infrastructure gaps and pain points that led up to this proposal?
> -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Friday, March 12, 2021 2:18 AM > To: Greg KH <gregkh@linuxfoundation.org> > Cc: Chen, Mike Ximing <mike.ximing.chen@intel.com>; Netdev <netdev@vger.kernel.org>; David Miller > <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Arnd Bergmann <arnd@arndb.de>; Pierre- > Louis Bossart <pierre-louis.bossart@linux.intel.com> > Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver > > On Wed, Mar 10, 2021 at 1:02 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Wed, Feb 10, 2021 at 11:54:03AM -0600, Mike Ximing Chen wrote: > > > Intel DLB is an accelerator for the event-driven programming model of > > > DPDK's Event Device Library[2]. The library is used in packet processing > > > pipelines that arrange for multi-core scalability, dynamic load-balancing, > > > and variety of packet distribution and synchronization schemes > > > > The more that I look at this driver, the more I think this is a "run > > around" the networking stack. Why are you all adding kernel code to > > support DPDK which is an out-of-kernel networking stack? We can't > > support that at all. > > > > Why not just use the normal networking functionality instead of this > > custom char-device-node-monstrosity? > > Hey Greg, > > I've come to find out that this driver does not bypass kernel > networking, and the kernel functionality I thought it bypassed, IPC / > Scheduling, is not even in the picture in the non-accelerated case. So > given you and I are both confused by this submission that tells me > that the problem space needs to be clarified and assumptions need to > be enumerated. > > > What is missing from todays kernel networking code that requires this > > run-around? > > Yes, first and foremost Mike, what are the kernel infrastructure gaps > and pain points that led up to this proposal? Hi Greg/Dan, Sorry for the confusion. The cover letter and document did not articulate clearly the problem being solved by DLB. We will update the document in the next revision. In a brief description, Intel DLB is an accelerator that replaces shared memory queuing systems. Large modern server-class CPUs, with local caches for each core, tend to incur costly cache misses, cross core snoops and contentions. The impact becomes noticeable at high (messages/sec) rates, such as are seen in high throughput packet processing and HPC applications. DLB is used in high rate pipelines that require a variety of packet distribution & synchronization schemes. It can be leveraged to accelerate user space libraries, such as DPDK eventdev. It could show similar benefits in frameworks such as PADATA in the Kernel - if the messaging rate is sufficiently high. As can be seen in the following diagram, DLB operations come into the picture only after packets are received by Rx core from the networking devices. WCs are the worker cores which process packets distributed by DLB. (In case the diagram gets mis-formatted, please see attached file). WC1 WC4 +-----+ +----+ +---+ / \ +---+ / \ +---+ +----+ +-----+ |NIC | |Rx | |DLB| / \ |DLB| / \ |DLB| |Tx | |NIC | |Ports|---|Core|---| |-----WC2----| |-----WC5----| |---|Core|---|Ports| +-----+ -----+ +---+ \ / +---+ \ / +---+ +----+ ------+ \ / \ / WC3 WC6 At its heart DLB consists of resources than can be assigned to VDEVs/applications in a flexible manner, such as ports, queues, credits to use queues, sequence numbers, etc. We support up to 16/32 VF/VDEVs (depending on version) with SRIOV and SIOV. Role of the kernel driver includes VDEV Composition (vdcm module), functional level reset, live migration, error handling, power management, and etc.. Thanks Mike WC1 WC4 +-----+ +----+ +---+ / \ +---+ / \ +---+ +----+ +-----+ |NIC | |Rx | |DLB| / \ |DLB| / \ |DLB| |Tx | |NIC | |Ports|---|Core|---| |-----WC2----| |-----WC5----| |---|Core|---|Ports| +-----+ -----+ +---+ \ / +---+ \ / +---+ +----+ ------+ \ / \ / WC3 WC6
On Fri, Mar 12, 2021 at 1:55 PM Chen, Mike Ximing <mike.ximing.chen@intel.com> wrote: > > > > > -----Original Message----- > > From: Dan Williams <dan.j.williams@intel.com> > > Sent: Friday, March 12, 2021 2:18 AM > > To: Greg KH <gregkh@linuxfoundation.org> > > Cc: Chen, Mike Ximing <mike.ximing.chen@intel.com>; Netdev <netdev@vger.kernel.org>; David Miller > > <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Arnd Bergmann <arnd@arndb.de>; Pierre- > > Louis Bossart <pierre-louis.bossart@linux.intel.com> > > Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver > > > > On Wed, Mar 10, 2021 at 1:02 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > On Wed, Feb 10, 2021 at 11:54:03AM -0600, Mike Ximing Chen wrote: > > > > Intel DLB is an accelerator for the event-driven programming model of > > > > DPDK's Event Device Library[2]. The library is used in packet processing > > > > pipelines that arrange for multi-core scalability, dynamic load-balancing, > > > > and variety of packet distribution and synchronization schemes > > > > > > The more that I look at this driver, the more I think this is a "run > > > around" the networking stack. Why are you all adding kernel code to > > > support DPDK which is an out-of-kernel networking stack? We can't > > > support that at all. > > > > > > Why not just use the normal networking functionality instead of this > > > custom char-device-node-monstrosity? > > > > Hey Greg, > > > > I've come to find out that this driver does not bypass kernel > > networking, and the kernel functionality I thought it bypassed, IPC / > > Scheduling, is not even in the picture in the non-accelerated case. So > > given you and I are both confused by this submission that tells me > > that the problem space needs to be clarified and assumptions need to > > be enumerated. > > > > > What is missing from todays kernel networking code that requires this > > > run-around? > > > > Yes, first and foremost Mike, what are the kernel infrastructure gaps > > and pain points that led up to this proposal? > > Hi Greg/Dan, > > Sorry for the confusion. The cover letter and document did not articulate > clearly the problem being solved by DLB. We will update the document in > the next revision. I'm not sure this answers Greg question about what is missing from today's kernel implementation? > In a brief description, Intel DLB is an accelerator that replaces shared memory > queuing systems. Large modern server-class CPUs, with local caches > for each core, tend to incur costly cache misses, cross core snoops > and contentions. The impact becomes noticeable at high (messages/sec) > rates, such as are seen in high throughput packet processing and HPC > applications. DLB is used in high rate pipelines that require a variety of packet > distribution & synchronization schemes. It can be leveraged to accelerate > user space libraries, such as DPDK eventdev. It could show similar benefits in > frameworks such as PADATA in the Kernel - if the messaging rate is sufficiently > high. Where is PADATA limited by distribution and synchronization overhead? It's meant for parallelizable work that has minimal communication between the work units, ordering is about it's only synchronization overhead, not messaging. It's used for ipsec crypto and page init. Even potential future bulk work usages that might benefit from PADATA like like md-raid, ksm, or kcopyd do not have any messaging overhead. > As can be seen in the following diagram, DLB operations come into the > picture only after packets are received by Rx core from the networking > devices. WCs are the worker cores which process packets distributed by DLB. > (In case the diagram gets mis-formatted, please see attached file). > > > WC1 WC4 > +-----+ +----+ +---+ / \ +---+ / \ +---+ +----+ +-----+ > |NIC | |Rx | |DLB| / \ |DLB| / \ |DLB| |Tx | |NIC | > |Ports|---|Core|---| |-----WC2----| |-----WC5----| |---|Core|---|Ports| > +-----+ -----+ +---+ \ / +---+ \ / +---+ +----+ ------+ > \ / \ / > WC3 WC6 > > At its heart DLB consists of resources than can be assigned to > VDEVs/applications in a flexible manner, such as ports, queues, credits to use > queues, sequence numbers, etc. All of those objects are managed in userspace today in the unaccelerated case? > We support up to 16/32 VF/VDEVs (depending > on version) with SRIOV and SIOV. Role of the kernel driver includes VDEV > Composition (vdcm module), functional level reset, live migration, error > handling, power management, and etc.. Need some more specificity here. What about those features requires the kernel to get involved with a DLB2 specific ABI to manage ports, queues, credits, sequence numbers, etc...?
> From: Dan Williams <dan.j.williams@intel.com> > On Fri, Mar 12, 2021 at 1:55 PM Chen, Mike Ximing <mike.ximing.chen@intel.com> wrote: > > > > In a brief description, Intel DLB is an accelerator that replaces > > shared memory queuing systems. Large modern server-class CPUs, with > > local caches for each core, tend to incur costly cache misses, cross > > core snoops and contentions. The impact becomes noticeable at high > > (messages/sec) rates, such as are seen in high throughput packet > > processing and HPC applications. DLB is used in high rate pipelines > > that require a variety of packet distribution & synchronization > > schemes. It can be leveraged to accelerate user space libraries, such > > as DPDK eventdev. It could show similar benefits in frameworks such as > > PADATA in the Kernel - if the messaging rate is sufficiently high. > > Where is PADATA limited by distribution and synchronization overhead? > It's meant for parallelizable work that has minimal communication between the work units, ordering is > about it's only synchronization overhead, not messaging. It's used for ipsec crypto and page init. > Even potential future bulk work usages that might benefit from PADATA like like md-raid, ksm, or kcopyd > do not have any messaging overhead. > In the our PADATA investigation, the improvements are primarily from ordering overhead. Parallel scheduling is offloaded by DLB orderd parallel queue. Serialization (re-order) is offloaded by DLB directed queue. We see significant throughput increases in crypto tests using tcrypt. In our test configuration, preliminary results show that the dlb accelerated case encrypts at 2.4x (packets/s), and decrypts at 2.6x of the unaccelerated case.
> From: Dan Williams <dan.j.williams@intel.com> > On Fri, Mar 12, 2021 at 1:55 PM Chen, Mike Ximing <mike.ximing.chen@intel.com> wrote: > > > > At its heart DLB consists of resources than can be assigned to > > VDEVs/applications in a flexible manner, such as ports, queues, > > credits to use queues, sequence numbers, etc. > > All of those objects are managed in userspace today in the unaccelerated case? > Yes, in the unaccelerated case, the software queue manager is generally implemented in the user space (except for cases like padata), so the resources are managed in the user space as well. With a hardware DLB module, these resources will be managed by the kernel driver for VF and VDEV supports. Thanks Mike
> From: Dan Williams <dan.j.williams@intel.com> > On Fri, Mar 12, 2021 at 1:55 PM Chen, Mike Ximing <mike.ximing.chen@intel.com> wrote: > > > > We support up to 16/32 VF/VDEVs (depending on version) with SRIOV and > > SIOV. Role of the kernel driver includes VDEV Composition (vdcm > > module), functional level reset, live migration, error handling, power > > management, and etc.. > > Need some more specificity here. What about those features requires the kernel to get involved with a > DLB2 specific ABI to manage ports, queues, credits, sequence numbers, etc...? Role of the dlb kernel driver: VDEV Composition For example writing 1024 to the VDEV_CREDITS[0] register will allocate 1024 credits to VDEV 0. In this way, VFs or VDEVs can be composed as mini-versions of the full device. VDEV composition will leverage vfio-mdev to create the VDEV devices while the KMD will implement the VDCM. Dynamic Composition Such composition can be dynamic – the PF/VF interface supports scenarios whereby, for example, an application may wish to boost its credit allocation – can I have 100 more credits? Functional Level Reset Much of the internal storage is RAM based and not resettable by hardware schemes. There are also internal SRAM based control structures (BCAM) that have to be flushed. The planned way to do this is, roughly: -- Kernel driver disables access from the associated ports (to prevent any SW access, the application should be deadso this is a precaution). -- Kernel masquerades as the application to drain all data from internal queues. It can poll some internal counters to verify everything is fully drained. -- Only at this point can the resources associated with the VDEV be returned to the pool of available resources for handing to another application/VDEV. Migration Requirement is fairly similar to FLR. A VDEV has to be manually drained and reconstituted on another server, Kernel driver is responsible on both sides. Error Handling Errors include “Credit Excursions” where a VDEV attempts to use more of the internal capacity (credits) than has been allocated. In such a case, the data is dropped and an interrupt generated. All such interrupts are directed to the PF driver, which may simply forward them to a VF (via the PF/VF comms mechanism). Power Management The kernel driver keeps the device in D3Hot when not in use. The driver transitions the device to D0 when the first device file is opened or a VF or VDEV is created, and keeps it in that state until there are no open device files, memory mappings, or VFs/VDEVs. Ioctl interface Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits, sequence numbers, and links between ports and queues. Applications also use the interface to start, stop and inquire the dlb operations. Thanks Mike
On Mon, Mar 15, 2021 at 08:18:10PM +0000, Chen, Mike Ximing wrote: > > From: Dan Williams <dan.j.williams@intel.com> > > On Fri, Mar 12, 2021 at 1:55 PM Chen, Mike Ximing <mike.ximing.chen@intel.com> wrote: > > > > > > We support up to 16/32 VF/VDEVs (depending on version) with SRIOV and > > > SIOV. Role of the kernel driver includes VDEV Composition (vdcm > > > module), functional level reset, live migration, error handling, power > > > management, and etc.. > > > > Need some more specificity here. What about those features requires the kernel to get involved with a > > DLB2 specific ABI to manage ports, queues, credits, sequence numbers, etc...? > > Role of the dlb kernel driver: > > VDEV Composition > For example writing 1024 to the VDEV_CREDITS[0] register will allocate 1024 credits to VDEV 0. In this way, VFs or VDEVs can be composed as mini-versions of the full device. > VDEV composition will leverage vfio-mdev to create the VDEV devices while the KMD will implement the VDCM. What is a vdev? What is KMD? What is VDCM? What is VF? And how does this all work? > Dynamic Composition > Such composition can be dynamic – the PF/VF interface supports scenarios whereby, for example, an application may wish to boost its credit allocation – can I have 100 more credits? What applications? What "credits" For what resources? > Functional Level Reset > Much of the internal storage is RAM based and not resettable by hardware schemes. There are also internal SRAM based control structures (BCAM) that have to be flushed. > The planned way to do this is, roughly: > -- Kernel driver disables access from the associated ports (to prevent any SW access, the application should be deadso this is a precaution). What is a "port" here? > -- Kernel masquerades as the application to drain all data from internal queues. It can poll some internal counters to verify everything is fully drained. What queues? Why would the kernel mess with userspace data? > -- Only at this point can the resources associated with the VDEV be returned to the pool of available resources for handing to another application/VDEV. What is a VDEV and how does an application be "associated with it"? > Migration > Requirement is fairly similar to FLR. A VDEV has to be manually drained and reconstituted on another server, Kernel driver is responsible on both sides. What is FLR? > Error Handling > Errors include “Credit Excursions” where a VDEV attempts to use more of the internal capacity (credits) than has been allocated. In such a case, > the data is dropped and an interrupt generated. All such interrupts are directed to the PF driver, which may simply forward them to a VF (via the PF/VF comms mechanism). What data is going where? > Power Management > The kernel driver keeps the device in D3Hot when not in use. The driver transitions the device to D0 when the first device file is opened or a VF or VDEV is created, > and keeps it in that state until there are no open device files, memory mappings, or VFs/VDEVs. That's just normal power management for any device, why is this anything special? > Ioctl interface > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits, > sequence numbers, and links between ports and queues. Applications also use the interface to start, stop and inquire the dlb operations. What applications use any of this? What userspace implementation today interacts with this? Where is that code located? Too many TLAs here, I have even less of an understanding of what this driver is supposed to be doing, and what this hardware is now than before. And here I thought I understood hardware devices, and if I am confused, I pity anyone else looking at this code... You all need to get some real documentation together to explain everything here in terms that anyone can understand. Without that, this code is going nowhere. good luck! greg k-h
[ add kvm@vger.kernel.org for VFIO discussion ] On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@linuxfoundation.org> wrote: [..] > > Ioctl interface > > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits, > > sequence numbers, and links between ports and queues. Applications also use the interface to start, stop and inquire the dlb operations. > > What applications use any of this? What userspace implementation today > interacts with this? Where is that code located? > > Too many TLAs here, I have even less of an understanding of what this > driver is supposed to be doing, and what this hardware is now than > before. > > And here I thought I understood hardware devices, and if I am confused, > I pity anyone else looking at this code... > > You all need to get some real documentation together to explain > everything here in terms that anyone can understand. Without that, this > code is going nowhere. Hi Greg, So, for the last few weeks Mike and company have patiently waded through my questions and now I think we are at a point to work through the upstream driver architecture options and tradeoffs. You were not alone in struggling to understand what this device does because it is unlike any other accelerator Linux has ever considered. It shards / load balances a data stream for processing by CPU threads. This is typically a network appliance function / protocol, but could also be any other generic thread pool like the kernel's padata. It saves the CPU cycles spent load balancing work items and marshaling them through a thread pool pipeline. For example, in DPDK applications, DLB2 frees up entire cores that would otherwise be consumed with scheduling and work distribution. A separate proof-of-concept, using DLB2 to accelerate the kernel's "padata" thread pool for a crypto workload, demonstrated ~150% higher throughput with hardware employed to manage work distribution and result ordering. Yes, you need a sufficiently high touch / high throughput protocol before the software load balancing overhead coordinating CPU threads starts to dominate the performance, but there are some specific workloads willing to switch to this regime. The primary consumer to date has been as a backend for the event handling in the userspace networking stack, DPDK. DLB2 has an existing polled-mode-userspace driver for that use case. So I said, "great, just add more features to that userspace driver and you're done". In fact there was DLB1 hardware that also had a polled-mode-userspace driver. So, the next question is "what's changed in DLB2 where a userspace driver is no longer suitable?". The new use case for DLB2 is new hardware support for a host driver to carve up device resources into smaller sets (vfio-mdevs) that can be assigned to guests (Intel calls this new hardware capability SIOV: Scalable IO Virtualization). Hardware resource management is difficult to handle in userspace especially when bare-metal hardware events need to coordinate with guest-VM device instances. This includes a mailbox interface for the guest VM to negotiate resources with the host driver. Another more practical roadblock for a "DLB2 in userspace" proposal is the fact that it implements what are in-effect software-defined-interrupts to go beyond the scalability limits of PCI MSI-x (Intel calls this Interrupt Message Store: IMS). So even if hardware resource management was awkwardly plumbed into a userspace daemon there would still need to be kernel enabling for device-specific extensions to drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS interrupts of DLB2 in addition to PCI MSI-x. While that still might be solvable in userspace if you squint at it, I don't think Linux end users are served by pushing all of hardware resource management to userspace. VFIO is mostly built to pass entire PCI devices to guests, or in coordination with a kernel driver to describe a subset of the hardware to a virtual-device (vfio-mdev) interface. The rub here is that to date kernel drivers using VFIO to provision mdevs have some existing responsibilities to the core kernel like a network driver or DMA offload driver. The DLB2 driver offers no such service to the kernel for its primary role of accelerating a userspace data-plane. I am assuming here that the padata proof-of-concept is interesting, but not a compelling reason to ship a driver compared to giving end users competent kernel-driven hardware-resource assignment for deploying DLB2 virtual instances into guest VMs. My "just continue in userspace" suggestion has no answer for the IMS interrupt and reliable hardware resource management support requirements. If you're with me so far we can go deeper into the details, but in answer to your previous questions most of the TLAs were from the land of "SIOV" where the VFIO community should be brought in to review. The driver is mostly a configuration plane where the fast path data-plane is entirely in userspace. That configuration plane needs to manage hardware events and resourcing on behalf of guest VMs running on a partitioned subset of the device. There are worthwhile questions about whether some of the uapi can be refactored to common modules like uacce, but I think we need to get to a first order understanding on what DLB2 is and why the kernel has a role before diving into the uapi discussion. Any clearer? So, in summary drivers/misc/ appears to be the first stop in the review since a host driver needs to be established to start the VFIO enabling campaign. With my community hat on, I think requiring standalone host drivers is healthier for Linux than broaching the subject of VFIO-only drivers. Even if, as in this case, the initial host driver is mostly implementing a capability that could be achieved with a userspace driver.
On Wed, May 12, 2021 at 12:07:31PM -0700, Dan Williams wrote: > [ add kvm@vger.kernel.org for VFIO discussion ] > > > On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@linuxfoundation.org> wrote: > [..] > > > Ioctl interface > > > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits, > > > sequence numbers, and links between ports and queues. Applications also use the interface to start, stop and inquire the dlb operations. > > > > What applications use any of this? What userspace implementation today > > interacts with this? Where is that code located? > > > > Too many TLAs here, I have even less of an understanding of what this > > driver is supposed to be doing, and what this hardware is now than > > before. > > > > And here I thought I understood hardware devices, and if I am confused, > > I pity anyone else looking at this code... > > > > You all need to get some real documentation together to explain > > everything here in terms that anyone can understand. Without that, this > > code is going nowhere. > > Hi Greg, > > So, for the last few weeks Mike and company have patiently waded > through my questions and now I think we are at a point to work through > the upstream driver architecture options and tradeoffs. You were not > alone in struggling to understand what this device does because it is > unlike any other accelerator Linux has ever considered. It shards / > load balances a data stream for processing by CPU threads. This is > typically a network appliance function / protocol, but could also be > any other generic thread pool like the kernel's padata. It saves the > CPU cycles spent load balancing work items and marshaling them through > a thread pool pipeline. For example, in DPDK applications, DLB2 frees > up entire cores that would otherwise be consumed with scheduling and > work distribution. A separate proof-of-concept, using DLB2 to > accelerate the kernel's "padata" thread pool for a crypto workload, > demonstrated ~150% higher throughput with hardware employed to manage > work distribution and result ordering. Yes, you need a sufficiently > high touch / high throughput protocol before the software load > balancing overhead coordinating CPU threads starts to dominate the > performance, but there are some specific workloads willing to switch > to this regime. > > The primary consumer to date has been as a backend for the event > handling in the userspace networking stack, DPDK. DLB2 has an existing > polled-mode-userspace driver for that use case. So I said, "great, > just add more features to that userspace driver and you're done". In > fact there was DLB1 hardware that also had a polled-mode-userspace > driver. So, the next question is "what's changed in DLB2 where a > userspace driver is no longer suitable?". The new use case for DLB2 is > new hardware support for a host driver to carve up device resources > into smaller sets (vfio-mdevs) that can be assigned to guests (Intel > calls this new hardware capability SIOV: Scalable IO Virtualization). > > Hardware resource management is difficult to handle in userspace > especially when bare-metal hardware events need to coordinate with > guest-VM device instances. This includes a mailbox interface for the > guest VM to negotiate resources with the host driver. Another more > practical roadblock for a "DLB2 in userspace" proposal is the fact > that it implements what are in-effect software-defined-interrupts to > go beyond the scalability limits of PCI MSI-x (Intel calls this > Interrupt Message Store: IMS). So even if hardware resource management > was awkwardly plumbed into a userspace daemon there would still need > to be kernel enabling for device-specific extensions to > drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS > interrupts of DLB2 in addition to PCI MSI-x. > > While that still might be solvable in userspace if you squint at it, I > don't think Linux end users are served by pushing all of hardware > resource management to userspace. VFIO is mostly built to pass entire > PCI devices to guests, or in coordination with a kernel driver to > describe a subset of the hardware to a virtual-device (vfio-mdev) > interface. The rub here is that to date kernel drivers using VFIO to > provision mdevs have some existing responsibilities to the core kernel > like a network driver or DMA offload driver. The DLB2 driver offers no > such service to the kernel for its primary role of accelerating a > userspace data-plane. I am assuming here that the padata > proof-of-concept is interesting, but not a compelling reason to ship a > driver compared to giving end users competent kernel-driven > hardware-resource assignment for deploying DLB2 virtual instances into > guest VMs. > > My "just continue in userspace" suggestion has no answer for the IMS > interrupt and reliable hardware resource management support > requirements. If you're with me so far we can go deeper into the > details, but in answer to your previous questions most of the TLAs > were from the land of "SIOV" where the VFIO community should be > brought in to review. The driver is mostly a configuration plane where > the fast path data-plane is entirely in userspace. That configuration > plane needs to manage hardware events and resourcing on behalf of > guest VMs running on a partitioned subset of the device. There are > worthwhile questions about whether some of the uapi can be refactored > to common modules like uacce, but I think we need to get to a first > order understanding on what DLB2 is and why the kernel has a role > before diving into the uapi discussion. > > Any clearer? A bit, yes, thanks. > So, in summary drivers/misc/ appears to be the first stop in the > review since a host driver needs to be established to start the VFIO > enabling campaign. With my community hat on, I think requiring > standalone host drivers is healthier for Linux than broaching the > subject of VFIO-only drivers. Even if, as in this case, the initial > host driver is mostly implementing a capability that could be achieved > with a userspace driver. Ok, then how about a much "smaller" kernel driver for all of this, and a whole lot of documentation to describe what is going on and what all of the TLAs are. thanks, greg k-h
> -----Original Message----- > From: Greg KH <gregkh@linuxfoundation.org> > Sent: Friday, May 14, 2021 10:33 AM > To: Williams, Dan J <dan.j.williams@intel.com> > > Hi Greg, > > > > So, for the last few weeks Mike and company have patiently waded > > through my questions and now I think we are at a point to work through > > the upstream driver architecture options and tradeoffs. You were not > > alone in struggling to understand what this device does because it is > > unlike any other accelerator Linux has ever considered. It shards / > > load balances a data stream for processing by CPU threads. This is > > typically a network appliance function / protocol, but could also be > > any other generic thread pool like the kernel's padata. It saves the > > CPU cycles spent load balancing work items and marshaling them through > > a thread pool pipeline. For example, in DPDK applications, DLB2 frees > > up entire cores that would otherwise be consumed with scheduling and > > work distribution. A separate proof-of-concept, using DLB2 to > > accelerate the kernel's "padata" thread pool for a crypto workload, > > demonstrated ~150% higher throughput with hardware employed to manage > > work distribution and result ordering. Yes, you need a sufficiently > > high touch / high throughput protocol before the software load > > balancing overhead coordinating CPU threads starts to dominate the > > performance, but there are some specific workloads willing to switch > > to this regime. > > > > The primary consumer to date has been as a backend for the event > > handling in the userspace networking stack, DPDK. DLB2 has an existing > > polled-mode-userspace driver for that use case. So I said, "great, > > just add more features to that userspace driver and you're done". In > > fact there was DLB1 hardware that also had a polled-mode-userspace > > driver. So, the next question is "what's changed in DLB2 where a > > userspace driver is no longer suitable?". The new use case for DLB2 is > > new hardware support for a host driver to carve up device resources > > into smaller sets (vfio-mdevs) that can be assigned to guests (Intel > > calls this new hardware capability SIOV: Scalable IO Virtualization). > > > > Hardware resource management is difficult to handle in userspace > > especially when bare-metal hardware events need to coordinate with > > guest-VM device instances. This includes a mailbox interface for the > > guest VM to negotiate resources with the host driver. Another more > > practical roadblock for a "DLB2 in userspace" proposal is the fact > > that it implements what are in-effect software-defined-interrupts to > > go beyond the scalability limits of PCI MSI-x (Intel calls this > > Interrupt Message Store: IMS). So even if hardware resource management > > was awkwardly plumbed into a userspace daemon there would still need > > to be kernel enabling for device-specific extensions to > > drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS > > interrupts of DLB2 in addition to PCI MSI-x. > > > > While that still might be solvable in userspace if you squint at it, I > > don't think Linux end users are served by pushing all of hardware > > resource management to userspace. VFIO is mostly built to pass entire > > PCI devices to guests, or in coordination with a kernel driver to > > describe a subset of the hardware to a virtual-device (vfio-mdev) > > interface. The rub here is that to date kernel drivers using VFIO to > > provision mdevs have some existing responsibilities to the core kernel > > like a network driver or DMA offload driver. The DLB2 driver offers no > > such service to the kernel for its primary role of accelerating a > > userspace data-plane. I am assuming here that the padata > > proof-of-concept is interesting, but not a compelling reason to ship a > > driver compared to giving end users competent kernel-driven > > hardware-resource assignment for deploying DLB2 virtual instances into > > guest VMs. > > > > My "just continue in userspace" suggestion has no answer for the IMS > > interrupt and reliable hardware resource management support > > requirements. If you're with me so far we can go deeper into the > > details, but in answer to your previous questions most of the TLAs > > were from the land of "SIOV" where the VFIO community should be > > brought in to review. The driver is mostly a configuration plane where > > the fast path data-plane is entirely in userspace. That configuration > > plane needs to manage hardware events and resourcing on behalf of > > guest VMs running on a partitioned subset of the device. There are > > worthwhile questions about whether some of the uapi can be refactored > > to common modules like uacce, but I think we need to get to a first > > order understanding on what DLB2 is and why the kernel has a role > > before diving into the uapi discussion. > > > > Any clearer? > > A bit, yes, thanks. > > > So, in summary drivers/misc/ appears to be the first stop in the > > review since a host driver needs to be established to start the VFIO > > enabling campaign. With my community hat on, I think requiring > > standalone host drivers is healthier for Linux than broaching the > > subject of VFIO-only drivers. Even if, as in this case, the initial > > host driver is mostly implementing a capability that could be achieved > > with a userspace driver. > > Ok, then how about a much "smaller" kernel driver for all of this, and a whole lot of documentation to > describe what is going on and what all of the TLAs are. > > thanks, > > greg k-h Hi Greg, tl;dr: We have been looking into various options to reduce the kernel driver size and ABI surface, such as moving more responsibility to user space, reusing existing kernel modules (uacce, for example), and converting functionality from ioctl to sysfs. End result 10 ioctls will be replaced by sysfs, the rest of them (20 ioctls) will be replaced by configfs. Some concepts are moved to device-special files rather than ioctls that produce file descriptors. Details: We investigated the possibility of using uacce (https://www.kernel.org/doc/html/latest/misc-devices/uacce.html) in our kernel driver. The uacce interface fits well with accelerators that process user data with known source and destination addresses. For a DLB (Dynamic Load Balancer), however, the destination port depends on the system load and is unknown to the application. While uacce exposes "queues" to user, the dlb driver has to handle much complicated resource managements, such as credits, ports, queues and domains. We would have to add a lot of more concepts and code, which are not useful for other accelerators, in uacce to make it working for DLB. This may also lead to a bigger code size over all. We also took a another look at moving resource management functionality from kernel space to user space. Much of kernel driver supports both PF (Physical Function) on host and VFs (Virtual Functions) on VMs. Since only the PF on the host has permissions to setup resource and configure the DLB HW, all the requests on VFs are forwarded to PF via the VF-PF mail boxes, which are handled by the kernel driver. The driver also maintains various virtual id to physical id translations (for VFs, ports, queues, etc), and provides virtual-to-physical id mapping info DLB HW so that an application in VM can access the resources with virtual IDs only. Because of the VF/VDEV support, we have to keep the resource management, which is more than one half of the code size, in the driver. To simplify the user interface, we explored the ways to reduce/eliminate ioctl interface, and found that we can utilize configfs for many of the DLB functionalities. Our current plan is to replace all the ioctls in the driver with sysfs and configfs. We will use configfs for most of setup and configuration for both physical function and virtual functions. This may not reduce the overall driver size greatly, but it will lessen much of ABI maintenance burden (with the elimination of ioctls). I hope this is something that is in line with what you like to see for the driver. Thanks Mike