Message ID | 20201215090358.240365-4-saeed@kernel.org |
---|---|
State | New |
Headers | show |
Series | Add mlx5 subfunction support | expand |
On Tue, 15 Dec 2020 01:03:46 -0800 Saeed Mahameed wrote: > + * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes > + * > + * @devlink_port: devlink port > + * @controller: associated controller number for the devlink port instance > + * @pf: associated PF for the devlink port instance > + * @sf: associated SF of a PF for the devlink port instance > + * @external: indicates if the port is for an external controller > + */ > +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, > + u16 pf, u32 sf, bool external) > +{ > + struct devlink_port_attrs *attrs = &devlink_port->attrs; > + int ret; > + > + if (WARN_ON(devlink_port->registered)) > + return; > + ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_SF); > + if (ret) > + return; > + attrs->pci_sf.controller = controller; > + attrs->pci_sf.pf = pf; > + attrs->pci_sf.sf = sf; > + attrs->pci_sf.external = external; > +} > +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); So subfunctions don't have a VF id but they may have a controller? Can you tell us more about the use cases and deployment models you're intending to support? Let's not add attributes and info which will go unused. How are SFs supposed to be used with SmartNICs? Are you assuming single domain of control? It seems that the way the industry is moving the major use case for SmartNICs is bare metal. I always assumed nested eswitches when thinking about SmartNICs, what are you intending to do? What are your plans for enabling this feature in user space project?
> From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, December 16, 2020 4:58 AM > > On Tue, 15 Dec 2020 01:03:46 -0800 Saeed Mahameed wrote: > > + * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes > > + * > > + * @devlink_port: devlink port > > + * @controller: associated controller number for the devlink port > instance > > + * @pf: associated PF for the devlink port instance > > + * @sf: associated SF of a PF for the devlink port instance > > + * @external: indicates if the port is for an external controller > > + */ > > +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 > controller, > > + u16 pf, u32 sf, bool external) { > > + struct devlink_port_attrs *attrs = &devlink_port->attrs; > > + int ret; > > + > > + if (WARN_ON(devlink_port->registered)) > > + return; > > + ret = __devlink_port_attrs_set(devlink_port, > DEVLINK_PORT_FLAVOUR_PCI_SF); > > + if (ret) > > + return; > > + attrs->pci_sf.controller = controller; > > + attrs->pci_sf.pf = pf; > > + attrs->pci_sf.sf = sf; > > + attrs->pci_sf.external = external; > > +} > > +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); > > So subfunctions don't have a VF id but they may have a controller? > Right. SF can be on external controller. > Can you tell us more about the use cases and deployment models you're > intending to support? Let's not add attributes and info which will go unused. > External will be used the same way how it is used for PF and VF. > How are SFs supposed to be used with SmartNICs? Are you assuming single > domain of control? No. it is not assumed. SF can be deployed from smartnic to external host. A user has to pass appropriate controller number, pf number attributes during creation time. > It seems that the way the industry is moving the major > use case for SmartNICs is bare metal. > > I always assumed nested eswitches when thinking about SmartNICs, what > are you intending to do? > Mlx5 doesn't support nested eswitch. SF can be deployed on the external controller PCI function. But this interface neither limited nor enforcing nested or flat eswitch. > What are your plans for enabling this feature in user space project? Do you mean K8s plugin or iproute2? Can you please tell us what user space project? If iproute2, will send the iproute2 patchset like other patchset pointing to kernel uapi headers..
On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote: > > From: Jakub Kicinski <kuba@kernel.org> > > So subfunctions don't have a VF id but they may have a controller? > > > Right. SF can be on external controller. > > > Can you tell us more about the use cases and deployment models you're > > intending to support? Let's not add attributes and info which will go unused. > > > External will be used the same way how it is used for PF and VF. > > > How are SFs supposed to be used with SmartNICs? Are you assuming single > > domain of control? > No. it is not assumed. SF can be deployed from smartnic to external host. > A user has to pass appropriate controller number, pf number attributes during creation time. My problem with this series is that I've gotten some real life application exposure over the last year, and still I have no idea who is going to find this feature useful and why. That's the point of my questions in the previous email - what are the use cases, how are they going to operate. It's hard to review an API without knowing the use of it. iproute2 is low level plumbing. Here the patch is adding the ability to apparently create a SF on a remote controller. If you haven't thought that use case through just don't allow it until you know how it will work. > > It seems that the way the industry is moving the major > > use case for SmartNICs is bare metal. > > > > I always assumed nested eswitches when thinking about SmartNICs, what > > are you intending to do? > > > Mlx5 doesn't support nested eswitch. SF can be deployed on the external controller PCI function. > But this interface neither limited nor enforcing nested or flat eswitch. > > > What are your plans for enabling this feature in user space project? > Do you mean K8s plugin or iproute2? Can you please tell us what user space project? That's my question. For SR-IOV it'd be all the virt stacks out there. But this can't do virt. So what can it do? > If iproute2, will send the iproute2 patchset like other patchset pointing to kernel uapi headers..
On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote: > On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote: > > > From: Jakub Kicinski <kuba@kernel.org> > > > So subfunctions don't have a VF id but they may have a > > > controller? > > > > > Right. SF can be on external controller. > > > > > Can you tell us more about the use cases and deployment models > > > you're > > > intending to support? Let's not add attributes and info which > > > will go unused. > > > > > External will be used the same way how it is used for PF and VF. > > > > > How are SFs supposed to be used with SmartNICs? Are you assuming > > > single > > > domain of control? > > No. it is not assumed. SF can be deployed from smartnic to external > > host. > > A user has to pass appropriate controller number, pf number > > attributes during creation time. > > My problem with this series is that I've gotten some real life > application exposure over the last year, and still I have no idea > who is going to find this feature useful and why. > > That's the point of my questions in the previous email - what > are the use cases, how are they going to operate. > The main focus of this feature is scale-ability we want to run thousands of Containers/VMs, this is useful for both smartnic and baremetal hypervisor worlds, where security and control is exclusive to the eswitch manager may it be the smarnic embedded CPU or the x86 Hypervisor. deployment models is identical to SRIOV, the only difference is the instantiation model of SF, which is the main discussion point of this series (i hope), which to my taste is very modest and minimal. after SF is instantiated from that point nothing is new, the SF is exposing standard linux interfaces netdev/rdma identical to what VF does, most likely you will assign them a namespace and pass them through to a container or assign them (not direct assignment) to a VM via the virt stack, or create a vdpa instance and pass it to a virtio interface. There are endless usecases for the netdev stack, for customers who want high scale virtualized/containerized environments, with thousands of network functions that can deliver high speed and full offload accelerators, Native XDP, Crypto, encap/decap, and HW filtering and processing pipeline capabilities. I have a long list of customers with various and different applications and i am not even talking about the rdma and vdpa customers ! those customers just can't wait to leave sriov behind and scale up ! this feature has a lot of value to the netdev users only because of the minimal foot print to the netdev stack (to be honest there is no change in netdev, only a thin API layer in devlink) and the immediate and effortless benefits to deploy multiple (accelerated) netdevs at scale. > It's hard to review an API without knowing the use of it. iproute2 > is low level plumbing. > I don't know how to put this, let me try: A) SRIOV model echo 128 > /sys/class/net/eth0/device/sriov_numvfs ubind vf ip set vf attribute x configure representor .. deploy vf/netdev/rdma interface into the container B) SF model you do (every thing under the devlink umbrella/switchdev): for i in {1..1024} ; do devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i devlink port sf $i set attribute x # from here on, it is identical to a VF configure representor deply sf/netdev/rdma interfaces into a container B is more scale-able and has more visibility and controllability to the user, after you create the SFs deployment and usecases are identical to SRIOV VF usecases. See the improvement ? :) > Here the patch is adding the ability to apparently create a SF on > a remote controller. If you haven't thought that use case through > just don't allow it until you know how it will work. > We have thought the use case through it is not any different from the local controller use case. the code is uniform, we need to work hard to block a remote controller :) .. > > > It seems that the way the industry is moving the major > > > use case for SmartNICs is bare metal. > > > > > > I always assumed nested eswitches when thinking about SmartNICs, > > > what > > > are you intending to do? > > > > > Mlx5 doesn't support nested eswitch. SF can be deployed on the > > external controller PCI function. > > But this interface neither limited nor enforcing nested or flat > > eswitch. > > > > > What are your plans for enabling this feature in user space > > > project? > > Do you mean K8s plugin or iproute2? Can you please tell us what > > user space project? > > That's my question. For SR-IOV it'd be all the virt stacks out there. > But this can't do virt. So what can it do? > you are thinking VF direct assignment. but don't forget virt handles netdev assignment to a vm perfectly fine and SF has a netdev. And don't get me started on the weird virt handling of SRIOV VF, the whole thing is a big mess :) it shouldn't be a de facto standard that we need to follow..
On Wed, 16 Dec 2020 20:44:21 -0800 Saeed Mahameed wrote: > On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote: > > On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote: > > > > From: Jakub Kicinski <kuba@kernel.org> > > > > So subfunctions don't have a VF id but they may have a > > > > controller? > > > > > > > Right. SF can be on external controller. > > > > > > > Can you tell us more about the use cases and deployment models > > > > you're > > > > intending to support? Let's not add attributes and info which > > > > will go unused. > > > > > > > External will be used the same way how it is used for PF and VF. > > > > > > > How are SFs supposed to be used with SmartNICs? Are you assuming > > > > single > > > > domain of control? > > > No. it is not assumed. SF can be deployed from smartnic to external > > > host. > > > A user has to pass appropriate controller number, pf number > > > attributes during creation time. > > > > My problem with this series is that I've gotten some real life > > application exposure over the last year, and still I have no idea > > who is going to find this feature useful and why. > > > > That's the point of my questions in the previous email - what > > are the use cases, how are they going to operate. > > > > The main focus of this feature is scale-ability we want to run > thousands of Containers/VMs, this is useful for both smartnic and > baremetal hypervisor worlds, where security and control is exclusive to > the eswitch manager may it be the smarnic embedded CPU or the x86 > Hypervisor. > > deployment models is identical to SRIOV, the only difference is the > instantiation model of SF, which is the main discussion point of this > series (i hope), which to my taste is very modest and minimal. > after SF is instantiated from that point nothing is new, the SF is > exposing standard linux interfaces netdev/rdma identical to what VF > does, most likely you will assign them a namespace and pass them > through to a container or assign them (not direct assignment) to a VM > via the virt stack, or create a vdpa instance and pass it to a virtio > interface. > > There are endless usecases for the netdev stack, for customers who want "endless" :) > high scale virtualized/containerized environments, with thousands of > network functions that can deliver high speed and full offload > accelerators, Native XDP, Crypto, encap/decap, and HW filtering and > processing pipeline capabilities. > > I have a long list of customers with various and different applications > and i am not even talking about the rdma and vdpa customers ! those > customers just can't wait to leave sriov behind and scale up ! > > this feature has a lot of value to the netdev users only because of the > minimal foot print to the netdev stack (to be honest there is no change > in netdev, only a thin API layer in devlink) and the immediate and > effortless benefits to deploy multiple (accelerated) netdevs at scale. The acceleration can hopefully be plumbed through the software devices. I think your HW is capable of doing large queue sets so I'm curious how this actually performs. We're probably talking 1000+ queues here - the CPU will have hard time serving so many queues. In my experiments basically the more queues the more cache trashing, the more interrupts, etc. and the lower the performance. > > It's hard to review an API without knowing the use of it. iproute2 > > is low level plumbing. > > I don't know how to put this, let me try: > A) SRIOV model > echo 128 > /sys/class/net/eth0/device/sriov_numvfs > ubind vf > > ip set vf attribute x > configure representor .. > deploy vf/netdev/rdma interface into the container No, no, my point is that for SR-IOV it's OpenStack, libvirt etc. which do this. I understand the manual steps. Often problems pop up when real systems try to string the HW objects together, allocated them, learn their capabilities, etc. > B) SF model > you do (every thing under the devlink umbrella/switchdev): > for i in {1..1024} ; do > devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i > devlink port sf $i set attribute x > > # from here on, it is identical to a VF > configure representor > deply sf/netdev/rdma interfaces into a container > > B is more scale-able and has more visibility and controllability to > the user, after you create the SFs deployment and usecases are > identical to SRIOV VF usecases. > > See the improvement ? :) > > > Here the patch is adding the ability to apparently create a SF on > > a remote controller. If you haven't thought that use case through > > just don't allow it until you know how it will work. > > We have thought the use case through it is not any different from the > local controller use case. the code is uniform, we need to work hard to > block a remote controller :) .. So the SF is always created from the eswitch controller side? How does the host side look? I really think that for ease of merging this we should leave the remote controller out at the beginning - only allow local creation. > > > > It seems that the way the industry is moving the major > > > > use case for SmartNICs is bare metal. > > > > > > > > I always assumed nested eswitches when thinking about SmartNICs, > > > > what > > > > are you intending to do? > > > > > > > Mlx5 doesn't support nested eswitch. SF can be deployed on the > > > external controller PCI function. > > > But this interface neither limited nor enforcing nested or flat > > > eswitch. > > > > > > > What are your plans for enabling this feature in user space > > > > project? > > > Do you mean K8s plugin or iproute2? Can you please tell us what > > > user space project? > > > > That's my question. For SR-IOV it'd be all the virt stacks out there. > > But this can't do virt. So what can it do? > > you are thinking VF direct assignment. but don't forget > virt handles netdev assignment to a vm perfectly fine and SF has a > netdev. > > And don't get me started on the weird virt handling of SRIOV VF, the > whole thing is a big mess :) it shouldn't be a de facto standard that > we need to follow..
> From: Jakub Kicinski <kuba@kernel.org> > Sent: Saturday, December 19, 2020 1:18 AM > So the SF is always created from the eswitch controller side? > How does the host side look? > Host side creates the auxiliary device for the SF. $ ls -l /sys/bus/auxiliary/devices/ mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4 > I really think that for ease of merging this we should leave the remote > controller out at the beginning - only allow local creation. > Alright. I will drop remote controller attribute of SF in this patchset.
diff --git a/include/net/devlink.h b/include/net/devlink.h index f466819cc477..5bd43f0a79a8 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -93,6 +93,20 @@ struct devlink_port_pci_vf_attrs { u8 external:1; }; +/** + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes + * @controller: Associated controller number + * @pf: Associated PCI PF number for this port. + * @sf: Associated PCI SF for of the PCI PF for this port. + * @external: when set, indicates if a port is for an external controller + */ +struct devlink_port_pci_sf_attrs { + u32 controller; + u16 pf; + u32 sf; + u8 external:1; +}; + /** * struct devlink_port_attrs - devlink port object * @flavour: flavour of the port @@ -114,6 +128,7 @@ struct devlink_port_attrs { struct devlink_port_phys_attrs phys; struct devlink_port_pci_pf_attrs pci_pf; struct devlink_port_pci_vf_attrs pci_vf; + struct devlink_port_pci_sf_attrs pci_sf; }; }; @@ -1404,6 +1419,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro u16 pf, bool external); void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, u16 vf, bool external); +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, + u16 pf, u32 sf, bool external); int devlink_sb_register(struct devlink *devlink, unsigned int sb_index, u32 size, u16 ingress_pools_count, u16 egress_pools_count, u16 ingress_tc_count, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 5203f54a2be1..6fe00f10eb3f 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -200,6 +200,10 @@ enum devlink_port_flavour { DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but * is not used in any way. */ + DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port + * for the PCI SF. It is an internal + * port that faces the PCI SF. + */ }; enum devlink_param_cmode { @@ -529,6 +533,7 @@ enum devlink_attr { DEVLINK_ATTR_RELOAD_ACTION_INFO, /* nested */ DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */ + DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */ /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/net/core/devlink.c b/net/core/devlink.c index 13e0de80c4f9..08eac247f200 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -690,6 +690,15 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg, if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external)) return -EMSGSIZE; break; + case DEVLINK_PORT_FLAVOUR_PCI_SF: + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, + attrs->pci_sf.controller) || + nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_sf.pf) || + nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, attrs->pci_sf.sf)) + return -EMSGSIZE; + if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_sf.external)) + return -EMSGSIZE; + break; case DEVLINK_PORT_FLAVOUR_PHYSICAL: case DEVLINK_PORT_FLAVOUR_CPU: case DEVLINK_PORT_FLAVOUR_DSA: @@ -8373,6 +8382,33 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro } EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set); +/** + * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes + * + * @devlink_port: devlink port + * @controller: associated controller number for the devlink port instance + * @pf: associated PF for the devlink port instance + * @sf: associated SF of a PF for the devlink port instance + * @external: indicates if the port is for an external controller + */ +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, + u16 pf, u32 sf, bool external) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + int ret; + + if (WARN_ON(devlink_port->registered)) + return; + ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_SF); + if (ret) + return; + attrs->pci_sf.controller = controller; + attrs->pci_sf.pf = pf; + attrs->pci_sf.sf = sf; + attrs->pci_sf.external = external; +} +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); + static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, char *name, size_t len) { @@ -8421,6 +8457,16 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, n = snprintf(name, len, "pf%uvf%u", attrs->pci_vf.pf, attrs->pci_vf.vf); break; + case DEVLINK_PORT_FLAVOUR_PCI_SF: + if (attrs->pci_sf.external) { + n = snprintf(name, len, "c%u", attrs->pci_sf.controller); + if (n >= len) + return -EINVAL; + len -= n; + name += n; + } + n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs->pci_sf.sf); + break; } if (n >= len)