Message ID | 20201014161719.30289-1-ceggers@arri.de |
---|---|
State | New |
Headers | show |
Series | [net] net: dsa: ksz: fix padding size of skb | expand |
On Wed, Oct 14, 2020 at 06:17:19PM +0200, Christian Eggers wrote: > __skb_put_padto() is called in order to ensure a minimal size of the > sk_buff. The required minimal size is ETH_ZLEN + the size required for > the tail tag. > > The current argument misses the size for the tail tag. The expression > "skb->len + padlen" can be simplified to ETH_ZLEN. > > Too small sk_buffs typically result from cloning in > dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size > requirements. > > Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing") > Signed-off-by: Christian Eggers <ceggers@arri.de> > --- Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
On Wed, Oct 14, 2020 at 07:47:50PM +0300, Vladimir Oltean wrote: > On Wed, Oct 14, 2020 at 06:17:19PM +0200, Christian Eggers wrote: > > __skb_put_padto() is called in order to ensure a minimal size of the > > sk_buff. The required minimal size is ETH_ZLEN + the size required for > > the tail tag. > > > > The current argument misses the size for the tail tag. The expression > > "skb->len + padlen" can be simplified to ETH_ZLEN. > > > > Too small sk_buffs typically result from cloning in > > dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size > > requirements. > > > > Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing") > > Signed-off-by: Christian Eggers <ceggers@arri.de> > > --- > > Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Actually no, I take that back. This statement: > The expression "skb->len + padlen" can be simplified to ETH_ZLEN. is false. skb->len + padlen == ETH_ZLEN only if skb->len is less than ETH_ZLEN. Otherwise, skb->len + padlen == skb->len. Otherwise said, the frame must be padded to max(skb->len, ETH_ZLEN) + tail tag length. So please keep the "skb->len + padlen + len". Thanks, -Vladimir
Hi Vladimir, On Wednesday, 14 October 2020, 18:54:10 CEST, Vladimir Oltean wrote: > On Wed, Oct 14, 2020 at 07:47:50PM +0300, Vladimir Oltean wrote: > > On Wed, Oct 14, 2020 at 06:17:19PM +0200, Christian Eggers wrote: > > > __skb_put_padto() is called in order to ensure a minimal size of the > > > sk_buff. The required minimal size is ETH_ZLEN + the size required for > > > the tail tag. > > > > > > The current argument misses the size for the tail tag. The expression > > > "skb->len + padlen" can be simplified to ETH_ZLEN. > > > > > > Too small sk_buffs typically result from cloning in > > > dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size > > > requirements. > > > > > > Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing") > > > Signed-off-by: Christian Eggers <ceggers@arri.de> > > > --- > > > > Reviewed-by: Vladimir Oltean <olteanv@gmail.com> > > Actually no, I take that back. > > This statement: > > The expression "skb->len + padlen" can be simplified to ETH_ZLEN. > > is false. > skb->len + padlen == ETH_ZLEN only if skb->len is less than ETH_ZLEN. ok, my comment is false. > Otherwise, skb->len + padlen == skb->len. > > Otherwise said, the frame must be padded to > max(skb->len, ETH_ZLEN) + tail tag length. At first I thought the same when working on this. But IMHO the padding must only ensure the minimum required size, there is no need to pad to the "real" size of the skb. The check for the tailroom above ensures that enough memory for the "real" size is available. > So please keep the "skb->len + padlen + len". > > Thanks, > -Vladimir Best regards Christian
On Wed, Oct 14, 2020 at 07:02:13PM +0200, Christian Eggers wrote: > > Otherwise said, the frame must be padded to > > max(skb->len, ETH_ZLEN) + tail tag length. > At first I thought the same when working on this. But IMHO the padding must > only ensure the minimum required size, there is no need to pad to the "real" > size of the skb. The check for the tailroom above ensures that enough memory > for the "real" size is available. Yes, that's right, that's the current logic, but what's the point of your patch, then, if the call to __skb_put_padto is only supposed to ensure ETH_ZLEN length? In fact, __skb_put_padto fundamentally does: - an extension of skb->len to the requested argument, via __skb_put - a zero-filling of the extra area So if you include the length of the tag in the call to __skb_put_padto, then what's the other skb_put() from ksz8795_xmit, ksz9477_xmit, ksz9893_xmit going to do? Aren't you increasing the frame length twice by the length of one tag when you are doing this? What problem are you actually trying to solve? Can you show a skb_dump(KERN_ERR, skb, true) before and after your change?
Hi Vladimir, sorry for the delay, getting answers to all you questions seems to be challenging for me. Unfortunately it's about 1 year ago when I was originally working on this particular problem and obviously I didn't understand the full problem... On Wednesday, 14 October 2020, 19:31:03 CEST, Vladimir Oltean wrote: > On Wed, Oct 14, 2020 at 07:02:13PM +0200, Christian Eggers wrote: > > > Otherwise said, the frame must be padded to > > > max(skb->len, ETH_ZLEN) + tail tag length. > > > > At first I thought the same when working on this. But IMHO the padding > > must > > only ensure the minimum required size, there is no need to pad to the > > "real" size of the skb. The check for the tailroom above ensures that > > enough memory for the "real" size is available. > > Yes, that's right, that's the current logic, but what's the point of > your patch, then, if the call to __skb_put_padto is only supposed to > ensure ETH_ZLEN length? After checking __skb_put_padto, I realized that I didn't knew what it actually does. The problem in my case is, that the condition skb_tailroom(skb) >= padlen + len may not be met anymore after __skb_put_padto(..., skb->len + padlen, ...) returns. If skb has previously been cloned, __skb_put_padto() will allocate a copy which may have a size of equal / only slightly more than skb->len + padlen, which misses the full space for the tail tag. Further calls to skb_put() may not find enough tailroom for placing the tail tag. > In fact, __skb_put_padto fundamentally does: > - an extension of skb->len to the requested argument, via __skb_put > - a zero-filling of the extra area > So if you include the length of the tag in the call to __skb_put_padto, > then what's the other skb_put() from ksz8795_xmit, ksz9477_xmit, > ksz9893_xmit going to do? Aren't you increasing the frame length twice > by the length of one tag when you are doing this? I initially thought that __skb_put_padto() would perform some sort of allocation which can later be used by skb_put(). You are right that my change would increase the frame twice. The only reason why this worked for me was because the newly allocated skb had enough tailroom due to alignment. > What problem are you actually trying to solve? After (hopefully) understanding the important bits, I would like to solve the problem that after calling __skb_put_padto() there may be no tailroom for the tail tag. The conditions where this can happen are quite special. You need a skb->len < ETH_ZLEN and the skb must be marked as cloned. One condition where this happens in practice is when the skb has been selected for TX time stamping in dsa_skb_tx_timestamp() [cloned] and L2 is used as transport for PTP [size < ETH_ZLEN]. But maybe cloned sk_buffs can also happen for other reasons. I now suggest the following: - if (skb_tailroom(skb) >= padlen + len) { + if (skb_tailroom(skb) >= padlen + len && !skb_cloned(skb)) { This would avoid allocation of a new skb in __skb_put_padto() which may be finally too small. > Can you show a skb_dump(KERN_ERR, skb, true) before and after your change? Best regards Christian
On Thu Oct 15 2020, Christian Eggers wrote: > On Wednesday, 14 October 2020, 19:31:03 CEST, Vladimir Oltean wrote: >> What problem are you actually trying to solve? > After (hopefully) understanding the important bits, I would like to solve the > problem that after calling __skb_put_padto() there may be no tailroom for the > tail tag. > > The conditions where this can happen are quite special. You need a skb->len < > ETH_ZLEN and the skb must be marked as cloned. One condition where this > happens in practice is when the skb has been selected for TX time stamping in > dsa_skb_tx_timestamp() [cloned] and L2 is used as transport for PTP [size < > ETH_ZLEN]. But maybe cloned sk_buffs can also happen for other reasons. Hmm. I've never observed any problems using DSA with L2 PTP time stamping with this tail tag code. What's the impact exactly? Memory corruption? Thanks, Kurt
On Friday, 16 October 2020, 09:45:42 CEST, Kurt Kanzenbach wrote: > On Thu Oct 15 2020, Christian Eggers wrote: > > On Wednesday, 14 October 2020, 19:31:03 CEST, Vladimir Oltean wrote: > >> What problem are you actually trying to solve? > > > > After (hopefully) understanding the important bits, I would like to solve > > the problem that after calling __skb_put_padto() there may be no tailroom > > for the tail tag. > > > > The conditions where this can happen are quite special. You need a > > skb->len < ETH_ZLEN and the skb must be marked as cloned. One condition > > where this happens in practice is when the skb has been selected for TX > > time stamping in dsa_skb_tx_timestamp() [cloned] and L2 is used as > > transport for PTP [size < ETH_ZLEN]. But maybe cloned sk_buffs can also > > happen for other reasons. > Hmm. I've never observed any problems using DSA with L2 PTP time > stamping with this tail tag code. What's the impact exactly? Memory > corruption? It looks like skb_put_padto() is only used by the tag_ksz driver. So it's unlikely that other drivers are affected by the same problem. If I remember correctly, I got a skb_panic in skb_put() when adding the tail tag. But with the current kernel I didn't manage to create packets where the skb allocated by __skb_put_padto has not enough spare room for the tag tag. Either I am trying with wrong packets, or something else has been changed in between. I just sent a new patch which should solve the problem correctly here: https://patchwork.ozlabs.org/project/netdev/list/?series=208269 Best regards Christian
On Fri, Oct 16, 2020 at 11:00:20AM +0200, Christian Eggers wrote: > On Friday, 16 October 2020, 09:45:42 CEST, Kurt Kanzenbach wrote: > > On Thu Oct 15 2020, Christian Eggers wrote: > > > On Wednesday, 14 October 2020, 19:31:03 CEST, Vladimir Oltean wrote: > > >> What problem are you actually trying to solve? > > > > > > After (hopefully) understanding the important bits, I would like to solve > > > the problem that after calling __skb_put_padto() there may be no tailroom > > > for the tail tag. > > > > > > The conditions where this can happen are quite special. You need a > > > skb->len < ETH_ZLEN and the skb must be marked as cloned. One condition > > > where this happens in practice is when the skb has been selected for TX > > > time stamping in dsa_skb_tx_timestamp() [cloned] and L2 is used as > > > transport for PTP [size < ETH_ZLEN]. But maybe cloned sk_buffs can also > > > happen for other reasons. > > Hmm. I've never observed any problems using DSA with L2 PTP time > > stamping with this tail tag code. What's the impact exactly? Memory > > corruption? > It looks like skb_put_padto() is only used by the tag_ksz driver. So it's > unlikely that other drivers are affected by the same problem. > > If I remember correctly, I got a skb_panic in skb_put() when adding the tail > tag. But with the current kernel I didn't manage to create packets where the > skb allocated by __skb_put_padto has not enough spare room for the tag tag. > Either I am trying with wrong packets, or something else has been changed in > between. > > I just sent a new patch which should solve the problem correctly here: > https://patchwork.ozlabs.org/project/netdev/list/?series=208269 Kurt is asking, and rightfully so, because his tag_hellcreek.c driver (for a 1588 switch with tail tags) is copied from tag_ksz.c. I have also attempted to replicate your issue at my end and failed to do so. In principle, it is indeed true that a cloned skb should not be modified without calling skb_unshare() first. The DSA core (dsa_slave_xmit) should do that. But that doesn't explain the symptoms you're seeing, which is why I asked for skb_dump.
On Friday, 16 October 2020, 11:05:27 CEST, Vladimir Oltean wrote: > On Fri, Oct 16, 2020 at 11:00:20AM +0200, Christian Eggers wrote: > > On Friday, 16 October 2020, 09:45:42 CEST, Kurt Kanzenbach wrote: > > > Hmm. I've never observed any problems using DSA with L2 PTP time > > > stamping with this tail tag code. What's the impact exactly? Memory > > > corruption? > > > > Kurt is asking, and rightfully so, because his tag_hellcreek.c driver > (for a 1588 switch with tail tags) is copied from tag_ksz.c. > I have also attempted to replicate your issue at my end and failed to do > so. After spending some additional hours, I managed to reproduce the original problem on my current environment (BUG output is below). Some important properties of my system: Machine: - ARMv7 (i.MX6ULL), SMP_CACHE_BYTES is 64 - DSA device: Microchip KSZ9563 (I am currently working on time stamping support) In order to trigger the problem, we need a skb with a length between 56 and 59. It must be a PTP message with is marked for time stamping in dsa_switch_ops::port_txtstamp, so it will be cloned in dsa_skb_tx_timestamp(). In my setup, only PTP Delay_Req fulfills both requirements (if transferred via L2, not UDPv4/v6). Last, CONFIG_SLOB must be selected. In ksz_common_xmit() we have the following conditions: - skb->len is 58 - skb->cloned is 1 - skb_tailroom() is 68 --> condition "skb_tailroom(skb) >= padlen + len" is easily fulfilled, 2 bytes padding is required __skb_put_padto() calls then __skb_pad() which in turn calls pskb_expand_head(.., nhead=0, ntail=-66, ...) - osize is 128 - size = 128 + 0 + (-66) = 62 (two bytes are reserved before the mac header) - size = SKB_DATA_ALIGN(size) --> 64 on ARMv7 - data = kmalloc_reserve(64 + 256) - ksize(data) returns 64 + 256 (only with CONFIG_SLOB) - size = SKB_WITH_OVERHEAD(...) is 64 - skb_tailroom() is 4 (before padding 2 bytes) --> the copied skb has enough size for padding, but not for the tail tag (1+4 bytes if PTP is enabled). - back in __skb_pad(), 2 bytes will be initialized with memset() - back in __skb_put_padto(), __skb_put increments the tail and len by 2 ---> skb_tailroom() is 2 now - back in ksz9893_xmit(), skb_put(..., 4) (in my code) calls skb_over_panic() because the tail has grown beyond then end. QED. > In principle, it is indeed true that a cloned skb should not be > modified without calling skb_unshare() first. The DSA core > (dsa_slave_xmit) should do that. But that doesn't explain the symptoms > you're seeing, which is why I asked for skb_dump. As dsa_slave_xmit() is (in my case) the cause for the cloned skb (in dsa_skb_tx_timestamp()), solving the problem there makes sense. The drawback compared with my solution (checking for skb->cloned myself) may be, that the skb may have to be copied another time in ksz_common_xmit() because there is still not enough tailroom for the tag. Probably this could be solved with using dsa_device_ops::overhead and "manually" unsharing in dsa_slave_xmit(). @Vladimir: Please let me know, which solution you prefer. 1. Considering skb->cloned in ksz_common_xmit() 2. Using skb_unshare() before calling dsa_device_ops::xmit() 3. "Manually" unsharing in dsa_slave_xmit(), reserving enough tailroom for the tail tag (and ETH_ZLEN?). Would moving the "else" clause from ksz_common_xmit() to dsa_slave_xmit() do the job correctly? Best regards Christian [ 81.416971] skbuff: skb_over_panic: text:bf9fd3e3 len:64 put:4 head:c739dd40 data:c739dd42 tail:0xc739dd82 end:0xc739dd80 dev:lan0 [ 81.423081] Internal error: Oops - undefined instruction: 0 [#1] THUMB2 [ 81.426428] Modules linked in: bridge stp llc sd_mod ebt_ip ebtable_broute ebtables x_tables usb_storage scsi_mod ksz9477_i2c ksz9477 st_magn_spi tag_ksz st_sensors_spi regmap_spi ksz_common dsa_core st_magn_i2c st_sensors_i2c st_magn phylink st_sensors at24 as73211 industrialio_triggered_buffer rtc_rv3028 kfifo_buf i2c_dev regmap_i2c usb49xx i2c_imx i2c_core imx_thermal anatop_regulator imx2_wdt imx_fan spidev leds_gpio leds_pwm led_class iio_trig_sysfs imx6sx_adc industrialio micrel fec ptp pps_core at 25 spi_imx spi_bitbang imx_napi dev nfsv3 nfs lockd grace sunrpc usb_f_ecm u_ether libcomposite configfs ci_hdrc_imx ci_hdrc ulpi ehci_hcd usbcore nls_base tcpm roles typec udc_core usb_common usbmisc_imx phy_mxs_usb fixed imx_sdma virt_dma [ 81.460419] CPU: 0 PID: 286 Comm: ptp4l Not tainted 5.9.0-rc8+ #31 [ 81.463561] Hardware name: Freescale i.MX6 Ultralite (Device Tree) [ 81.466725] PC is at skb_panic+0x30/0x3c [ 81.468708] LR is at skb_panic+0x31/0x3c [ 81.470703] pc : [<c028f054>] lr : [<c028f055>] psr: 00000033 [ 81.474018] sp : c56ddd08 ip : 00000000 fp : c074b288 [ 81.476683] r10: c074b29c r9 : bfa09caf r8 : c4889e40 [ 81.479353] r7 : c739dd80 r6 : c739dd82 r5 : c04eaeae r4 : 00000004 [ 81.482694] r3 : 00000000 r2 : c071bb1c r1 : c070df78 r0 : 00000076 [ 81.486043] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user [ 81.489800] Control: 50c53c7d Table: 85c08059 DAC: 00000055 [ 81.492747] Process ptp4l (pid: 286, stack limit = 0xf8a7d922) [ 81.495945] Stack: (0xc56ddd08 to 0xc56de000) [ 81.498180] dd00: 00000004 c739dd40 c739dd42 c739dd82 c739dd80 c3c44040 [ 81.502357] dd20: c739dd82 c028a18b 00000001 c739dd50 00000000 00000042 c725f580 bf9fd3e3 [ 81.506533] dd40: c7430c7c bf9fd0cf 00000044 c725f580 00000002 bf9fd187 bf9fd459 c725f580 [ 81.510709] dd60: c725f580 c432e6c0 00000042 bf9fd46d c3c44040 c725f580 c725f8c0 bf9e6dbd [ 81.514903] dd80: bf9e6d41 c46c39c0 c0749140 00000000 00000000 00000000 c46c39c0 c0292b9f [ 81.519318] dda0: 00000000 c725f580 c0749140 c029585d c0749140 c3c44040 c3c44040 c3c44040 [ 81.523493] ddc0: 0000003a c56dddf8 c725f580 c725f580 c0749140 c3c44040 c46c39c0 00000000 [ 81.527662] dde0: c7430cc0 c74302c0 c56ddedc c0295e2d 000005ea c74302c0 fffffff4 c02fb9ef [ 81.531831] de00: 00000000 0000f788 00000000 00000000 00000000 c74304c0 c725f580 c3c44040 [ 81.535998] de20: 0000003a 0000003a 000005ea c02fe013 c56dde70 00000000 00000000 00000000 [ 81.540413] de40: 00000300 00000010 00000000 00000000 00000122 c5444be8 c5447940 0000001b [ 81.544587] de60: 00000000 c56ddcfc c017ff09 00000100 00000000 00000000 00000000 00000000 [ 81.548758] de80: 00000000 00000000 00000000 00000045 00000001 00000000 c74302c0 00000000 [ 81.553018] dea0: 00000000 00000000 c56dc000 00000122 018e1a7a c0285559 00200000 c0286525 [ 81.557177] dec0: c56ddee4 532380f0 00000013 fffffff7 00000000 018e1a7a 0000003a 00000000 [ 81.561358] dee0: 00000000 00000005 00000000 00000000 c56ddedc 00000000 00000000 00000000 [ 81.565562] df00: 00000000 00000000 00000000 c56ddf38 25c17d03 bec4fb28 00000000 00000161 [ 81.569746] df20: c0100224 bec4fb30 00000008 00000000 c56ddf78 c012f9fd 00000000 00000000 [ 81.574156] df40: 00000000 c01300b9 00000000 00000000 bec4fb28 c019ad8f 00000000 00000000 [ 81.578311] df60: 00000000 018df2f0 00000001 00000000 26a83bd2 000000a8 00000000 00000000 [ 81.582477] df80: 00000000 c56ddf68 00000000 00000000 00000000 00000000 0000003a 00000122 [ 81.586661] dfa0: c0100224 c0100041 00000000 00000000 00000010 018e1a7a 0000003a 00000000 [ 81.590846] dfc0: 00000000 00000000 0000003a 00000122 b6f92080 018e2098 018e1a88 018e1a7a [ 81.595144] dfe0: bec4f4b0 bec4f4a0 b6f43973 b6f4fc5c 60000030 00000010 00000000 00000000 [ 81.599338] [<c028f054>] (skb_panic) from [<c028a18b>] (skb_put+0x2b/0x34) [ 81.602861] [<c028a18b>] (skb_put) from [<bf9fd3e3>] (ksz9477_xmit_timestamp+0x19/0x8e [tag_ksz]) [ 81.607404] [<bf9fd3e3>] (ksz9477_xmit_timestamp [tag_ksz]) from [<bf9fd46d>] (ksz9893_xmit+0x15/0x42 [tag_ksz]) [ 81.612624] [<bf9fd46d>] (ksz9893_xmit [tag_ksz]) from [<bf9e6dbd>] (dsa_slave_xmit+0x7d/0x96 [dsa_core]) [ 81.617662] [<bf9e6dbd>] (dsa_slave_xmit [dsa_core]) from [<c0292b9f>] (netdev_start_xmit+0x13/0x2c) [ 81.622319] [<c0292b9f>] (netdev_start_xmit) from [<c029585d>] (dev_hard_start_xmit+0x6d/0xc8) [ 81.626703] [<c029585d>] (dev_hard_start_xmit) from [<c0295e2d>] (__dev_queue_xmit+0x25d/0x2e0) [ 81.631126] [<c0295e2d>] (__dev_queue_xmit) from [<c02fe013>] (packet_sendmsg+0x883/0x92c) [ 81.635337] [<c02fe013>] (packet_sendmsg) from [<c0285559>] (sock_sendmsg_nosec+0xb/0x16) [ 81.639767] [<c0285559>] (sock_sendmsg_nosec) from [<c0286525>] (__sys_sendto+0x69/0x7c) [ 81.643916] [<c0286525>] (__sys_sendto) from [<c0100041>] (ret_fast_syscall+0x1/0x5a) [ 81.647900] Exception stack(0xc56ddfa8 to 0xc56ddff0) [ 81.650702] dfa0: 00000000 00000000 00000010 018e1a7a 0000003a 00000000 [ 81.654865] dfc0: 00000000 00000000 0000003a 00000122 b6f92080 018e2098 018e1a88 018e1a7a [ 81.659033] dfe0: bec4f4b0 bec4f4a0 b6f43973 b6f4fc5c [ 81.661623] Code: 6d03 4803 f699 ffbe (de02) bf00 [ 81.664071] ---[ end trace 8f263011ec91dee5 ]--- [ 81.666434] Kernel panic - not syncing: Fatal exception in interrupt [ 81.669693] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
On Fri, Oct 16, 2020 at 02:44:46PM +0200, Christian Eggers wrote: > Machine: > - ARMv7 (i.MX6ULL), SMP_CACHE_BYTES is 64 > - DSA device: Microchip KSZ9563 (I am currently working on time stamping support) I have a board very similar to this on which I am going to test. > Last, CONFIG_SLOB must be selected. Interesting, do you know why? > 3. "Manually" unsharing in dsa_slave_xmit(), reserving enough tailroom > for the tail tag (and ETH_ZLEN?). Would moving the "else" clause from > ksz_common_xmit() to dsa_slave_xmit() do the job correctly? I was thinking about something like that, indeed. DSA knows everything about the tagger: its overhead, whether it's a tail tag or not. The xmit callback of the tagger should only be there to populate the tag where it needs to be. But reallocation, padding, etc etc, should all be dealt with by the common DSA xmit procedure. We want the taggers to be simple and reuse as much logic as possible, not to be bloated.
On Fri, 16 Oct 2020 18:56:45 +0300 Vladimir Oltean wrote: > > 3. "Manually" unsharing in dsa_slave_xmit(), reserving enough tailroom > > for the tail tag (and ETH_ZLEN?). Would moving the "else" clause from > > ksz_common_xmit() to dsa_slave_xmit() do the job correctly? > > I was thinking about something like that, indeed. DSA knows everything > about the tagger: its overhead, whether it's a tail tag or not. The xmit > callback of the tagger should only be there to populate the tag where it > needs to be. But reallocation, padding, etc etc, should all be dealt > with by the common DSA xmit procedure. We want the taggers to be simple > and reuse as much logic as possible, not to be bloated. FWIW if you want to avoid the reallocs you may want to set needed_tailroom on the netdev.
On Fri, Oct 16, 2020 at 11:03:11AM -0700, Jakub Kicinski wrote: > On Fri, 16 Oct 2020 18:56:45 +0300 Vladimir Oltean wrote: > > > 3. "Manually" unsharing in dsa_slave_xmit(), reserving enough tailroom > > > for the tail tag (and ETH_ZLEN?). Would moving the "else" clause from > > > ksz_common_xmit() to dsa_slave_xmit() do the job correctly? > > > > I was thinking about something like that, indeed. DSA knows everything > > about the tagger: its overhead, whether it's a tail tag or not. The xmit > > callback of the tagger should only be there to populate the tag where it > > needs to be. But reallocation, padding, etc etc, should all be dealt > > with by the common DSA xmit procedure. We want the taggers to be simple > > and reuse as much logic as possible, not to be bloated. > > FWIW if you want to avoid the reallocs you may want to set > needed_tailroom on the netdev. Tell me more about that, I've been meaning since forever to try it out. I read about needed_headroom and needed_tailroom, and it's one of the reasons why I added the .tail_tag option in the DSA tagger (to distinguish whether a switch needs headroom or tailroom), but I can't figure out, just from static analysis of the code, where exactly is the needed tailroom being accounted for. For example, if I'm in Christian's situation, e.g. I have a packet smaller than ETH_ZLEN, would the tailroom be enough to hold just the dev->needed_tailroom, or would there be enough space in the skb for the entire ETH_ZLEN + dev->needed_tailroom?
On Fri, 16 Oct 2020 21:13:19 +0300 Vladimir Oltean wrote: > On Fri, Oct 16, 2020 at 11:03:11AM -0700, Jakub Kicinski wrote: > > On Fri, 16 Oct 2020 18:56:45 +0300 Vladimir Oltean wrote: > > > > 3. "Manually" unsharing in dsa_slave_xmit(), reserving enough tailroom > > > > for the tail tag (and ETH_ZLEN?). Would moving the "else" clause from > > > > ksz_common_xmit() to dsa_slave_xmit() do the job correctly? > > > > > > I was thinking about something like that, indeed. DSA knows everything > > > about the tagger: its overhead, whether it's a tail tag or not. The xmit > > > callback of the tagger should only be there to populate the tag where it > > > needs to be. But reallocation, padding, etc etc, should all be dealt > > > with by the common DSA xmit procedure. We want the taggers to be simple > > > and reuse as much logic as possible, not to be bloated. > > > > FWIW if you want to avoid the reallocs you may want to set > > needed_tailroom on the netdev. > > Tell me more about that, I've been meaning since forever to try it out. > I read about needed_headroom and needed_tailroom, and it's one of the > reasons why I added the .tail_tag option in the DSA tagger (to > distinguish whether a switch needs headroom or tailroom), but I can't > figure out, just from static analysis of the code, where exactly is the > needed tailroom being accounted for. My understanding that the tail tag use case matches pretty well, needed_tailroom is supposed to be a hit to the higher layers of the stack to leave some extra space at the end. Now, it's probably of limited use in practice since I'd imagine most data comes in fragments. > For example, if I'm in Christian's > situation, e.g. I have a packet smaller than ETH_ZLEN, would the > tailroom be enough to hold just the dev->needed_tailroom, or would there > be enough space in the skb for the entire ETH_ZLEN + dev->needed_tailroom? I don't think we account for padding in general. Also looking at __pskb_pull_tail() we're already seem to be provisioning some extra 128B. So I guess it won't matter and the DSA tags are too small to need the head/tailroom hints anyway..
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 945a9bd5ba35..8ef2085349e7 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -24,7 +24,7 @@ static struct sk_buff *ksz_common_xmit(struct sk_buff *skb, if (skb_tailroom(skb) >= padlen + len) { /* Let dsa_slave_xmit() free skb */ - if (__skb_put_padto(skb, skb->len + padlen, false)) + if (__skb_put_padto(skb, ETH_ZLEN + len, false)) return NULL; nskb = skb; @@ -45,7 +45,7 @@ static struct sk_buff *ksz_common_xmit(struct sk_buff *skb, /* Let skb_put_padto() free nskb, and let dsa_slave_xmit() free * skb */ - if (skb_put_padto(nskb, nskb->len + padlen)) + if (skb_put_padto(nskb, ETH_ZLEN + len)) return NULL; consume_skb(skb);
__skb_put_padto() is called in order to ensure a minimal size of the sk_buff. The required minimal size is ETH_ZLEN + the size required for the tail tag. The current argument misses the size for the tail tag. The expression "skb->len + padlen" can be simplified to ETH_ZLEN. Too small sk_buffs typically result from cloning in dsa_skb_tx_timestamp(). The cloned sk_buff may not meet the minimum size requirements. Fixes: e71cb9e00922 ("net: dsa: ksz: fix skb freeing") Signed-off-by: Christian Eggers <ceggers@arri.de> --- net/dsa/tag_ksz.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)