mbox series

[0/3] net: dsa: move skb reallocation to dsa_slave_xmit

Message ID 20201016200226.23994-1-ceggers@arri.de
Headers show
Series net: dsa: move skb reallocation to dsa_slave_xmit | expand

Message

Christian Eggers Oct. 16, 2020, 8:02 p.m. UTC
This series moves the reallocation of a skb which may be required due to
tail tagging or padding, from the tag_trailer and tag_ksz drivers to
dsa_slave_xmit. Additionally it prevents a skb_panic in a very special
corner case described here:
https://patchwork.ozlabs.org/project/netdev/patch/20201014161719.30289-1-ceggers@arri.de/#2554896

This series has been tested with KSZ9563 and my preliminary PTP patches.

On Friday, 16 October 2020, 17:56:45 CEST, Vladimir Oltean wrote:
> On Fri, Oct 16, 2020 at 02:44:46PM +0200, Christian Eggers wrote:
> > Machine:
> > - ARMv7 (i.MX6ULL), SMP_CACHE_BYTES is 64
> > - DSA device: Microchip KSZ9563 (I am currently working on time stamping
> > support)
> I have a board very similar to this on which I am going to test.
hopefully you are not just developing on PTP support for KSZ9563 ;-)
Which hardware do you exactly own? The problem I described to (link
above) can only be reproduced with my (not yes published) PTP patches.

> > Last, CONFIG_SLOB must be selected.
> 
> Interesting, do you know why?
Yes. The other allocaters will actually allocate 512 byte instead of 320
if 64+256 bytes are requested. This will then be reported by ksize() and
let to more skb tailroom. The SLOB allocator will really allocate only
320 byte in this case, so that the skb will be run out of tail room when
tail tagging...

> > 3. "Manually" unsharing in dsa_slave_xmit(), reserving enough tailroom
> > for the tail tag (and ETH_ZLEN?). Would moving the "else" clause from
> > ksz_common_xmit()  to dsa_slave_xmit() do the job correctly?
> 
> I was thinking about something like that, indeed. DSA knows everything
> about the tagger: its overhead, whether it's a tail tag or not. The xmit
> callback of the tagger should only be there to populate the tag where it
> needs to be. But reallocation, padding, etc etc, should all be dealt
> with by the common DSA xmit procedure. We want the taggers to be simple
> and reuse as much logic as possible, not to be bloated.
This series is the first draft for it. Some additional changes my be
done later:
1. All xmit() function now return either the supplied skb or NULL. No
reallocation will be done anymore. Maybe the type of the return value may
be changed to reflect this (e.g. to bool).
2. There is no path left which calls __skb_put_padto()/skb_pad() with
free_on_error set to false. So the following commit may be reverted in
order to simply the code:

cd0a137acbb6 ("net: core: Specify skb_pad()/skb_put_padto() SKB freeing")

On Friday, 16 October 2020, 11:05:27 CEST, Vladimir Oltean wrote:
> Kurt is asking, and rightfully so, because his tag_hellcreek.c driver
> (for a 1588 switch with tail tags) is copied from tag_ksz.c.
@Kurt: If this series (or a later version) is accepted, please update
your tagging driver. Ensure that your dsa_device_ops::overhead contains
the "maximum" possible tail tag len for xmit and that
dsa_device_ops::tail_tag is set to true.

On Friday, 16 October 2020, 20:03:11 CEST, Jakub Kicinski wrote:
> FWIW if you want to avoid the reallocs you may want to set
> needed_tailroom on the netdev.
I haven't looked for this yet. If this can really solve the tagging AND
padding problem, I would like to do this in a follow up patch.

Wishing a nice weekend for netdev.
Christian

Comments

Vladimir Oltean Oct. 17, 2020, 12:48 a.m. UTC | #1
On Fri, Oct 16, 2020 at 10:02:24PM +0200, Christian Eggers wrote:
> Ensure that the skb is not cloned and has enough tail room for the tail
> tag. This code will be removed from the drivers in the next commits.
> 
> Signed-off-by: Christian Eggers <ceggers@arri.de>
> ---

Does 1588 work for you using this change, or you haven't finished
implementing it yet? If you haven't, I would suggest finishing that
part first.

The post-reallocation skb looks nothing like the one before.

Before:
skb len=68 headroom=2 headlen=68 tailroom=186
mac=(2,14) net=(16,-1) trans=-1
shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x9d6927ec sw=1 l4=0) proto=0x88f7 pkttype=0 iif=0
dev name=swp2 feat=0x0x0002000000005020
sk family=17 type=3 proto=0

After:
skb len=68 headroom=2 headlen=68 tailroom=186
mac=(2,16) net=(18,-17) trans=1
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0

Notice how you've changed shinfo(txflags), among other things.

Which proves that you can't just copy&paste whatever you found in
tag_trailer.c.

I am not yet sure whether there is any helper that can be used instead
of this crazy open-coding. Right now, not having tested anything yet, my
candidates of choice would be pskb_expand_head or __pskb_pull_tail. You
should probably also try to cater here for the potential reallocation
done in the skb_cow_head() of non-tail taggers. Which would lean the
balance towards pskb_expand_head(), I believe.

Also, if the result is going to be longer than ~20 lines of code, I
strongly suggest moving the reallocation to a separate function so you
don't clutter dsa_slave_xmit.

Also, please don't redeclare struct sk_buff *nskb, you don't need to.
Florian Fainelli Oct. 17, 2020, 2:44 a.m. UTC | #2
On 10/16/2020 1:02 PM, Christian Eggers wrote:

[snip]

> On Friday, 16 October 2020, 20:03:11 CEST, Jakub Kicinski wrote:
>> FWIW if you want to avoid the reallocs you may want to set
>> needed_tailroom on the netdev.
> I haven't looked for this yet. If this can really solve the tagging AND
> padding problem, I would like to do this in a follow up patch.

The comment in netdevice.h says:

    *      @needed_headroom: Extra headroom the hardware may need, but 
not in all
    *                        cases can this be guaranteed
    *      @needed_tailroom: Extra tailroom the hardware may need, but 
not in all
    *                        cases can this be guaranteed. Some cases 
also use
    *                        LL_MAX_HEADER instead to allocate the skb

and while I have never seen a reallocation occur while pushing a 
descriptor status block in front of a frame on transmit after setting 
the correct needed_headroom, it was not exercised in a very complicated 
way either, just TCP or UDP over IPv4 or IPv6. This makes me think that 
the comment is cautionary about more complicated transmit scenarios with 
stacked devices, tunneling etc.

> 
> Wishing a nice weekend for netdev.

Likewise!
Christian Eggers Oct. 17, 2020, 6:53 p.m. UTC | #3
Hi Vladimir,

On Saturday, 17 October 2020, 02:48:16 CEST, Vladimir Oltean wrote:
> On Fri, Oct 16, 2020 at 10:02:24PM +0200, Christian Eggers wrote:

> > Ensure that the skb is not cloned and has enough tail room for the tail

> > tag. This code will be removed from the drivers in the next commits.

> > 

> > Signed-off-by: Christian Eggers <ceggers@arri.de>

> > ---

> 

> Does 1588 work for you using this change, or you haven't finished

> implementing it yet? If you haven't, I would suggest finishing that

> part first.

Yes it does. Just after finishing this topic, I would to sent the patches for
PTP. Maybe I'll do it in parallel, anything but the combination of L2/E2E/SLOB
seems to work.

> The post-reallocation skb looks nothing like the one before.

> 

> Before:

> skb len=68 headroom=2 headlen=68 tailroom=186

> mac=(2,14) net=(16,-1) trans=-1

> shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))

> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)

> hash(0x9d6927ec sw=1 l4=0) proto=0x88f7 pkttype=0 iif=0

> dev name=swp2 feat=0x0x0002000000005020

> sk family=17 type=3 proto=0

> 

> After:

> skb len=68 headroom=2 headlen=68 tailroom=186

> mac=(2,16) net=(18,-17) trans=1

> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))

> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)

> hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0

> 

> Notice how you've changed shinfo(txflags), among other things.


I get a similar output when placing the two skb_dump() calls in the current
ksz_common_xmit() code:

[ 5052.662168] old:skb len=58 headroom=2 headlen=58 tailroom=68
[ 5052.662168] mac=(2,14) net=(16,-1) trans=-1
[ 5052.662168] shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
[ 5052.662168] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
[ 5052.662168] hash(0x0 sw=0 l4=0) proto=0x88f7 pkttype=0 iif=0
[ 5052.676360] old:dev name=lan0 feat=0x0x0002000000005220
[ 5052.679001] old:sk family=17 type=3 proto=0
[ 5052.681140] old:skb linear:   00000000: 01 1b 19 00 00 00 52 d9 a9 5d a1 40 88 f7 01 02
[ 5052.685236] old:skb linear:   00000010: 00 2c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 5052.689342] old:skb linear:   00000020: 00 00 52 d9 a9 ff fe 5d a1 40 00 01 00 00 01 7f
[ 5052.693418] old:skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00
[ 5052.696843] new:skb len=65 headroom=2 headlen=65 tailroom=61
[ 5052.696843] mac=(2,16) net=(18,-17) trans=1
[ 5052.696843] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
[ 5052.696843] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
[ 5052.696843] hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
[ 5052.711215] new:skb linear:   00000000: 01 1b 19 00 00 00 52 d9 a9 5d a1 40 88 f7 01 02
[ 5052.715305] new:skb linear:   00000010: 00 2c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 5052.719407] new:skb linear:   00000020: 00 00 52 d9 a9 ff fe 5d a1 40 00 01 00 00 01 7f
[ 5052.723484] new:skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 5052.727587] new:skb linear:   00000040: 00


Note that whilst some skb members differ, the two hexdumps look correct.

> Which proves that you can't just copy&paste whatever you found in

> tag_trailer.c.

I did. tag_trailer and tag_ksz are quite similar here, so I took a combination of them.

> I am not yet sure whether there is any helper that can be used instead

> of this crazy open-coding. Right now, not having tested anything yet, my

> candidates of choice would be pskb_expand_head or __pskb_pull_tail. You

> should probably also try to cater here for the potential reallocation

> done in the skb_cow_head() of non-tail taggers. Which would lean the

> balance towards pskb_expand_head(), I believe.

The "open coding" is from the existing code (which doesn't say that it is
correct). I will investigate why the copied skb is different and whether 
psk_expand_head can do better.

I don't like to touch the non-tail taggers, this is too much out of the scope
of my current work.

> Also, if the result is going to be longer than ~20 lines of code, I

> strongly suggest moving the reallocation to a separate function so you

> don't clutter dsa_slave_xmit.

As Florian requested I'll likely put the code into a separate function in
slave.c and call it from the individual tail-taggers in order not to put 
extra conditionals in dsa_slave_xmit.

regards
Christian
Vladimir Oltean Oct. 17, 2020, 7:12 p.m. UTC | #4
On Sat, Oct 17, 2020 at 08:53:19PM +0200, Christian Eggers wrote:
> > Does 1588 work for you using this change, or you haven't finished

> > implementing it yet? If you haven't, I would suggest finishing that

> > part first.

> Yes it does. Just after finishing this topic, I would to sent the patches for

> PTP. Maybe I'll do it in parallel, anything but the combination of L2/E2E/SLOB

> seems to work.


2 aspects:
- net-next is closed for this week and the next one, due to the merge
  window. You'll have to wait until it reopens.
- Actually I was asking you this because sja1105 PTP no longer works
  after this change, due to the change of txflags.

> I don't like to touch the non-tail taggers, this is too much out of the scope

> of my current work.


Do you want me to try and send a version using pskb_expand_head and you
can test if it works for your tail-tagging switch?

> > Also, if the result is going to be longer than ~20 lines of code, I

> > strongly suggest moving the reallocation to a separate function so you

> > don't clutter dsa_slave_xmit.

> As Florian requested I'll likely put the code into a separate function in

> slave.c and call it from the individual tail-taggers in order not to put

> extra conditionals in dsa_slave_xmit.


I think it would be best to use the unlikely(tail_tag) approach though.
The reallocation function should still be in the common code path. Even
for a non-1588 switch, there are other code paths that clone packets on
TX. For example, the bridge does that, when flooding packets. Currently,
DSA ensures that the header area is writable by calling skb_cow_head, as
far as I can see. But the point is, maybe we can do TX reallocation
centrally.
Christian Eggers Oct. 17, 2020, 8:56 p.m. UTC | #5
On Saturday, 17 October 2020, 21:12:47 CEST, Vladimir Oltean wrote:
> On Sat, Oct 17, 2020 at 08:53:19PM +0200, Christian Eggers wrote:

> > > Does 1588 work for you using this change, or you haven't finished

> > > implementing it yet? If you haven't, I would suggest finishing that

> > > part first.

> > 

> > Yes it does. Just after finishing this topic, I would to sent the patches

> > for PTP. Maybe I'll do it in parallel, anything but the combination of

> > L2/E2E/SLOB seems to work.

> 

> 2 aspects:

> - net-next is closed for this week and the next one, due to the merge

>   window. You'll have to wait until it reopens.

The status page seems to be out of date:
http://vger.kernel.org/~davem/net-next.html

The FAQ says: "Do not send new net-next content to netdev...". So there is no
possibility for code review, is it?

> - Actually I was asking you this because sja1105 PTP no longer works

>   after this change, due to the change of txflags.

The tail taggers seem to be immune against this change.

> > I don't like to touch the non-tail taggers, this is too much out of the

> > scope of my current work.

> 

> Do you want me to try and send a version using pskb_expand_head and you

> can test if it works for your tail-tagging switch?

I already wanted to ask... My 2nd try (checking for !skb_cloned()) was already
sufficient (for me). Hacking linux-net is very interesting, but I have many 
other items open... Testing would be no problem.

> > > Also, if the result is going to be longer than ~20 lines of code, I

> > > strongly suggest moving the reallocation to a separate function so you

> > > don't clutter dsa_slave_xmit.

> > 

> > As Florian requested I'll likely put the code into a separate function in

> > slave.c and call it from the individual tail-taggers in order not to put

> > extra conditionals in dsa_slave_xmit.

> 

> I think it would be best to use the unlikely(tail_tag) approach though.

> The reallocation function should still be in the common code path. Even

> for a non-1588 switch, there are other code paths that clone packets on

> TX. For example, the bridge does that, when flooding packets. 

You already mentioned that you don't want to pass cloned packets to the tag 
drivers xmit() functions. I've no experience with the problems caused by 
cloned packets, but would cloned packets work anyway? Or must cloned packets 
not be changed (e.g. by tail-tagging)? Is there any value in first cloning in 
dsa_skb_tx_timestamp() and then unsharing in dsa_slave_xmit a few lines later? 
The issue I currently have only affects a very minor number of packets (cloned 
AND < ETH_ZLEN AND CONFIG_SLOB), so only these packets would need a copying.

> Currently, DSA ensures that the header area is writable by calling 

> skb_cow_head, as far as I can see. But the point is, maybe we can do TX 

> reallocation centrally.


regards
Christian
Vladimir Oltean Oct. 17, 2020, 9:35 p.m. UTC | #6
On Sat, Oct 17, 2020 at 10:56:24PM +0200, Christian Eggers wrote:
> The status page seems to be out of date:
> http://vger.kernel.org/~davem/net-next.html

Yeah, it can do that sometimes. Extremely rarely, but it happens. But
net-next is still closed, nonetheless.

> The FAQ says: "Do not send new net-next content to netdev...". So there is no
> possibility for code review, is it?

You can always send patches as RFC (Request For Comments). In fact
that's what I'm going to do right now.

> > - Actually I was asking you this because sja1105 PTP no longer works
> >   after this change, due to the change of txflags.
> The tail taggers seem to be immune against this change.

How?

> > Do you want me to try and send a version using pskb_expand_head and you
> > can test if it works for your tail-tagging switch?
> I already wanted to ask... My 2nd try (checking for !skb_cloned()) was already
> sufficient (for me). Hacking linux-net is very interesting, but I have many
> other items open... Testing would be no problem.

Ok, incoming.....

> > I think it would be best to use the unlikely(tail_tag) approach though.
> > The reallocation function should still be in the common code path. Even
> > for a non-1588 switch, there are other code paths that clone packets on
> > TX. For example, the bridge does that, when flooding packets.
> You already mentioned that you don't want to pass cloned packets to the tag
> drivers xmit() functions. I've no experience with the problems caused by
> cloned packets, but would cloned packets work anyway? Or must cloned packets
> not be changed (e.g. by tail-tagging)? Is there any value in first cloning in
> dsa_skb_tx_timestamp() and then unsharing in dsa_slave_xmit a few lines later?
> The issue I currently have only affects a very minor number of packets (cloned
> AND < ETH_ZLEN AND CONFIG_SLOB), so only these packets would need a copying.

Yes, we need to clone and then unshare immediately afterwards because
sja1105_xmit calls sja1105_defer_xmit, which schedules a workqueue. The
sja1105 driver assumes that the skb has already been cloned by then. So
basically, the sja1105 driver introduces a strict ordering requirement
that dsa_skb_tx_timestamp needs to be first, then p->xmit second. So we
necessarily must reallocate freshly cloned skbs, as things stand now.
I'll think about avoiding that, but not now. We were always reallocating
those frames before, using skb_cow_head. The only difference now is that
the skb, as it is passed to the tagger's xmit() function, is directly
writable. You'll see...