Message ID | 20220117115440.60296-1-miquel.raynal@bootlin.com |
---|---|
Headers | show |
Series | IEEE 802.15.4 scan support | expand |
Hi, On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > ieee802154_xmit_complete() is the right helper to call when a > transmission is over. The fact that it completed or not is not really a > question, but drivers must tell the core that the completion is over, > even if it was canceled. Do not call ieee802154_wake_queue() manually, > in order to let full control of this task to the core. > This is not a cancellation of a transmission, it is something weird going on. Introduce a xmit_error() for this, you call consume_skb() which is wrong for a non error case. > By using the complete helper we also avoid leacking the skb structure. > Yes, we are leaking here. - Alex
Hi, On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > We should never start a transmission after the queue has been stopped. > > But because it might work we don't kill the function here but rather > warn loudly the user that something is wrong. > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > net/mac802154/tx.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > index 18ee6fcfcd7f..de5ecda80472 100644 > --- a/net/mac802154/tx.c > +++ b/net/mac802154/tx.c > @@ -112,6 +112,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > static netdev_tx_t > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > { > + WARN_ON(mac802154_queue_is_stopped(local)); > + we should do a WARN_ON_ONCE() in this hot function. - Alex
Hi, On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > ieee802154_xmit_complete() is the right helper to call when a > transmission is over. The fact that it completed or not is not really a > question, but drivers must tell the core that the completion is over, > even if it was canceled. Do not call ieee802154_wake_queue() manually, > in order to let full control of this task to the core. > > By using the complete helper we also avoid leacking the skb structure. > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > drivers/net/ieee802154/at86rf230.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > index 583f835c317a..1941e1f3d2ef 100644 > --- a/drivers/net/ieee802154/at86rf230.c > +++ b/drivers/net/ieee802154/at86rf230.c > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > if (ctx->free) > kfree(ctx); > > - ieee802154_wake_queue(lp->hw); > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); also this lp->tx_skb can be a dangled pointer, after xmit_complete() we need to set it to NULL in a xmit_error() we can check on NULL before calling kfree_skb(). - Alex
Hi, On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote: > > Hi, > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > ieee802154_xmit_complete() is the right helper to call when a > > transmission is over. The fact that it completed or not is not really a > > question, but drivers must tell the core that the completion is over, > > even if it was canceled. Do not call ieee802154_wake_queue() manually, > > in order to let full control of this task to the core. > > > > By using the complete helper we also avoid leacking the skb structure. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > drivers/net/ieee802154/at86rf230.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > index 583f835c317a..1941e1f3d2ef 100644 > > --- a/drivers/net/ieee802154/at86rf230.c > > +++ b/drivers/net/ieee802154/at86rf230.c > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > > if (ctx->free) > > kfree(ctx); > > > > - ieee802154_wake_queue(lp->hw); > > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); > > also this lp->tx_skb can be a dangled pointer, after xmit_complete() > we need to set it to NULL in a xmit_error() we can check on NULL > before calling kfree_skb(). > forget the NULL checking, it's already done by core. However in some cases this is called with a dangled pointer on lp->tx_skb. - Alex
Hi Alexander, > > So far the only technical point that is missing in this series is the > > possibility to grab a reference over the module driving the net device > > in order to prevent module unloading during a scan or when the beacons > > work is ongoing. Do you have any advises regarding this issue? That is the only technical point that is left unaddressed IMHO. > > Finally, this series is a deep reshuffle of David Girault's original > > work, hence the fact that he is almost systematically credited, either > > by being the only author when I created the patches based on his changes > > with almost no modification, or with a Co-developped-by tag whenever the > > final code base is significantly different than his first proposal while > > still being greatly inspired from it. > > > > can you please split this patch series, what I see is now: > > 1. cleanup patches > 2. sync tx handling for mlme commands > 3. scan support Works for me. I just wanted to give the big picture but I'll split the series. Also sorry for forgetting the 'wpan-next' subject prefix. > we try to bring the patches upstream in this order. > > Thanks. > > - Alex Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Mon, 17 Jan 2022 18:14:17 -0500: > Hi, > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > We should never start a transmission after the queue has been stopped. > > > > But because it might work we don't kill the function here but rather > > warn loudly the user that something is wrong. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > net/mac802154/tx.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > index 18ee6fcfcd7f..de5ecda80472 100644 > > --- a/net/mac802154/tx.c > > +++ b/net/mac802154/tx.c > > @@ -112,6 +112,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > static netdev_tx_t > > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > { > > + WARN_ON(mac802154_queue_is_stopped(local)); > > + > > we should do a WARN_ON_ONCE() in this hot function. Sure! Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500: > Hi, > > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote: > > > > Hi, > > > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > ieee802154_xmit_complete() is the right helper to call when a > > > transmission is over. The fact that it completed or not is not really a > > > question, but drivers must tell the core that the completion is over, > > > even if it was canceled. Do not call ieee802154_wake_queue() manually, > > > in order to let full control of this task to the core. > > > > > > By using the complete helper we also avoid leacking the skb structure. > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > --- > > > drivers/net/ieee802154/at86rf230.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > index 583f835c317a..1941e1f3d2ef 100644 > > > --- a/drivers/net/ieee802154/at86rf230.c > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > > > if (ctx->free) > > > kfree(ctx); > > > > > > - ieee802154_wake_queue(lp->hw); > > > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); > > > > also this lp->tx_skb can be a dangled pointer, after xmit_complete() > > we need to set it to NULL in a xmit_error() we can check on NULL > > before calling kfree_skb(). > > > > forget the NULL checking, it's already done by core. However in some > cases this is called with a dangled pointer on lp->tx_skb. I'll try to fix these dangling situation first if I find them. I'll also introduce a xmit_error() helper as you suggest. Thanks, Miquèl
Hi, On Tue, 18 Jan 2022 at 13:20, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Mon, 17 Jan 2022 17:52:10 -0500: > > > Hi, > > > > On Mon, 17 Jan 2022 at 06:54, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > These periods are expressed in time units (microseconds) while 40 and 12 > > > are the number of symbol durations these periods will last. We need to > > > multiply them both with phy->symbol_duration in order to get these > > > values in microseconds. > > > > > > Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver driver") > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > --- > > > drivers/net/ieee802154/mcr20a.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/ieee802154/mcr20a.c b/drivers/net/ieee802154/mcr20a.c > > > index f0eb2d3b1c4e..e2c249aef430 100644 > > > --- a/drivers/net/ieee802154/mcr20a.c > > > +++ b/drivers/net/ieee802154/mcr20a.c > > > @@ -976,8 +976,8 @@ static void mcr20a_hw_setup(struct mcr20a_local *lp) > > > dev_dbg(printdev(lp), "%s\n", __func__); > > > > > > phy->symbol_duration = 16; > > > - phy->lifs_period = 40; > > > - phy->sifs_period = 12; > > > + phy->lifs_period = 40 * phy->symbol_duration; > > > + phy->sifs_period = 12 * phy->symbol_duration; > > > > I thought we do that now in register_hw(). Why does this patch exist? > > The lifs and sifs period are wrong. > > Fixing this silently by generalizing the calculation is simply wrong. I > feel we need to do this in order: > 1- Fix the period because it is wrong. > 2- Now that the period is set to a valid value and the core is able to > do the same operation and set the variables to an identical content, > we can drop these lines from the driver. > > #2 being a mechanical change, doing it without #1 means that something > that appears harmless actually changes the behavior of the driver. We > generally try to avoid that, no? yes, maybe Stefan can get this patch then somehow to wpan and queue it for stable. Thanks for clarification. - Alex
Hi, On Tue, 18 Jan 2022 at 05:40, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > > > So far the only technical point that is missing in this series is the > > > possibility to grab a reference over the module driving the net device > > > in order to prevent module unloading during a scan or when the beacons > > > work is ongoing. > > Do you have any advises regarding this issue? That is the only > technical point that is left unaddressed IMHO. > module_get()/module_put() or I don't see where the problem here is. You can avoid module unloading with it. Which module is the problem here? > > > Finally, this series is a deep reshuffle of David Girault's original > > > work, hence the fact that he is almost systematically credited, either > > > by being the only author when I created the patches based on his changes > > > with almost no modification, or with a Co-developped-by tag whenever the > > > final code base is significantly different than his first proposal while > > > still being greatly inspired from it. > > > > > > > can you please split this patch series, what I see is now: > > > > 1. cleanup patches > > 2. sync tx handling for mlme commands > > 3. scan support > > Works for me. I just wanted to give the big picture but I'll split the > series. > maybe also put some "symbol duration" series into it if it's getting too large? It is difficult to review 40 patches... in one step. > Also sorry for forgetting the 'wpan-next' subject prefix. > no problem. I really appreciate your work and your willingness to work on all outstanding issues. I am really happy to see something that we can use for mlme-commands and to separate it from the hotpath transmission... It is good to see architecture for that which I think goes in the right direction. - Alex
Hi Alexander, alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500: > Hi, > > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote: > > > > Hi, > > > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > ieee802154_xmit_complete() is the right helper to call when a > > > transmission is over. The fact that it completed or not is not really a > > > question, but drivers must tell the core that the completion is over, > > > even if it was canceled. Do not call ieee802154_wake_queue() manually, > > > in order to let full control of this task to the core. > > > > > > By using the complete helper we also avoid leacking the skb structure. > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > --- > > > drivers/net/ieee802154/at86rf230.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > index 583f835c317a..1941e1f3d2ef 100644 > > > --- a/drivers/net/ieee802154/at86rf230.c > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > > > if (ctx->free) > > > kfree(ctx); > > > > > > - ieee802154_wake_queue(lp->hw); > > > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); > > > > also this lp->tx_skb can be a dangled pointer, after xmit_complete() > > we need to set it to NULL in a xmit_error() we can check on NULL > > before calling kfree_skb(). I've created a xmit_error() helper as suggested, which call dev_kfree_skb_any() instead of *consume_skb*(). > > forget the NULL checking, it's already done by core. Indeed, it is. > However in some > cases this is called with a dangled pointer on lp->tx_skb. I've fixed that by setting it to NULL after the call to the xmit_error helper. > > - Alex Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500: > Hi, > > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote: > > > > Hi, > > > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > ieee802154_xmit_complete() is the right helper to call when a > > > transmission is over. The fact that it completed or not is not really a > > > question, but drivers must tell the core that the completion is over, > > > even if it was canceled. Do not call ieee802154_wake_queue() manually, > > > in order to let full control of this task to the core. > > > > > > By using the complete helper we also avoid leacking the skb structure. > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > --- > > > drivers/net/ieee802154/at86rf230.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > index 583f835c317a..1941e1f3d2ef 100644 > > > --- a/drivers/net/ieee802154/at86rf230.c > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > > > if (ctx->free) > > > kfree(ctx); > > > > > > - ieee802154_wake_queue(lp->hw); > > > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); > > > > also this lp->tx_skb can be a dangled pointer, after xmit_complete() > > we need to set it to NULL in a xmit_error() we can check on NULL > > before calling kfree_skb(). > > > > forget the NULL checking, it's already done by core. However in some > cases this is called with a dangled pointer on lp->tx_skb. Actually I don't see why tx_skb is dangling? There is no function that could accesses lp->tx_skb between the free operation and the next call to ->xmit() which does a lp->tx_skb = skb. Am I missing something? Thanks, Miquèl
Hi, On Wed, 19 Jan 2022 at 17:56, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500: > > > Hi, > > > > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote: > > > > > > Hi, > > > > > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > > > ieee802154_xmit_complete() is the right helper to call when a > > > > transmission is over. The fact that it completed or not is not really a > > > > question, but drivers must tell the core that the completion is over, > > > > even if it was canceled. Do not call ieee802154_wake_queue() manually, > > > > in order to let full control of this task to the core. > > > > > > > > By using the complete helper we also avoid leacking the skb structure. > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > > --- > > > > drivers/net/ieee802154/at86rf230.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > > index 583f835c317a..1941e1f3d2ef 100644 > > > > --- a/drivers/net/ieee802154/at86rf230.c > > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > > > > if (ctx->free) > > > > kfree(ctx); > > > > > > > > - ieee802154_wake_queue(lp->hw); > > > > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); > > > > > > also this lp->tx_skb can be a dangled pointer, after xmit_complete() > > > we need to set it to NULL in a xmit_error() we can check on NULL > > > before calling kfree_skb(). > > > > > > > forget the NULL checking, it's already done by core. However in some > > cases this is called with a dangled pointer on lp->tx_skb. > > Actually I don't see why tx_skb is dangling? > > There is no function that could accesses lp->tx_skb between the free > operation and the next call to ->xmit() which does a lp->tx_skb = skb. > Am I missing something? > look into at86rf230_sync_state_change() it is a sync over async and the function "at86rf230_async_error_recover_complete()" is a generic error handling to recover from a state change. It's e.g. being used in e.g. at86rf230_start() which can occur in cases which are not xmit related. Indeed there is no dangled pointer in the irq handling, sorry. I thought maybe the receive handling but the transceiver is doing a lot of its own state change handling because of some framebuffer protection which is not the case. - Alex
Hi, On Wed, 19 Jan 2022 at 18:34, Alexander Aring <alex.aring@gmail.com> wrote: > > Hi, > > On Wed, 19 Jan 2022 at 17:56, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Alexander, > > > > alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500: > > > > > Hi, > > > > > > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > > > > > ieee802154_xmit_complete() is the right helper to call when a > > > > > transmission is over. The fact that it completed or not is not really a > > > > > question, but drivers must tell the core that the completion is over, > > > > > even if it was canceled. Do not call ieee802154_wake_queue() manually, > > > > > in order to let full control of this task to the core. > > > > > > > > > > By using the complete helper we also avoid leacking the skb structure. > > > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > > > > --- > > > > > drivers/net/ieee802154/at86rf230.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c > > > > > index 583f835c317a..1941e1f3d2ef 100644 > > > > > --- a/drivers/net/ieee802154/at86rf230.c > > > > > +++ b/drivers/net/ieee802154/at86rf230.c > > > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context) > > > > > if (ctx->free) > > > > > kfree(ctx); > > > > > > > > > > - ieee802154_wake_queue(lp->hw); > > > > > + ieee802154_xmit_complete(lp->hw, lp->tx_skb, false); > > > > > > > > also this lp->tx_skb can be a dangled pointer, after xmit_complete() > > > > we need to set it to NULL in a xmit_error() we can check on NULL > > > > before calling kfree_skb(). > > > > > > > > > > forget the NULL checking, it's already done by core. However in some > > > cases this is called with a dangled pointer on lp->tx_skb. > > > > Actually I don't see why tx_skb is dangling? > > > > There is no function that could accesses lp->tx_skb between the free > > operation and the next call to ->xmit() which does a lp->tx_skb = skb. > > Am I missing something? > > > > look into at86rf230_sync_state_change() it is a sync over async and > the function "at86rf230_async_error_recover_complete()" is a generic > error handling to recover from a state change. It's e.g. being used in > e.g. at86rf230_start() which can occur in cases which are not xmit > related. > which means there is more being broken that we should not simply call to wake the queue in non-transmit cases... - Alex
Hi Alexander, alex.aring@gmail.com wrote on Tue, 18 Jan 2022 18:12:49 -0500: > Hi, > > On Tue, 18 Jan 2022 at 05:40, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Alexander, > > > > > > So far the only technical point that is missing in this series is the > > > > possibility to grab a reference over the module driving the net device > > > > in order to prevent module unloading during a scan or when the beacons > > > > work is ongoing. > > > > Do you have any advises regarding this issue? That is the only > > technical point that is left unaddressed IMHO. > > > > module_get()/module_put() or I don't see where the problem here is. > You can avoid module unloading with it. Which module is the problem > here? I'll give it another try, maybe when I first tried that I was missing a few mental peaces and did not understood the puzzle correctly. > > > > Finally, this series is a deep reshuffle of David Girault's original > > > > work, hence the fact that he is almost systematically credited, either > > > > by being the only author when I created the patches based on his changes > > > > with almost no modification, or with a Co-developped-by tag whenever the > > > > final code base is significantly different than his first proposal while > > > > still being greatly inspired from it. > > > > > > > > > > can you please split this patch series, what I see is now: > > > > > > 1. cleanup patches > > > 2. sync tx handling for mlme commands > > > 3. scan support > > > > Works for me. I just wanted to give the big picture but I'll split the > > series. > > > > maybe also put some "symbol duration" series into it if it's getting > too large? It is difficult to review 40 patches... in one step. Yep, I truly understand (and now 50+). > > > Also sorry for forgetting the 'wpan-next' subject prefix. > > > > no problem. > > I really appreciate your work and your willingness to work on all > outstanding issues. I am really happy to see something that we can use > for mlme-commands and to separate it from the hotpath transmission... > It is good to see architecture for that which I think goes in the > right direction. That is very stirring to read :) Thanks, Miquèl