Message ID | 20240102185933.64179-4-verdre@v0yd.nl |
---|---|
State | New |
Headers | show |
Series | Bluetooth: Improve retrying of connection attempts | expand |
Hi Jonas, On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote: > > Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting > later when we get a "Command Disallowed" error returned by "Create > Connection". > > In this commit the intention was to retry only once, and give up if we see > "Command Disallowed" again on the second try. > > This made sense back then when the retry was initiated *only* from the > "Connect Complete" event. If we received that event, we knew that now the > card now must have a "free slot" for a new connection request again. These > days we call hci_conn_check_pending() from a few more places though, and > in these places we can't really be sure that there's a "free slot" on the > card, so the second try to "Create Connection" might fail again. > > Deal with this by being less strict about these retries and try again > every time we get "Command Disallowed" errors, removing the limitation to > only two attempts. > > Since this can potentially cause us to enter an endless cycle of > reconnection attempts, we'll add some guarding against that with the next > commit. Don't see where you are doing such guarding, besides you seem to assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller is busy, or something like that, but it could perform the connection later, but that may not always be the case, thus why I think reconnecting just a few number of times is better, if you really need to keep retrying then this needs to be controlled by a policy in userspace not hardcoded in the kernel, well I can even argument that perhaps the initial number of reconnection shall be configurable so one don't have to recompile the kernel if that needs changing. > Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> > --- > net/bluetooth/hci_event.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c > index e8b4a0126..e1f5b6f90 100644 > --- a/net/bluetooth/hci_event.c > +++ b/net/bluetooth/hci_event.c > @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) > > if (status) { > if (conn && conn->state == BT_CONNECT) { > - if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) { > + if (status == HCI_ERROR_COMMAND_DISALLOWED) { > + conn->state = BT_CONNECT2; > + } else { > conn->state = BT_CLOSED; > hci_connect_cfm(conn, status); > hci_conn_del(conn); > - } else > - conn->state = BT_CONNECT2; > + } > } > } else { > if (!conn) { > -- > 2.43.0 >
Hi Luiz, On 1/3/24 17:05, Luiz Augusto von Dentz wrote: > Hi Jonas, > > On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote: >> >> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting >> later when we get a "Command Disallowed" error returned by "Create >> Connection". >> >> In this commit the intention was to retry only once, and give up if we see >> "Command Disallowed" again on the second try. >> >> This made sense back then when the retry was initiated *only* from the >> "Connect Complete" event. If we received that event, we knew that now the >> card now must have a "free slot" for a new connection request again. These >> days we call hci_conn_check_pending() from a few more places though, and >> in these places we can't really be sure that there's a "free slot" on the >> card, so the second try to "Create Connection" might fail again. >> >> Deal with this by being less strict about these retries and try again >> every time we get "Command Disallowed" errors, removing the limitation to >> only two attempts. >> >> Since this can potentially cause us to enter an endless cycle of >> reconnection attempts, we'll add some guarding against that with the next >> commit. > > Don't see where you are doing such guarding, besides you seem to > assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller > is busy, or something like that, but it could perform the connection > later, but that may not always be the case, thus why I think > reconnecting just a few number of times is better, if you really need > to keep retrying then this needs to be controlled by a policy in > userspace not hardcoded in the kernel, well I can even argument that > perhaps the initial number of reconnection shall be configurable so > one don't have to recompile the kernel if that needs changing. Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always means busy. The guarding is that we stop retrying as soon as there's no (competing) ongoing connection attempt nor an active inquiry, which should eventually be the case no matter what, no? I agree it's probably still better to not rely on this fairly complex sanity check and keep the checking of attempts nonetheless. I think we could keep doing that if we check for !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) && !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before we actually retry, to make sure the retry counter doesn't get incremented wrongly. I'll give that a try. > >> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> >> --- >> net/bluetooth/hci_event.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c >> index e8b4a0126..e1f5b6f90 100644 >> --- a/net/bluetooth/hci_event.c >> +++ b/net/bluetooth/hci_event.c >> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) >> >> if (status) { >> if (conn && conn->state == BT_CONNECT) { >> - if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) { >> + if (status == HCI_ERROR_COMMAND_DISALLOWED) { >> + conn->state = BT_CONNECT2; >> + } else { >> conn->state = BT_CLOSED; >> hci_connect_cfm(conn, status); >> hci_conn_del(conn); >> - } else >> - conn->state = BT_CONNECT2; >> + } >> } >> } else { >> if (!conn) { >> -- >> 2.43.0 >> > > Cheers, Jonas
Hi Jonas, On Fri, Jan 5, 2024 at 10:54 AM Jonas Dreßler <verdre@v0yd.nl> wrote: > > Hi Luiz, > > On 1/3/24 17:05, Luiz Augusto von Dentz wrote: > > Hi Jonas, > > > > On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote: > >> > >> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting > >> later when we get a "Command Disallowed" error returned by "Create > >> Connection". > >> > >> In this commit the intention was to retry only once, and give up if we see > >> "Command Disallowed" again on the second try. > >> > >> This made sense back then when the retry was initiated *only* from the > >> "Connect Complete" event. If we received that event, we knew that now the > >> card now must have a "free slot" for a new connection request again. These > >> days we call hci_conn_check_pending() from a few more places though, and > >> in these places we can't really be sure that there's a "free slot" on the > >> card, so the second try to "Create Connection" might fail again. > >> > >> Deal with this by being less strict about these retries and try again > >> every time we get "Command Disallowed" errors, removing the limitation to > >> only two attempts. > >> > >> Since this can potentially cause us to enter an endless cycle of > >> reconnection attempts, we'll add some guarding against that with the next > >> commit. > > > > Don't see where you are doing such guarding, besides you seem to > > assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller > > is busy, or something like that, but it could perform the connection > > later, but that may not always be the case, thus why I think > > reconnecting just a few number of times is better, if you really need > > to keep retrying then this needs to be controlled by a policy in > > userspace not hardcoded in the kernel, well I can even argument that > > perhaps the initial number of reconnection shall be configurable so > > one don't have to recompile the kernel if that needs changing. > > Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always > means busy. The guarding is that we stop retrying as soon as there's no > (competing) ongoing connection attempt nor an active inquiry, which > should eventually be the case no matter what, no? > > I agree it's probably still better to not rely on this fairly complex > sanity check and keep the checking of attempts nonetheless. > > I think we could keep doing that if we check for > !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) && > !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before > we actually retry, to make sure the retry counter doesn't get > incremented wrongly. I'll give that a try. Perhaps I'm missing something, but it should be possible to block concurrent access to HCI while a command is pending with use of hci_cmd_sync, at least on LE we do that by waiting the connection complete event so connection attempts are serialized but we don't seem to be doing the same for BR/EDR. > > > >> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> > >> --- > >> net/bluetooth/hci_event.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c > >> index e8b4a0126..e1f5b6f90 100644 > >> --- a/net/bluetooth/hci_event.c > >> +++ b/net/bluetooth/hci_event.c > >> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) > >> > >> if (status) { > >> if (conn && conn->state == BT_CONNECT) { > >> - if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) { > >> + if (status == HCI_ERROR_COMMAND_DISALLOWED) { > >> + conn->state = BT_CONNECT2; > >> + } else { > >> conn->state = BT_CLOSED; > >> hci_connect_cfm(conn, status); > >> hci_conn_del(conn); > >> - } else > >> - conn->state = BT_CONNECT2; > >> + } > >> } > >> } else { > >> if (!conn) { > >> -- > >> 2.43.0 > >> > > > > > > Cheers, > Jonas -- Luiz Augusto von Dentz
Hi Luiz, On 1/5/24 17:05, Luiz Augusto von Dentz wrote: > Hi Jonas, > > On Fri, Jan 5, 2024 at 10:54 AM Jonas Dreßler <verdre@v0yd.nl> wrote: >> >> Hi Luiz, >> >> On 1/3/24 17:05, Luiz Augusto von Dentz wrote: >>> Hi Jonas, >>> >>> On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote: >>>> >>>> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting >>>> later when we get a "Command Disallowed" error returned by "Create >>>> Connection". >>>> >>>> In this commit the intention was to retry only once, and give up if we see >>>> "Command Disallowed" again on the second try. >>>> >>>> This made sense back then when the retry was initiated *only* from the >>>> "Connect Complete" event. If we received that event, we knew that now the >>>> card now must have a "free slot" for a new connection request again. These >>>> days we call hci_conn_check_pending() from a few more places though, and >>>> in these places we can't really be sure that there's a "free slot" on the >>>> card, so the second try to "Create Connection" might fail again. >>>> >>>> Deal with this by being less strict about these retries and try again >>>> every time we get "Command Disallowed" errors, removing the limitation to >>>> only two attempts. >>>> >>>> Since this can potentially cause us to enter an endless cycle of >>>> reconnection attempts, we'll add some guarding against that with the next >>>> commit. >>> >>> Don't see where you are doing such guarding, besides you seem to >>> assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller >>> is busy, or something like that, but it could perform the connection >>> later, but that may not always be the case, thus why I think >>> reconnecting just a few number of times is better, if you really need >>> to keep retrying then this needs to be controlled by a policy in >>> userspace not hardcoded in the kernel, well I can even argument that >>> perhaps the initial number of reconnection shall be configurable so >>> one don't have to recompile the kernel if that needs changing. >> >> Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always >> means busy. The guarding is that we stop retrying as soon as there's no >> (competing) ongoing connection attempt nor an active inquiry, which >> should eventually be the case no matter what, no? >> >> I agree it's probably still better to not rely on this fairly complex >> sanity check and keep the checking of attempts nonetheless. >> >> I think we could keep doing that if we check for >> !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) && >> !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before >> we actually retry, to make sure the retry counter doesn't get >> incremented wrongly. I'll give that a try. > > Perhaps I'm missing something, but it should be possible to block > concurrent access to HCI while a command is pending with use of > hci_cmd_sync, at least on LE we do that by waiting the connection > complete event so connection attempts are serialized but we don't seem > to be doing the same for BR/EDR. > That's a good point, it might even allow for a nice little cleanup because we can then cancel inquiries in hci_acl_create_connection() synchronously, too. Question is just whether there's any devices out there that actually do support paging with more than a single device at a time and if we want to support that? > >>> >>>> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> >>>> --- >>>> net/bluetooth/hci_event.c | 7 ++++--- >>>> 1 file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c >>>> index e8b4a0126..e1f5b6f90 100644 >>>> --- a/net/bluetooth/hci_event.c >>>> +++ b/net/bluetooth/hci_event.c >>>> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) >>>> >>>> if (status) { >>>> if (conn && conn->state == BT_CONNECT) { >>>> - if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) { >>>> + if (status == HCI_ERROR_COMMAND_DISALLOWED) { >>>> + conn->state = BT_CONNECT2; >>>> + } else { >>>> conn->state = BT_CLOSED; >>>> hci_connect_cfm(conn, status); >>>> hci_conn_del(conn); >>>> - } else >>>> - conn->state = BT_CONNECT2; >>>> + } >>>> } >>>> } else { >>>> if (!conn) { >>>> -- >>>> 2.43.0 >>>> >>> >>> >> >> Cheers, >> Jonas > > > > -- > Luiz Augusto von Dentz
Hi Jonas, On Sun, Jan 7, 2024 at 5:20 PM Jonas Dreßler <verdre@v0yd.nl> wrote: > > Hi Luiz, > > On 1/5/24 17:05, Luiz Augusto von Dentz wrote: > > Hi Jonas, > > > > On Fri, Jan 5, 2024 at 10:54 AM Jonas Dreßler <verdre@v0yd.nl> wrote: > >> > >> Hi Luiz, > >> > >> On 1/3/24 17:05, Luiz Augusto von Dentz wrote: > >>> Hi Jonas, > >>> > >>> On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote: > >>>> > >>>> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting > >>>> later when we get a "Command Disallowed" error returned by "Create > >>>> Connection". > >>>> > >>>> In this commit the intention was to retry only once, and give up if we see > >>>> "Command Disallowed" again on the second try. > >>>> > >>>> This made sense back then when the retry was initiated *only* from the > >>>> "Connect Complete" event. If we received that event, we knew that now the > >>>> card now must have a "free slot" for a new connection request again. These > >>>> days we call hci_conn_check_pending() from a few more places though, and > >>>> in these places we can't really be sure that there's a "free slot" on the > >>>> card, so the second try to "Create Connection" might fail again. > >>>> > >>>> Deal with this by being less strict about these retries and try again > >>>> every time we get "Command Disallowed" errors, removing the limitation to > >>>> only two attempts. > >>>> > >>>> Since this can potentially cause us to enter an endless cycle of > >>>> reconnection attempts, we'll add some guarding against that with the next > >>>> commit. > >>> > >>> Don't see where you are doing such guarding, besides you seem to > >>> assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller > >>> is busy, or something like that, but it could perform the connection > >>> later, but that may not always be the case, thus why I think > >>> reconnecting just a few number of times is better, if you really need > >>> to keep retrying then this needs to be controlled by a policy in > >>> userspace not hardcoded in the kernel, well I can even argument that > >>> perhaps the initial number of reconnection shall be configurable so > >>> one don't have to recompile the kernel if that needs changing. > >> > >> Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always > >> means busy. The guarding is that we stop retrying as soon as there's no > >> (competing) ongoing connection attempt nor an active inquiry, which > >> should eventually be the case no matter what, no? > >> > >> I agree it's probably still better to not rely on this fairly complex > >> sanity check and keep the checking of attempts nonetheless. > >> > >> I think we could keep doing that if we check for > >> !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) && > >> !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before > >> we actually retry, to make sure the retry counter doesn't get > >> incremented wrongly. I'll give that a try. > > > > Perhaps I'm missing something, but it should be possible to block > > concurrent access to HCI while a command is pending with use of > > hci_cmd_sync, at least on LE we do that by waiting the connection > > complete event so connection attempts are serialized but we don't seem > > to be doing the same for BR/EDR. > > > > That's a good point, it might even allow for a nice little cleanup > because we can then cancel inquiries in hci_acl_create_connection() > synchronously, too. > > Question is just whether there's any devices out there that actually do > support paging with more than a single device at a time and if we want > to support that? I think the controller would serialize the paging anyway, so the fact that we would serialize it on the host side might actually means that we have a common behavior across controllers rather than having to handle errors when the controller is not capable of serializing connections by themselves. > > > >>> > >>>> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> > >>>> --- > >>>> net/bluetooth/hci_event.c | 7 ++++--- > >>>> 1 file changed, 4 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c > >>>> index e8b4a0126..e1f5b6f90 100644 > >>>> --- a/net/bluetooth/hci_event.c > >>>> +++ b/net/bluetooth/hci_event.c > >>>> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) > >>>> > >>>> if (status) { > >>>> if (conn && conn->state == BT_CONNECT) { > >>>> - if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) { > >>>> + if (status == HCI_ERROR_COMMAND_DISALLOWED) { > >>>> + conn->state = BT_CONNECT2; > >>>> + } else { > >>>> conn->state = BT_CLOSED; > >>>> hci_connect_cfm(conn, status); > >>>> hci_conn_del(conn); > >>>> - } else > >>>> - conn->state = BT_CONNECT2; > >>>> + } > >>>> } > >>>> } else { > >>>> if (!conn) { > >>>> -- > >>>> 2.43.0 > >>>> > >>> > >>> > >> > >> Cheers, > >> Jonas > > > > > > > > -- > > Luiz Augusto von Dentz
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index e8b4a0126..e1f5b6f90 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) if (status) { if (conn && conn->state == BT_CONNECT) { - if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) { + if (status == HCI_ERROR_COMMAND_DISALLOWED) { + conn->state = BT_CONNECT2; + } else { conn->state = BT_CLOSED; hci_connect_cfm(conn, status); hci_conn_del(conn); - } else - conn->state = BT_CONNECT2; + } } } else { if (!conn) {
Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting later when we get a "Command Disallowed" error returned by "Create Connection". In this commit the intention was to retry only once, and give up if we see "Command Disallowed" again on the second try. This made sense back then when the retry was initiated *only* from the "Connect Complete" event. If we received that event, we knew that now the card now must have a "free slot" for a new connection request again. These days we call hci_conn_check_pending() from a few more places though, and in these places we can't really be sure that there's a "free slot" on the card, so the second try to "Create Connection" might fail again. Deal with this by being less strict about these retries and try again every time we get "Command Disallowed" errors, removing the limitation to only two attempts. Since this can potentially cause us to enter an endless cycle of reconnection attempts, we'll add some guarding against that with the next commit. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> --- net/bluetooth/hci_event.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)