Message ID | 20221029202454.25651-3-swyterzone@gmail.com |
---|---|
State | New |
Headers | show |
Series | [1/3] Bluetooth: btusb: Fix Chinese CSR dongles again by re-adding ERR_DATA_REPORTING quirk | expand |
Hi Ismael, On Sat, Oct 29, 2022 at 1:25 PM Ismael Ferreras Morezuelas <swyterzone@gmail.com> wrote: > > A few users have reported that their cloned Chinese dongle doesn't > work well with the hack Hans de Goede added, that tries this > off-on mechanism as a way to unfreeze them. > > It's still more than worthwhile to have it, as in the vast majority > of cases it either completely brings dongles to life or just resets > them harmlessly as it already happens during normal USB operation. > > This is nothing new and the controllers are expected to behave > correctly. But yeah, go figure. :) > > For that unhappy minority we can easily handle this edge case by letting > users disable it via our «btusb.disable_fake_csr_forcesuspend_hack=1» kernel option. Don't really like the idea of adding module parameter for device specific problem. > I believe this is the most generic way of doing it, given the constraints > and by still having a good out-of-the-box experience. > > No clone left behind. > > Cc: stable@vger.kernel.org > Cc: Hans de Goede <hdegoede@redhat.com> > Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> > --- > drivers/bluetooth/btusb.c | 31 +++++++++++++++++++------------ > 1 file changed, 19 insertions(+), 12 deletions(-) > > diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c > index 8f34bf195bae..d31d4f925463 100644 > --- a/drivers/bluetooth/btusb.c > +++ b/drivers/bluetooth/btusb.c > @@ -34,6 +34,7 @@ static bool force_scofix; > static bool enable_autosuspend = IS_ENABLED(CONFIG_BT_HCIBTUSB_AUTOSUSPEND); > static bool enable_poll_sync = IS_ENABLED(CONFIG_BT_HCIBTUSB_POLL_SYNC); > static bool reset = true; > +static bool disable_fake_csr_forcesuspend_hack; > > static struct usb_driver btusb_driver; > > @@ -2171,7 +2172,7 @@ static int btusb_setup_csr(struct hci_dev *hdev) > is_fake = true; > > if (is_fake) { > - bt_dev_warn(hdev, "CSR: Unbranded CSR clone detected; adding workarounds and force-suspending once..."); > + bt_dev_warn(hdev, "CSR: Unbranded CSR clone detected; adding workarounds..."); > > /* Generally these clones have big discrepancies between > * advertised features and what's actually supported. > @@ -2215,21 +2216,24 @@ static int btusb_setup_csr(struct hci_dev *hdev) > * apply this initialization quirk to every controller that gets here, > * it should be harmless. The alternative is to not work at all. > */ > - pm_runtime_allow(&data->udev->dev); > + if (!disable_fake_csr_forcesuspend_hack) { > + bt_dev_warn(hdev, "CSR: Unbranded CSR clone detected; force-suspending once..."); > + pm_runtime_allow(&data->udev->dev); > > - ret = pm_runtime_suspend(&data->udev->dev); > - if (ret >= 0) > - msleep(200); > - else > - bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); > + ret = pm_runtime_suspend(&data->udev->dev); > + if (ret >= 0) > + msleep(200); > + else > + bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); Is this specific to Barrot 8041a02? Why don't we add a quirk then? > - pm_runtime_forbid(&data->udev->dev); > + pm_runtime_forbid(&data->udev->dev); > > - device_set_wakeup_capable(&data->udev->dev, false); > + device_set_wakeup_capable(&data->udev->dev, false); > > - /* Re-enable autosuspend if this was requested */ > - if (enable_autosuspend) > - usb_enable_autosuspend(data->udev); > + /* Re-enable autosuspend if this was requested */ > + if (enable_autosuspend) > + usb_enable_autosuspend(data->udev); > + } > } > > kfree_skb(skb); > @@ -4312,6 +4316,9 @@ MODULE_PARM_DESC(enable_autosuspend, "Enable USB autosuspend by default"); > module_param(reset, bool, 0644); > MODULE_PARM_DESC(reset, "Send HCI reset command on initialization"); > > +module_param(disable_fake_csr_forcesuspend_hack, bool, 0644); > +MODULE_PARM_DESC(disable_fake_csr_forcesuspend_hack, "Don't indiscriminately force-suspend Chinese-cloned CSR dongles trying to unfreeze them"); > + > MODULE_AUTHOR("Marcel Holtmann <marcel@holtmann.org>"); > MODULE_DESCRIPTION("Generic Bluetooth USB driver ver " VERSION); > MODULE_VERSION(VERSION); > -- > 2.38.1 >
On 09/11/2022 21:49, Luiz Augusto von Dentz wrote: > Hi Ismael, > > On Sat, Oct 29, 2022 at 1:25 PM Ismael Ferreras Morezuelas > <swyterzone@gmail.com> wrote: >> >> A few users have reported that their cloned Chinese dongle doesn't >> work well with the hack Hans de Goede added, that tries this >> off-on mechanism as a way to unfreeze them. >> >> It's still more than worthwhile to have it, as in the vast majority >> of cases it either completely brings dongles to life or just resets >> them harmlessly as it already happens during normal USB operation. >> >> This is nothing new and the controllers are expected to behave >> correctly. But yeah, go figure. :) >> >> For that unhappy minority we can easily handle this edge case by letting >> users disable it via our «btusb.disable_fake_csr_forcesuspend_hack=1» kernel option. > > Don't really like the idea of adding module parameter for device > specific problem. It's not for a single device, it's for a whole class of unnamed devices that are currently screwed even after all this. >> - ret = pm_runtime_suspend(&data->udev->dev); >> - if (ret >= 0) >> - msleep(200); >> - else >> - bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); >> + ret = pm_runtime_suspend(&data->udev->dev); >> + if (ret >= 0) >> + msleep(200); >> + else >> + bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); > > Is this specific to Barrot 8041a02? Why don't we add a quirk then? > We don't know how specific it is, we suspect the getting stuck thing happens with Barrot controllers, but in this world of lasered-out counterfeit chip IDs you can never be sure. Unless someone decaps them. Hans added that name because it's the closest thing we have, but this applies to a lot of chips. So much that now we do the hack by default, for very good reasons. So please reconsider, this closes the gap. With this last patch we go from ~+90% to almost ~100%, as the rest of generic quirks we added don't really hurt; even if a particular dongle only needs a few of the zoo of quirks we set, it's alright if we vaccinate them against all of these, except some are "allergic" against this particular "vaccine". Let people skip this one. :-) You know how normal BT controllers are utterly and inconsistently broken, now imagine you have a whole host of vendors reusing a VID/PID/version/subversion, masking as a CSR for bizarre reasons to avoid paying any USB-IF fees, or whatever. That's what we are fighting against here.
Hi Swyter, On Wed, Nov 9, 2022 at 1:30 PM Swyter <swyterzone@gmail.com> wrote: > > On 09/11/2022 21:49, Luiz Augusto von Dentz wrote: > > Hi Ismael, > > > > On Sat, Oct 29, 2022 at 1:25 PM Ismael Ferreras Morezuelas > > <swyterzone@gmail.com> wrote: > >> > >> A few users have reported that their cloned Chinese dongle doesn't > >> work well with the hack Hans de Goede added, that tries this > >> off-on mechanism as a way to unfreeze them. > >> > >> It's still more than worthwhile to have it, as in the vast majority > >> of cases it either completely brings dongles to life or just resets > >> them harmlessly as it already happens during normal USB operation. > >> > >> This is nothing new and the controllers are expected to behave > >> correctly. But yeah, go figure. :) > >> > >> For that unhappy minority we can easily handle this edge case by letting > >> users disable it via our «btusb.disable_fake_csr_forcesuspend_hack=1» kernel option. > > > > Don't really like the idea of adding module parameter for device > > specific problem. > > It's not for a single device, it's for a whole class of unnamed devices > that are currently screwed even after all this. > > > >> - ret = pm_runtime_suspend(&data->udev->dev); > >> - if (ret >= 0) > >> - msleep(200); > >> - else > >> - bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); > >> + ret = pm_runtime_suspend(&data->udev->dev); > >> + if (ret >= 0) > >> + msleep(200); > >> + else > >> + bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); > > > > Is this specific to Barrot 8041a02? Why don't we add a quirk then? > > > > We don't know how specific it is, we suspect the getting stuck thing happens with Barrot controllers, > but in this world of lasered-out counterfeit chip IDs you can never be sure. Unless someone decaps them. > > Hans added that name because it's the closest thing we have, but this applies to a lot of chips. > So much that now we do the hack by default, for very good reasons. > > So please reconsider, this closes the gap. > > With this last patch we go from ~+90% to almost ~100%, as the rest of generic quirks we added > don't really hurt; even if a particular dongle only needs a few of the zoo of quirks we set, > it's alright if we vaccinate them against all of these, except some are "allergic" > against this particular "vaccine". Let people skip this one. :-) > > You know how normal BT controllers are utterly and inconsistently broken, now imagine you have a whole host > of vendors reusing a VID/PID/version/subversion, masking as a CSR for bizarre reasons to avoid paying > any USB-IF fees, or whatever. That's what we are fighting against here. I see, but for suspend in particular, can't we actually handle it somehow? I mean if we can detect the controller is getting stuck and print some information and flip the quirk? Otherwise Im afraid this parameter will end up always being set by distros to avoid suspend problems.
On 09/11/2022 23:39, Luiz Augusto von Dentz wrote: >>> Is this specific to Barrot 8041a02? Why don't we add a quirk then? >>> >> >> We don't know how specific it is, we suspect the getting stuck thing happens with Barrot controllers, >> but in this world of lasered-out counterfeit chip IDs you can never be sure. Unless someone decaps them. >> >> Hans added that name because it's the closest thing we have, but this applies to a lot of chips. >> So much that now we do the hack by default, for very good reasons. >> >> So please reconsider, this closes the gap. >> >> With this last patch we go from ~+90% to almost ~100%, as the rest of generic quirks we added >> don't really hurt; even if a particular dongle only needs a few of the zoo of quirks we set, >> it's alright if we vaccinate them against all of these, except some are "allergic" >> against this particular "vaccine". Let people skip this one. :-) >> >> You know how normal BT controllers are utterly and inconsistently broken, now imagine you have a whole host >> of vendors reusing a VID/PID/version/subversion, masking as a CSR for bizarre reasons to avoid paying >> any USB-IF fees, or whatever. That's what we are fighting against here. > > I see, but for suspend in particular, can't we actually handle it > somehow? I mean if we can detect the controller is getting stuck and > print some information and flip the quirk? Otherwise Im afraid this > parameter will end up always being set by distros to avoid suspend > problems. Maybe, auto-detection is certainly a better way and a potential improvement, assuming we cover all the edge cases. Which I'm not too sure about. The controllers don't get totally stuck, they just act weird and give funky HCI responses at certain points if we don't do this, if I remember correctly. Unfortunately I can't really justify spending that much time *right now* on this hobby project. Distros should *definitely* keep doing the hack by default if they want the widest compatibility. This is comparatively a niche issue. But yeah, to sum things up; I'm not sure going back to a whitelist of a whitelist is a good idea without a foolproof method. We want the widest possible reach by doing it in the most generic way with the smallest possible side effects. If we can find a way to blacklist this quirk when we are super sure it's going to cause issues I'm all for it. Right now I think this is an acceptable solution, as long as people in charge of distros don't flip these toggles. Nobody has cared until now, barely any of these devices worked. Now that most of them work it would be very funny to see them break the majority to fix a few. With the amount of effort it takes an outsider like me to get stuff into the kernel and fight to avoid random reverts I don't know if I'd be able to get something as involved as that to work in a satisfying, automatic and simple way. Perfect is the enemy of good, diminishing returns, and all that. ¯\_(ツ)_/¯
Hi Luiz, On 11/9/22 23:39, Luiz Augusto von Dentz wrote: > Hi Swyter, > > On Wed, Nov 9, 2022 at 1:30 PM Swyter <swyterzone@gmail.com> wrote: >> >> On 09/11/2022 21:49, Luiz Augusto von Dentz wrote: >>> Hi Ismael, >>> >>> On Sat, Oct 29, 2022 at 1:25 PM Ismael Ferreras Morezuelas >>> <swyterzone@gmail.com> wrote: >>>> >>>> A few users have reported that their cloned Chinese dongle doesn't >>>> work well with the hack Hans de Goede added, that tries this >>>> off-on mechanism as a way to unfreeze them. >>>> >>>> It's still more than worthwhile to have it, as in the vast majority >>>> of cases it either completely brings dongles to life or just resets >>>> them harmlessly as it already happens during normal USB operation. >>>> >>>> This is nothing new and the controllers are expected to behave >>>> correctly. But yeah, go figure. :) >>>> >>>> For that unhappy minority we can easily handle this edge case by letting >>>> users disable it via our «btusb.disable_fake_csr_forcesuspend_hack=1» kernel option. >>> >>> Don't really like the idea of adding module parameter for device >>> specific problem. >> >> It's not for a single device, it's for a whole class of unnamed devices >> that are currently screwed even after all this. >> >> >>>> - ret = pm_runtime_suspend(&data->udev->dev); >>>> - if (ret >= 0) >>>> - msleep(200); >>>> - else >>>> - bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); >>>> + ret = pm_runtime_suspend(&data->udev->dev); >>>> + if (ret >= 0) >>>> + msleep(200); >>>> + else >>>> + bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); >>> >>> Is this specific to Barrot 8041a02? Why don't we add a quirk then? >>> >> >> We don't know how specific it is, we suspect the getting stuck thing happens with Barrot controllers, >> but in this world of lasered-out counterfeit chip IDs you can never be sure. Unless someone decaps them. >> >> Hans added that name because it's the closest thing we have, but this applies to a lot of chips. >> So much that now we do the hack by default, for very good reasons. >> >> So please reconsider, this closes the gap. >> >> With this last patch we go from ~+90% to almost ~100%, as the rest of generic quirks we added >> don't really hurt; even if a particular dongle only needs a few of the zoo of quirks we set, >> it's alright if we vaccinate them against all of these, except some are "allergic" >> against this particular "vaccine". Let people skip this one. :-) >> >> You know how normal BT controllers are utterly and inconsistently broken, now imagine you have a whole host >> of vendors reusing a VID/PID/version/subversion, masking as a CSR for bizarre reasons to avoid paying >> any USB-IF fees, or whatever. That's what we are fighting against here. > > I see, but for suspend in particular, can't we actually handle it > somehow? Note this is not about suspend. This is about a workaround where we runtime-suspend the HCI once, before initializing it and then actually disable runtime suspend (unless BT is turned off) because the USB remote wakeup functionality is also broken on these. IIRC the problem was that without the runtime suspend the HCI seems to work, but any events from the HCI to the host which are not direct responses to a request from the host, like a known previously paired BT device trying to connect would not get registered by the HCI. At least not until the host actually requested something from the HCI, e.g. starting a scan for new devices would all of a sudden cause the known paired device to show up. My theory here is that during init the Windows drivers at least runtime suspend the HCI once (or leave it idle long enough for Windows to do this) and somehow these clones rely on that for setting up some of their host notification functionality. > I mean if we can detect the controller is getting stuck and > print some information and flip the quirk? Detection of devices which need the "runtime-suspend once" workaround is almost impossible since the problem is the lack of unsolicited messages from the HCI, which can be normal if no BT "clients" are in use during probe. > Otherwise Im afraid this > parameter will end up always being set by distros to avoid suspend > problems. The flag is Swyter is suggesting is actually to disable the workaround, which is currently enabled for all USB BT dongles identified as being a CSR clone. Most clones work fine with it, with many needing it, but some clones seem to not like the workaround and behave undesirable if we runtime suspend them once. So the flag is to disable the workaround to make these last 5% of these (really cheap, bottom of the barrel quality) clones work. Since 95% of the clones do work with the workaround distro's enabling the flag by default would be a bad idea and I don't expect them to do this. Regards, Hans
On 09/11/2022 23:39, Luiz Augusto von Dentz wrote: > I see, but for suspend in particular, can't we actually handle it > somehow? I mean if we can detect the controller is getting stuck and > print some information and flip the quirk? Otherwise Im afraid this > parameter will end up always being set by distros to avoid suspend > problems. Hi, Luiz. I haven't seen any movement about the [3/3] patch since a few weeks ago. Given what Hans clarified in his reply, I wondered if you or any of the other Bluetooth maintainers have changed opinions about including this in some form. I'm a kernel development newbie, so I'm not good at this. I don't know if I should do anything else, wait a bit more, or just drop this. Thanks in advance. :)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 8f34bf195bae..d31d4f925463 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -34,6 +34,7 @@ static bool force_scofix; static bool enable_autosuspend = IS_ENABLED(CONFIG_BT_HCIBTUSB_AUTOSUSPEND); static bool enable_poll_sync = IS_ENABLED(CONFIG_BT_HCIBTUSB_POLL_SYNC); static bool reset = true; +static bool disable_fake_csr_forcesuspend_hack; static struct usb_driver btusb_driver; @@ -2171,7 +2172,7 @@ static int btusb_setup_csr(struct hci_dev *hdev) is_fake = true; if (is_fake) { - bt_dev_warn(hdev, "CSR: Unbranded CSR clone detected; adding workarounds and force-suspending once..."); + bt_dev_warn(hdev, "CSR: Unbranded CSR clone detected; adding workarounds..."); /* Generally these clones have big discrepancies between * advertised features and what's actually supported. @@ -2215,21 +2216,24 @@ static int btusb_setup_csr(struct hci_dev *hdev) * apply this initialization quirk to every controller that gets here, * it should be harmless. The alternative is to not work at all. */ - pm_runtime_allow(&data->udev->dev); + if (!disable_fake_csr_forcesuspend_hack) { + bt_dev_warn(hdev, "CSR: Unbranded CSR clone detected; force-suspending once..."); + pm_runtime_allow(&data->udev->dev); - ret = pm_runtime_suspend(&data->udev->dev); - if (ret >= 0) - msleep(200); - else - bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); + ret = pm_runtime_suspend(&data->udev->dev); + if (ret >= 0) + msleep(200); + else + bt_dev_warn(hdev, "CSR: Couldn't suspend the device for our Barrot 8041a02 receive-issue workaround"); - pm_runtime_forbid(&data->udev->dev); + pm_runtime_forbid(&data->udev->dev); - device_set_wakeup_capable(&data->udev->dev, false); + device_set_wakeup_capable(&data->udev->dev, false); - /* Re-enable autosuspend if this was requested */ - if (enable_autosuspend) - usb_enable_autosuspend(data->udev); + /* Re-enable autosuspend if this was requested */ + if (enable_autosuspend) + usb_enable_autosuspend(data->udev); + } } kfree_skb(skb); @@ -4312,6 +4316,9 @@ MODULE_PARM_DESC(enable_autosuspend, "Enable USB autosuspend by default"); module_param(reset, bool, 0644); MODULE_PARM_DESC(reset, "Send HCI reset command on initialization"); +module_param(disable_fake_csr_forcesuspend_hack, bool, 0644); +MODULE_PARM_DESC(disable_fake_csr_forcesuspend_hack, "Don't indiscriminately force-suspend Chinese-cloned CSR dongles trying to unfreeze them"); + MODULE_AUTHOR("Marcel Holtmann <marcel@holtmann.org>"); MODULE_DESCRIPTION("Generic Bluetooth USB driver ver " VERSION); MODULE_VERSION(VERSION);
A few users have reported that their cloned Chinese dongle doesn't work well with the hack Hans de Goede added, that tries this off-on mechanism as a way to unfreeze them. It's still more than worthwhile to have it, as in the vast majority of cases it either completely brings dongles to life or just resets them harmlessly as it already happens during normal USB operation. This is nothing new and the controllers are expected to behave correctly. But yeah, go figure. :) For that unhappy minority we can easily handle this edge case by letting users disable it via our «btusb.disable_fake_csr_forcesuspend_hack=1» kernel option. I believe this is the most generic way of doing it, given the constraints and by still having a good out-of-the-box experience. No clone left behind. Cc: stable@vger.kernel.org Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> --- drivers/bluetooth/btusb.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-)