Message ID | 20210825170902.11234-1-novikov@ispras.ru |
---|---|
State | New |
Headers | show |
Series | usb: ehci-orion: Handle errors of clk_prepare_enable() in probe | expand |
On Wed, Aug 25, 2021 at 08:09:02PM +0300, Evgeny Novikov wrote: > ehci_orion_drv_probe() did not account for possible errors of > clk_prepare_enable() Hi Evgeny Your patch looks good. Reviewed-by: Andrew Lunn <andrew@lunn.ch> At the time this was added, clk_prepare_enable() could not actually fail, the clocks are built in, there was no error path that could trigger. I've no idea if this is still true, so please do have this patch merged. Andrew
Hi Alan, On 25.08.2021 20:29, Alan Stern wrote: > On Wed, Aug 25, 2021 at 08:09:02PM +0300, Evgeny Novikov wrote: >> ehci_orion_drv_probe() did not account for possible errors of >> clk_prepare_enable() that in particular could cause invocation of >> clk_disable_unprepare() on clocks that were not prepared/enabled yet, >> e.g. in remove or on handling errors of usb_add_hcd() in probe. Though, >> there were several patches fixing different issues with clocks in this >> driver, they did not solve this problem. >> >> Add handling of errors of clk_prepare_enable() in ehci_orion_drv_probe() >> to avoid calls of clk_disable_unprepare() without previous successful >> invocation of clk_prepare_enable(). >> >> Found by Linux Driver Verification project (linuxtesting.org). >> >> Fixes: 8c869edaee07 ("ARM: Orion: EHCI: Add support for enabling clocks") >> Signed-off-by: Evgeny Novikov <novikov@ispras.ru> >> Co-developed-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> >> Signed-off-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> >> --- > Acked-by: Alan Stern <stern@rowland.harvard.edu> > > Do you intend to submit patches for the other EHCI drivers that don't > check for errors in clk_prepare_enable()? It looks like > ehci-atmel.c, ehci-mv.c, and ehci-spear.c all need some attention. > > The same is true for a bunch of the OHCI drivers: ohci-at91.c, > ohci-exynos.c, ohci-s3c2410.c, and ohci-spear.c. > > Didn't the Linux Driver Verification project identify this problem in > all of these drivers? Our verification framework report numerous issues like the one for which I sent the given patch. There are many warnings for different USB drivers and other types of device drivers as well. We sent several patches that were acknowledged by the developers already, though, after the Andrew's reply [1] I have doubts that we need to treat these warnings as potential bugs and fix them. The verification framework performs static analysis in a way that I described before [2]. Regarding the clock API it uses such models of clk_prepare() and clk_enable() that can fail independently on underlying hardware since is not easy to either model all such hardware or try to relate and consider corresponding drivers in addition to drivers using clocks at verification. Thus, potentially the verification framework can produce warnings for all drivers that invoke clk_prepare(), clk_enable() or clk_prepare_enable(), but do not check for their return values. I look forward whether you will confirm that it makes sense to handle errors from mentioned functions anyway or it would be better not to sent such bug reports unless we will be strictly sure that they can happen. In the former case it would be better if somebody will teach built-in Linux kernel static analyzers to detect these issues on a regular basis. [1] https://lkml.org/lkml/2021/8/25/794 [2] https://lkml.org/lkml/2021/8/17/239 Best regards, Evgeny Novikov > Alan Stern > >> drivers/usb/host/ehci-orion.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/usb/host/ehci-orion.c b/drivers/usb/host/ehci-orion.c >> index a319b1df3011..3626758b3e2a 100644 >> --- a/drivers/usb/host/ehci-orion.c >> +++ b/drivers/usb/host/ehci-orion.c >> @@ -264,8 +264,11 @@ static int ehci_orion_drv_probe(struct platform_device *pdev) >> * the clock does not exists. >> */ >> priv->clk = devm_clk_get(&pdev->dev, NULL); >> - if (!IS_ERR(priv->clk)) >> - clk_prepare_enable(priv->clk); >> + if (!IS_ERR(priv->clk)) { >> + err = clk_prepare_enable(priv->clk); >> + if (err) >> + goto err_put_hcd; >> + } >> >> priv->phy = devm_phy_optional_get(&pdev->dev, "usb"); >> if (IS_ERR(priv->phy)) { >> @@ -311,6 +314,7 @@ static int ehci_orion_drv_probe(struct platform_device *pdev) >> err_dis_clk: >> if (!IS_ERR(priv->clk)) >> clk_disable_unprepare(priv->clk); >> +err_put_hcd: >> usb_put_hcd(hcd); >> err: >> dev_err(&pdev->dev, "init %s fail, %d\n", >> -- >> 2.26.2 >>
On Thu, Aug 26, 2021 at 04:30:22PM +0300, Evgeny Novikov wrote: > Hi Alan, > > On 25.08.2021 20:29, Alan Stern wrote: > > On Wed, Aug 25, 2021 at 08:09:02PM +0300, Evgeny Novikov wrote: > > > ehci_orion_drv_probe() did not account for possible errors of > > > clk_prepare_enable() that in particular could cause invocation of > > > clk_disable_unprepare() on clocks that were not prepared/enabled yet, > > > e.g. in remove or on handling errors of usb_add_hcd() in probe. Though, > > > there were several patches fixing different issues with clocks in this > > > driver, they did not solve this problem. > > > > > > Add handling of errors of clk_prepare_enable() in ehci_orion_drv_probe() > > > to avoid calls of clk_disable_unprepare() without previous successful > > > invocation of clk_prepare_enable(). > > > > > > Found by Linux Driver Verification project (linuxtesting.org). > > > > > > Fixes: 8c869edaee07 ("ARM: Orion: EHCI: Add support for enabling clocks") > > > Signed-off-by: Evgeny Novikov <novikov@ispras.ru> > > > Co-developed-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> > > > Signed-off-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> > > > --- > > Acked-by: Alan Stern <stern@rowland.harvard.edu> > > > > Do you intend to submit patches for the other EHCI drivers that don't > > check for errors in clk_prepare_enable()? It looks like > > ehci-atmel.c, ehci-mv.c, and ehci-spear.c all need some attention. > > > > The same is true for a bunch of the OHCI drivers: ohci-at91.c, > > ohci-exynos.c, ohci-s3c2410.c, and ohci-spear.c. > > > > Didn't the Linux Driver Verification project identify this problem in > > all of these drivers? > Our verification framework report numerous issues like the one for which I > sent the given patch. There are many warnings for different USB drivers and > other types of device drivers as well. We sent several patches that were > acknowledged by the developers already, though, after the Andrew's reply [1] > I have doubts that we need to treat these warnings as potential bugs and fix > them. The verification framework performs static analysis in a way that I > described before [2]. Regarding the clock API it uses such models of > clk_prepare() and clk_enable() that can fail independently on underlying > hardware since is not easy to either model all such hardware or try to > relate and consider corresponding drivers in addition to drivers using > clocks at verification. Thus, potentially the verification framework can > produce warnings for all drivers that invoke clk_prepare(), clk_enable() or > clk_prepare_enable(), but do not check for their return values. > > I look forward whether you will confirm that it makes sense to handle errors > from mentioned functions anyway or it would be better not to sent such bug > reports unless we will be strictly sure that they can happen. In the former > case it would be better if somebody will teach built-in Linux kernel static > analyzers to detect these issues on a regular basis. I don't know whether these errors can occur or not. To find out, you need to ask someone who knows more about the clock framework. On the other hand, the fact that the functions do return an error code means that they expect callers to check its value. In fact, whoever changed the API should have gone through all the callers to make sure they did so. (Let's put it this way: If those functions can return an error, we should check the return code. If they can't return an error then they shouldn't be defined to return an int, so the API should be changed.) So on the whole, I think making these changes would be a good thing. At the very least, it will help make all the different EHCI and OHCI drivers consistent with each other. Alan Stern > [1] https://lkml.org/lkml/2021/8/25/794 > [2] https://lkml.org/lkml/2021/8/17/239 > > Best regards, > Evgeny Novikov
On 26.08.2021 18:24, Alan Stern wrote: > On Thu, Aug 26, 2021 at 04:30:22PM +0300, Evgeny Novikov wrote: >> Hi Alan, >> >> On 25.08.2021 20:29, Alan Stern wrote: >>> On Wed, Aug 25, 2021 at 08:09:02PM +0300, Evgeny Novikov wrote: >>>> ehci_orion_drv_probe() did not account for possible errors of >>>> clk_prepare_enable() that in particular could cause invocation of >>>> clk_disable_unprepare() on clocks that were not prepared/enabled yet, >>>> e.g. in remove or on handling errors of usb_add_hcd() in probe. Though, >>>> there were several patches fixing different issues with clocks in this >>>> driver, they did not solve this problem. >>>> >>>> Add handling of errors of clk_prepare_enable() in ehci_orion_drv_probe() >>>> to avoid calls of clk_disable_unprepare() without previous successful >>>> invocation of clk_prepare_enable(). >>>> >>>> Found by Linux Driver Verification project (linuxtesting.org). >>>> >>>> Fixes: 8c869edaee07 ("ARM: Orion: EHCI: Add support for enabling clocks") >>>> Signed-off-by: Evgeny Novikov <novikov@ispras.ru> >>>> Co-developed-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> >>>> Signed-off-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> >>>> --- >>> Acked-by: Alan Stern <stern@rowland.harvard.edu> >>> >>> Do you intend to submit patches for the other EHCI drivers that don't >>> check for errors in clk_prepare_enable()? It looks like >>> ehci-atmel.c, ehci-mv.c, and ehci-spear.c all need some attention. >>> >>> The same is true for a bunch of the OHCI drivers: ohci-at91.c, >>> ohci-exynos.c, ohci-s3c2410.c, and ohci-spear.c. >>> >>> Didn't the Linux Driver Verification project identify this problem in >>> all of these drivers? >> Our verification framework report numerous issues like the one for which I >> sent the given patch. There are many warnings for different USB drivers and >> other types of device drivers as well. We sent several patches that were >> acknowledged by the developers already, though, after the Andrew's reply [1] >> I have doubts that we need to treat these warnings as potential bugs and fix >> them. The verification framework performs static analysis in a way that I >> described before [2]. Regarding the clock API it uses such models of >> clk_prepare() and clk_enable() that can fail independently on underlying >> hardware since is not easy to either model all such hardware or try to >> relate and consider corresponding drivers in addition to drivers using >> clocks at verification. Thus, potentially the verification framework can >> produce warnings for all drivers that invoke clk_prepare(), clk_enable() or >> clk_prepare_enable(), but do not check for their return values. >> >> I look forward whether you will confirm that it makes sense to handle errors >> from mentioned functions anyway or it would be better not to sent such bug >> reports unless we will be strictly sure that they can happen. In the former >> case it would be better if somebody will teach built-in Linux kernel static >> analyzers to detect these issues on a regular basis. > I don't know whether these errors can occur or not. To find out, you need to > ask someone who knows more about the clock framework. > > On the other hand, the fact that the functions do return an error code means > that they expect callers to check its value. In fact, whoever changed the API > should have gone through all the callers to make sure they did so. > > (Let's put it this way: If those functions can return an error, we should > check the return code. If they can't return an error then they shouldn't be > defined to return an int, so the API should be changed.) > > So on the whole, I think making these changes would be a good thing. At the > very least, it will help make all the different EHCI and OHCI drivers > consistent with each other. I agree with your reasoning. Even though today these bugs can not happen since appropriate hardware and its drivers do not fail in a "necessary" way ever, this can suddenly change in the future. Maybe it will not be so easy to track the root causes especially taking into account that failures happen very seldom. I am unready to make any considerable changes in the API. Also, as I mentioned before, it would be better if a bunch of appropriate fixes will be prepared by somebody who involved in development of the Linux kernel much more than me and who can automatize the process of finding such issues using static analysis tools. Most likely those tools are able to find all such places very quickly even for drivers on very specific architectures. I added Dan to the discussion since he is a developer of one of such tools. Best regards, Evgeny Novikov > Alan Stern > >> [1] https://lkml.org/lkml/2021/8/25/794 >> [2] https://lkml.org/lkml/2021/8/17/239 >> >> Best regards, >> Evgeny Novikov
On Thu, Aug 26, 2021 at 07:26:22PM +0300, Evgeny Novikov wrote: > I added Dan to the discussion since he is a developer of one of such > tools. Thanks for that... :P I never warn about "forgot to check the return" bugs except in the case of error pointers or allocation failures. There are too many false positives. If people want to do that they should add a __must_check attribute to the function. You linked to another thread: https://lkml.org/lkml/2021/8/17/239 That patch isn't correct. Miquel was on the right track but not 100%. The nand_scan() calls mxic_nfc_clk_enable() so we should disable it until it has been successfully enabled. The current code can trigger a WARN() in __clk_disable(). In other words it should have been: diff --git a/drivers/mtd/nand/raw/mxic_nand.c b/drivers/mtd/nand/raw/mxic_nand.c index da1070993994..87aef98f5b9d 100644 --- a/drivers/mtd/nand/raw/mxic_nand.c +++ b/drivers/mtd/nand/raw/mxic_nand.c @@ -535,11 +535,11 @@ static int mxic_nfc_probe(struct platform_device *pdev) err = devm_request_irq(&pdev->dev, irq, mxic_nfc_isr, 0, "mxic-nfc", nfc); if (err) - goto fail; + return err; err = nand_scan(nand_chip, 1); if (err) - goto fail; + return err; err = mtd_device_register(mtd, NULL, 0); if (err) nand_scan() error handling is still leaky, but it's hard to fix without re-working the API. Anyway, thanks for the fixes. I've been inspired by the Linux Driver Verification project work. regards, dan carpenter
Hi Dan, On 27.08.2021 14:51, Dan Carpenter wrote: > On Thu, Aug 26, 2021 at 07:26:22PM +0300, Evgeny Novikov wrote: >> I added Dan to the discussion since he is a developer of one of such >> tools. > Thanks for that... :P > > I never warn about "forgot to check the return" bugs except in the case > of error pointers or allocation failures. There are too many false > positives. If people want to do that they should add a __must_check > attribute to the function. Maybe you will be able to convince the developers of the clock framework to add this attribute to some of their functions. For instance, this is already the case, say, for clk_bulk_prepare() and clk_bulk_enable() that seem to represent a number of clk_prepare() and clk_enable(). For those functions the __must_check attribute was added in commit 6e0d4ff4580c without providing any specific reasons why it is necessary for them and is not necessary for usual clk_prepare() and clk_enable(). > You linked to another thread: https://lkml.org/lkml/2021/8/17/239 > > That patch isn't correct. Miquel was on the right track but not 100%. > The nand_scan() calls mxic_nfc_clk_enable() so we should disable it > until it has been successfully enabled. The current code can trigger a > WARN() in __clk_disable(). In other words it should have been: > > diff --git a/drivers/mtd/nand/raw/mxic_nand.c b/drivers/mtd/nand/raw/mxic_nand.c > index da1070993994..87aef98f5b9d 100644 > --- a/drivers/mtd/nand/raw/mxic_nand.c > +++ b/drivers/mtd/nand/raw/mxic_nand.c > @@ -535,11 +535,11 @@ static int mxic_nfc_probe(struct platform_device *pdev) > err = devm_request_irq(&pdev->dev, irq, mxic_nfc_isr, > 0, "mxic-nfc", nfc); > if (err) > - goto fail; > + return err; > > err = nand_scan(nand_chip, 1); > if (err) > - goto fail; > + return err; > > err = mtd_device_register(mtd, NULL, 0); > if (err) Thank you for this explanation. Now I understand better the Miquel's comment. Nevertheless, I still have doubts that your fix is completely correct sinceĀ mxic_nfc_set_freq() invokes mxic_nfc_clk_disable() first that still should raise a warning. It seems that the driver developers are looking on this issue, so, let's wait a bug fix from them. At least they will be able to test that everything is okay after all. > nand_scan() error handling is still leaky, but it's hard to fix without > re-working the API. > > Anyway, thanks for the fixes. I've been inspired by the Linux Driver > Verification project work. It would be great to collaborate with each other. For instance, for the aforementioned clock API your tool can perform better checking and find more potential bugs in some (maybe even all) cases due to a number of reasons. Unless it will be possible to detect all target issues automatically with static analysis tools, we can try to reveal some of the remaining ones with our heavyweight approach. Best regards, Evgeny Novikov > regards, > dan carpenter
Hi Alan, On 26.08.2021 18:24, Alan Stern wrote: > On Thu, Aug 26, 2021 at 04:30:22PM +0300, Evgeny Novikov wrote: >> Hi Alan, >> >> On 25.08.2021 20:29, Alan Stern wrote: >>> On Wed, Aug 25, 2021 at 08:09:02PM +0300, Evgeny Novikov wrote: >>>> ehci_orion_drv_probe() did not account for possible errors of >>>> clk_prepare_enable() that in particular could cause invocation of >>>> clk_disable_unprepare() on clocks that were not prepared/enabled yet, >>>> e.g. in remove or on handling errors of usb_add_hcd() in probe. Though, >>>> there were several patches fixing different issues with clocks in this >>>> driver, they did not solve this problem. >>>> >>>> Add handling of errors of clk_prepare_enable() in ehci_orion_drv_probe() >>>> to avoid calls of clk_disable_unprepare() without previous successful >>>> invocation of clk_prepare_enable(). >>>> >>>> Found by Linux Driver Verification project (linuxtesting.org). >>>> >>>> Fixes: 8c869edaee07 ("ARM: Orion: EHCI: Add support for enabling clocks") >>>> Signed-off-by: Evgeny Novikov <novikov@ispras.ru> >>>> Co-developed-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> >>>> Signed-off-by: Kirill Shilimanov <kirill.shilimanov@huawei.com> >>>> --- >>> Acked-by: Alan Stern <stern@rowland.harvard.edu> >>> >>> Do you intend to submit patches for the other EHCI drivers that don't >>> check for errors in clk_prepare_enable()? It looks like >>> ehci-atmel.c, ehci-mv.c, and ehci-spear.c all need some attention. >>> >>> The same is true for a bunch of the OHCI drivers: ohci-at91.c, >>> ohci-exynos.c, ohci-s3c2410.c, and ohci-spear.c. >>> >>> Didn't the Linux Driver Verification project identify this problem in >>> all of these drivers? >> Our verification framework report numerous issues like the one for which I >> sent the given patch. There are many warnings for different USB drivers and >> other types of device drivers as well. We sent several patches that were >> acknowledged by the developers already, though, after the Andrew's reply [1] >> I have doubts that we need to treat these warnings as potential bugs and fix >> them. The verification framework performs static analysis in a way that I >> described before [2]. Regarding the clock API it uses such models of >> clk_prepare() and clk_enable() that can fail independently on underlying >> hardware since is not easy to either model all such hardware or try to >> relate and consider corresponding drivers in addition to drivers using >> clocks at verification. Thus, potentially the verification framework can >> produce warnings for all drivers that invoke clk_prepare(), clk_enable() or >> clk_prepare_enable(), but do not check for their return values. >> >> I look forward whether you will confirm that it makes sense to handle errors >> from mentioned functions anyway or it would be better not to sent such bug >> reports unless we will be strictly sure that they can happen. In the former >> case it would be better if somebody will teach built-in Linux kernel static >> analyzers to detect these issues on a regular basis. > I don't know whether these errors can occur or not. To find out, you need to > ask someone who knows more about the clock framework. > > On the other hand, the fact that the functions do return an error code means > that they expect callers to check its value. In fact, whoever changed the API > should have gone through all the callers to make sure they did so. > > (Let's put it this way: If those functions can return an error, we should > check the return code. If they can't return an error then they shouldn't be > defined to return an int, so the API should be changed.) > > So on the whole, I think making these changes would be a good thing. At the > very least, it will help make all the different EHCI and OHCI drivers > consistent with each other. Though I may be wrong, but after the discussion with Dan, it does not seem that we can expect any considerable changes in the clock API and support from the static analysis tools soon. So, if you still would like to see corresponding fixes in EHCI and OHCI drivers, I can prepare them. Best regards, Evgeny Novikov > Alan Stern > >> [1] https://lkml.org/lkml/2021/8/25/794 >> [2] https://lkml.org/lkml/2021/8/17/239 >> >> Best regards, >> Evgeny Novikov
On Sat, Aug 28, 2021 at 01:47:12PM +0300, Evgeny Novikov wrote: > Hi Alan, > > On 26.08.2021 18:24, Alan Stern wrote: > > I don't know whether these errors can occur or not. To find out, you need to > > ask someone who knows more about the clock framework. > > > > On the other hand, the fact that the functions do return an error code means > > that they expect callers to check its value. In fact, whoever changed the API > > should have gone through all the callers to make sure they did so. > > > > (Let's put it this way: If those functions can return an error, we should > > check the return code. If they can't return an error then they shouldn't be > > defined to return an int, so the API should be changed.) > > > > So on the whole, I think making these changes would be a good thing. At the > > very least, it will help make all the different EHCI and OHCI drivers > > consistent with each other. > Though I may be wrong, but after the discussion with Dan, it does not seem > that we can expect any considerable changes in the clock API and support > from the static analysis tools soon. So, if you still would like to see > corresponding fixes in EHCI and OHCI drivers, I can prepare them. Yes, please do so. Alan Stern
diff --git a/drivers/usb/host/ehci-orion.c b/drivers/usb/host/ehci-orion.c index a319b1df3011..3626758b3e2a 100644 --- a/drivers/usb/host/ehci-orion.c +++ b/drivers/usb/host/ehci-orion.c @@ -264,8 +264,11 @@ static int ehci_orion_drv_probe(struct platform_device *pdev) * the clock does not exists. */ priv->clk = devm_clk_get(&pdev->dev, NULL); - if (!IS_ERR(priv->clk)) - clk_prepare_enable(priv->clk); + if (!IS_ERR(priv->clk)) { + err = clk_prepare_enable(priv->clk); + if (err) + goto err_put_hcd; + } priv->phy = devm_phy_optional_get(&pdev->dev, "usb"); if (IS_ERR(priv->phy)) { @@ -311,6 +314,7 @@ static int ehci_orion_drv_probe(struct platform_device *pdev) err_dis_clk: if (!IS_ERR(priv->clk)) clk_disable_unprepare(priv->clk); +err_put_hcd: usb_put_hcd(hcd); err: dev_err(&pdev->dev, "init %s fail, %d\n",