Message ID | 20220818203308.439043-1-nicolas.dufresne@collabora.com |
---|---|
Headers | show |
Series | cedrus: Various bug fixes | expand |
Dne četrtek, 18. avgust 2022 ob 22:33:08 CEST je Nicolas Dufresne napisal(a): > From: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > The busy status bit may never de-assert if number of programmed skip > bits is incorrect, resulting in a kernel hang because the bit is polled > endlessly in the code. Fix it by adding timeout for the bit-polling. > This problem is reproducible by setting the data_bit_offset field of > the HEVC slice params to a wrong value by userspace. > > Cc: stable@vger.kernel.org > Reported-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Fixes tag would be nice. > --- > drivers/staging/media/sunxi/cedrus/cedrus_h265.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c index > f703c585d91c5..f0bc118021b0a 100644 > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > @@ -227,6 +227,7 @@ static void cedrus_h265_pred_weight_write(struct > cedrus_dev *dev, static void cedrus_h265_skip_bits(struct cedrus_dev *dev, > int num) { > int count = 0; > + u32 reg; > > while (count < num) { > int tmp = min(num - count, 32); > @@ -234,8 +235,9 @@ static void cedrus_h265_skip_bits(struct cedrus_dev > *dev, int num) cedrus_write(dev, VE_DEC_H265_TRIGGER, > VE_DEC_H265_TRIGGER_FLUSH_BITS | > VE_DEC_H265_TRIGGER_TYPE_N_BITS(tmp)); > - while (cedrus_read(dev, VE_DEC_H265_STATUS) & > VE_DEC_H265_STATUS_VLD_BUSY) - udelay(1); > + > + if (cedrus_wait_for(dev, VE_DEC_H265_STATUS, > VE_DEC_H265_STATUS_VLD_BUSY)) + dev_err_ratelimited(dev->dev, "timed out > waiting to skip bits\n"); Reporting issue is nice, but better would be to propagate error, since there is no way to properly decode this slice if above code block fails. Best regards, Jernej > > count += tmp; > }
Le vendredi 19 août 2022 à 06:16 +0200, Jernej Škrabec a écrit : > Dne četrtek, 18. avgust 2022 ob 22:33:08 CEST je Nicolas Dufresne napisal(a): > > From: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > > > The busy status bit may never de-assert if number of programmed skip > > bits is incorrect, resulting in a kernel hang because the bit is polled > > endlessly in the code. Fix it by adding timeout for the bit-polling. > > This problem is reproducible by setting the data_bit_offset field of > > the HEVC slice params to a wrong value by userspace. > > > > Cc: stable@vger.kernel.org > > Reported-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > > Fixes tag would be nice. > > > --- > > drivers/staging/media/sunxi/cedrus/cedrus_h265.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > > b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c index > > f703c585d91c5..f0bc118021b0a 100644 > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > > @@ -227,6 +227,7 @@ static void cedrus_h265_pred_weight_write(struct > > cedrus_dev *dev, static void cedrus_h265_skip_bits(struct cedrus_dev *dev, > > int num) { > > int count = 0; > > + u32 reg; > > > > while (count < num) { > > int tmp = min(num - count, 32); > > @@ -234,8 +235,9 @@ static void cedrus_h265_skip_bits(struct cedrus_dev > > *dev, int num) cedrus_write(dev, VE_DEC_H265_TRIGGER, > > VE_DEC_H265_TRIGGER_FLUSH_BITS | > > VE_DEC_H265_TRIGGER_TYPE_N_BITS(tmp)); > > - while (cedrus_read(dev, VE_DEC_H265_STATUS) & > > VE_DEC_H265_STATUS_VLD_BUSY) - udelay(1); > > + > > + if (cedrus_wait_for(dev, VE_DEC_H265_STATUS, > > VE_DEC_H265_STATUS_VLD_BUSY)) + > dev_err_ratelimited(dev->dev, "timed out > > waiting to skip bits\n"); > > Reporting issue is nice, but better would be to propagate error, since there > is no way to properly decode this slice if above code block fails. This mimic what was already there, mind if we do that later ? The propagation is doing to be a lot more intrusive. > > Best regards, > Jernej > > > > > count += tmp; > > } > > > >
Dne petek, 19. avgust 2022 ob 17:37:20 CEST je Nicolas Dufresne napisal(a): > Le vendredi 19 août 2022 à 06:17 +0200, Jernej Škrabec a écrit : > > Dne četrtek, 18. avgust 2022 ob 22:33:07 CEST je Nicolas Dufresne napisal(a): > > > From: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > > > > > The cedrus_hw_resume() crashes with NULL deference on driver probe if > > > runtime PM is disabled because it uses platform data that hasn't been > > > set up yet. Fix this by setting the platform data earlier during probe. > > > > Does it even work without PM? Maybe it would be better if Cedrus would > > select PM in Kconfig. > > I cannot comment myself on this, but it does not seem to invalidate this > Dmitry's fix. If NULL pointer dereference happens only when PM is disabled, then it does. I have PM always enabled and I never experienced above issue. Best regards, Jernej > > > Best regards, > > Jernej > > > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > > > --- > > > > > > drivers/staging/media/sunxi/cedrus/cedrus.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus.c index > > > 960a0130cd620..55c54dfdc585c 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c > > > @@ -448,6 +448,8 @@ static int cedrus_probe(struct platform_device > > > *pdev) > > > > > > if (!dev) > > > > > > return -ENOMEM; > > > > > > + platform_set_drvdata(pdev, dev); > > > + > > > > > > dev->vfd = cedrus_video_device; > > > dev->dev = &pdev->dev; > > > dev->pdev = pdev; > > > > > > @@ -521,8 +523,6 @@ static int cedrus_probe(struct platform_device > > > *pdev) > > > > > > goto err_m2m_mc; > > > > > > } > > > > > > - platform_set_drvdata(pdev, dev); > > > - > > > > > > return 0; > > > > > > err_m2m_mc:
On 8/20/22 11:25, Jernej Škrabec wrote: > Dne petek, 19. avgust 2022 ob 17:37:20 CEST je Nicolas Dufresne napisal(a): >> Le vendredi 19 août 2022 à 06:17 +0200, Jernej Škrabec a écrit : >>> Dne četrtek, 18. avgust 2022 ob 22:33:07 CEST je Nicolas Dufresne > napisal(a): >>>> From: Dmitry Osipenko <dmitry.osipenko@collabora.com> >>>> >>>> The cedrus_hw_resume() crashes with NULL deference on driver probe if >>>> runtime PM is disabled because it uses platform data that hasn't been >>>> set up yet. Fix this by setting the platform data earlier during probe. >>> >>> Does it even work without PM? Maybe it would be better if Cedrus would >>> select PM in Kconfig. >> >> I cannot comment myself on this, but it does not seem to invalidate this >> Dmitry's fix. > > If NULL pointer dereference happens only when PM is disabled, then it does. I > have PM always enabled and I never experienced above issue. Originally this fix was needed when I changed cedrus_hw_probe() to do the rpm-resume while was debugging the hang issue and then also noticed that the !PM should be broken. It's a good common practice for all drivers to have the drv data set early to avoid such problems, hence it won't hurt to make this change anyways. In practice nobody disables PM other than for debugging purposes and !PM handling is often broken in drivers. I assume that it might be even better to enable PM for all Allwiner SoCs and remove !PM handling from all the affected drivers, like it was done for Tegra some time ago.
Dne nedelja, 21. avgust 2022 ob 22:40:21 CEST je Dmitry Osipenko napisal(a): > On 8/20/22 11:25, Jernej Škrabec wrote: > > Dne petek, 19. avgust 2022 ob 17:37:20 CEST je Nicolas Dufresne napisal(a): > >> Le vendredi 19 août 2022 à 06:17 +0200, Jernej Škrabec a écrit : > >>> Dne četrtek, 18. avgust 2022 ob 22:33:07 CEST je Nicolas Dufresne > > > > napisal(a): > >>>> From: Dmitry Osipenko <dmitry.osipenko@collabora.com> > >>>> > >>>> The cedrus_hw_resume() crashes with NULL deference on driver probe if > >>>> runtime PM is disabled because it uses platform data that hasn't been > >>>> set up yet. Fix this by setting the platform data earlier during probe. > >>> > >>> Does it even work without PM? Maybe it would be better if Cedrus would > >>> select PM in Kconfig. > >> > >> I cannot comment myself on this, but it does not seem to invalidate this > >> Dmitry's fix. > > > > If NULL pointer dereference happens only when PM is disabled, then it > > does. I have PM always enabled and I never experienced above issue. > > Originally this fix was needed when I changed cedrus_hw_probe() to do > the rpm-resume while was debugging the hang issue and then also noticed > that the !PM should be broken. It's a good common practice for all > drivers to have the drv data set early to avoid such problems, hence it > won't hurt to make this change anyways. Ok, as others pointed out, this is still a good thing, so: Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> > > In practice nobody disables PM other than for debugging purposes and !PM > handling is often broken in drivers. I assume that it might be even > better to enable PM for all Allwiner SoCs and remove !PM handling from > all the affected drivers, like it was done for Tegra some time ago. > Maybe in the future :) Best regards, Jernej > -- > Best regards, > Dmitry
Dne petek, 19. avgust 2022 ob 17:39:25 CEST je Nicolas Dufresne napisal(a): > Le vendredi 19 août 2022 à 06:16 +0200, Jernej Škrabec a écrit : > > Dne četrtek, 18. avgust 2022 ob 22:33:08 CEST je Nicolas Dufresne napisal(a): > > > From: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > > > > > The busy status bit may never de-assert if number of programmed skip > > > bits is incorrect, resulting in a kernel hang because the bit is polled > > > endlessly in the code. Fix it by adding timeout for the bit-polling. > > > This problem is reproducible by setting the data_bit_offset field of > > > the HEVC slice params to a wrong value by userspace. > > > > > > Cc: stable@vger.kernel.org > > > Reported-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > > > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> > > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > > > > Fixes tag would be nice. > > > > > --- > > > > > > drivers/staging/media/sunxi/cedrus/cedrus_h265.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c index > > > f703c585d91c5..f0bc118021b0a 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c > > > @@ -227,6 +227,7 @@ static void cedrus_h265_pred_weight_write(struct > > > cedrus_dev *dev, static void cedrus_h265_skip_bits(struct cedrus_dev > > > *dev, > > > int num) { > > > > > > int count = 0; > > > > > > + u32 reg; > > > > > > while (count < num) { > > > > > > int tmp = min(num - count, 32); > > > > > > @@ -234,8 +235,9 @@ static void cedrus_h265_skip_bits(struct cedrus_dev > > > *dev, int num) cedrus_write(dev, VE_DEC_H265_TRIGGER, > > > > > > VE_DEC_H265_TRIGGER_FLUSH_BITS | > > > VE_DEC_H265_TRIGGER_TYPE_N_BITS(tmp)); > > > > > > - while (cedrus_read(dev, VE_DEC_H265_STATUS) & > > > VE_DEC_H265_STATUS_VLD_BUSY) - udelay(1); > > > + > > > + if (cedrus_wait_for(dev, VE_DEC_H265_STATUS, > > > VE_DEC_H265_STATUS_VLD_BUSY)) + > > > > dev_err_ratelimited(dev->dev, "timed out > > > > > waiting to skip bits\n"); > > > > Reporting issue is nice, but better would be to propagate error, since > > there is no way to properly decode this slice if above code block fails. > This mimic what was already there, mind if we do that later ? The > propagation is doing to be a lot more intrusive. Since backporting fixes before 6.0 isn't priority, viability for backpporting isn't that important. You would only need to return 0 or -ETIMEDOUT and check for error in only one location. That doesn't sound very intrusive for me. Best regards, Jernej > > > Best regards, > > Jernej > > > > > count += tmp; > > > > > > }