Message ID | 20230707092414.866760-1-zyytlz.wz@163.com |
---|---|
State | Accepted |
Commit | c677d7ae83141d390d1253abebafa49c962afb52 |
Headers | show |
Series | [RESEND,v2] media: mtk-jpeg: Fix use after free bug due to uncanceled work | expand |
Hi, This issue has not been resolved for a long time. Is there anyone who can help? Best regards, Zheng Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: > > > > On 07/07/2023 11:24, Zheng Wang wrote: > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > and mtk_jpeg_enc_device_run may be called to start the > > work. > > If we remove the module which will call mtk_jpeg_remove > > to make cleanup, there may be a unfinished work. The > > possible sequence is as follows, which will cause a > > typical UAF bug. > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > CPU0 CPU1 > > > > |mtk_jpeg_job_timeout_work > > mtk_jpeg_remove | > > v4l2_m2m_release | > > kfree(m2m_dev); | > > | > > | v4l2_m2m_get_curr_priv > > | m2m_dev->curr_ctx //use > > Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> > > -- > Regards, > Alexandre
Friendly ping Zheng Hacker <hackerzheng666@gmail.com> 于2023年7月16日周日 00:08写道: > > Hi, > > This issue has not been resolved for a long time. Is there anyone who can help? > > Best regards, > Zheng > > Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: > > > > > > > > On 07/07/2023 11:24, Zheng Wang wrote: > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > and mtk_jpeg_enc_device_run may be called to start the > > > work. > > > If we remove the module which will call mtk_jpeg_remove > > > to make cleanup, there may be a unfinished work. The > > > possible sequence is as follows, which will cause a > > > typical UAF bug. > > > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > > > CPU0 CPU1 > > > > > > |mtk_jpeg_job_timeout_work > > > mtk_jpeg_remove | > > > v4l2_m2m_release | > > > kfree(m2m_dev); | > > > | > > > | v4l2_m2m_get_curr_priv > > > | m2m_dev->curr_ctx //use > > > > Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> > > > > -- > > Regards, > > Alexandre
On 18/07/2023 05:07, Zheng Hacker wrote: > Friendly ping > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年7月16日周日 00:08写道: >> >> Hi, >> >> This issue has not been resolved for a long time. Is there anyone who can help? >> >> Best regards, >> Zheng >> >> Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: >>> >>> >>> >>> On 07/07/2023 11:24, Zheng Wang wrote: >>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with >>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run >>>> and mtk_jpeg_enc_device_run may be called to start the >>>> work. >>>> If we remove the module which will call mtk_jpeg_remove >>>> to make cleanup, there may be a unfinished work. The >>>> possible sequence is as follows, which will cause a >>>> typical UAF bug. >>>> >>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove >>>> >>>> CPU0 CPU1 >>>> >>>> |mtk_jpeg_job_timeout_work >>>> mtk_jpeg_remove | >>>> v4l2_m2m_release | >>>> kfree(m2m_dev); | >>>> | >>>> | v4l2_m2m_get_curr_priv >>>> | m2m_dev->curr_ctx //use >>> >>> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> >>> >>> -- >>> Regards, >>> Alexandre Hi Zheng, If you asking me to merge patch, sorry but I can't, I'm just a reviewer. I invite you to ping the maintainers directly: Bin Liu <bin.liu@mediatek.com> (supporter:MEDIATEK JPEG DRIVER) Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB)) Matthias Brugger <matthias.bgg@gmail.com> (maintainer:ARM/Mediatek SoC support) Otherwise, I misunderstood what you asking me. If so, can you rephrase your question please?
Thanks dor your kind reply. I'll try to connect them later. Best regards, Zheng Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月19日周三 18:17写道: > > > > On 18/07/2023 05:07, Zheng Hacker wrote: > > Friendly ping > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年7月16日周日 00:08写道: > >> > >> Hi, > >> > >> This issue has not been resolved for a long time. Is there anyone who can help? > >> > >> Best regards, > >> Zheng > >> > >> Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: > >>> > >>> > >>> > >>> On 07/07/2023 11:24, Zheng Wang wrote: > >>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > >>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > >>>> and mtk_jpeg_enc_device_run may be called to start the > >>>> work. > >>>> If we remove the module which will call mtk_jpeg_remove > >>>> to make cleanup, there may be a unfinished work. The > >>>> possible sequence is as follows, which will cause a > >>>> typical UAF bug. > >>>> > >>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove > >>>> > >>>> CPU0 CPU1 > >>>> > >>>> |mtk_jpeg_job_timeout_work > >>>> mtk_jpeg_remove | > >>>> v4l2_m2m_release | > >>>> kfree(m2m_dev); | > >>>> | > >>>> | v4l2_m2m_get_curr_priv > >>>> | m2m_dev->curr_ctx //use > >>> > >>> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> > >>> > >>> -- > >>> Regards, > >>> Alexandre > > Hi Zheng, > > If you asking me to merge patch, sorry but I can't, I'm just a reviewer. > I invite you to ping the maintainers directly: > > Bin Liu <bin.liu@mediatek.com> (supporter:MEDIATEK JPEG DRIVER) > Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT > INFRASTRUCTURE (V4L/DVB)) > Matthias Brugger <matthias.bgg@gmail.com> (maintainer:ARM/Mediatek SoC > support) > > Otherwise, I misunderstood what you asking me. If so, can you rephrase > your question please? > > -- > Regards, > Alexandre
On Fri, Jul 7, 2023 at 5:25 PM Zheng Wang <zyytlz.wz@163.com> wrote: > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > and mtk_jpeg_enc_device_run may be called to start the > work. > If we remove the module which will call mtk_jpeg_remove > to make cleanup, there may be a unfinished work. The > possible sequence is as follows, which will cause a > typical UAF bug. > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > CPU0 CPU1 > > |mtk_jpeg_job_timeout_work > mtk_jpeg_remove | > v4l2_m2m_release | > kfree(m2m_dev); | > | > | v4l2_m2m_get_curr_priv > | m2m_dev->curr_ctx //use > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Hello, Is there any follow-up about this issue? Thanks, Zheng AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> 于2023年7月20日周四 15:45写道: > > Il 07/07/23 11:24, Zheng Wang ha scritto: > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > and mtk_jpeg_enc_device_run may be called to start the > > work. > > If we remove the module which will call mtk_jpeg_remove > > to make cleanup, there may be a unfinished work. The > > possible sequence is as follows, which will cause a > > typical UAF bug. > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > CPU0 CPU1 > > > > |mtk_jpeg_job_timeout_work > > mtk_jpeg_remove | > > v4l2_m2m_release | > > kfree(m2m_dev); | > > | > > | v4l2_m2m_get_curr_priv > > | m2m_dev->curr_ctx //use > > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> > >
Hello Zheng, On 7/7/23 12:24, Zheng Wang wrote: > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > and mtk_jpeg_enc_device_run may be called to start the > work. > If we remove the module which will call mtk_jpeg_remove > to make cleanup, there may be a unfinished work. The > possible sequence is as follows, which will cause a > typical UAF bug. > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > CPU0 CPU1 > > |mtk_jpeg_job_timeout_work > mtk_jpeg_remove | > v4l2_m2m_release | > kfree(m2m_dev); | > | > | v4l2_m2m_get_curr_priv > | m2m_dev->curr_ctx //use > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > --- > - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie. > --- > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > index 0051f372a66c..6069ecf420b0 100644 > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev) > { > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); > > + cancel_delayed_work_sync(&jpeg->job_timeout_work); > pm_runtime_disable(&pdev->dev); > video_unregister_device(jpeg->vdev); > v4l2_m2m_release(jpeg->m2m_dev); AFAICS, there is a fundamental problem here. The job_timeout_work uses v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded, all the v4l contexts must be closed and released. Hence the v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when work is executed before cancel_delayed_work_sync(). At the time when mtk_jpeg_remove() is invoked, there shall be no job_timeout_work running in background because all jobs should be completed before context is released. If you'll look at v4l2_m2m_cancel_job(), you can see that it waits for the task completion before closing context. You shouldn't be able to remove driver module while it has active/opened v4l contexts. If you can do that, then this is yours bug that needs to be fixed. In addition to this all, the job_timeout_work is initialized only for the single-core JPEG device. I'd expect this patch should crash multi-core JPEG devices.
Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道: > > Hello Zheng, > > On 7/7/23 12:24, Zheng Wang wrote: > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > and mtk_jpeg_enc_device_run may be called to start the > > work. > > If we remove the module which will call mtk_jpeg_remove > > to make cleanup, there may be a unfinished work. The > > possible sequence is as follows, which will cause a > > typical UAF bug. > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > CPU0 CPU1 > > > > |mtk_jpeg_job_timeout_work > > mtk_jpeg_remove | > > v4l2_m2m_release | > > kfree(m2m_dev); | > > | > > | v4l2_m2m_get_curr_priv > > | m2m_dev->curr_ctx //use > > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > --- > > - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie. > > --- > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > index 0051f372a66c..6069ecf420b0 100644 > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev) > > { > > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); > > > > + cancel_delayed_work_sync(&jpeg->job_timeout_work); > > pm_runtime_disable(&pdev->dev); > > video_unregister_device(jpeg->vdev); > > v4l2_m2m_release(jpeg->m2m_dev); > > AFAICS, there is a fundamental problem here. The job_timeout_work uses > v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded, > all the v4l contexts must be closed and released. Hence the > v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when > work is executed before cancel_delayed_work_sync(). > Hi Dmitry, Thanks for your reply. I think you're right. As m2m_dev is freed in v4l2_m2m_release, the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null pointer dereference bug. I am sure that context is closed when we invoke mtk_jpeg_remove. But I'm not sure if context is released when mtk_jpegdec_timeout_work running. > At the time when mtk_jpeg_remove() is invoked, there shall be no > job_timeout_work running in background because all jobs should be > completed before context is released. If you'll look at > v4l2_m2m_cancel_job(), you can see that it waits for the task completion > before closing context. Yes, so I think the better way is to put the cancel_delayed_work_sync invoking into v4l2_m2m_ctx_release function? > > You shouldn't be able to remove driver module while it has active/opened > v4l contexts. If you can do that, then this is yours bug that needs to > be fixed. > > In addition to this all, the job_timeout_work is initialized only for > the single-core JPEG device. I'd expect this patch should crash > multi-core JPEG devices. > I think that's true. As I'm not familiar with the code here. Could you please give me some advice about the patch? Best regards, Zheng > -- > Best regards, > Dmitry >
On 8/24/23 11:20, Zheng Hacker wrote: > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道: > >> >> Hello Zheng, >> >> On 7/7/23 12:24, Zheng Wang wrote: >>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with >>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run >>> and mtk_jpeg_enc_device_run may be called to start the >>> work. >>> If we remove the module which will call mtk_jpeg_remove >>> to make cleanup, there may be a unfinished work. The >>> possible sequence is as follows, which will cause a >>> typical UAF bug. >>> >>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove >>> >>> CPU0 CPU1 >>> >>> |mtk_jpeg_job_timeout_work >>> mtk_jpeg_remove | >>> v4l2_m2m_release | >>> kfree(m2m_dev); | >>> | >>> | v4l2_m2m_get_curr_priv >>> | m2m_dev->curr_ctx //use >>> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") >>> Signed-off-by: Zheng Wang <zyytlz.wz@163.com> >>> --- >>> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie. >>> --- >>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c >>> index 0051f372a66c..6069ecf420b0 100644 >>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c >>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c >>> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev) >>> { >>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); >>> >>> + cancel_delayed_work_sync(&jpeg->job_timeout_work); >>> pm_runtime_disable(&pdev->dev); >>> video_unregister_device(jpeg->vdev); >>> v4l2_m2m_release(jpeg->m2m_dev); >> >> AFAICS, there is a fundamental problem here. The job_timeout_work uses >> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded, >> all the v4l contexts must be closed and released. Hence the >> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when >> work is executed before cancel_delayed_work_sync(). >> > > Hi Dmitry, > > Thanks for your reply. I think you're right. As m2m_dev is freed in > v4l2_m2m_release, > the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null > pointer dereference > bug. I am sure that context is closed when we invoke mtk_jpeg_remove. > But I'm not sure if > context is released when mtk_jpegdec_timeout_work running. > >> At the time when mtk_jpeg_remove() is invoked, there shall be no >> job_timeout_work running in background because all jobs should be >> completed before context is released. If you'll look at >> v4l2_m2m_cancel_job(), you can see that it waits for the task completion >> before closing context. > > Yes, so I think the better way is to put the cancel_delayed_work_sync > invoking into > v4l2_m2m_ctx_release function? The v4l2_m2m_ctx_release() already should wait for the job_timeout_work completion or for the interrupt fire. Apparently it doesn't work in yours case. You'll need to debug why v4l job or job_timeout_work is running after v4l2_m2m_ctx_release(), it shouldn't happen. The interrupt handler cancels job_timeout_work, you shouldn't need to flush the work. Technically, interrupt handler may race with job_timeout_work, but the timeout is set to 1 second and in practice should be difficult to trigger the race. The interrupt handler needs to be threaded, it should use cancel_delayed_work_sync() and check the return value of this function. >> >> You shouldn't be able to remove driver module while it has active/opened >> v4l contexts. If you can do that, then this is yours bug that needs to >> be fixed. >> >> In addition to this all, the job_timeout_work is initialized only for >> the single-core JPEG device. I'd expect this patch should crash >> multi-core JPEG devices. >> > > I think that's true. As I'm not familiar with the code here. Could you > please give me some advice about the patch? We'll need to understand why v4l2_m2m_ctx_release() doesn't work as expected before thinking about the patch.
On 8/31/23 11:18, Zheng Hacker wrote: >> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work >> completion or for the interrupt fire. Apparently it doesn't work in >> yours case. You'll need to debug why v4l job or job_timeout_work is >> running after v4l2_m2m_ctx_release(), it shouldn't happen. >> > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish > to trigger that. > > However, this is not the only path to call v4l2_m2m_job_finish. Here > is a invoking chain: > v4l_streamon > ->v4l2_m2m_ioctl_streamon > ->v4l2_m2m_streamon > ->v4l2_m2m_try_schedule > ->v4l2_m2m_try_run > ->mtk_jpeg_dec_device_run > ->schedule_delayed_work(&jpeg->job_timeout_work... > ->error path goto dec_end > ->v4l2_m2m_job_finish > > In some specific situation, it starts the worker and also calls > v4l2_m2m_job_finish, which might > make v4l2_m2m_cancel_job continues. Then the error path should cancel the job_timeout_work, or better job needs to be run after the dec/enc has been started and not before. Looking further at the code, I'm confused by this hunk: mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); The job should be marked as finished when h/w has finished processing the job and not right after the job has been started. So the job is always completed and mtk_jpeg_job_timeout_work() doesn't work as expected, am I missing something?
On 9/19/23 21:24, Dmitry Osipenko wrote: > On 8/31/23 11:18, Zheng Hacker wrote: >>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work >>> completion or for the interrupt fire. Apparently it doesn't work in >>> yours case. You'll need to debug why v4l job or job_timeout_work is >>> running after v4l2_m2m_ctx_release(), it shouldn't happen. >>> >> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, >> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish >> to trigger that. >> >> However, this is not the only path to call v4l2_m2m_job_finish. Here >> is a invoking chain: >> v4l_streamon >> ->v4l2_m2m_ioctl_streamon >> ->v4l2_m2m_streamon >> ->v4l2_m2m_try_schedule >> ->v4l2_m2m_try_run >> ->mtk_jpeg_dec_device_run >> ->schedule_delayed_work(&jpeg->job_timeout_work... >> ->error path goto dec_end >> ->v4l2_m2m_job_finish >> >> In some specific situation, it starts the worker and also calls >> v4l2_m2m_job_finish, which might >> make v4l2_m2m_cancel_job continues. > > Then the error path should cancel the job_timeout_work, or better job s/job/timeout work/ > needs to be run after the dec/enc has been started and not before. > > Looking further at the code, I'm confused by this hunk: > > mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); > v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); > > The job should be marked as finished when h/w has finished processing > the job and not right after the job has been started. So the job is > always completed and mtk_jpeg_job_timeout_work() doesn't work as > expected, am I missing something? >
Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年9月20日周三 02:24写道: > > On 8/31/23 11:18, Zheng Hacker wrote: > >> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work > >> completion or for the interrupt fire. Apparently it doesn't work in > >> yours case. You'll need to debug why v4l job or job_timeout_work is > >> running after v4l2_m2m_ctx_release(), it shouldn't happen. > >> > > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, > > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish > > to trigger that. > > > > However, this is not the only path to call v4l2_m2m_job_finish. Here > > is a invoking chain: > > v4l_streamon > > ->v4l2_m2m_ioctl_streamon > > ->v4l2_m2m_streamon > > ->v4l2_m2m_try_schedule > > ->v4l2_m2m_try_run > > ->mtk_jpeg_dec_device_run > > ->schedule_delayed_work(&jpeg->job_timeout_work... > > ->error path goto dec_end > > ->v4l2_m2m_job_finish > > > > In some specific situation, it starts the worker and also calls > > v4l2_m2m_job_finish, which might > > make v4l2_m2m_cancel_job continues. > > Then the error path should cancel the job_timeout_work, or better job > needs to be run after the dec/enc has been started and not before. > Hi, Sorry for my late reply for I just went on a long vacation. Get it. I'll write another patch and change the summary to the lack of canceling job in error path. > Looking further at the code, I'm confused by this hunk: > > mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); > v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); > > The job should be marked as finished when h/w has finished processing > the job and not right after the job has been started. So the job is > always completed and mtk_jpeg_job_timeout_work() doesn't work as > expected, am I missing something? After reading the code I still don't know. I didn't see any function like mtk_jpeg_dec_end. The same thing happens on mtk_jpeg_enc_start. I think I'd better fix the first problem and wait for someone familiar with the second part. Best regards, Zheng > > -- > Best regards, > Dmitry >
On 10/8/23 12:13, Zheng Hacker wrote: > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年9月20日周三 02:24写道: >> >> On 8/31/23 11:18, Zheng Hacker wrote: >>>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work >>>> completion or for the interrupt fire. Apparently it doesn't work in >>>> yours case. You'll need to debug why v4l job or job_timeout_work is >>>> running after v4l2_m2m_ctx_release(), it shouldn't happen. >>>> >>> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, >>> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish >>> to trigger that. >>> >>> However, this is not the only path to call v4l2_m2m_job_finish. Here >>> is a invoking chain: >>> v4l_streamon >>> ->v4l2_m2m_ioctl_streamon >>> ->v4l2_m2m_streamon >>> ->v4l2_m2m_try_schedule >>> ->v4l2_m2m_try_run >>> ->mtk_jpeg_dec_device_run >>> ->schedule_delayed_work(&jpeg->job_timeout_work... >>> ->error path goto dec_end >>> ->v4l2_m2m_job_finish >>> >>> In some specific situation, it starts the worker and also calls >>> v4l2_m2m_job_finish, which might >>> make v4l2_m2m_cancel_job continues. >> >> Then the error path should cancel the job_timeout_work, or better job >> needs to be run after the dec/enc has been started and not before. >> > > Hi, > > Sorry for my late reply for I just went on a long vacation. > > Get it. I'll write another patch and change the summary to the lack of > canceling job in error path. > >> Looking further at the code, I'm confused by this hunk: >> >> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); >> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); >> >> The job should be marked as finished when h/w has finished processing >> the job and not right after the job has been started. So the job is >> always completed and mtk_jpeg_job_timeout_work() doesn't work as >> expected, am I missing something? > > After reading the code I still don't know. I didn't see any function > like mtk_jpeg_dec_end. The same thing > happens on mtk_jpeg_enc_start. I think I'd better fix the first > problem and wait for someone familiar with > the second part. I missed that the code mentioned above is related to the multi-core hw version, while you care about single-core. Nevertheless, the multi-core device_run() looks incorrect, So, the error code paths need to be corrected. Please try to revert yours fix and test this change: diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c index 0051f372a66c..fd3b0587fcad 100644 --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c @@ -1254,9 +1254,6 @@ static void mtk_jpegdec_worker(struct work_struct *work) v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx); v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx); - schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); - mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, @@ -1266,6 +1263,9 @@ static void mtk_jpegdec_worker(struct work_struct *work) goto setdst_end; } + schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); + spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags); ctx->total_frame_num++; mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base); @@ -1330,13 +1330,13 @@ static void mtk_jpeg_dec_device_run(void *priv) if (ret < 0) goto dec_end; - schedule_delayed_work(&jpeg->job_timeout_work, - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); - mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb)) goto dec_end; + schedule_delayed_work(&jpeg->job_timeout_work, + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); + spin_lock_irqsave(&jpeg->hw_lock, flags); mtk_jpeg_dec_reset(jpeg->reg_base); mtk_jpeg_dec_set_config(jpeg->reg_base,
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c index 0051f372a66c..6069ecf420b0 100644 --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev) { struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); + cancel_delayed_work_sync(&jpeg->job_timeout_work); pm_runtime_disable(&pdev->dev); video_unregister_device(jpeg->vdev); v4l2_m2m_release(jpeg->m2m_dev);
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run and mtk_jpeg_enc_device_run may be called to start the work. If we remove the module which will call mtk_jpeg_remove to make cleanup, there may be a unfinished work. The possible sequence is as follows, which will cause a typical UAF bug. Fix it by canceling the work before cleanup in the mtk_jpeg_remove CPU0 CPU1 |mtk_jpeg_job_timeout_work mtk_jpeg_remove | v4l2_m2m_release | kfree(m2m_dev); | | | v4l2_m2m_get_curr_priv | m2m_dev->curr_ctx //use Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> --- - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie. --- drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 + 1 file changed, 1 insertion(+)