Message ID | 20220107093455.73766-2-wenst@chromium.org |
---|---|
State | New |
Headers | show |
Series | media: hantro: jpeg: Various improvements | expand |
Hi Chen-Yu, The series looks good, thanks for picking up this task. Just a one comment. On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote: > In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a > wmb() was inserted before the final register write that starts the > encoder. In v11, it was removed and the second-to-last register write > was changed to a non-relaxed write, which has an implicit wmb() [1]. > The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there > is another writel_relaxed() following the non-relaxed one. > > Turns out only the last writel() needs to be non-relaxed. Device I/O > mappings already guarantee strict ordering to the same endpoint, and > the writel() triggering the hardware would force all writes to memory > to be observed before the writel() to the hardware is observed. > > [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/ > > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> > --- > drivers/staging/media/hantro/hantro_h1_jpeg_enc.c | 3 +-- > drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +-- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c > index 1450013d3685..03db1c3444f8 100644 > --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c > +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c > @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx) > | H1_REG_AXI_CTRL_INPUT_SWAP32 > | H1_REG_AXI_CTRL_OUTPUT_SWAP8 > | H1_REG_AXI_CTRL_INPUT_SWAP8; > - /* Make sure that all registers are written at this point. */ > - vepu_write(vpu, reg, H1_REG_AXI_CTRL); > + vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL); > As far as I can remember, this logic comes from really old Chromium Kernels. You might be right, and this barrier isn't needed... but then OTOH the comment is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision). I don't have RK3288 boards near me, but in any case, I'm not sure we'd be able to test this easily (maybe there are issues that only trigger under a certain load). I'd personally avoid this one change, but if you are confident enough with it that's fine too. Thanks! Ezequiel > reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width)) > | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height)) > diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c > index 4df16f59fb97..b931fc5fa1a9 100644 > --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c > +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c > @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx) > | VEPU_REG_INPUT_SWAP8 > | VEPU_REG_INPUT_SWAP16 > | VEPU_REG_INPUT_SWAP32; > - /* Make sure that all registers are written at this point. */ > - vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN); > + vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN); > > reg = VEPU_REG_AXI_CTRL_BURST_LEN(16); > vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL); > -- > 2.34.1.575.g55b058a8bb-goog >
Hi Chen-Yu, I'll take patches 2-8. So should I mark patch 1/8 as 'Rejected' or 'Changes Requested' in patchwork? Regards, Hans On 1/19/22 11:08, Chen-Yu Tsai wrote: > Hi, > > On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia > <ezequiel@vanguardiasur.com.ar> wrote: >> >> Hi Chen-Yu, >> >> The series looks good, thanks for picking up this task. >> >> Just a one comment. >> >> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote: >>> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a >>> wmb() was inserted before the final register write that starts the >>> encoder. In v11, it was removed and the second-to-last register write >>> was changed to a non-relaxed write, which has an implicit wmb() [1]. >>> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there >>> is another writel_relaxed() following the non-relaxed one. >>> >>> Turns out only the last writel() needs to be non-relaxed. Device I/O >>> mappings already guarantee strict ordering to the same endpoint, and >>> the writel() triggering the hardware would force all writes to memory >>> to be observed before the writel() to the hardware is observed. >>> >>> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/ >>> >>> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> >>> --- >>> drivers/staging/media/hantro/hantro_h1_jpeg_enc.c | 3 +-- >>> drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +-- >>> 2 files changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c >>> index 1450013d3685..03db1c3444f8 100644 >>> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c >>> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c >>> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx) >>> | H1_REG_AXI_CTRL_INPUT_SWAP32 >>> | H1_REG_AXI_CTRL_OUTPUT_SWAP8 >>> | H1_REG_AXI_CTRL_INPUT_SWAP8; >>> - /* Make sure that all registers are written at this point. */ >>> - vepu_write(vpu, reg, H1_REG_AXI_CTRL); >>> + vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL); >>> >> >> As far as I can remember, this logic comes from really old Chromium Kernels. >> You might be right, and this barrier isn't needed... but then OTOH the comment >> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision). > > I just realized that my commit log is wrong. > > " ... a wmb() was inserted before the final register write that starts the > encoder. ... " . It is actually before the second-to-last register write. > >> I don't have RK3288 boards near me, but in any case, I'm not sure >> we'd be able to test this easily (maybe there are issues that only >> trigger under a certain load). > > I see. I do have a Veyron around that I haven't used in awhile. But as you > said, it might not be an obvious hardware limitation. > >> I'd personally avoid this one change, but if you are confident enough with it >> that's fine too. > > Unfortunately they didn't leave a whole lot of clues around. For most hardware, > as I mentioned in the commit log, I think the final non-relaxed write should > suffice. I'd point to the decoder drivers not having any barriers or > non-relaxed writes except the final one, but IIUC they are actually two > distinct pieces of hardware. > > I suspect we will never know. This JPEG encoder doesn't seem to get used > a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the > extra barrier and get tested a lot, but that's only testing the RK3399. > > Hans, would it be possible for you to skip this patch and pick the rest? > Or would you like me to resent without this one? > > > Thanks > ChenYu > >> Thanks! >> Ezequiel >> >>> reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width)) >>> | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height)) >>> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c >>> index 4df16f59fb97..b931fc5fa1a9 100644 >>> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c >>> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c >>> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx) >>> | VEPU_REG_INPUT_SWAP8 >>> | VEPU_REG_INPUT_SWAP16 >>> | VEPU_REG_INPUT_SWAP32; >>> - /* Make sure that all registers are written at this point. */ >>> - vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN); >>> + vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN); >>> >>> reg = VEPU_REG_AXI_CTRL_BURST_LEN(16); >>> vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL); >>> -- >>> 2.34.1.575.g55b058a8bb-goog >>>
On Thu, Jan 20, 2022 at 10:20 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote: > > Hi Chen-Yu, > > I'll take patches 2-8. Got it. > So should I mark patch 1/8 as 'Rejected' or 'Changes Requested' in patchwork? I'd say "Changes Requested", but it won't really progress unless someone from Rockchip fills in the blanks about the wmb() though. ChenYu > Regards, > > Hans > > On 1/19/22 11:08, Chen-Yu Tsai wrote: > > Hi, > > > > On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia > > <ezequiel@vanguardiasur.com.ar> wrote: > >> > >> Hi Chen-Yu, > >> > >> The series looks good, thanks for picking up this task. > >> > >> Just a one comment. > >> > >> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote: > >>> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a > >>> wmb() was inserted before the final register write that starts the > >>> encoder. In v11, it was removed and the second-to-last register write > >>> was changed to a non-relaxed write, which has an implicit wmb() [1]. > >>> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there > >>> is another writel_relaxed() following the non-relaxed one. > >>> > >>> Turns out only the last writel() needs to be non-relaxed. Device I/O > >>> mappings already guarantee strict ordering to the same endpoint, and > >>> the writel() triggering the hardware would force all writes to memory > >>> to be observed before the writel() to the hardware is observed. > >>> > >>> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/ > >>> > >>> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> > >>> --- > >>> drivers/staging/media/hantro/hantro_h1_jpeg_enc.c | 3 +-- > >>> drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +-- > >>> 2 files changed, 2 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c > >>> index 1450013d3685..03db1c3444f8 100644 > >>> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c > >>> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c > >>> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx) > >>> | H1_REG_AXI_CTRL_INPUT_SWAP32 > >>> | H1_REG_AXI_CTRL_OUTPUT_SWAP8 > >>> | H1_REG_AXI_CTRL_INPUT_SWAP8; > >>> - /* Make sure that all registers are written at this point. */ > >>> - vepu_write(vpu, reg, H1_REG_AXI_CTRL); > >>> + vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL); > >>> > >> > >> As far as I can remember, this logic comes from really old Chromium Kernels. > >> You might be right, and this barrier isn't needed... but then OTOH the comment > >> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision). > > > > I just realized that my commit log is wrong. > > > > " ... a wmb() was inserted before the final register write that starts the > > encoder. ... " . It is actually before the second-to-last register write. > > > >> I don't have RK3288 boards near me, but in any case, I'm not sure > >> we'd be able to test this easily (maybe there are issues that only > >> trigger under a certain load). > > > > I see. I do have a Veyron around that I haven't used in awhile. But as you > > said, it might not be an obvious hardware limitation. > > > >> I'd personally avoid this one change, but if you are confident enough with it > >> that's fine too. > > > > Unfortunately they didn't leave a whole lot of clues around. For most hardware, > > as I mentioned in the commit log, I think the final non-relaxed write should > > suffice. I'd point to the decoder drivers not having any barriers or > > non-relaxed writes except the final one, but IIUC they are actually two > > distinct pieces of hardware. > > > > I suspect we will never know. This JPEG encoder doesn't seem to get used > > a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the > > extra barrier and get tested a lot, but that's only testing the RK3399. > > > > Hans, would it be possible for you to skip this patch and pick the rest? > > Or would you like me to resent without this one? > > > > > > Thanks > > ChenYu > > > >> Thanks! > >> Ezequiel > >> > >>> reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width)) > >>> | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height)) > >>> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c > >>> index 4df16f59fb97..b931fc5fa1a9 100644 > >>> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c > >>> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c > >>> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx) > >>> | VEPU_REG_INPUT_SWAP8 > >>> | VEPU_REG_INPUT_SWAP16 > >>> | VEPU_REG_INPUT_SWAP32; > >>> - /* Make sure that all registers are written at this point. */ > >>> - vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN); > >>> + vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN); > >>> > >>> reg = VEPU_REG_AXI_CTRL_BURST_LEN(16); > >>> vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL); > >>> -- > >>> 2.34.1.575.g55b058a8bb-goog > >>>
diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c index 1450013d3685..03db1c3444f8 100644 --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx) | H1_REG_AXI_CTRL_INPUT_SWAP32 | H1_REG_AXI_CTRL_OUTPUT_SWAP8 | H1_REG_AXI_CTRL_INPUT_SWAP8; - /* Make sure that all registers are written at this point. */ - vepu_write(vpu, reg, H1_REG_AXI_CTRL); + vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL); reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width)) | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height)) diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c index 4df16f59fb97..b931fc5fa1a9 100644 --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx) | VEPU_REG_INPUT_SWAP8 | VEPU_REG_INPUT_SWAP16 | VEPU_REG_INPUT_SWAP32; - /* Make sure that all registers are written at this point. */ - vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN); + vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN); reg = VEPU_REG_AXI_CTRL_BURST_LEN(16); vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a wmb() was inserted before the final register write that starts the encoder. In v11, it was removed and the second-to-last register write was changed to a non-relaxed write, which has an implicit wmb() [1]. The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there is another writel_relaxed() following the non-relaxed one. Turns out only the last writel() needs to be non-relaxed. Device I/O mappings already guarantee strict ordering to the same endpoint, and the writel() triggering the hardware would force all writes to memory to be observed before the writel() to the hardware is observed. [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/ Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> --- drivers/staging/media/hantro/hantro_h1_jpeg_enc.c | 3 +-- drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-)