mbox series

[v2,0/2] media: qcom: camss: Fix two CAMSS bugs found by dogfooding with SoftISP

Message ID 20240716-linux-next-24-07-13-camss-fixes-v2-0-e60c9f6742f2@linaro.org
Headers show
Series media: qcom: camss: Fix two CAMSS bugs found by dogfooding with SoftISP | expand

Message

Bryan O'Donoghue July 16, 2024, 10:13 p.m. UTC
v2:
- Updates commits with Johan's Review/Reported tags
- Adds Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com
- Cc's stable
- Adds in suggested kernel log to allow others to more easily match kernel
  log to fixes
- Link to v1: https://lore.kernel.org/r/20240714-linux-next-24-07-13-camss-fixes-v1-0-8f8954bc8c85@linaro.org

V1:
Dogfooding with SoftISP has uncovered two bugs in this series which I'm
posting fixes for.

- The first error:
  A simple race condition which to be honest I'm surprised I haven't found
  earlier nor has anybody else. Simply stated the order we typically
  end up loading CAMSS on boot has masked out the pm_runtime_enable() race
  condition that has been present in CAMSS for a long time.

  If you blacklist qcom-camss in modules.d and then modprobe after boot,
  the race condition shows up easily.

  Moving the pm_runtime_enable prior to subdevice registration fixes the
  problem.

The second error:
  Nomenclature:
    - CSIPHY: CSI Physical layer analogue to digital domain serialiser
    - CSID: CSI Decoder
    - VFE: Video Front End
    - RDI: Raw Data Interface
    - VC: Virtual Channel

  In order to support streaming multiple virtual-channels on the same RDI a
  V4L2 provided use_count variable is used to decide whether or not to actually
  terminate streaming and release buffers for 'msm_vfe_rdiX'.

  Unfortunately use_count indicates the number of times msm_vfe_rdiX has
  been opened by user-space not the number of concurrent streams on
  msm_vfe_rdiX.

  Simply stated use_count and stream_count are two different things.

  The silicon enabling code to select between VCs is valid but, a different
  solution needs to be found to support _concurrent_ VC streams.

  Right now the upstream use_count as-is is breaking the non concurrent VC
  case and I don't believe there are upstream users of concurrent VCs on
  CAMSS.

  This series implements a revert for the invalid use_count check,
  retaining the ability to select which VC is active on the RDI.

  Dogfooding with libcamera's SoftISP in Hangouts, Zoom and multiple runs
  of libcamera's "qcam" application is a very different test-case to the
  simple capture of frames we previously did when validating the
  'use_count' change.

  A partial revert in expectation of a renewed push to fixup that
  concurrent VC issue is included.

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
Bryan O'Donoghue (2):
      media: qcom: camss: Remove use_count guard in stop_streaming
      media: qcom: camss: Fix ordering of pm_runtime_enable

 drivers/media/platform/qcom/camss/camss-video.c | 6 ------
 drivers/media/platform/qcom/camss/camss.c       | 5 +++--
 2 files changed, 3 insertions(+), 8 deletions(-)
---
base-commit: c6ce8f9ab92edc9726996a0130bfc1c408132d47
change-id: 20240713-linux-next-24-07-13-camss-fixes-fa98c0965a5d

Best regards,

Comments

Johan Hovold July 17, 2024, 9:06 a.m. UTC | #1
On Tue, Jul 16, 2024 at 11:13:24PM +0100, Bryan O'Donoghue wrote:
> The use_count check was introduced so that multiple concurrent Raw Data
> Interfaces RDIs could be driven by different virtual channels VCs on the
> CSIPHY input driving the video pipeline.
> 
> This is an invalid use of use_count though as use_count pertains to the
> number of times a video entity has been opened by user-space not the number
> of active streams.
> 
> If use_count and stream-on count don't agree then stop_streaming() will
> break as is currently the case and has become apparent when using CAMSS
> with libcamera's released softisp 0.3.
> 
> The use of use_count like this is a bit hacky and right now breaks regular
> usage of CAMSS for a single stream case. As an example the "qcam"
> application in libcamera will fail with an -EBUSY result on stream stop and
> cannot then subsequently be restarted.

No, stopping qcam results in the splat below, and then it cannot be
started again and any attempts to do so fails with -EBUSY.

> The kernel log for this fault looks like this:
> 
> [ 1265.509831] WARNING: CPU: 5 PID: 919 at drivers/media/common/videobuf2/videobuf2-core.c:2183 __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common]
> ...
> [ 1265.510630] Call trace:
> [ 1265.510636]  __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common]
> [ 1265.510648]  vb2_core_streamoff+0x24/0xcc [videobuf2_common]
> [ 1265.510660]  vb2_ioctl_streamoff+0x5c/0xa8 [videobuf2_v4l2]
> [ 1265.510673]  v4l_streamoff+0x24/0x30 [videodev]
> [ 1265.510707]  __video_do_ioctl+0x190/0x3f4 [videodev]
> [ 1265.510732]  video_usercopy+0x304/0x8c4 [videodev]
> [ 1265.510757]  video_ioctl2+0x18/0x34 [videodev]
> [ 1265.510782]  v4l2_ioctl+0x40/0x60 [videodev]
> ...
> [ 1265.510944] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 0 in active state
> [ 1265.511175] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 1 in active state
> [ 1265.511398] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 2 in active st

Johan
Bryan O'Donoghue July 17, 2024, 10:43 a.m. UTC | #2
On 17/07/2024 10:06, Johan Hovold wrote:
>> The use of use_count like this is a bit hacky and right now breaks regular
>> usage of CAMSS for a single stream case. As an example the "qcam"
>> application in libcamera will fail with an -EBUSY result on stream stop and
>> cannot then subsequently be restarted.
> No, stopping qcam results in the splat below, and then it cannot be
> started again and any attempts to do so fails with -EBUSY.

I thought that's what I said.

Let me reword the commit log with your sentence included directly :)

---
bod
Johan Hovold July 17, 2024, 11:04 a.m. UTC | #3
On Wed, Jul 17, 2024 at 11:43:02AM +0100, Bryan O'Donoghue wrote:
> On 17/07/2024 10:06, Johan Hovold wrote:
> >> The use of use_count like this is a bit hacky and right now breaks regular
> >> usage of CAMSS for a single stream case. As an example the "qcam"
> >> application in libcamera will fail with an -EBUSY result on stream stop and
> >> cannot then subsequently be restarted.
> > No, stopping qcam results in the splat below, and then it cannot be
> > started again and any attempts to do so fails with -EBUSY.
> 
> I thought that's what I said.

I read the above as if stopping the stream fails with -EBUSY, when it's
really restarting the stream that fails that way after the first stop.

> Let me reword the commit log with your sentence included directly :)

Sounds good.

Johan