mbox series

[v4,00/11] Preemption support for A7XX

Message ID 20240917-preemption-a750-t-v4-0-95d48012e0ac@gmail.com
Headers show
Series Preemption support for A7XX | expand

Message

Antonino Maniscalco Sept. 17, 2024, 11:14 a.m. UTC
This series implements preemption for A7XX targets, which allows the GPU to
switch to an higher priority ring when work is pushed to it, reducing latency
for high priority submissions.

This series enables L1 preemption with skip_save_restore which requires
the following userspace patches to function:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544

A flag is added to `msm_submitqueue_create` to only allow submissions
from compatible userspace to be preempted, therefore maintaining
compatibility.

Preemption is currently only enabled by default on A750, it can be
enabled on other targets through the `enable_preemption` module
parameter. This is because more testing is required on other targets.

For testing on other HW it is sufficient to set that parameter to a
value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
allows to run any application as high priority therefore preempting
submissions from other applications.

The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
added in this series can be used to observe preemption's behavior as
well as measuring preemption latency.

Some commits from this series are based on a previous series to enable
preemption on A6XX targets:

https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org

Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
---
Changes in v4:
- Added missing register in pwrup list 
- Removed and rearrange barriers
- Renamed `skip_inline_wptr` to `restore_wptr`
- Track ctx seqno per ring
- Removed secure preempt context
- NOP out postamble to disable it instantly
- Only emit pwrup reglist once
- Document bv_rptr_addr
- Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
- Set name on preempt record buffer
- Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com

Changes in v3:
- Added documentation about preemption
- Use quirks to determine which target supports preemption
- Add a module parameter to force disabling or enabling preemption
- Clear postamble when profiling
- Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
- Make preemption records MAP_PRIV
- Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
  anymore
- Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com

Changes in v2:
- Added preept_record_size for X185 in PATCH 3/7
- Added patches to reset perf counters
- Dropped unused defines
- Dropped unused variable (fixes warning)
- Only enable preemption on a750
- Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
- Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
- Added Neil's Tested-By tags
- Added explanation for UAPI changes in commit message
- Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com

---
Antonino Maniscalco (11):
      drm/msm: Fix bv_fence being used as bv_rptr
      drm/msm/A6XX: Track current_ctx_seqno per ring
      drm/msm: Add a `preempt_record_size` field
      drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
      drm/msm/A6xx: Implement preemption for A7XX targets
      drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
      drm/msm/A6xx: Use posamble to reset counters on preemption
      drm/msm/A6xx: Add traces for preemption
      drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
      drm/msm/A6xx: Enable preemption for A750
      Documentation: document adreno preemption

 Documentation/gpu/msm-preemption.rst               |  98 +++++
 drivers/gpu/drm/msm/Makefile                       |   1 +
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
 drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
 drivers/gpu/drm/msm/msm_drv.c                      |   4 +
 drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
 drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
 drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
 drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
 drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
 drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
 .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
 include/uapi/drm/msm_drm.h                         |   5 +-
 20 files changed, 1117 insertions(+), 66 deletions(-)
---
base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
change-id: 20240815-preemption-a750-t-fcee9a844b39

Best regards,

Comments

Akhil P Oommen Sept. 20, 2024, 4:14 p.m. UTC | #1
On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> <neil.armstrong@linaro.org> wrote:
> >
> > Hi,
> >
> > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > This series implements preemption for A7XX targets, which allows the GPU to
> > > switch to an higher priority ring when work is pushed to it, reducing latency
> > > for high priority submissions.
> > >
> > > This series enables L1 preemption with skip_save_restore which requires
> > > the following userspace patches to function:
> > >
> > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > >
> > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > from compatible userspace to be preempted, therefore maintaining
> > > compatibility.
> > >
> > > Preemption is currently only enabled by default on A750, it can be
> > > enabled on other targets through the `enable_preemption` module
> > > parameter. This is because more testing is required on other targets.
> > >
> > > For testing on other HW it is sufficient to set that parameter to a
> > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > allows to run any application as high priority therefore preempting
> > > submissions from other applications.
> > >
> > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > added in this series can be used to observe preemption's behavior as
> > > well as measuring preemption latency.
> > >
> > > Some commits from this series are based on a previous series to enable
> > > preemption on A6XX targets:
> > >
> > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > >
> > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > ---
> > > Changes in v4:
> > > - Added missing register in pwrup list
> > > - Removed and rearrange barriers
> > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > - Track ctx seqno per ring
> > > - Removed secure preempt context
> > > - NOP out postamble to disable it instantly
> > > - Only emit pwrup reglist once
> > > - Document bv_rptr_addr
> > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > - Set name on preempt record buffer
> > > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > >
> > > Changes in v3:
> > > - Added documentation about preemption
> > > - Use quirks to determine which target supports preemption
> > > - Add a module parameter to force disabling or enabling preemption
> > > - Clear postamble when profiling
> > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > - Make preemption records MAP_PRIV
> > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > >    anymore
> > > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > >
> > > Changes in v2:
> > > - Added preept_record_size for X185 in PATCH 3/7
> > > - Added patches to reset perf counters
> > > - Dropped unused defines
> > > - Dropped unused variable (fixes warning)
> > > - Only enable preemption on a750
> > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > - Added Neil's Tested-By tags
> > > - Added explanation for UAPI changes in commit message
> > > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > >
> > > ---
> > > Antonino Maniscalco (11):
> > >        drm/msm: Fix bv_fence being used as bv_rptr
> > >        drm/msm/A6XX: Track current_ctx_seqno per ring
> > >        drm/msm: Add a `preempt_record_size` field
> > >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > >        drm/msm/A6xx: Implement preemption for A7XX targets
> > >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > >        drm/msm/A6xx: Use posamble to reset counters on preemption
> > >        drm/msm/A6xx: Add traces for preemption
> > >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > >        drm/msm/A6xx: Enable preemption for A750
> > >        Documentation: document adreno preemption
> > >
> > >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> > >   drivers/gpu/drm/msm/Makefile                       |   1 +
> > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > >   include/uapi/drm/msm_drm.h                         |   5 +-
> > >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > > ---
> > > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > >
> > > Best regards,
> >
> > I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > and forced to 1, and I've seen no regression so far
> >
> > On SM8550, I've seen a few:
> > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > but it's unrelated to preempt
> >
> > and on SM8450:
> > platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> 
> I suspect on newer devices we have trouble resetting the GMU, leading
> to (what I assume is happening here) the CPU thinking the GMU is in a
> different state than it is.
> 
> Which has led to some stability issues on a660 in mesa CI, if anything
> crashes the gpu in the CI run it tends to kill the rest of the run
> until the board is power cycled.
> 
> https://gitlab.freedesktop.org/drm/msm/-/issues/37
> 
> I think we have some work to do on making recovery more robust on
> things newer than early a6xx things.

Is this seen only with a particular scenario or is recovery always
broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
not sure what exactly regressed. At least I didn't see any issue on
x185.

-Akhil.

> 
> BR,
> -R
> 
> > So you can also add:
> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> >
> > Thanks,
> > Neil
Akhil P Oommen Sept. 20, 2024, 5:09 p.m. UTC | #2
On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
> Hi,
> 
> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > This series implements preemption for A7XX targets, which allows the GPU to
> > switch to an higher priority ring when work is pushed to it, reducing latency
> > for high priority submissions.
> > 
> > This series enables L1 preemption with skip_save_restore which requires
> > the following userspace patches to function:
> > 
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > 
> > A flag is added to `msm_submitqueue_create` to only allow submissions
> > from compatible userspace to be preempted, therefore maintaining
> > compatibility.
> > 
> > Preemption is currently only enabled by default on A750, it can be
> > enabled on other targets through the `enable_preemption` module
> > parameter. This is because more testing is required on other targets.
> > 
> > For testing on other HW it is sufficient to set that parameter to a
> > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > allows to run any application as high priority therefore preempting
> > submissions from other applications.
> > 
> > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > added in this series can be used to observe preemption's behavior as
> > well as measuring preemption latency.
> > 
> > Some commits from this series are based on a previous series to enable
> > preemption on A6XX targets:
> > 
> > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > 
> > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > ---
> > Changes in v4:
> > - Added missing register in pwrup list
> > - Removed and rearrange barriers
> > - Renamed `skip_inline_wptr` to `restore_wptr`
> > - Track ctx seqno per ring
> > - Removed secure preempt context
> > - NOP out postamble to disable it instantly
> > - Only emit pwrup reglist once
> > - Document bv_rptr_addr
> > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > - Set name on preempt record buffer
> > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > 
> > Changes in v3:
> > - Added documentation about preemption
> > - Use quirks to determine which target supports preemption
> > - Add a module parameter to force disabling or enabling preemption
> > - Clear postamble when profiling
> > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > - Make preemption records MAP_PRIV
> > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >    anymore
> > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > 
> > Changes in v2:
> > - Added preept_record_size for X185 in PATCH 3/7
> > - Added patches to reset perf counters
> > - Dropped unused defines
> > - Dropped unused variable (fixes warning)
> > - Only enable preemption on a750
> > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > - Added Neil's Tested-By tags
> > - Added explanation for UAPI changes in commit message
> > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > 
> > ---
> > Antonino Maniscalco (11):
> >        drm/msm: Fix bv_fence being used as bv_rptr
> >        drm/msm/A6XX: Track current_ctx_seqno per ring
> >        drm/msm: Add a `preempt_record_size` field
> >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >        drm/msm/A6xx: Implement preemption for A7XX targets
> >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >        drm/msm/A6xx: Use posamble to reset counters on preemption
> >        drm/msm/A6xx: Add traces for preemption
> >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >        drm/msm/A6xx: Enable preemption for A750
> >        Documentation: document adreno preemption
> > 
> >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> >   drivers/gpu/drm/msm/Makefile                       |   1 +
> >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> >   include/uapi/drm/msm_drm.h                         |   5 +-
> >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > ---
> > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > 
> > Best regards,
> 
> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> and forced to 1, and I've seen no regression so far
> 
> On SM8550, I've seen a few:
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> but it's unrelated to preempt
> 
> and on SM8450:
> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> 
> So you can also add:
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> 

Niel,

On my x1e device, all submissions were somehow going into only a single
ring, even the compositor's. Not sure why. So effectively preemption was
not really exercised. I had to force one of the two benchmark I ran
using the "highprio" mesa debug flag force submittions to ring 0.

If possible it is a good idea to check the new preemption traces to
ensure preemption kicks in.

-Akhil

> Thanks,
> Neil
Rob Clark Sept. 20, 2024, 5:17 p.m. UTC | #3
On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> > On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> > <neil.armstrong@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > > This series implements preemption for A7XX targets, which allows the GPU to
> > > > switch to an higher priority ring when work is pushed to it, reducing latency
> > > > for high priority submissions.
> > > >
> > > > This series enables L1 preemption with skip_save_restore which requires
> > > > the following userspace patches to function:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > > >
> > > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > > from compatible userspace to be preempted, therefore maintaining
> > > > compatibility.
> > > >
> > > > Preemption is currently only enabled by default on A750, it can be
> > > > enabled on other targets through the `enable_preemption` module
> > > > parameter. This is because more testing is required on other targets.
> > > >
> > > > For testing on other HW it is sufficient to set that parameter to a
> > > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > > allows to run any application as high priority therefore preempting
> > > > submissions from other applications.
> > > >
> > > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > > added in this series can be used to observe preemption's behavior as
> > > > well as measuring preemption latency.
> > > >
> > > > Some commits from this series are based on a previous series to enable
> > > > preemption on A6XX targets:
> > > >
> > > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > > >
> > > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > > ---
> > > > Changes in v4:
> > > > - Added missing register in pwrup list
> > > > - Removed and rearrange barriers
> > > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > > - Track ctx seqno per ring
> > > > - Removed secure preempt context
> > > > - NOP out postamble to disable it instantly
> > > > - Only emit pwrup reglist once
> > > > - Document bv_rptr_addr
> > > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > > - Set name on preempt record buffer
> > > > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > > >
> > > > Changes in v3:
> > > > - Added documentation about preemption
> > > > - Use quirks to determine which target supports preemption
> > > > - Add a module parameter to force disabling or enabling preemption
> > > > - Clear postamble when profiling
> > > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > > - Make preemption records MAP_PRIV
> > > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > > >    anymore
> > > > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > > >
> > > > Changes in v2:
> > > > - Added preept_record_size for X185 in PATCH 3/7
> > > > - Added patches to reset perf counters
> > > > - Dropped unused defines
> > > > - Dropped unused variable (fixes warning)
> > > > - Only enable preemption on a750
> > > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > > - Added Neil's Tested-By tags
> > > > - Added explanation for UAPI changes in commit message
> > > > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > > >
> > > > ---
> > > > Antonino Maniscalco (11):
> > > >        drm/msm: Fix bv_fence being used as bv_rptr
> > > >        drm/msm/A6XX: Track current_ctx_seqno per ring
> > > >        drm/msm: Add a `preempt_record_size` field
> > > >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > > >        drm/msm/A6xx: Implement preemption for A7XX targets
> > > >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > > >        drm/msm/A6xx: Use posamble to reset counters on preemption
> > > >        drm/msm/A6xx: Add traces for preemption
> > > >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > > >        drm/msm/A6xx: Enable preemption for A750
> > > >        Documentation: document adreno preemption
> > > >
> > > >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> > > >   drivers/gpu/drm/msm/Makefile                       |   1 +
> > > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > > >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > > >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > > >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > > >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > > >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > > >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > > >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > > >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > > >   include/uapi/drm/msm_drm.h                         |   5 +-
> > > >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > > > ---
> > > > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > > >
> > > > Best regards,
> > >
> > > I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > > on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > > and forced to 1, and I've seen no regression so far
> > >
> > > On SM8550, I've seen a few:
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > > but it's unrelated to preempt
> > >
> > > and on SM8450:
> > > platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > > adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> >
> > I suspect on newer devices we have trouble resetting the GMU, leading
> > to (what I assume is happening here) the CPU thinking the GMU is in a
> > different state than it is.
> >
> > Which has led to some stability issues on a660 in mesa CI, if anything
> > crashes the gpu in the CI run it tends to kill the rest of the run
> > until the board is power cycled.
> >
> > https://gitlab.freedesktop.org/drm/msm/-/issues/37
> >
> > I think we have some work to do on making recovery more robust on
> > things newer than early a6xx things.
>
> Is this seen only with a particular scenario or is recovery always
> broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
> not sure what exactly regressed. At least I didn't see any issue on
> x185.

It might only be certain scenarios.. There is an igt recovery test, I
guess I can try that on a690 next week.  Maybe we should see if we
could get the a660 boards into drm-ci, which would include the
msm_recovery test

BR,
-R

BR,
-R

> -Akhil.
>
> >
> > BR,
> > -R
> >
> > > So you can also add:
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> > >
> > > Thanks,
> > > Neil
Neil Armstrong Sept. 21, 2024, 4:45 p.m. UTC | #4
Le 20/09/2024 à 19:09, Akhil P Oommen a écrit :
> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>> Hi,
>>
>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>> This series implements preemption for A7XX targets, which allows the GPU to
>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>> for high priority submissions.
>>>
>>> This series enables L1 preemption with skip_save_restore which requires
>>> the following userspace patches to function:
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>
>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>> from compatible userspace to be preempted, therefore maintaining
>>> compatibility.
>>>
>>> Preemption is currently only enabled by default on A750, it can be
>>> enabled on other targets through the `enable_preemption` module
>>> parameter. This is because more testing is required on other targets.
>>>
>>> For testing on other HW it is sufficient to set that parameter to a
>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>> allows to run any application as high priority therefore preempting
>>> submissions from other applications.
>>>
>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>> added in this series can be used to observe preemption's behavior as
>>> well as measuring preemption latency.
>>>
>>> Some commits from this series are based on a previous series to enable
>>> preemption on A6XX targets:
>>>
>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>
>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>> ---
>>> Changes in v4:
>>> - Added missing register in pwrup list
>>> - Removed and rearrange barriers
>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>> - Track ctx seqno per ring
>>> - Removed secure preempt context
>>> - NOP out postamble to disable it instantly
>>> - Only emit pwrup reglist once
>>> - Document bv_rptr_addr
>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>> - Set name on preempt record buffer
>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>
>>> Changes in v3:
>>> - Added documentation about preemption
>>> - Use quirks to determine which target supports preemption
>>> - Add a module parameter to force disabling or enabling preemption
>>> - Clear postamble when profiling
>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>> - Make preemption records MAP_PRIV
>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>     anymore
>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>
>>> Changes in v2:
>>> - Added preept_record_size for X185 in PATCH 3/7
>>> - Added patches to reset perf counters
>>> - Dropped unused defines
>>> - Dropped unused variable (fixes warning)
>>> - Only enable preemption on a750
>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>> - Added Neil's Tested-By tags
>>> - Added explanation for UAPI changes in commit message
>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>
>>> ---
>>> Antonino Maniscalco (11):
>>>         drm/msm: Fix bv_fence being used as bv_rptr
>>>         drm/msm/A6XX: Track current_ctx_seqno per ring
>>>         drm/msm: Add a `preempt_record_size` field
>>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>         drm/msm/A6xx: Implement preemption for A7XX targets
>>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>         drm/msm/A6xx: Use posamble to reset counters on preemption
>>>         drm/msm/A6xx: Add traces for preemption
>>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>         drm/msm/A6xx: Enable preemption for A750
>>>         Documentation: document adreno preemption
>>>
>>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>    drivers/gpu/drm/msm/Makefile                       |   1 +
>>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>    include/uapi/drm/msm_drm.h                         |   5 +-
>>>    20 files changed, 1117 insertions(+), 66 deletions(-)
>>> ---
>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>
>>> Best regards,
>>
>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>> and forced to 1, and I've seen no regression so far
>>
>> On SM8550, I've seen a few:
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>> but it's unrelated to preempt
>>
>> and on SM8450:
>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>
>> So you can also add:
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>
> 
> Niel,
> 
> On my x1e device, all submissions were somehow going into only a single
> ring, even the compositor's. Not sure why. So effectively preemption was
> not really exercised. I had to force one of the two benchmark I ran
> using the "highprio" mesa debug flag force submittions to ring 0.
> 
> If possible it is a good idea to check the new preemption traces to
> ensure preemption kicks in.

Sure I'll run the test again on a750 and check if preemption kicks in.

Neil

> 
> -Akhil
> 
>> Thanks,
>> Neil
Antonino Maniscalco Sept. 24, 2024, 11:53 a.m. UTC | #5
On 9/20/24 7:09 PM, Akhil P Oommen wrote:
> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>> Hi,
>>
>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>> This series implements preemption for A7XX targets, which allows the GPU to
>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>> for high priority submissions.
>>>
>>> This series enables L1 preemption with skip_save_restore which requires
>>> the following userspace patches to function:
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>
>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>> from compatible userspace to be preempted, therefore maintaining
>>> compatibility.
>>>
>>> Preemption is currently only enabled by default on A750, it can be
>>> enabled on other targets through the `enable_preemption` module
>>> parameter. This is because more testing is required on other targets.
>>>
>>> For testing on other HW it is sufficient to set that parameter to a
>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>> allows to run any application as high priority therefore preempting
>>> submissions from other applications.
>>>
>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>> added in this series can be used to observe preemption's behavior as
>>> well as measuring preemption latency.
>>>
>>> Some commits from this series are based on a previous series to enable
>>> preemption on A6XX targets:
>>>
>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>
>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>> ---
>>> Changes in v4:
>>> - Added missing register in pwrup list
>>> - Removed and rearrange barriers
>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>> - Track ctx seqno per ring
>>> - Removed secure preempt context
>>> - NOP out postamble to disable it instantly
>>> - Only emit pwrup reglist once
>>> - Document bv_rptr_addr
>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>> - Set name on preempt record buffer
>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>
>>> Changes in v3:
>>> - Added documentation about preemption
>>> - Use quirks to determine which target supports preemption
>>> - Add a module parameter to force disabling or enabling preemption
>>> - Clear postamble when profiling
>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>> - Make preemption records MAP_PRIV
>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>     anymore
>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>
>>> Changes in v2:
>>> - Added preept_record_size for X185 in PATCH 3/7
>>> - Added patches to reset perf counters
>>> - Dropped unused defines
>>> - Dropped unused variable (fixes warning)
>>> - Only enable preemption on a750
>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>> - Added Neil's Tested-By tags
>>> - Added explanation for UAPI changes in commit message
>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>
>>> ---
>>> Antonino Maniscalco (11):
>>>         drm/msm: Fix bv_fence being used as bv_rptr
>>>         drm/msm/A6XX: Track current_ctx_seqno per ring
>>>         drm/msm: Add a `preempt_record_size` field
>>>         drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>         drm/msm/A6xx: Implement preemption for A7XX targets
>>>         drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>         drm/msm/A6xx: Use posamble to reset counters on preemption
>>>         drm/msm/A6xx: Add traces for preemption
>>>         drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>         drm/msm/A6xx: Enable preemption for A750
>>>         Documentation: document adreno preemption
>>>
>>>    Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>    drivers/gpu/drm/msm/Makefile                       |   1 +
>>>    drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>    drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>    drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>    drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>    drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>    drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>    drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>    drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>    drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>    drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>    drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>    .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>    include/uapi/drm/msm_drm.h                         |   5 +-
>>>    20 files changed, 1117 insertions(+), 66 deletions(-)
>>> ---
>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>
>>> Best regards,
>>
>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>> and forced to 1, and I've seen no regression so far
>>
>> On SM8550, I've seen a few:
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>> but it's unrelated to preempt
>>
>> and on SM8450:
>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>
>> So you can also add:
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>
> 
> Niel,
> 
> On my x1e device, all submissions were somehow going into only a single
> ring, even the compositor's. Not sure why. So effectively preemption was
> not really exercised. I had to force one of the two benchmark I ran
> using the "highprio" mesa debug flag force submittions to ring 0.

I think that is because GL applications (so most compositors) run 
through zink which does not forward GL preemption to vulkan so yeah, for 
GL applications the only way of getting preemption is the debug flag.

Unfortunately this is not easy to fix in Zink because it creates one 
VkDevice at screen creation and uses it for all GL contexts. Since GL 
priority is provided per context and at context creation time Zink has 
no way of handling this.

Once TU will support more than one queue it will be possible for Zink to 
create one queue per priority then pick one at context creation time. 
Doing so would require a new vulkan extension for per queue global 
priority. I had started working on this some time ago 
https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
but this solution will only be viable once TU can expose more than one 
queue.

> 
> If possible it is a good idea to check the new preemption traces to
> ensure preemption kicks in.
> 
> -Akhil
> 
>> Thanks,
>> Neil


Best regards,
Antonino Maniscalco Sept. 24, 2024, 5:45 p.m. UTC | #6
On 9/24/24 4:47 PM, Rob Clark wrote:
> On Tue, Sep 24, 2024 at 4:54 AM Antonino Maniscalco
> <antomani103@gmail.com> wrote:
>>
>> On 9/20/24 7:09 PM, Akhil P Oommen wrote:
>>> On Wed, Sep 18, 2024 at 09:46:33AM +0200, Neil Armstrong wrote:
>>>> Hi,
>>>>
>>>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>>>> This series implements preemption for A7XX targets, which allows the GPU to
>>>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>>>> for high priority submissions.
>>>>>
>>>>> This series enables L1 preemption with skip_save_restore which requires
>>>>> the following userspace patches to function:
>>>>>
>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>>>
>>>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>>>> from compatible userspace to be preempted, therefore maintaining
>>>>> compatibility.
>>>>>
>>>>> Preemption is currently only enabled by default on A750, it can be
>>>>> enabled on other targets through the `enable_preemption` module
>>>>> parameter. This is because more testing is required on other targets.
>>>>>
>>>>> For testing on other HW it is sufficient to set that parameter to a
>>>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>>>> allows to run any application as high priority therefore preempting
>>>>> submissions from other applications.
>>>>>
>>>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>>>> added in this series can be used to observe preemption's behavior as
>>>>> well as measuring preemption latency.
>>>>>
>>>>> Some commits from this series are based on a previous series to enable
>>>>> preemption on A6XX targets:
>>>>>
>>>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>>>
>>>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>>>> ---
>>>>> Changes in v4:
>>>>> - Added missing register in pwrup list
>>>>> - Removed and rearrange barriers
>>>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>>>> - Track ctx seqno per ring
>>>>> - Removed secure preempt context
>>>>> - NOP out postamble to disable it instantly
>>>>> - Only emit pwrup reglist once
>>>>> - Document bv_rptr_addr
>>>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>>>> - Set name on preempt record buffer
>>>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>>>
>>>>> Changes in v3:
>>>>> - Added documentation about preemption
>>>>> - Use quirks to determine which target supports preemption
>>>>> - Add a module parameter to force disabling or enabling preemption
>>>>> - Clear postamble when profiling
>>>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>>>> - Make preemption records MAP_PRIV
>>>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>>>      anymore
>>>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>>>
>>>>> Changes in v2:
>>>>> - Added preept_record_size for X185 in PATCH 3/7
>>>>> - Added patches to reset perf counters
>>>>> - Dropped unused defines
>>>>> - Dropped unused variable (fixes warning)
>>>>> - Only enable preemption on a750
>>>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>>>> - Added Neil's Tested-By tags
>>>>> - Added explanation for UAPI changes in commit message
>>>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>>>
>>>>> ---
>>>>> Antonino Maniscalco (11):
>>>>>          drm/msm: Fix bv_fence being used as bv_rptr
>>>>>          drm/msm/A6XX: Track current_ctx_seqno per ring
>>>>>          drm/msm: Add a `preempt_record_size` field
>>>>>          drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>>>          drm/msm/A6xx: Implement preemption for A7XX targets
>>>>>          drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>>>          drm/msm/A6xx: Use posamble to reset counters on preemption
>>>>>          drm/msm/A6xx: Add traces for preemption
>>>>>          drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>>>          drm/msm/A6xx: Enable preemption for A750
>>>>>          Documentation: document adreno preemption
>>>>>
>>>>>     Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>>>     drivers/gpu/drm/msm/Makefile                       |   1 +
>>>>>     drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>>>     drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>>>     drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>>>     drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>>>     drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>>>     drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>>>     drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>>>     drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>>>     drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>>>     drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>>>     drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>>>     drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>>>     drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>>>     .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>>>     include/uapi/drm/msm_drm.h                         |   5 +-
>>>>>     20 files changed, 1117 insertions(+), 66 deletions(-)
>>>>> ---
>>>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>>>
>>>>> Best regards,
>>>>
>>>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>>>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>>>> and forced to 1, and I've seen no regression so far
>>>>
>>>> On SM8550, I've seen a few:
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>>>> but it's unrelated to preempt
>>>>
>>>> and on SM8450:
>>>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>>>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>>>
>>>> So you can also add:
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>>>
>>>
>>> Niel,
>>>
>>> On my x1e device, all submissions were somehow going into only a single
>>> ring, even the compositor's. Not sure why. So effectively preemption was
>>> not really exercised. I had to force one of the two benchmark I ran
>>> using the "highprio" mesa debug flag force submittions to ring 0.
>>
>> I think that is because GL applications (so most compositors) run
>> through zink which does not forward GL preemption to vulkan so yeah, for
>> GL applications the only way of getting preemption is the debug flag.
> 
> I guess if it is mesa 24.2.x or newer it would be using the gallium
> driver.  Which I guess would need xAMBLE stuff wired up.  Outside of
> fd6_emit_restore() and fd6_gmem.cc there isn't really any state emit
> in IB1, so I guess it probably wouldn't be too hard to get preemption
> support wired up.

I hadn't realized a7xx supportd had landed for the gallium driver. 
That's good news! This is definitely a shorter path towards getting 
compositors to use preemption.

> 
> BR,
> -R
> 
>> Unfortunately this is not easy to fix in Zink because it creates one
>> VkDevice at screen creation and uses it for all GL contexts. Since GL
>> priority is provided per context and at context creation time Zink has
>> no way of handling this.
>>
>> Once TU will support more than one queue it will be possible for Zink to
>> create one queue per priority then pick one at context creation time.
>> Doing so would require a new vulkan extension for per queue global
>> priority. I had started working on this some time ago
>> https://gitlab.freedesktop.org/antonino/mesa/-/tree/priority_ext?ref_type=heads
>> but this solution will only be viable once TU can expose more than one
>> queue.
>>
>>>
>>> If possible it is a good idea to check the new preemption traces to
>>> ensure preemption kicks in.
>>>
>>> -Akhil
>>>
>>>> Thanks,
>>>> Neil
>>
>>
>> Best regards,
>> --
>> Antonino Maniscalco <antomani103@gmail.com>


Best regards,
Rob Clark Oct. 22, 2024, 3:05 p.m. UTC | #7
On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> > On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> > <neil.armstrong@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > > This series implements preemption for A7XX targets, which allows the GPU to
> > > > switch to an higher priority ring when work is pushed to it, reducing latency
> > > > for high priority submissions.
> > > >
> > > > This series enables L1 preemption with skip_save_restore which requires
> > > > the following userspace patches to function:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > > >
> > > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > > from compatible userspace to be preempted, therefore maintaining
> > > > compatibility.
> > > >
> > > > Preemption is currently only enabled by default on A750, it can be
> > > > enabled on other targets through the `enable_preemption` module
> > > > parameter. This is because more testing is required on other targets.
> > > >
> > > > For testing on other HW it is sufficient to set that parameter to a
> > > > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > > > allows to run any application as high priority therefore preempting
> > > > submissions from other applications.
> > > >
> > > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > > added in this series can be used to observe preemption's behavior as
> > > > well as measuring preemption latency.
> > > >
> > > > Some commits from this series are based on a previous series to enable
> > > > preemption on A6XX targets:
> > > >
> > > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
> > > >
> > > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
> > > > ---
> > > > Changes in v4:
> > > > - Added missing register in pwrup list
> > > > - Removed and rearrange barriers
> > > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > > - Track ctx seqno per ring
> > > > - Removed secure preempt context
> > > > - NOP out postamble to disable it instantly
> > > > - Only emit pwrup reglist once
> > > > - Document bv_rptr_addr
> > > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > > - Set name on preempt record buffer
> > > > - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
> > > >
> > > > Changes in v3:
> > > > - Added documentation about preemption
> > > > - Use quirks to determine which target supports preemption
> > > > - Add a module parameter to force disabling or enabling preemption
> > > > - Clear postamble when profiling
> > > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > > - Make preemption records MAP_PRIV
> > > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > > >    anymore
> > > > - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
> > > >
> > > > Changes in v2:
> > > > - Added preept_record_size for X185 in PATCH 3/7
> > > > - Added patches to reset perf counters
> > > > - Dropped unused defines
> > > > - Dropped unused variable (fixes warning)
> > > > - Only enable preemption on a750
> > > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > > - Added Neil's Tested-By tags
> > > > - Added explanation for UAPI changes in commit message
> > > > - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
> > > >
> > > > ---
> > > > Antonino Maniscalco (11):
> > > >        drm/msm: Fix bv_fence being used as bv_rptr
> > > >        drm/msm/A6XX: Track current_ctx_seqno per ring
> > > >        drm/msm: Add a `preempt_record_size` field
> > > >        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> > > >        drm/msm/A6xx: Implement preemption for A7XX targets
> > > >        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> > > >        drm/msm/A6xx: Use posamble to reset counters on preemption
> > > >        drm/msm/A6xx: Add traces for preemption
> > > >        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> > > >        drm/msm/A6xx: Enable preemption for A750
> > > >        Documentation: document adreno preemption
> > > >
> > > >   Documentation/gpu/msm-preemption.rst               |  98 +++++
> > > >   drivers/gpu/drm/msm/Makefile                       |   1 +
> > > >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
> > > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
> > > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
> > > >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
> > > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
> > > >   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
> > > >   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
> > > >   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
> > > >   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
> > > >   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
> > > >   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
> > > >   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
> > > >   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
> > > >   include/uapi/drm/msm_drm.h                         |   5 +-
> > > >   20 files changed, 1117 insertions(+), 66 deletions(-)
> > > > ---
> > > > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > > > change-id: 20240815-preemption-a750-t-fcee9a844b39
> > > >
> > > > Best regards,
> > >
> > > I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
> > > on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
> > > and forced to 1, and I've seen no regression so far
> > >
> > > On SM8550, I've seen a few:
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
> > > platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
> > > but it's unrelated to preempt
> > >
> > > and on SM8450:
> > > platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
> > > msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
> > > adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
> > > msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
> > > leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
> >
> > I suspect on newer devices we have trouble resetting the GMU, leading
> > to (what I assume is happening here) the CPU thinking the GMU is in a
> > different state than it is.
> >
> > Which has led to some stability issues on a660 in mesa CI, if anything
> > crashes the gpu in the CI run it tends to kill the rest of the run
> > until the board is power cycled.
> >
> > https://gitlab.freedesktop.org/drm/msm/-/issues/37
> >
> > I think we have some work to do on making recovery more robust on
> > things newer than early a6xx things.
>
> Is this seen only with a particular scenario or is recovery always
> broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
> not sure what exactly regressed. At least I didn't see any issue on
> x185.

More recently my x1e (x1-85) and sc8280xp (a690) have been pretty
reliable about recovery.  And mesa CI seems to have gotten more
reliable at recovery when they uprev'd from v6.6x to v6.11.x, so I
guess something in that range improved things?  But maybe not 100%,
kernel-ci (msm/msm_recovery@gpu-fault) can sometimes reproduce this,
apparently:

https://gitlab.freedesktop.org/drm/msm/-/issues/65

This test does 16 submits, with the 10th one having an invalid opc,
and then checks that all the ones before and after successfully
execute a CP_MEM_WRITE:

https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/msm/msm_recovery.c?ref_type=heads#L145

BR,
-R

> -Akhil.
>
> >
> > BR,
> > -R
> >
> > > So you can also add:
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
> > >
> > > Thanks,
> > > Neil
Akhil P Oommen Oct. 23, 2024, 4:21 p.m. UTC | #8
On 10/22/2024 8:35 PM, Rob Clark wrote:
> On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>
>> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
>>> On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
>>> <neil.armstrong@linaro.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 17/09/2024 13:14, Antonino Maniscalco wrote:
>>>>> This series implements preemption for A7XX targets, which allows the GPU to
>>>>> switch to an higher priority ring when work is pushed to it, reducing latency
>>>>> for high priority submissions.
>>>>>
>>>>> This series enables L1 preemption with skip_save_restore which requires
>>>>> the following userspace patches to function:
>>>>>
>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>>>>>
>>>>> A flag is added to `msm_submitqueue_create` to only allow submissions
>>>>> from compatible userspace to be preempted, therefore maintaining
>>>>> compatibility.
>>>>>
>>>>> Preemption is currently only enabled by default on A750, it can be
>>>>> enabled on other targets through the `enable_preemption` module
>>>>> parameter. This is because more testing is required on other targets.
>>>>>
>>>>> For testing on other HW it is sufficient to set that parameter to a
>>>>> value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
>>>>> allows to run any application as high priority therefore preempting
>>>>> submissions from other applications.
>>>>>
>>>>> The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
>>>>> added in this series can be used to observe preemption's behavior as
>>>>> well as measuring preemption latency.
>>>>>
>>>>> Some commits from this series are based on a previous series to enable
>>>>> preemption on A6XX targets:
>>>>>
>>>>> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org
>>>>>
>>>>> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
>>>>> ---
>>>>> Changes in v4:
>>>>> - Added missing register in pwrup list
>>>>> - Removed and rearrange barriers
>>>>> - Renamed `skip_inline_wptr` to `restore_wptr`
>>>>> - Track ctx seqno per ring
>>>>> - Removed secure preempt context
>>>>> - NOP out postamble to disable it instantly
>>>>> - Only emit pwrup reglist once
>>>>> - Document bv_rptr_addr
>>>>> - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
>>>>> - Set name on preempt record buffer
>>>>> - Link to v3: https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f7bc@gmail.com
>>>>>
>>>>> Changes in v3:
>>>>> - Added documentation about preemption
>>>>> - Use quirks to determine which target supports preemption
>>>>> - Add a module parameter to force disabling or enabling preemption
>>>>> - Clear postamble when profiling
>>>>> - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
>>>>> - Make preemption records MAP_PRIV
>>>>> - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
>>>>>    anymore
>>>>> - Link to v2: https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com
>>>>>
>>>>> Changes in v2:
>>>>> - Added preept_record_size for X185 in PATCH 3/7
>>>>> - Added patches to reset perf counters
>>>>> - Dropped unused defines
>>>>> - Dropped unused variable (fixes warning)
>>>>> - Only enable preemption on a750
>>>>> - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
>>>>> - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
>>>>> - Added Neil's Tested-By tags
>>>>> - Added explanation for UAPI changes in commit message
>>>>> - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com
>>>>>
>>>>> ---
>>>>> Antonino Maniscalco (11):
>>>>>        drm/msm: Fix bv_fence being used as bv_rptr
>>>>>        drm/msm/A6XX: Track current_ctx_seqno per ring
>>>>>        drm/msm: Add a `preempt_record_size` field
>>>>>        drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
>>>>>        drm/msm/A6xx: Implement preemption for A7XX targets
>>>>>        drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
>>>>>        drm/msm/A6xx: Use posamble to reset counters on preemption
>>>>>        drm/msm/A6xx: Add traces for preemption
>>>>>        drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>>>>>        drm/msm/A6xx: Enable preemption for A750
>>>>>        Documentation: document adreno preemption
>>>>>
>>>>>   Documentation/gpu/msm-preemption.rst               |  98 +++++
>>>>>   drivers/gpu/drm/msm/Makefile                       |   1 +
>>>>>   drivers/gpu/drm/msm/adreno/a2xx_gpu.c              |   2 +-
>>>>>   drivers/gpu/drm/msm/adreno/a3xx_gpu.c              |   2 +-
>>>>>   drivers/gpu/drm/msm/adreno/a4xx_gpu.c              |   2 +-
>>>>>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c              |   6 +-
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_catalog.c          |   7 +-
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 325 ++++++++++++++-
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h              | 174 ++++++++
>>>>>   drivers/gpu/drm/msm/adreno/a6xx_preempt.c          | 440 +++++++++++++++++++++
>>>>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h            |   9 +-
>>>>>   drivers/gpu/drm/msm/msm_drv.c                      |   4 +
>>>>>   drivers/gpu/drm/msm/msm_gpu.c                      |   2 +-
>>>>>   drivers/gpu/drm/msm/msm_gpu.h                      |  11 -
>>>>>   drivers/gpu/drm/msm/msm_gpu_trace.h                |  28 ++
>>>>>   drivers/gpu/drm/msm/msm_ringbuffer.h               |  18 +
>>>>>   drivers/gpu/drm/msm/msm_submitqueue.c              |   3 +
>>>>>   drivers/gpu/drm/msm/registers/adreno/a6xx.xml      |   7 +-
>>>>>   .../gpu/drm/msm/registers/adreno/adreno_pm4.xml    |  39 +-
>>>>>   include/uapi/drm/msm_drm.h                         |   5 +-
>>>>>   20 files changed, 1117 insertions(+), 66 deletions(-)
>>>>> ---
>>>>> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
>>>>> change-id: 20240815-preemption-a750-t-fcee9a844b39
>>>>>
>>>>> Best regards,
>>>>
>>>> I've been running vulkan-cts (1.3.7.3-0-gd71a36db16d98313c431829432a136dbda692a08 from Yocto)
>>>> on SM8650-QRD, SM8550-QRD & SM8450-HDK boards with enable_preemption in default value
>>>> and forced to 1, and I've seen no regression so far
>>>>
>>>> On SM8550, I've seen a few:
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Message HFI_H2F_MSG_GX_BW_PERF_VOTE id 2743 timed out waiting for response
>>>> platform 3d6a000.gmu: [drm:a6xx_hfi_send_msg.constprop.0 [msm]] *ERROR* Unexpected message id 2743 on the response queue
>>>> but it's unrelated to preempt
>>>>
>>>> and on SM8450:
>>>> platform 3d6a000.gmu: [drm:a6xx_gmu_set_oob [msm]] *ERROR* Timeout waiting for GMU OOB set GPU_SET: 0x0
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1: hangcheck detected gpu lockup rb 0!
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     completed fence: 331235
>>>> msm_dpu ae01000.display-controller: [drm:hangcheck_handler [msm]] *ERROR* 7.3.0.1:     submitted fence: 331236
>>>> adreno 3d00000.gpu: [drm:a6xx_irq [msm]] *ERROR* gpu fault ring 0 fence 50de4 status 00800005 rb 0000/0699 ib1 0000000000000000/0000 ib2 0000000000000000/0000
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: offending task: deqp-vk (/usr/lib/vulkan-cts/deqp-vk)
>>>> msm_dpu ae01000.display-controller: [drm:recover_worker [msm]] *ERROR* 7.3.0.1: hangcheck recover!
>>>> leading to a VK_ERROR_DEVICE_LOST, but again unrelated to preempt support.
>>>
>>> I suspect on newer devices we have trouble resetting the GMU, leading
>>> to (what I assume is happening here) the CPU thinking the GMU is in a
>>> different state than it is.
>>>
>>> Which has led to some stability issues on a660 in mesa CI, if anything
>>> crashes the gpu in the CI run it tends to kill the rest of the run
>>> until the board is power cycled.
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/issues/37
>>>
>>> I think we have some work to do on making recovery more robust on
>>> things newer than early a6xx things.
>>
>> Is this seen only with a particular scenario or is recovery always
>> broken? I fixed recovery on 7c3 (a660 based) a couple of year ago,
>> not sure what exactly regressed. At least I didn't see any issue on
>> x185.
> 
> More recently my x1e (x1-85) and sc8280xp (a690) have been pretty
> reliable about recovery.  And mesa CI seems to have gotten more
> reliable at recovery when they uprev'd from v6.6x to v6.11.x, so I
> guess something in that range improved things?  But maybe not 100%,
> kernel-ci (msm/msm_recovery@gpu-fault) can sometimes reproduce this,
> apparently:
> 
> https://gitlab.freedesktop.org/drm/msm/-/issues/65
> 
> This test does 16 submits, with the 10th one having an invalid opc,
> and then checks that all the ones before and after successfully
> execute a CP_MEM_WRITE:
> 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/msm/msm_recovery.c?ref_type=heads#L145
>

I suppose we don't have a gpu coredump available. A663 is pretty similar to
A660, so I can try to reproduce this issue there. Will check this out.

-Akhil

> BR,
> -R
> 
>> -Akhil.
>>
>>>
>>> BR,
>>> -R
>>>
>>>> So you can also add:
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
>>>> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
>>>>
>>>> Thanks,
>>>> Neil