Message ID | 20240830-preemption-a750-t-v2-0-86aeead2cd80@gmail.com |
---|---|
Headers | show |
Series | Preemption support for A7XX | expand |
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco <antomani103@gmail.com> wrote: > > Initialize with 4 rings to enable preemption. > > For now only on A750 as other targets require testing. > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > --- > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > index a2853309288b..ea518209c03d 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > @@ -2609,7 +2609,9 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev) > return ERR_PTR(ret); > } > > - if (is_a7xx) > + if (adreno_is_a750(adreno_gpu)) > + ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 4); perhaps it would be useful (to enable others to more easily test), to allow this to be overridden with a modparam.. something like if ((enable_preemption == 1) || (enable_preemption == -1 && (config->info->quirks & ADRENO_QUIRK_PREEMPTION)) That would allow overriding enable_preemption to 1 or 0 on cmdline to force enable/disable preemption. That plus some instructions about how to test preemption (ie. what deqp tests to run, or similar) would make it easier to "crowd source" the testing (assuming you don't have every a7xx device there is) BR, -R > + else if (is_a7xx) > ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1); > else if (adreno_has_gmu_wrapper(adreno_gpu)) > ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1); > > -- > 2.46.0 >
On Fri, Aug 30, 2024 at 7:08 PM Rob Clark <robdclark@gmail.com> wrote: > > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco > <antomani103@gmail.com> wrote: > > > > This patch implements preemption feature for A6xx targets, this allows > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX > > hardware as such supports multiple levels of preemption granularities, > > ranging from coarse grained(ringbuffer level) to a more fine grained > > such as draw-call level or a bin boundary level preemption. This patch > > enables the basic preemption level, with more fine grained preemption > > support to follow. > > > > Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > > --- > > drivers/gpu/drm/msm/Makefile | 1 + > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +++++++++++++++++++++- > > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 ++++++++++++ > > drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 ++++++++++++++++++++++++++++++ > > drivers/gpu/drm/msm/msm_ringbuffer.h | 7 + > > 5 files changed, 921 insertions(+), 9 deletions(-) > > > > [snip] > > > + > > +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu, > > + struct msm_gpu_submitqueue *queue) > > +{ > > + void *ptr; > > + > > + /* > > + * Create a per submitqueue buffer for the CP to save and restore user > > + * specific information such as the VPC streamout data. > > + */ > > + ptr = msm_gem_kernel_new(gpu->dev, A6XX_PREEMPT_USER_RECORD_SIZE, > > + MSM_BO_WC, gpu->aspace, &queue->bo, &queue->bo_iova); > > Can this be MSM_BO_MAP_PRIV? Otherwise it is visible (and writeable) > by other proceess's userspace generated cmdstream. > > And a similar question for the scratch_bo.. I'd have to give some > thought to what sort of mischief could be had, but generall kernel > mappings that are not MAP_PRIV are a thing to be careful about. > It seems like the idea behind this is that it's supposed to be per-context. kgsl allocates it as part of the context, as part of the userspace address space, and then in order to know which user record to use when preempting, before each submit (although really it only needs to be done when setting the pagetable) it does a CP_MEM_WRITE of the user record address to a scratch buffer holding an array of the current user record for each ring. Then when preempting it reads the address for the next ring from the scratch buffer and sets it. I think we need to do that dance too. Connor > BR, > -R
On Fri, Aug 30, 2024 at 8:00 PM Rob Clark <robdclark@gmail.com> wrote: > > On Fri, Aug 30, 2024 at 11:54 AM Connor Abbott <cwabbott0@gmail.com> wrote: > > > > On Fri, Aug 30, 2024 at 7:08 PM Rob Clark <robdclark@gmail.com> wrote: > > > > > > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco > > > <antomani103@gmail.com> wrote: > > > > > > > > This patch implements preemption feature for A6xx targets, this allows > > > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX > > > > hardware as such supports multiple levels of preemption granularities, > > > > ranging from coarse grained(ringbuffer level) to a more fine grained > > > > such as draw-call level or a bin boundary level preemption. This patch > > > > enables the basic preemption level, with more fine grained preemption > > > > support to follow. > > > > > > > > Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> > > > > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> > > > > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > > > > --- > > > > drivers/gpu/drm/msm/Makefile | 1 + > > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +++++++++++++++++++++- > > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 ++++++++++++ > > > > drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 ++++++++++++++++++++++++++++++ > > > > drivers/gpu/drm/msm/msm_ringbuffer.h | 7 + > > > > 5 files changed, 921 insertions(+), 9 deletions(-) > > > > > > > > > > [snip] > > > > > > > + > > > > +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu, > > > > + struct msm_gpu_submitqueue *queue) > > > > +{ > > > > + void *ptr; > > > > + > > > > + /* > > > > + * Create a per submitqueue buffer for the CP to save and restore user > > > > + * specific information such as the VPC streamout data. > > > > + */ > > > > + ptr = msm_gem_kernel_new(gpu->dev, A6XX_PREEMPT_USER_RECORD_SIZE, > > > > + MSM_BO_WC, gpu->aspace, &queue->bo, &queue->bo_iova); > > > > > > Can this be MSM_BO_MAP_PRIV? Otherwise it is visible (and writeable) > > > by other proceess's userspace generated cmdstream. > > > > > > And a similar question for the scratch_bo.. I'd have to give some > > > thought to what sort of mischief could be had, but generall kernel > > > mappings that are not MAP_PRIV are a thing to be careful about. > > > > > > > It seems like the idea behind this is that it's supposed to be > > per-context. kgsl allocates it as part of the context, as part of the > > userspace address space, and then in order to know which user record > > to use when preempting, before each submit (although really it only > > needs to be done when setting the pagetable) it does a CP_MEM_WRITE of > > the user record address to a scratch buffer holding an array of the > > current user record for each ring. Then when preempting it reads the > > address for the next ring from the scratch buffer and sets it. I think > > we need to do that dance too. > > Moving it into userspace's address space (vm) would be better. > > I assume the preempt record is where state is saved/restored? So > would need to be in kernel aspace/vm? Or is the fw changing ttbr0 > after saving state but before restoring? > > BR, > -R The preempt record is split into a number of pieces, each with their own address. One of those pieces is the SMMU record with ttbr0 and other SMMU things. Another piece is the "private" context record with sensitive things like RB address/rptr/wptr, although actually the bulk of the registers are saved here. Then the user or "non-private" record is its own piece, which is presumably saved before switching ttbr0 and restored after the SMMU record is restored and ttbr0 is switched. Connor > > > Connor > > > > > BR, > > > -R
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco <antomani103@gmail.com> wrote: > > This patch implements preemption feature for A6xx targets, this allows > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX > hardware as such supports multiple levels of preemption granularities, > ranging from coarse grained(ringbuffer level) to a more fine grained > such as draw-call level or a bin boundary level preemption. This patch > enables the basic preemption level, with more fine grained preemption > support to follow. > > Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> > Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> > Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD > --- > drivers/gpu/drm/msm/Makefile | 1 + > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +++++++++++++++++++++- > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 ++++++++++++ > drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 ++++++++++++++++++++++++++++++ > drivers/gpu/drm/msm/msm_ringbuffer.h | 7 + > 5 files changed, 921 insertions(+), 9 deletions(-) > [snip] > @@ -784,6 +1062,16 @@ static int a6xx_ucode_load(struct msm_gpu *gpu) > msm_gem_object_set_name(a6xx_gpu->shadow_bo, "shadow"); > } > > + a6xx_gpu->pwrup_reglist_ptr = msm_gem_kernel_new(gpu->dev, PAGE_SIZE, > + MSM_BO_WC | MSM_BO_MAP_PRIV, > + gpu->aspace, &a6xx_gpu->pwrup_reglist_bo, > + &a6xx_gpu->pwrup_reglist_iova); I guess this could also be MSM_BO_GPU_READONLY? BR, -R
On 8/30/24 10:25 PM, Rob Clark wrote: > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco > <antomani103@gmail.com> wrote: >> >> This patch implements preemption feature for A6xx targets, this allows >> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX >> hardware as such supports multiple levels of preemption granularities, >> ranging from coarse grained(ringbuffer level) to a more fine grained >> such as draw-call level or a bin boundary level preemption. This patch >> enables the basic preemption level, with more fine grained preemption >> support to follow. >> >> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> >> Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> >> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD >> --- >> drivers/gpu/drm/msm/Makefile | 1 + >> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +++++++++++++++++++++- >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 ++++++++++++ >> drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 ++++++++++++++++++++++++++++++ >> drivers/gpu/drm/msm/msm_ringbuffer.h | 7 + >> 5 files changed, 921 insertions(+), 9 deletions(-) >> > > [snip] > >> @@ -784,6 +1062,16 @@ static int a6xx_ucode_load(struct msm_gpu *gpu) >> msm_gem_object_set_name(a6xx_gpu->shadow_bo, "shadow"); >> } >> >> + a6xx_gpu->pwrup_reglist_ptr = msm_gem_kernel_new(gpu->dev, PAGE_SIZE, >> + MSM_BO_WC | MSM_BO_MAP_PRIV, >> + gpu->aspace, &a6xx_gpu->pwrup_reglist_bo, >> + &a6xx_gpu->pwrup_reglist_iova); > > I guess this could also be MSM_BO_GPU_READONLY? > > BR, > -R Besides containing the the actual reglist this buffer also contains the `cpu_gpu_lock` structure which is written by the SQE so adding the `MSM_BO_GPU_READONLY` flag would cause it to fault. Best regards,
This series implements preemption for A7XX targets, which allows the GPU to switch to an higher priority ring when work is pushed to it, reducing latency for high priority submissions. This series enables L1 preemption with skip_save_restore which requires the following userspace patches to function: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544 A flag is added to `msm_submitqueue_create` to only allow submissions from compatible userspace to be preempted, therefore maintaining compatibility. Some commits from this series are based on a previous series to enable preemption on A6XX targets: https://lkml.kernel.org/1520489185-21828-1-git-send-email-smasetty@codeaurora.org Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> --- Changes in v2: - Added preept_record_size for X185 in PATCH 3/7 - Added patches to reset perf counters - Dropped unused defines - Dropped unused variable (fixes warning) - Only enable preemption on a750 - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9 - Added Neil's Tested-By tags - Added explanation for UAPI changes in commit message - Link to v1: https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34037@gmail.com --- Antonino Maniscalco (9): drm/msm: Fix bv_fence being used as bv_rptr drm/msm: Add submitqueue setup and close drm/msm: Add a `preempt_record_size` field drm/msm/A6xx: Implement preemption for A7XX targets drm/msm/A6xx: Sync relevant adreno_pm4.xml changes drm/msm/A6xx: Use posamble to reset counters on preemption drm/msm/A6xx: Add traces for preemption drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create drm/msm/A6xx: Enable preemption for A750 drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 4 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 353 +++++++++++++++- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 174 ++++++++ drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 462 +++++++++++++++++++++ drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 +- drivers/gpu/drm/msm/msm_gpu.h | 7 + drivers/gpu/drm/msm/msm_gpu_trace.h | 28 ++ drivers/gpu/drm/msm/msm_ringbuffer.h | 8 + drivers/gpu/drm/msm/msm_submitqueue.c | 13 + .../gpu/drm/msm/registers/adreno/adreno_pm4.xml | 39 +- include/uapi/drm/msm_drm.h | 5 +- 12 files changed, 1062 insertions(+), 40 deletions(-) --- base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba change-id: 20240815-preemption-a750-t-fcee9a844b39 Best regards,