Message ID | 20230302235356.3148279-1-robdclark@gmail.com |
---|---|
Headers | show |
Series | dma-fence: Deadline awareness | expand |
On Fri, Mar 3, 2023 at 2:10 AM Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote: > > On 03/03/2023 01:53, Rob Clark wrote: > > From: Rob Clark <robdclark@chromium.org> > > > > Track the nearest deadline on a fence timeline and set a timer to expire > > shortly before to trigger boost if the fence has not yet been signaled. > > > > v2: rebase > > > > Signed-off-by: Rob Clark <robdclark@chromium.org> > > --- > > drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++ > > drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++ > > 2 files changed, 94 insertions(+) > > Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > > A small question: do we boost to fit into the deadline or to miss the > deadline for as little as possible? If the former is the case, we might > need to adjust 3ms depending on the workload. The goal is as much to run with higher clock on the next frame as it is to not miss a deadline. Ie. we don't want devfreq to come to the conclusion that running at <50% clks is best due to the amount of utilization caused by missing ever other vblank. But 3ms is mostly just "seems like a good compromise" value. It might change. BR, -R > > > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c > > index 56641408ea74..51b461f32103 100644 > > --- a/drivers/gpu/drm/msm/msm_fence.c > > +++ b/drivers/gpu/drm/msm/msm_fence.c > > @@ -8,6 +8,35 @@ > > > > #include "msm_drv.h" > > #include "msm_fence.h" > > +#include "msm_gpu.h" > > + > > +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) > > +{ > > + struct msm_drm_private *priv = fctx->dev->dev_private; > > + return priv->gpu; > > +} > > + > > +static enum hrtimer_restart deadline_timer(struct hrtimer *t) > > +{ > > + struct msm_fence_context *fctx = container_of(t, > > + struct msm_fence_context, deadline_timer); > > + > > + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); > > + > > + return HRTIMER_NORESTART; > > +} > > + > > +static void deadline_work(struct kthread_work *work) > > +{ > > + struct msm_fence_context *fctx = container_of(work, > > + struct msm_fence_context, deadline_work); > > + > > + /* If deadline fence has already passed, nothing to do: */ > > + if (msm_fence_completed(fctx, fctx->next_deadline_fence)) > > + return; > > + > > + msm_devfreq_boost(fctx2gpu(fctx), 2); > > +} > > > > > > struct msm_fence_context * > > @@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, > > fctx->completed_fence = fctx->last_fence; > > *fctx->fenceptr = fctx->last_fence; > > > > + hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); > > + fctx->deadline_timer.function = deadline_timer; > > + > > + kthread_init_work(&fctx->deadline_work, deadline_work); > > + > > + fctx->next_deadline = ktime_get(); > > + > > return fctx; > > } > > > > @@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) > > spin_lock_irqsave(&fctx->spinlock, flags); > > if (fence_after(fence, fctx->completed_fence)) > > fctx->completed_fence = fence; > > + if (msm_fence_completed(fctx, fctx->next_deadline_fence)) > > + hrtimer_cancel(&fctx->deadline_timer); > > spin_unlock_irqrestore(&fctx->spinlock, flags); > > } > > > > @@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) > > return msm_fence_completed(f->fctx, f->base.seqno); > > } > > > > +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) > > +{ > > + struct msm_fence *f = to_msm_fence(fence); > > + struct msm_fence_context *fctx = f->fctx; > > + unsigned long flags; > > + ktime_t now; > > + > > + spin_lock_irqsave(&fctx->spinlock, flags); > > + now = ktime_get(); > > + > > + if (ktime_after(now, fctx->next_deadline) || > > + ktime_before(deadline, fctx->next_deadline)) { > > + fctx->next_deadline = deadline; > > + fctx->next_deadline_fence = > > + max(fctx->next_deadline_fence, (uint32_t)fence->seqno); > > + > > + /* > > + * Set timer to trigger boost 3ms before deadline, or > > + * if we are already less than 3ms before the deadline > > + * schedule boost work immediately. > > + */ > > + deadline = ktime_sub(deadline, ms_to_ktime(3)); > > + > > + if (ktime_after(now, deadline)) { > > + kthread_queue_work(fctx2gpu(fctx)->worker, > > + &fctx->deadline_work); > > + } else { > > + hrtimer_start(&fctx->deadline_timer, deadline, > > + HRTIMER_MODE_ABS); > > + } > > + } > > + > > + spin_unlock_irqrestore(&fctx->spinlock, flags); > > +} > > + > > static const struct dma_fence_ops msm_fence_ops = { > > .get_driver_name = msm_fence_get_driver_name, > > .get_timeline_name = msm_fence_get_timeline_name, > > .signaled = msm_fence_signaled, > > + .set_deadline = msm_fence_set_deadline, > > }; > > > > struct dma_fence * > > diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h > > index 7f1798c54cd1..cdaebfb94f5c 100644 > > --- a/drivers/gpu/drm/msm/msm_fence.h > > +++ b/drivers/gpu/drm/msm/msm_fence.h > > @@ -52,6 +52,26 @@ struct msm_fence_context { > > volatile uint32_t *fenceptr; > > > > spinlock_t spinlock; > > + > > + /* > > + * TODO this doesn't really deal with multiple deadlines, like > > + * if userspace got multiple frames ahead.. OTOH atomic updates > > + * don't queue, so maybe that is ok > > + */ > > + > > + /** next_deadline: Time of next deadline */ > > + ktime_t next_deadline; > > + > > + /** > > + * next_deadline_fence: > > + * > > + * Fence value for next pending deadline. The deadline timer is > > + * canceled when this fence is signaled. > > + */ > > + uint32_t next_deadline_fence; > > + > > + struct hrtimer deadline_timer; > > + struct kthread_work deadline_work; > > }; > > > > struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev, > > -- > With best wishes > Dmitry >
On 03/03/2023 19:03, Rob Clark wrote: > On Fri, Mar 3, 2023 at 2:10 AM Dmitry Baryshkov > <dmitry.baryshkov@linaro.org> wrote: >> >> On 03/03/2023 01:53, Rob Clark wrote: >>> From: Rob Clark <robdclark@chromium.org> >>> >>> Track the nearest deadline on a fence timeline and set a timer to expire >>> shortly before to trigger boost if the fence has not yet been signaled. >>> >>> v2: rebase >>> >>> Signed-off-by: Rob Clark <robdclark@chromium.org> >>> --- >>> drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++ >>> drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++ >>> 2 files changed, 94 insertions(+) >> >> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> >> >> A small question: do we boost to fit into the deadline or to miss the >> deadline for as little as possible? If the former is the case, we might >> need to adjust 3ms depending on the workload. > > The goal is as much to run with higher clock on the next frame as it > is to not miss a deadline. Ie. we don't want devfreq to come to the > conclusion that running at <50% clks is best due to the amount of > utilization caused by missing ever other vblank. Ack, thanks for the explanation. > > But 3ms is mostly just "seems like a good compromise" value. It might change. > > BR, > -R > >>> >>> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c >>> index 56641408ea74..51b461f32103 100644 >>> --- a/drivers/gpu/drm/msm/msm_fence.c >>> +++ b/drivers/gpu/drm/msm/msm_fence.c >>> @@ -8,6 +8,35 @@ >>> >>> #include "msm_drv.h" >>> #include "msm_fence.h" >>> +#include "msm_gpu.h" >>> + >>> +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) >>> +{ >>> + struct msm_drm_private *priv = fctx->dev->dev_private; >>> + return priv->gpu; >>> +} >>> + >>> +static enum hrtimer_restart deadline_timer(struct hrtimer *t) >>> +{ >>> + struct msm_fence_context *fctx = container_of(t, >>> + struct msm_fence_context, deadline_timer); >>> + >>> + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); >>> + >>> + return HRTIMER_NORESTART; >>> +} >>> + >>> +static void deadline_work(struct kthread_work *work) >>> +{ >>> + struct msm_fence_context *fctx = container_of(work, >>> + struct msm_fence_context, deadline_work); >>> + >>> + /* If deadline fence has already passed, nothing to do: */ >>> + if (msm_fence_completed(fctx, fctx->next_deadline_fence)) >>> + return; >>> + >>> + msm_devfreq_boost(fctx2gpu(fctx), 2); >>> +} >>> >>> >>> struct msm_fence_context * >>> @@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, >>> fctx->completed_fence = fctx->last_fence; >>> *fctx->fenceptr = fctx->last_fence; >>> >>> + hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); >>> + fctx->deadline_timer.function = deadline_timer; >>> + >>> + kthread_init_work(&fctx->deadline_work, deadline_work); >>> + >>> + fctx->next_deadline = ktime_get(); >>> + >>> return fctx; >>> } >>> >>> @@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) >>> spin_lock_irqsave(&fctx->spinlock, flags); >>> if (fence_after(fence, fctx->completed_fence)) >>> fctx->completed_fence = fence; >>> + if (msm_fence_completed(fctx, fctx->next_deadline_fence)) >>> + hrtimer_cancel(&fctx->deadline_timer); >>> spin_unlock_irqrestore(&fctx->spinlock, flags); >>> } >>> >>> @@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) >>> return msm_fence_completed(f->fctx, f->base.seqno); >>> } >>> >>> +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) >>> +{ >>> + struct msm_fence *f = to_msm_fence(fence); >>> + struct msm_fence_context *fctx = f->fctx; >>> + unsigned long flags; >>> + ktime_t now; >>> + >>> + spin_lock_irqsave(&fctx->spinlock, flags); >>> + now = ktime_get(); >>> + >>> + if (ktime_after(now, fctx->next_deadline) || >>> + ktime_before(deadline, fctx->next_deadline)) { >>> + fctx->next_deadline = deadline; >>> + fctx->next_deadline_fence = >>> + max(fctx->next_deadline_fence, (uint32_t)fence->seqno); >>> + >>> + /* >>> + * Set timer to trigger boost 3ms before deadline, or >>> + * if we are already less than 3ms before the deadline >>> + * schedule boost work immediately. >>> + */ >>> + deadline = ktime_sub(deadline, ms_to_ktime(3)); >>> + >>> + if (ktime_after(now, deadline)) { >>> + kthread_queue_work(fctx2gpu(fctx)->worker, >>> + &fctx->deadline_work); >>> + } else { >>> + hrtimer_start(&fctx->deadline_timer, deadline, >>> + HRTIMER_MODE_ABS); >>> + } >>> + } >>> + >>> + spin_unlock_irqrestore(&fctx->spinlock, flags); >>> +} >>> + >>> static const struct dma_fence_ops msm_fence_ops = { >>> .get_driver_name = msm_fence_get_driver_name, >>> .get_timeline_name = msm_fence_get_timeline_name, >>> .signaled = msm_fence_signaled, >>> + .set_deadline = msm_fence_set_deadline, >>> }; >>> >>> struct dma_fence * >>> diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h >>> index 7f1798c54cd1..cdaebfb94f5c 100644 >>> --- a/drivers/gpu/drm/msm/msm_fence.h >>> +++ b/drivers/gpu/drm/msm/msm_fence.h >>> @@ -52,6 +52,26 @@ struct msm_fence_context { >>> volatile uint32_t *fenceptr; >>> >>> spinlock_t spinlock; >>> + >>> + /* >>> + * TODO this doesn't really deal with multiple deadlines, like >>> + * if userspace got multiple frames ahead.. OTOH atomic updates >>> + * don't queue, so maybe that is ok >>> + */ >>> + >>> + /** next_deadline: Time of next deadline */ >>> + ktime_t next_deadline; >>> + >>> + /** >>> + * next_deadline_fence: >>> + * >>> + * Fence value for next pending deadline. The deadline timer is >>> + * canceled when this fence is signaled. >>> + */ >>> + uint32_t next_deadline_fence; >>> + >>> + struct hrtimer deadline_timer; >>> + struct kthread_work deadline_work; >>> }; >>> >>> struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev, >> >> -- >> With best wishes >> Dmitry >>
From: Rob Clark <robdclark@chromium.org> This series adds a deadline hint to fences, so realtime deadlines such as vblank can be communicated to the fence signaller for power/ frequency management decisions. This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons: 1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers This iteration adds a dma-fence ioctl to set a deadline (both to support igt-tests, and compositors which delay decisions about which client buffer to display), and a sw_sync ioctl to read back the deadline. IGT tests utilizing these can be found at: https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline v1: https://patchwork.freedesktop.org/series/93035/ v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence v3: Add support in fence-array and fence-chain; Add some uabi to support igt tests and userspace compositors. v4: Rebase, address various comments, and add syncobj deadline support, and sync_file EPOLLPRI based on experience with perf/ freq issues with clvk compute workloads on i915 (anv) v5: Clarify that this is a hint as opposed to a more hard deadline guarantee, switch to using u64 ns values in UABI (still absolute CLOCK_MONOTONIC values), drop syncobj related cap and driver feature flag in favor of allowing count_handles==0 for probing kernel support. v6: Re-work vblank helper to calculate time of _start_ of vblank, and work correctly if the last vblank event was more than a frame ago. Add (mostly unrelated) drm/msm patch which also uses the vblank helper. Use dma_fence_chain_contained(). More verbose syncobj UABI comments. Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT. v7: Fix kbuild complaints about vblank helper. Add more docs. v8: Add patch to surface sync_file UAPI, and more docs updates. v9: Drop (E)POLLPRI support.. I still like it, but not essential and it can always be revived later. Fix doc build warning. Rob Clark (15): dma-buf/dma-fence: Add deadline awareness dma-buf/fence-array: Add fence deadline support dma-buf/fence-chain: Add fence deadline support dma-buf/dma-resv: Add a way to set fence deadline dma-buf/sync_file: Surface sync-file uABI dma-buf/sync_file: Add SET_DEADLINE ioctl dma-buf/sw_sync: Add fence deadline support drm/scheduler: Add fence deadline support drm/syncobj: Add deadline support for syncobj waits drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/msm: Add deadline based boost support drm/msm: Add wait-boost support drm/msm/atomic: Switch to vblank_start helper drm/i915: Add deadline based boost support Documentation/driver-api/dma-buf.rst | 16 ++++- drivers/dma-buf/dma-fence-array.c | 11 ++++ drivers/dma-buf/dma-fence-chain.c | 12 ++++ drivers/dma-buf/dma-fence.c | 60 ++++++++++++++++++ drivers/dma-buf/dma-resv.c | 22 +++++++ drivers/dma-buf/sw_sync.c | 81 +++++++++++++++++++++++++ drivers/dma-buf/sync_debug.h | 2 + drivers/dma-buf/sync_file.c | 19 ++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++ drivers/gpu/drm/drm_syncobj.c | 64 +++++++++++++++---- drivers/gpu/drm/drm_vblank.c | 53 +++++++++++++--- drivers/gpu/drm/i915/i915_request.c | 20 ++++++ drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 ----- drivers/gpu/drm/msm/msm_atomic.c | 8 ++- drivers/gpu/drm/msm/msm_drv.c | 12 ++-- drivers/gpu/drm/msm/msm_fence.c | 74 ++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 ++++++ drivers/gpu/drm/msm/msm_gem.c | 5 ++ drivers/gpu/drm/msm/msm_kms.h | 8 --- drivers/gpu/drm/scheduler/sched_fence.c | 46 ++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 17 ++++++ include/linux/dma-fence.h | 22 +++++++ include/linux/dma-resv.h | 2 + include/uapi/drm/drm.h | 17 ++++++ include/uapi/drm/msm_drm.h | 14 ++++- include/uapi/linux/sync_file.h | 59 +++++++++++------- 28 files changed, 639 insertions(+), 79 deletions(-)