diff mbox series

[v4,16/16] drm/msm/dpu: Enable quad-pipe for DSC and dual-DSI case

Message ID 20250116-sm8650-v6-13-hmd-deckard-mdss-quad-upstream-33-v4-16-74749c6eba33@linaro.org
State New
Headers show
Series drm/msm/dpu: Support quad pipe with dual-DSI | expand

Commit Message

Jun Nie Jan. 16, 2025, 7:26 a.m. UTC
Request 4 mixers and 4 DSC for the case that both dual-DSI and DSC are
enabled. 4 pipes are preferred for dual DSI case for it is power optimal
for DSC.

Signed-off-by: Jun Nie <jun.nie@linaro.org>
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c         |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h         |  6 ++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c      | 29 ++++++++++++++++++------
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h   |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h      |  2 +-
 6 files changed, 29 insertions(+), 14 deletions(-)

Comments

Marijn Suijten Jan. 20, 2025, 12:58 p.m. UTC | #1
On 2025-01-17 15:32:44, Jun Nie wrote:
> Dmitry Baryshkov <dmitry.baryshkov@linaro.org> 于2025年1月16日周四 16:32写道:
> >
> > On Thu, Jan 16, 2025 at 03:26:05PM +0800, Jun Nie wrote:
> > > Request 4 mixers and 4 DSC for the case that both dual-DSI and DSC are
> > > enabled.
> >
> > Why? What is the issue that you are solving?
> 
>     To support high-resolution cases that exceed the width limitation of

How high is high?  Some Sony phones use a bonded 2:2:2 setup.

>     a pair of SSPPs, or scenarios that surpass the maximum MDP clock rate,
>     additional pipes are necessary to enable parallel data processing
>     within the SSPP width constraints and MDP clock rate.
> 
>     Request 4 mixers and 4 DSCs for high-resolution cases where both DSC
>     and dual interfaces are enabled. More use cases can be incorporated
>     later if quad-pipe capabilities are required.
> 
> >
> > > 4 pipes are preferred for dual DSI case for it is power optimal
> > > for DSC.
> > >
> > > Signed-off-by: Jun Nie <jun.nie@linaro.org>
> > > ---
> > >  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c         |  2 +-
> > >  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h         |  6 ++---
> > >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c      | 29 ++++++++++++++++++------
> > >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h |  2 +-
> > >  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h   |  2 +-
> > >  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h      |  2 +-
> > >  6 files changed, 29 insertions(+), 14 deletions(-)
> > >
> >
> > > @@ -664,15 +664,20 @@ static struct msm_display_topology dpu_encoder_get_topology(
> > >
> > >       /* Datapath topology selection
> > >        *
> > > -      * Dual display
> > > +      * Dual display without DSC
> > >        * 2 LM, 2 INTF ( Split display using 2 interfaces)
> > >        *
> > > +      * Dual display with DSC
> > > +      * 2 LM, 2 INTF ( Split display using 2 interfaces)
> > > +      * 4 LM, 2 INTF ( Split display using 2 interfaces)

Are these always bonded (i.e. single display with dual-DSI link), or can it be
two independent panels?

> > > +      *
> > >        * Single display
> > >        * 1 LM, 1 INTF
> > >        * 2 LM, 1 INTF (stream merge to support high resolution interfaces)
> > >        *
> > >        * Add dspps to the reservation requirements if ctm is requested
> > >        */
> > > +
> >
> > irrlevant extra line, please drop.
> >
> > >       if (intf_count == 2)
> > >               topology.num_lm = 2;
> > >       else if (!dpu_kms->catalog->caps->has_3d_merge)
> > > @@ -691,10 +696,20 @@ static struct msm_display_topology dpu_encoder_get_topology(
> > >                * 2 DSC encoders, 2 layer mixers and 1 interface
> > >                * this is power optimal and can drive up to (including) 4k
> > >                * screens
> > > +              * But for dual display case, we prefer 4 layer mixers. Because
> > > +              * the resolution is always high in the case and 4 DSCs are more
> > > +              * power optimal.
> >
> > I think this part is thought about in a wrong way. If it was just about
> > power efficiency, we wouldn't have to add quad pipe support.
> > Please correct me if I'm wrong, but I think it is about the maximum
> > width supported by a particular topology being too low for a requested
> > resolution. So, if there is a DSC and mode width is higher than 5120
> > (8.x+) / 4096 ( <= 7.x), then we have to use quad pipe. Likewise if
> > there is no DSC in play, the limitation should be 2 * max_mixer_width.
> 
> Both width limitation and clock rate contribute to pipe number decision.
> To support high resolution, the MDP clock may be required to overclock
> to a higher rate than the safe rate. Current implementation does not
> support checking clock rate when deciding topology. We can add the
> clock rate check later with a new patch.
> >
> > >                */
> > > -             topology.num_dsc = 2;
> > > -             topology.num_lm = 2;
> > > -             topology.num_intf = 1;
> > > +
> > > +             if (intf_count == 2) {
> > > +                     topology.num_dsc = dpu_kms->catalog->dsc_count >= 4 ? 4 : 2;

What if there are other encoders that have already reserved DSC blocks, reducing
the current available number of DSC blocks?  This patch seems to cover multiple
panels/interfaces on a single virtual encoder, how does one get to such a
scenario?  Bonded display?  If one or more DP panel is connected, do they use
their own virtual encoder, and hence miss out on logic that properly divides
available DSC blocks?

That idea is what's been holding back a quick hack to support 1:1:1
topology for sc7280 / FairPhone 5 to perform a similar workaround:

	if (catalog->dsc_count < 2)
		num_dsc = num_lm = 1;.

> >
> > This assumes that the driver can support 2:2:2. Is it the case?
> 
> The code falls back to 2:2:2 case here. But I guess 2:2:2 does not work yet.
> How about below code for DSC case?

I've been working on 2:2:2 support [1] but have stalled it to see how your
series here pans out.  Doesn't look like it's heading in a compatible direction
though, going for quick wins rather than redesigning how DSC blocks are
allocated (to name one of the many topics).

[1]: https://lore.kernel.org/linux-arm-msm/20240417-drm-msm-initial-dualpipe-dsc-fixes-v1-0-78ae3ee9a697@somainline.org/

- Marijn

> 
>                  if (intf_count == 2 && dpu_kms->catalog->dsc_count >= 4) {
>                          topology.num_dsc = 4;
>                          topology.num_lm = 4;
>                          topology.num_intf = 2;
>                  } else {
>                          topology.num_dsc = 2;
>                          topology.num_lm = 2;
>                          topology.num_intf = 1;
>                  }
> 
> >
> > > +                     topology.num_lm = topology.num_dsc;
> > > +                     topology.num_intf = 2;
> > > +             } else {
> > > +                     topology.num_dsc = 2;
> > > +                     topology.num_lm = 2;
> > > +                     topology.num_intf = 1;
> > > +             }
> > >       }
> > >
> > >       return topology;
> >
> > --
> > With best wishes
> > Dmitry
Jun Nie Jan. 21, 2025, 4:28 a.m. UTC | #2
Marijn Suijten <marijn.suijten@somainline.org> 于2025年1月20日周一 04:58写道:
>
> On 2025-01-17 15:32:44, Jun Nie wrote:
> > Dmitry Baryshkov <dmitry.baryshkov@linaro.org> 于2025年1月16日周四 16:32写道:
> > >
> > > On Thu, Jan 16, 2025 at 03:26:05PM +0800, Jun Nie wrote:
> > > > Request 4 mixers and 4 DSC for the case that both dual-DSI and DSC are
> > > > enabled.
> > >
> > > Why? What is the issue that you are solving?
> >
> >     To support high-resolution cases that exceed the width limitation of
>
> How high is high?  Some Sony phones use a bonded 2:2:2 setup.

The high resolution here means the capability of 2 SSPP without multi-rect mode,
or the clock rate requirement exceed the SoC capability. It depends on specific
SoC spec.

>
> >     a pair of SSPPs, or scenarios that surpass the maximum MDP clock rate,
> >     additional pipes are necessary to enable parallel data processing
> >     within the SSPP width constraints and MDP clock rate.
> >
> >     Request 4 mixers and 4 DSCs for high-resolution cases where both DSC
> >     and dual interfaces are enabled. More use cases can be incorporated
> >     later if quad-pipe capabilities are required.
> >
> > >
> > > > 4 pipes are preferred for dual DSI case for it is power optimal
> > > > for DSC.
> > > >
> > > > Signed-off-by: Jun Nie <jun.nie@linaro.org>
> > > > ---
> > > >  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c         |  2 +-
> > > >  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h         |  6 ++---
> > > >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c      | 29 ++++++++++++++++++------
> > > >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h |  2 +-
> > > >  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h   |  2 +-
> > > >  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h      |  2 +-
> > > >  6 files changed, 29 insertions(+), 14 deletions(-)
> > > >
> > >
> > > > @@ -664,15 +664,20 @@ static struct msm_display_topology dpu_encoder_get_topology(
> > > >
> > > >       /* Datapath topology selection
> > > >        *
> > > > -      * Dual display
> > > > +      * Dual display without DSC
> > > >        * 2 LM, 2 INTF ( Split display using 2 interfaces)
> > > >        *
> > > > +      * Dual display with DSC
> > > > +      * 2 LM, 2 INTF ( Split display using 2 interfaces)
> > > > +      * 4 LM, 2 INTF ( Split display using 2 interfaces)
>
> Are these always bonded (i.e. single display with dual-DSI link), or can it be
> two independent panels?

It can be two independent panels.
>
> > > > +      *
> > > >        * Single display
> > > >        * 1 LM, 1 INTF
> > > >        * 2 LM, 1 INTF (stream merge to support high resolution interfaces)
> > > >        *
> > > >        * Add dspps to the reservation requirements if ctm is requested
> > > >        */
> > > > +
> > >
> > > irrlevant extra line, please drop.
> > >
> > > >       if (intf_count == 2)
> > > >               topology.num_lm = 2;
> > > >       else if (!dpu_kms->catalog->caps->has_3d_merge)
> > > > @@ -691,10 +696,20 @@ static struct msm_display_topology dpu_encoder_get_topology(
> > > >                * 2 DSC encoders, 2 layer mixers and 1 interface
> > > >                * this is power optimal and can drive up to (including) 4k
> > > >                * screens
> > > > +              * But for dual display case, we prefer 4 layer mixers. Because
> > > > +              * the resolution is always high in the case and 4 DSCs are more
> > > > +              * power optimal.
> > >
> > > I think this part is thought about in a wrong way. If it was just about
> > > power efficiency, we wouldn't have to add quad pipe support.
> > > Please correct me if I'm wrong, but I think it is about the maximum
> > > width supported by a particular topology being too low for a requested
> > > resolution. So, if there is a DSC and mode width is higher than 5120
> > > (8.x+) / 4096 ( <= 7.x), then we have to use quad pipe. Likewise if
> > > there is no DSC in play, the limitation should be 2 * max_mixer_width.
> >
> > Both width limitation and clock rate contribute to pipe number decision.
> > To support high resolution, the MDP clock may be required to overclock
> > to a higher rate than the safe rate. Current implementation does not
> > support checking clock rate when deciding topology. We can add the
> > clock rate check later with a new patch.
> > >
> > > >                */
> > > > -             topology.num_dsc = 2;
> > > > -             topology.num_lm = 2;
> > > > -             topology.num_intf = 1;
> > > > +
> > > > +             if (intf_count == 2) {
> > > > +                     topology.num_dsc = dpu_kms->catalog->dsc_count >= 4 ? 4 : 2;
>
> What if there are other encoders that have already reserved DSC blocks, reducing
> the current available number of DSC blocks?  This patch seems to cover multiple
> panels/interfaces on a single virtual encoder, how does one get to such a
> scenario?  Bonded display?  If one or more DP panel is connected, do they use
> their own virtual encoder, and hence miss out on logic that properly divides
> available DSC blocks?

Yes, it's bonded display. Your point is valid. The scenario will fail
at the resource
allocation stage and CRTC pipe fails to be setup if there is not
enough DSC block
available. But it should fail gracefully, so not a big issue. It is
always possible
that resource cannot support a usage scenario.

>
> That idea is what's been holding back a quick hack to support 1:1:1
> topology for sc7280 / FairPhone 5 to perform a similar workaround:
>
>         if (catalog->dsc_count < 2)
>                 num_dsc = num_lm = 1;.

Actually, I do not see issue here. Could you help point me the link
for this change?
Checking the discussion history helps on understanding more aspects of
this change.
>
> > >
> > > This assumes that the driver can support 2:2:2. Is it the case?
> >
> > The code falls back to 2:2:2 case here. But I guess 2:2:2 does not work yet.
> > How about below code for DSC case?
>
> I've been working on 2:2:2 support [1] but have stalled it to see how your
> series here pans out.  Doesn't look like it's heading in a compatible direction
> though, going for quick wins rather than redesigning how DSC blocks are
> allocated (to name one of the many topics).
>
> [1]: https://lore.kernel.org/linux-arm-msm/20240417-drm-msm-initial-dualpipe-dsc-fixes-v1-0-78ae3ee9a697@somainline.org/
>
> - Marijn

Glad to know that a 2:2:2 case can be enabled. That can be supported with
code structure of quad-pipe in theory and we can co-work on that. But I would
not call the quad-pipe patch set a quick wins, as I cannot cover a usage case
that's not supported by mainline yet, and I do not have hardware to support
2:2:2 neither. What I can do is to add new usage case and make sure
existing usage cases are not broken. More and more usage cases can be
added later this way.
Could you help elaborate what do you mean by how DSC blocks are allocated?
I see DSC is allocated just as other resources. Maybe you mean how the
topology is decided. While that's about adding new usage case without
breaking existing usage cases.

Regards,
Jun
>
> >
> >                  if (intf_count == 2 && dpu_kms->catalog->dsc_count >= 4) {
> >                          topology.num_dsc = 4;
> >                          topology.num_lm = 4;
> >                          topology.num_intf = 2;
> >                  } else {
> >                          topology.num_dsc = 2;
> >                          topology.num_lm = 2;
> >                          topology.num_intf = 1;
> >                  }
> >
> > >
> > > > +                     topology.num_lm = topology.num_dsc;
> > > > +                     topology.num_intf = 2;
> > > > +             } else {
> > > > +                     topology.num_dsc = 2;
> > > > +                     topology.num_lm = 2;
> > > > +                     topology.num_intf = 1;
> > > > +             }
> > > >       }
> > > >
> > > >       return topology;
> > >
> > > --
> > > With best wishes
> > > Dmitry
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
index a900220deeb35..5e96c309fabb8 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
@@ -200,7 +200,7 @@  static int dpu_crtc_get_lm_crc(struct drm_crtc *crtc,
 		struct dpu_crtc_state *crtc_state)
 {
 	struct dpu_crtc_mixer *m;
-	u32 crcs[CRTC_DUAL_MIXERS];
+	u32 crcs[CRTC_QUAD_MIXERS];
 
 	int rc = 0;
 	int i;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h
index b14bab2754635..38820d05edb8b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h
@@ -210,7 +210,7 @@  struct dpu_crtc_state {
 
 	bool bw_control;
 	bool bw_split_vote;
-	struct drm_rect lm_bounds[CRTC_DUAL_MIXERS];
+	struct drm_rect lm_bounds[CRTC_QUAD_MIXERS];
 
 	uint64_t input_fence_timeout_ns;
 
@@ -218,10 +218,10 @@  struct dpu_crtc_state {
 
 	/* HW Resources reserved for the crtc */
 	u32 num_mixers;
-	struct dpu_crtc_mixer mixers[CRTC_DUAL_MIXERS];
+	struct dpu_crtc_mixer mixers[CRTC_QUAD_MIXERS];
 
 	u32 num_ctls;
-	struct dpu_hw_ctl *hw_ctls[CRTC_DUAL_MIXERS];
+	struct dpu_hw_ctl *hw_ctls[CRTC_QUAD_MIXERS];
 
 	enum dpu_crtc_crc_source crc_source;
 	int crc_frame_skip_count;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 1f3054792a228..fdb7bfcb4119c 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -54,7 +54,7 @@ 
 #define MAX_PHYS_ENCODERS_PER_VIRTUAL \
 	(MAX_H_TILES_PER_DISPLAY * NUM_PHYS_ENCODER_TYPES)
 
-#define MAX_CHANNELS_PER_ENC 2
+#define MAX_CHANNELS_PER_ENC 4
 
 #define IDLE_SHORT_TIMEOUT	1
 
@@ -664,15 +664,20 @@  static struct msm_display_topology dpu_encoder_get_topology(
 
 	/* Datapath topology selection
 	 *
-	 * Dual display
+	 * Dual display without DSC
 	 * 2 LM, 2 INTF ( Split display using 2 interfaces)
 	 *
+	 * Dual display with DSC
+	 * 2 LM, 2 INTF ( Split display using 2 interfaces)
+	 * 4 LM, 2 INTF ( Split display using 2 interfaces)
+	 *
 	 * Single display
 	 * 1 LM, 1 INTF
 	 * 2 LM, 1 INTF (stream merge to support high resolution interfaces)
 	 *
 	 * Add dspps to the reservation requirements if ctm is requested
 	 */
+
 	if (intf_count == 2)
 		topology.num_lm = 2;
 	else if (!dpu_kms->catalog->caps->has_3d_merge)
@@ -691,10 +696,20 @@  static struct msm_display_topology dpu_encoder_get_topology(
 		 * 2 DSC encoders, 2 layer mixers and 1 interface
 		 * this is power optimal and can drive up to (including) 4k
 		 * screens
+		 * But for dual display case, we prefer 4 layer mixers. Because
+		 * the resolution is always high in the case and 4 DSCs are more
+		 * power optimal.
 		 */
-		topology.num_dsc = 2;
-		topology.num_lm = 2;
-		topology.num_intf = 1;
+
+		if (intf_count == 2) {
+			topology.num_dsc = dpu_kms->catalog->dsc_count >= 4 ? 4 : 2;
+			topology.num_lm = topology.num_dsc;
+			topology.num_intf = 2;
+		} else {
+			topology.num_dsc = 2;
+			topology.num_lm = 2;
+			topology.num_intf = 1;
+		}
 	}
 
 	return topology;
@@ -2194,8 +2209,8 @@  static void dpu_encoder_helper_reset_mixers(struct dpu_encoder_phys *phys_enc)
 	struct dpu_hw_mixer_cfg mixer;
 	int i, num_lm;
 	struct dpu_global_state *global_state;
-	struct dpu_hw_blk *hw_lm[2];
-	struct dpu_hw_mixer *hw_mixer[2];
+	struct dpu_hw_blk *hw_lm[MAX_CHANNELS_PER_ENC];
+	struct dpu_hw_mixer *hw_mixer[MAX_CHANNELS_PER_ENC];
 	struct dpu_hw_ctl *ctl = phys_enc->hw_ctl;
 
 	memset(&mixer, 0, sizeof(mixer));
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 63f09857025c2..a9e122243dce9 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -302,7 +302,7 @@  static inline enum dpu_3d_blend_mode dpu_encoder_helper_get_3d_blend_mode(
 
 	/* Use merge_3d unless DSC MERGE topology is used */
 	if (phys_enc->split_role == ENC_ROLE_SOLO &&
-	    dpu_cstate->num_mixers == CRTC_DUAL_MIXERS &&
+	    (dpu_cstate->num_mixers != 1) &&
 	    !dpu_encoder_use_dsc_merge(phys_enc->parent))
 		return BLEND_3D_H_ROW_INT;
 
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index 4cea19e1a2038..77a7a5375d545 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -24,7 +24,7 @@ 
 #define DPU_MAX_IMG_WIDTH 0x3fff
 #define DPU_MAX_IMG_HEIGHT 0x3fff
 
-#define CRTC_DUAL_MIXERS	2
+#define CRTC_QUAD_MIXERS	4
 
 #define MAX_XIN_COUNT 16
 
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h
index 64e220987be56..804858e69e7da 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h
@@ -35,8 +35,8 @@ 
 #endif
 
 #define STAGES_PER_PLANE		2
-#define PIPES_PER_PLANE			2
 #define PIPES_PER_STAGE			2
+#define PIPES_PER_PLANE			(PIPES_PER_STAGE * STAGES_PER_PLANE)
 #ifndef DPU_MAX_DE_CURVES
 #define DPU_MAX_DE_CURVES		3
 #endif