Message ID | 20241211-topic-sm8x50-gpu-bw-vote-v5-2-6112f9f785ec@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | drm/msm: adreno: add support for DDR bandwidth scaling via GMU | expand |
On 11.12.2024 9:29 AM, Neil Armstrong wrote: > The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along > the Frequency and Power Domain level, but by default we leave the > OPP core scale the interconnect ddr path. > > While scaling via the interconnect path was sufficient, newer GPUs > like the A750 requires specific vote paremeters and bandwidth to > achieve full functionality. > > In order to calculate vote values used by the GPU Management > Unit (GMU), we need to parse all the possible OPP Bandwidths and > create a vote value to be sent to the appropriate Bus Control > Modules (BCMs) declared in the GPU info struct. > > This vote value is called IB, while on the other side the GMU also > takes another vote called AB which is a 16bit quantized value > of the floor bandwidth against the maximum supported bandwidth. > The AB vote will be calculated later when setting the frequency. > > The vote array will then be used to dynamically generate the GMU > bw_table sent during the GMU power-up. > > Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com> > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 144 ++++++++++++++++++++++++++++++++++ > drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 13 +++ > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 + > 3 files changed, 158 insertions(+) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > index 14db7376c712d19446b38152e480bd5a1e0a5198..36696d372a42a27b26a018b19e73bc6d8a4a5235 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > @@ -9,6 +9,7 @@ > #include <linux/pm_domain.h> > #include <linux/pm_opp.h> > #include <soc/qcom/cmd-db.h> > +#include <soc/qcom/tcs.h> > #include <drm/drm_gem.h> > > #include "a6xx_gpu.h" > @@ -1287,6 +1288,101 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu) > return 0; > } > > +/** > + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM) > + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg > + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg > + * @vcd: virtual clock domain that this bcm belongs to > + * @reserved: reserved field > + */ > +struct bcm_db { > + __le32 unit; > + __le16 width; > + u8 vcd; > + u8 reserved; > +}; No. This is a direct copypasta of drivers/interconnect/qcom/icc-rpmh.h You cannot just randomly duplicate things.. Move it out to a shared header in include/ (and remove the duplicate from clk-rpmh.c while at it) I'd also really prefer if you took drivers/interconnect/qcom/bcm-voter.c : tcs_list_gen() and abstracted it to operate on struct bcm_db with any additional required parameters passed as arguments.. Still left some comments on this version if you decide to go with it > + > +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info, > + struct a6xx_gmu *gmu) > +{ > + const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 }; > + unsigned int bcm_index, bw_index, bcm_count = 0; > + > + if (!info->bcms) > + return 0; You already checked that from the caller > + > + /* Retrieve BCM data from cmd-db */ > + for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) { > + size_t count; > + > + /* Stop at first unconfigured bcm */ > + if (!info->bcms[bcm_index].name) > + break; Unconfigured doesn't really fit here.. Maybe just mention the list is NULL -terminated > + > + bcm_data[bcm_index] = cmd_db_read_aux_data( > + info->bcms[bcm_index].name, > + &count); > + if (IS_ERR(bcm_data[bcm_index])) > + return PTR_ERR(bcm_data[bcm_index]); > + > + if (!count) > + return -EINVAL; If this condition ever happens, it'll be impossible to track down, please add an err message > + > + ++bcm_count; I've heard somewhere that prefixed increments are discouraged for "reasons" and my OCD would like to support that > + } > + > + /* Generate BCM votes values for each bandwidth & BCM */ > + for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) { > + u32 *data = gmu->gpu_ib_votes[bw_index]; > + u32 bw = gmu->gpu_bw_table[bw_index]; > + > + /* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */ > + for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) { > + bool commit = false; > + u64 peak; > + u32 vote; > + > + /* Skip unconfigured BCM */ > + if (!bcm_data[bcm_index]) > + continue; I don't see how this is useful here > + > + if (bcm_index == bcm_count - 1 || > + (bcm_data[bcm_index + 1] && > + bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd)) > + commit = true; > + > + if (!bw) { > + data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0); > + continue; > + } > + > + if (info->bcms[bcm_index].fixed) { You may want to take a pointer to info->bcms[bcm_index] > + u32 perfmode = 0; > + > + if (bw >= info->bcms[bcm_index].perfmode_bw) > + perfmode = info->bcms[bcm_index].perfmode; > + > + data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode); > + continue; > + } > + > + /* Multiply the bandwidth by the width of the connection */ > + peak = (u64)bw * le16_to_cpu(bcm_data[bcm_index]->width); > + do_div(peak, info->bcms[bcm_index].buswidth); > + > + /* Input bandwidth value is in KBps, scale the value to BCM unit */ > + peak *= 1000ULL; I don't think this needs to be ULL since the other argument is an u64 > + do_div(peak, le32_to_cpu(bcm_data[bcm_index]->unit)); > + > + vote = clamp(peak, 1, BCM_TCS_CMD_VOTE_MASK); > + > + data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote); x is the avg vote, y is the peak vote Just noting down for my future self I guess, a6xx sets ab=0, a7xx sets ab=ib like you did here Konrad
On 12/12/2024 20:55, Konrad Dybcio wrote: > On 11.12.2024 9:29 AM, Neil Armstrong wrote: >> The Adreno GPU Management Unit (GMU) can also scale DDR Bandwidth along >> the Frequency and Power Domain level, but by default we leave the >> OPP core scale the interconnect ddr path. >> >> While scaling via the interconnect path was sufficient, newer GPUs >> like the A750 requires specific vote paremeters and bandwidth to >> achieve full functionality. >> >> In order to calculate vote values used by the GPU Management >> Unit (GMU), we need to parse all the possible OPP Bandwidths and >> create a vote value to be sent to the appropriate Bus Control >> Modules (BCMs) declared in the GPU info struct. >> >> This vote value is called IB, while on the other side the GMU also >> takes another vote called AB which is a 16bit quantized value >> of the floor bandwidth against the maximum supported bandwidth. >> The AB vote will be calculated later when setting the frequency. >> >> The vote array will then be used to dynamically generate the GMU >> bw_table sent during the GMU power-up. >> >> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com> >> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> >> --- >> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 144 ++++++++++++++++++++++++++++++++++ >> drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 13 +++ >> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 + >> 3 files changed, 158 insertions(+) >> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >> index 14db7376c712d19446b38152e480bd5a1e0a5198..36696d372a42a27b26a018b19e73bc6d8a4a5235 100644 >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >> @@ -9,6 +9,7 @@ >> #include <linux/pm_domain.h> >> #include <linux/pm_opp.h> >> #include <soc/qcom/cmd-db.h> >> +#include <soc/qcom/tcs.h> >> #include <drm/drm_gem.h> >> >> #include "a6xx_gpu.h" >> @@ -1287,6 +1288,101 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu) >> return 0; >> } >> >> +/** >> + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM) >> + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg >> + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg >> + * @vcd: virtual clock domain that this bcm belongs to >> + * @reserved: reserved field >> + */ >> +struct bcm_db { >> + __le32 unit; >> + __le16 width; >> + u8 vcd; >> + u8 reserved; >> +}; > > No. This is a direct copypasta of drivers/interconnect/qcom/icc-rpmh.h > You cannot just randomly duplicate things.. > > Move it out to a shared header in include/ (and remove the duplicate from > clk-rpmh.c while at it) > Not sure if this a good idea > > I'd also really prefer if you took > > drivers/interconnect/qcom/bcm-voter.c : tcs_list_gen() > > and abstracted it to operate on struct bcm_db with any additional > required parameters passed as arguments.. Still left some comments > on this version if you decide to go with it They are still very different, look closely, tcs_list_gen is designed to operate on BW aggregations + scsaling, it would make no sense to unify them. The calculation is simple enough, I made it explicitely easy to read and maintain, but honestly there's nothing special. > >> + >> +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info, >> + struct a6xx_gmu *gmu) >> +{ >> + const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 }; >> + unsigned int bcm_index, bw_index, bcm_count = 0; >> + >> + if (!info->bcms) >> + return 0; > > You already checked that from the caller Good catch > >> + >> + /* Retrieve BCM data from cmd-db */ >> + for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) { >> + size_t count; >> + >> + /* Stop at first unconfigured bcm */ >> + if (!info->bcms[bcm_index].name) >> + break; > > Unconfigured doesn't really fit here.. Maybe just mention the list is NULL > -terminated Ack > >> + >> + bcm_data[bcm_index] = cmd_db_read_aux_data( >> + info->bcms[bcm_index].name, >> + &count); >> + if (IS_ERR(bcm_data[bcm_index])) >> + return PTR_ERR(bcm_data[bcm_index]); >> + >> + if (!count) >> + return -EINVAL; > > If this condition ever happens, it'll be impossible to track down, > please add an err message Hmm sure > >> + >> + ++bcm_count; > > I've heard somewhere that prefixed increments are discouraged for > "reasons" and my OCD would like to support that Never got this memo... > >> + } >> + >> + /* Generate BCM votes values for each bandwidth & BCM */ >> + for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) { >> + u32 *data = gmu->gpu_ib_votes[bw_index]; >> + u32 bw = gmu->gpu_bw_table[bw_index]; >> + >> + /* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */ >> + for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) { >> + bool commit = false; >> + u64 peak; >> + u32 vote; >> + >> + /* Skip unconfigured BCM */ >> + if (!bcm_data[bcm_index]) >> + continue; > > I don't see how this is useful here It's a leftover, will drop > >> + >> + if (bcm_index == bcm_count - 1 || >> + (bcm_data[bcm_index + 1] && >> + bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd)) >> + commit = true; >> + >> + if (!bw) { >> + data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0); >> + continue; >> + } >> + >> + if (info->bcms[bcm_index].fixed) { > > You may want to take a pointer to info->bcms[bcm_index] Sure, will help > >> + u32 perfmode = 0; >> + >> + if (bw >= info->bcms[bcm_index].perfmode_bw) >> + perfmode = info->bcms[bcm_index].perfmode; >> + >> + data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode); >> + continue; >> + } >> + >> + /* Multiply the bandwidth by the width of the connection */ >> + peak = (u64)bw * le16_to_cpu(bcm_data[bcm_index]->width); >> + do_div(peak, info->bcms[bcm_index].buswidth); >> + >> + /* Input bandwidth value is in KBps, scale the value to BCM unit */ >> + peak *= 1000ULL; > > I don't think this needs to be ULL since the other argument is an u64 > >> + do_div(peak, le32_to_cpu(bcm_data[bcm_index]->unit)); >> + >> + vote = clamp(peak, 1, BCM_TCS_CMD_VOTE_MASK); >> + >> + data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote); > > x is the avg vote, y is the peak vote downstream sets both calculated from the exact same value and the same way... > > Just noting down for my future self I guess, a6xx sets ab=0, > a7xx sets ab=ib like you did here Probably, I'll need to check on that, but it can be done in a second step when enabling it on a6xx > > Konrad
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index 14db7376c712d19446b38152e480bd5a1e0a5198..36696d372a42a27b26a018b19e73bc6d8a4a5235 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -9,6 +9,7 @@ #include <linux/pm_domain.h> #include <linux/pm_opp.h> #include <soc/qcom/cmd-db.h> +#include <soc/qcom/tcs.h> #include <drm/drm_gem.h> #include "a6xx_gpu.h" @@ -1287,6 +1288,101 @@ static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu) return 0; } +/** + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager (BCM) + * @unit: divisor used to convert bytes/sec bw value to an RPMh msg + * @width: multiplier used to convert bytes/sec bw value to an RPMh msg + * @vcd: virtual clock domain that this bcm belongs to + * @reserved: reserved field + */ +struct bcm_db { + __le32 unit; + __le16 width; + u8 vcd; + u8 reserved; +}; + +static int a6xx_gmu_rpmh_bw_votes_init(const struct a6xx_info *info, + struct a6xx_gmu *gmu) +{ + const struct bcm_db *bcm_data[GMU_MAX_BCMS] = { 0 }; + unsigned int bcm_index, bw_index, bcm_count = 0; + + if (!info->bcms) + return 0; + + /* Retrieve BCM data from cmd-db */ + for (bcm_index = 0; bcm_index < GMU_MAX_BCMS; bcm_index++) { + size_t count; + + /* Stop at first unconfigured bcm */ + if (!info->bcms[bcm_index].name) + break; + + bcm_data[bcm_index] = cmd_db_read_aux_data( + info->bcms[bcm_index].name, + &count); + if (IS_ERR(bcm_data[bcm_index])) + return PTR_ERR(bcm_data[bcm_index]); + + if (!count) + return -EINVAL; + + ++bcm_count; + } + + /* Generate BCM votes values for each bandwidth & BCM */ + for (bw_index = 0; bw_index < gmu->nr_gpu_bws; bw_index++) { + u32 *data = gmu->gpu_ib_votes[bw_index]; + u32 bw = gmu->gpu_bw_table[bw_index]; + + /* Calculations loosely copied from bcm_aggregate() & tcs_cmd_gen() */ + for (bcm_index = 0; bcm_index < bcm_count; bcm_index++) { + bool commit = false; + u64 peak; + u32 vote; + + /* Skip unconfigured BCM */ + if (!bcm_data[bcm_index]) + continue; + + if (bcm_index == bcm_count - 1 || + (bcm_data[bcm_index + 1] && + bcm_data[bcm_index]->vcd != bcm_data[bcm_index + 1]->vcd)) + commit = true; + + if (!bw) { + data[bcm_index] = BCM_TCS_CMD(commit, false, 0, 0); + continue; + } + + if (info->bcms[bcm_index].fixed) { + u32 perfmode = 0; + + if (bw >= info->bcms[bcm_index].perfmode_bw) + perfmode = info->bcms[bcm_index].perfmode; + + data[bcm_index] = BCM_TCS_CMD(commit, true, 0, perfmode); + continue; + } + + /* Multiply the bandwidth by the width of the connection */ + peak = (u64)bw * le16_to_cpu(bcm_data[bcm_index]->width); + do_div(peak, info->bcms[bcm_index].buswidth); + + /* Input bandwidth value is in KBps, scale the value to BCM unit */ + peak *= 1000ULL; + do_div(peak, le32_to_cpu(bcm_data[bcm_index]->unit)); + + vote = clamp(peak, 1, BCM_TCS_CMD_VOTE_MASK); + + data[bcm_index] = BCM_TCS_CMD(commit, true, vote, vote); + } + } + + return 0; +} + /* Return the 'arc-level' for the given frequency */ static unsigned int a6xx_gmu_get_arc_level(struct device *dev, unsigned long freq) @@ -1390,12 +1486,15 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device *dev, u32 *votes, * The GMU votes with the RPMh for itself and on behalf of the GPU but we need * to construct the list of votes on the CPU and send it over. Query the RPMh * voltage levels and build the votes + * The GMU can also vote for DDR interconnects, use the OPP bandwidth entries + * and BCM parameters to build the votes. */ static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu) { struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + const struct a6xx_info *info = adreno_gpu->info->a6xx; struct msm_gpu *gpu = &adreno_gpu->base; int ret; @@ -1407,6 +1506,10 @@ static int a6xx_gmu_rpmh_votes_init(struct a6xx_gmu *gmu) ret |= a6xx_gmu_rpmh_arc_votes_init(gmu->dev, gmu->cx_arc_votes, gmu->gmu_freqs, gmu->nr_gmu_freqs, "cx.lvl"); + /* Build the interconnect votes */ + if (info->bcms && gmu->nr_gpu_bws > 1) + ret |= a6xx_gmu_rpmh_bw_votes_init(info, gmu); + return ret; } @@ -1442,10 +1545,43 @@ static int a6xx_gmu_build_freq_table(struct device *dev, unsigned long *freqs, return index; } +static int a6xx_gmu_build_bw_table(struct device *dev, unsigned long *bandwidths, + u32 size) +{ + int count = dev_pm_opp_get_opp_count(dev); + struct dev_pm_opp *opp; + int i, index = 0; + unsigned int bandwidth = 1; + + /* + * The OPP table doesn't contain the "off" bandwidth level so we need to + * add 1 to the table size to account for it + */ + + if (WARN(count + 1 > size, + "The GMU bandwidth table is being truncated\n")) + count = size - 1; + + /* Set the "off" bandwidth */ + bandwidths[index++] = 0; + + for (i = 0; i < count; i++) { + opp = dev_pm_opp_find_bw_ceil(dev, &bandwidth, 0); + if (IS_ERR(opp)) + break; + + dev_pm_opp_put(opp); + bandwidths[index++] = bandwidth++; + } + + return index; +} + static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu) { struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + const struct a6xx_info *info = adreno_gpu->info->a6xx; struct msm_gpu *gpu = &adreno_gpu->base; int ret = 0; @@ -1472,6 +1608,14 @@ static int a6xx_gmu_pwrlevels_probe(struct a6xx_gmu *gmu) gmu->current_perf_index = gmu->nr_gpu_freqs - 1; + /* + * The GMU also handles GPU Interconnect Votes so build a list + * of DDR bandwidths from the GPU OPP table + */ + if (info->bcms) + gmu->nr_gpu_bws = a6xx_gmu_build_bw_table(&gpu->pdev->dev, + gmu->gpu_bw_table, ARRAY_SIZE(gmu->gpu_bw_table)); + /* Build the list of RPMh votes that we'll send to the GMU */ return a6xx_gmu_rpmh_votes_init(gmu); } diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h index 88f18ea6a38a08b5b171709e5020010947a5d347..2062a2be224768c1937d7768f7b8439920e9e127 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h @@ -21,6 +21,15 @@ struct a6xx_gmu_bo { #define GMU_MAX_GX_FREQS 16 #define GMU_MAX_CX_FREQS 4 +#define GMU_MAX_BCMS 3 + +struct a6xx_bcm { + char *name; + unsigned int buswidth; + bool fixed; + unsigned int perfmode; + unsigned int perfmode_bw; +}; /* * These define the different GMU wake up options - these define how both the @@ -85,6 +94,10 @@ struct a6xx_gmu { unsigned long gpu_freqs[GMU_MAX_GX_FREQS]; u32 gx_arc_votes[GMU_MAX_GX_FREQS]; + int nr_gpu_bws; + unsigned long gpu_bw_table[GMU_MAX_GX_FREQS]; + u32 gpu_ib_votes[GMU_MAX_GX_FREQS][GMU_MAX_BCMS]; + int nr_gmu_freqs; unsigned long gmu_freqs[GMU_MAX_CX_FREQS]; u32 cx_arc_votes[GMU_MAX_CX_FREQS]; diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 4aceffb6aae89c781facc2a6e4a82b20b341b6cb..9201a53dd341bf432923ffb44947e015208a3d02 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -44,6 +44,7 @@ struct a6xx_info { u32 gmu_chipid; u32 gmu_cgc_mode; u32 prim_fifo_threshold; + const struct a6xx_bcm *bcms; }; struct a6xx_gpu {