diff mbox series

tty: serial: qcom-geni-serial: minor fixes to get_clk_div_rate()

Message ID 1654021066-13341-1-git-send-email-quic_vnivarth@quicinc.com
State New
Headers show
Series tty: serial: qcom-geni-serial: minor fixes to get_clk_div_rate() | expand

Commit Message

Vijaya Krishna Nivarthi May 31, 2022, 6:17 p.m. UTC
Add missing initialisation and correct type casting

Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
---
 drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Greg KH May 31, 2022, 6:28 p.m. UTC | #1
On Tue, May 31, 2022 at 11:47:46PM +0530, Vijaya Krishna Nivarthi wrote:
> Add missing initialisation and correct type casting
> 
> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
> ---
>  drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You did not specify a description of why the patch is needed, or
  possibly, any description at all, in the email body.  Please read the
  section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what is needed in order to
  properly describe the change.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot
Vijaya Krishna Nivarthi June 1, 2022, 10:45 a.m. UTC | #2
Hi,

On 6/1/2022 12:58 AM, Doug Anderson wrote:
> Hi,
>
> On Tue, May 31, 2022 at 11:18 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Add missing initialisation and correct type casting
>>
>> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
>> ---
>>   drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
>> index 4733a23..08f3ad4 100644
>> --- a/drivers/tty/serial/qcom_geni_serial.c
>> +++ b/drivers/tty/serial/qcom_geni_serial.c
>> @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport)
>>   static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
>>                          unsigned int sampling_rate, unsigned int *clk_div)
>>   {
>> -       unsigned long ser_clk;
>> +       unsigned long ser_clk = 0;
> In this patch it's not at all obvious why you'd need to init to 0. I
> think the "for loop" is guaranteed to run at least once because
> "max_div" is known at compile time. ...and currently each time through
> the "for" loop you'll always set "ser_clk".

Ok, I realised we will never break out of for loop exceeding ULONG_MAX 
in 1st pass, so yes ser_clk will always be set.

> I think in a future patch you'll want to _remove_ this from the for loop:
>
> if (!prev)
>    ser_clk = freq;

Intent is to save (and use) 1st freq if we cannot find an exact divider.

Isn't it ok?

For example please find debug output for a required frequency of 51.2MHz.

We try dividers 1, 2, 3 and end up with 52.1MHz the first result.

[   18.815432] 20220509 get_clk_div_rate desired_clk:51200000
[   18.821081] 20220509 get_clk_div_rate maxdiv:4095
[   18.825924] 20220509 get_clk_div_rate div:1
[   18.830239] 20220509 get_clk_div_rate freq:52174000
[   18.835288] 20220509 get_clk_div_rate div:2
[   18.839628] 20220509 get_clk_div_rate freq:100000000
[   18.844794] 20220509 get_clk_div_rate div:3
[   18.849119] 20220509 get_clk_div_rate freq:100000000
[   18.854254] 20220509 get_clk_div_rate reached max frequency breaking...
[   18.861072] 20220509 get_clk_div_rate clk_div=1, ser_clk=52174000

The behaviour was same earlier too when root_freq table was present.

The table did contain 51.2MHz and we would exit with same but on call to 
clk_set_rate(51.2MHz) we were ending up with 52.1MHz

>
> ...and _that's_ when you should init "ser_clk" to 0. Until then I'd
> leave it as uninitialized...
>
> Honestly, I'd throw all the fixes into one series, too.

My concern was if there would be a requirement to split the changes.

Will put in all in 1 series with Fixes tag.

>
>
>>          unsigned long desired_clk;
>>          unsigned long freq, prev;
>>          unsigned long div, maxdiv;
>> -       int64_t mult;
>> +       unsigned long long mult;
>>
>>          desired_clk = baud * sampling_rate;
>>          if (!desired_clk) {
>> @@ -959,8 +959,8 @@ static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
>>          prev = 0;
>>
>>          for (div = 1; div <= maxdiv; div++) {
>> -               mult = div * desired_clk;
>> -               if (mult > ULONG_MAX)
>> +               mult = (unsigned long long)div * (unsigned long long)desired_clk;
> I think you only need to cast one of the two. The other will be
> up-cast automatically.
Will change.
>
>
>> +               if (mult > (unsigned long long)ULONG_MAX)
> I don't think you need this cast. As far as I know the C language will
> "upcast" to the larger of the two types.
Will change.
>
>
> -Doug

Thank you.

-Vijay/
Doug Anderson June 1, 2022, 3:33 p.m. UTC | #3
Hi,

On Wed, Jun 1, 2022 at 3:46 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
> On 6/1/2022 12:58 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, May 31, 2022 at 11:18 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >> Add missing initialisation and correct type casting
> >>
> >> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
> >> ---
> >>   drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
> >>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> >> index 4733a23..08f3ad4 100644
> >> --- a/drivers/tty/serial/qcom_geni_serial.c
> >> +++ b/drivers/tty/serial/qcom_geni_serial.c
> >> @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport)
> >>   static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
> >>                          unsigned int sampling_rate, unsigned int *clk_div)
> >>   {
> >> -       unsigned long ser_clk;
> >> +       unsigned long ser_clk = 0;
> > In this patch it's not at all obvious why you'd need to init to 0. I
> > think the "for loop" is guaranteed to run at least once because
> > "max_div" is known at compile time. ...and currently each time through
> > the "for" loop you'll always set "ser_clk".
>
> Ok, I realised we will never break out of for loop exceeding ULONG_MAX
> in 1st pass, so yes ser_clk will always be set.
>
> > I think in a future patch you'll want to _remove_ this from the for loop:
> >
> > if (!prev)
> >    ser_clk = freq;
>
> Intent is to save (and use) 1st freq if we cannot find an exact divider.
>
> Isn't it ok?
>
> For example please find debug output for a required frequency of 51.2MHz.
>
> We try dividers 1, 2, 3 and end up with 52.1MHz the first result.
>
> [   18.815432] 20220509 get_clk_div_rate desired_clk:51200000
> [   18.821081] 20220509 get_clk_div_rate maxdiv:4095
> [   18.825924] 20220509 get_clk_div_rate div:1
> [   18.830239] 20220509 get_clk_div_rate freq:52174000
> [   18.835288] 20220509 get_clk_div_rate div:2
> [   18.839628] 20220509 get_clk_div_rate freq:100000000
> [   18.844794] 20220509 get_clk_div_rate div:3
> [   18.849119] 20220509 get_clk_div_rate freq:100000000
> [   18.854254] 20220509 get_clk_div_rate reached max frequency breaking...
> [   18.861072] 20220509 get_clk_div_rate clk_div=1, ser_clk=52174000
>
> The behaviour was same earlier too when root_freq table was present.

Are you certain about the behavior being the same earlier? Before
commit c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart
frequency table..."), the behavior was that get_clk_cfg() would return
0 if there was no exact match. Then get_clk_div_rate() would see this
0 and print an error and return. Then the rest of
qcom_geni_serial_set_termios() would do nothing at all.

Ah, or I guess what you're saying is that the table historically
contained "rounded" rates but that clk_round_rate() isn't returning
nice round rates. OK, but if we truly want to support an inexact
match, you'd want to pick the rate that reduces the error, not just
pick the first one. In other words, something like this (untested):

freq = clk_round_rate(clk, mult);
diff = abs(((long)mult - freq) / div);
if (diff < best_diff) {
  best_diff = diff;
  ser_clk = freq;
  best_div = div;
}

Why do you need this? Imagine that the desired rate was 50000001 or
49999999. The closest match would be to use the rate 100000000 and
divide it by 2. ...but your existing algorithm would just arbitrarily
pick the first rate returned.

NOTE also that you could end up with a slightly higher or slightly
lower clock than requested, right? So it's important to:
* Do signed math when comparing.
* Save the "div" instead of trying to recompute it at the end.


> The table did contain 51.2MHz and we would exit with same but on call to
> clk_set_rate(51.2MHz) we were ending up with 52.1MHz
>
> >
> > ...and _that's_ when you should init "ser_clk" to 0. Until then I'd
> > leave it as uninitialized...
> >
> > Honestly, I'd throw all the fixes into one series, too.
>
> My concern was if there would be a requirement to split the changes.
>
> Will put in all in 1 series with Fixes tag.
>
> >
> >
> >>          unsigned long desired_clk;
> >>          unsigned long freq, prev;
> >>          unsigned long div, maxdiv;
> >> -       int64_t mult;
> >> +       unsigned long long mult;
> >>
> >>          desired_clk = baud * sampling_rate;
> >>          if (!desired_clk) {
> >> @@ -959,8 +959,8 @@ static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
> >>          prev = 0;
> >>
> >>          for (div = 1; div <= maxdiv; div++) {
> >> -               mult = div * desired_clk;
> >> -               if (mult > ULONG_MAX)
> >> +               mult = (unsigned long long)div * (unsigned long long)desired_clk;
> > I think you only need to cast one of the two. The other will be
> > up-cast automatically.
> Will change.
> >
> >
> >> +               if (mult > (unsigned long long)ULONG_MAX)
> > I don't think you need this cast. As far as I know the C language will
> > "upcast" to the larger of the two types.
> Will change.
> >
> >
> > -Doug
>
> Thank you.
>
> -Vijay/
>
Vijaya Krishna Nivarthi June 3, 2022, 5:43 p.m. UTC | #4
Hi,


On 6/1/2022 9:03 PM, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 1, 2022 at 3:46 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>> On 6/1/2022 12:58 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Tue, May 31, 2022 at 11:18 AM Vijaya Krishna Nivarthi
>>> <quic_vnivarth@quicinc.com> wrote:
>>>> Add missing initialisation and correct type casting
>>>>
>>>> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
>>>> ---
>>>>    drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
>>>> index 4733a23..08f3ad4 100644
>>>> --- a/drivers/tty/serial/qcom_geni_serial.c
>>>> +++ b/drivers/tty/serial/qcom_geni_serial.c
>>>> @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport)
>>>>    static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
>>>>                           unsigned int sampling_rate, unsigned int *clk_div)
>>>>    {
>>>> -       unsigned long ser_clk;
>>>> +       unsigned long ser_clk = 0;
>>> In this patch it's not at all obvious why you'd need to init to 0. I
>>> think the "for loop" is guaranteed to run at least once because
>>> "max_div" is known at compile time. ...and currently each time through
>>> the "for" loop you'll always set "ser_clk".
>> Ok, I realised we will never break out of for loop exceeding ULONG_MAX
>> in 1st pass, so yes ser_clk will always be set.
>>
>>> I think in a future patch you'll want to _remove_ this from the for loop:
>>>
>>> if (!prev)
>>>     ser_clk = freq;
>> Intent is to save (and use) 1st freq if we cannot find an exact divider.
>>
>> Isn't it ok?
>>
>> For example please find debug output for a required frequency of 51.2MHz.
>>
>> We try dividers 1, 2, 3 and end up with 52.1MHz the first result.
>>
>> [   18.815432] 20220509 get_clk_div_rate desired_clk:51200000
>> [   18.821081] 20220509 get_clk_div_rate maxdiv:4095
>> [   18.825924] 20220509 get_clk_div_rate div:1
>> [   18.830239] 20220509 get_clk_div_rate freq:52174000
>> [   18.835288] 20220509 get_clk_div_rate div:2
>> [   18.839628] 20220509 get_clk_div_rate freq:100000000
>> [   18.844794] 20220509 get_clk_div_rate div:3
>> [   18.849119] 20220509 get_clk_div_rate freq:100000000
>> [   18.854254] 20220509 get_clk_div_rate reached max frequency breaking...
>> [   18.861072] 20220509 get_clk_div_rate clk_div=1, ser_clk=52174000
>>
>> The behaviour was same earlier too when root_freq table was present.
> Are you certain about the behavior being the same earlier? Before
> commit c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart
> frequency table..."), the behavior was that get_clk_cfg() would return
> 0 if there was no exact match. Then get_clk_div_rate() would see this
> 0 and print an error and return. Then the rest of
> qcom_geni_serial_set_termios() would do nothing at all.
>
> Ah, or I guess what you're saying is that the table historically
> contained "rounded" rates but that clk_round_rate() isn't returning
> nice round rates. OK, but if we truly want to support an inexact
> match, you'd want to pick the rate that reduces the error, not just
> pick the first one. In other words, something like this (untested):
>
> freq = clk_round_rate(clk, mult);
> diff = abs(((long)mult - freq) / div);
> if (diff < best_diff) {
>    best_diff = diff;
>    ser_clk = freq;
>    best_div = div;
> }
I am not sure if its required that freq is a multiple of best_div now 
that we don't have a multiple of desired_clk anyway.

If it is indeed required, with above patch its not guaranteed and 
finding best_div gets little more complicated?

We may have to loop through all available frequencies and dividers?

PFB, a proposed implementation with a 2nd loop. Its tested but I haven't 
been able to optimise it further because it misses corner theoretical 
cases when I try


     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
     prev = 0;

     /* run through quicker loop anticipating to find an exact match */
     for (div = 1; div <= maxdiv; div++) {
         mult = (unsigned long long)div * desired_clk;
         if (mult > ULONG_MAX)
             break;

         freq = clk_round_rate(clk, max((unsigned long)mult, prev+1));
         if (!(freq % desired_clk)) {
             *clk_div = freq / desired_clk;
             return freq;
         }

         if (prev && prev == freq)
             break;

         prev = freq;
     }

     pr_warn("Can't find exact match frequency and divider\n");

     /*
      * this scenario ideally should be a rare occurrence
      * run through all frequencies and find closest match
      * note that it cannot get better than a difference of 1
      */
     freq = 0;
     best_diff = ULONG_MAX;
     while (true) {
         prev = freq;
         freq = clk_round_rate(clk, freq+1);

         if (freq == prev)
             break;

         for (div = 1; div <= maxdiv; div++) {
             if (!(freq % div)) {
                 diff = abs((long)(freq/div) - desired_clk);
                 if (diff < best_diff) {
                     best_diff = diff;
                     ser_clk = freq;
                     *clk_div = div;
                     if (diff == 1)
                         break;
                 }
             }
         }
     }

     return ser_clk;
}

>
> Why do you need this? Imagine that the desired rate was 50000001 or
> 49999999. The closest match would be to use the rate 100000000 and
> divide it by 2. ...but your existing algorithm would just arbitrarily
> pick the first rate returned.
>
> NOTE also that you could end up with a slightly higher or slightly
> lower clock than requested, right? So it's important to:
> * Do signed math when comparing.
> * Save the "div" instead of trying to recompute it at the end.
>
>
>> The table did contain 51.2MHz and we would exit with same but on call to
>> clk_set_rate(51.2MHz) we were ending up with 52.1MHz
>>
>>> ...and _that's_ when you should init "ser_clk" to 0. Until then I'd
>>> leave it as uninitialized...
>>>
>>> Honestly, I'd throw all the fixes into one series, too.
>> My concern was if there would be a requirement to split the changes.
>>
>> Will put in all in 1 series with Fixes tag.
>>
>>>
>>>>           unsigned long desired_clk;
>>>>           unsigned long freq, prev;
>>>>           unsigned long div, maxdiv;
>>>> -       int64_t mult;
>>>> +       unsigned long long mult;
>>>>
>>>>           desired_clk = baud * sampling_rate;
>>>>           if (!desired_clk) {
>>>> @@ -959,8 +959,8 @@ static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
>>>>           prev = 0;
>>>>
>>>>           for (div = 1; div <= maxdiv; div++) {
>>>> -               mult = div * desired_clk;
>>>> -               if (mult > ULONG_MAX)
>>>> +               mult = (unsigned long long)div * (unsigned long long)desired_clk;
>>> I think you only need to cast one of the two. The other will be
>>> up-cast automatically.
>> Will change.
>>>
>>>> +               if (mult > (unsigned long long)ULONG_MAX)
>>> I don't think you need this cast. As far as I know the C language will
>>> "upcast" to the larger of the two types.
>> Will change.
>>>
>>> -Doug
>> Thank you.
>>
>> -Vijay/
>>
Doug Anderson June 3, 2022, 6:40 p.m. UTC | #5
Hi,

On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
>
> On 6/1/2022 9:03 PM, Doug Anderson wrote:
> > Hi,
> >
> > On Wed, Jun 1, 2022 at 3:46 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >> Hi,
> >>
> >> On 6/1/2022 12:58 AM, Doug Anderson wrote:
> >>> Hi,
> >>>
> >>> On Tue, May 31, 2022 at 11:18 AM Vijaya Krishna Nivarthi
> >>> <quic_vnivarth@quicinc.com> wrote:
> >>>> Add missing initialisation and correct type casting
> >>>>
> >>>> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
> >>>> ---
> >>>>    drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
> >>>>    1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> >>>> index 4733a23..08f3ad4 100644
> >>>> --- a/drivers/tty/serial/qcom_geni_serial.c
> >>>> +++ b/drivers/tty/serial/qcom_geni_serial.c
> >>>> @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport)
> >>>>    static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
> >>>>                           unsigned int sampling_rate, unsigned int *clk_div)
> >>>>    {
> >>>> -       unsigned long ser_clk;
> >>>> +       unsigned long ser_clk = 0;
> >>> In this patch it's not at all obvious why you'd need to init to 0. I
> >>> think the "for loop" is guaranteed to run at least once because
> >>> "max_div" is known at compile time. ...and currently each time through
> >>> the "for" loop you'll always set "ser_clk".
> >> Ok, I realised we will never break out of for loop exceeding ULONG_MAX
> >> in 1st pass, so yes ser_clk will always be set.
> >>
> >>> I think in a future patch you'll want to _remove_ this from the for loop:
> >>>
> >>> if (!prev)
> >>>     ser_clk = freq;
> >> Intent is to save (and use) 1st freq if we cannot find an exact divider.
> >>
> >> Isn't it ok?
> >>
> >> For example please find debug output for a required frequency of 51.2MHz.
> >>
> >> We try dividers 1, 2, 3 and end up with 52.1MHz the first result.
> >>
> >> [   18.815432] 20220509 get_clk_div_rate desired_clk:51200000
> >> [   18.821081] 20220509 get_clk_div_rate maxdiv:4095
> >> [   18.825924] 20220509 get_clk_div_rate div:1
> >> [   18.830239] 20220509 get_clk_div_rate freq:52174000
> >> [   18.835288] 20220509 get_clk_div_rate div:2
> >> [   18.839628] 20220509 get_clk_div_rate freq:100000000
> >> [   18.844794] 20220509 get_clk_div_rate div:3
> >> [   18.849119] 20220509 get_clk_div_rate freq:100000000
> >> [   18.854254] 20220509 get_clk_div_rate reached max frequency breaking...
> >> [   18.861072] 20220509 get_clk_div_rate clk_div=1, ser_clk=52174000
> >>
> >> The behaviour was same earlier too when root_freq table was present.
> > Are you certain about the behavior being the same earlier? Before
> > commit c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart
> > frequency table..."), the behavior was that get_clk_cfg() would return
> > 0 if there was no exact match. Then get_clk_div_rate() would see this
> > 0 and print an error and return. Then the rest of
> > qcom_geni_serial_set_termios() would do nothing at all.
> >
> > Ah, or I guess what you're saying is that the table historically
> > contained "rounded" rates but that clk_round_rate() isn't returning
> > nice round rates. OK, but if we truly want to support an inexact
> > match, you'd want to pick the rate that reduces the error, not just
> > pick the first one. In other words, something like this (untested):
> >
> > freq = clk_round_rate(clk, mult);
> > diff = abs(((long)mult - freq) / div);
> > if (diff < best_diff) {
> >    best_diff = diff;
> >    ser_clk = freq;
> >    best_div = div;
> > }
> I am not sure if its required that freq is a multiple of best_div now
> that we don't have a multiple of desired_clk anyway.

How about just this (untested):

freq = clk_round_rate(clk, mult);
candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
candidate_freq = freq / candidate_div;
diff = abs((long)desired_clk - candidate_freq);
if (diff < best_diff) {
  best_diff = diff;
  ser_clk = freq;
  best_div = candidate_div;
}

Here:

freq: a freq we can definitely make

candidate_div: the best number to divide freq by to get the desired clock.

candidate_freq: the frequency we'll end up if we divide freq by
candidate_div. We want this to be close to desired_clk.

diff: how far away the candidate_freq is away from what we want.

best_diff: how far away the best candidate was from what we wanted.

ser_clk: What we should pass to clk_set_rate() to get the best candidate.

best_div: What we should use as a divider to get the best candidate.


-Doug
Doug Anderson June 6, 2022, 7:59 p.m. UTC | #6
Hi,

On Mon, Jun 6, 2022 at 11:19 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
>
> On 6/4/2022 12:10 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >>
> >> Ah, or I guess what you're saying is that the table historically
> >> contained "rounded" rates but that clk_round_rate() isn't returning
> >> nice round rates. OK, but if we truly want to support an inexact
> >> match, you'd want to pick the rate that reduces the error, not just
> >> pick the first one. In other words, something like this (untested):
> >>
> >> freq = clk_round_rate(clk, mult);
> >> diff = abs(((long)mult - freq) / div);
> >> if (diff < best_diff) {
> >>     best_diff = diff;
> >>     ser_clk = freq;
> >>     best_div = div;
> >> }
> >> I am not sure if its required that freq is a multiple of best_div now
> >> that we don't have a multiple of desired_clk anyway.
> > How about just this (untested):
> >
> > freq = clk_round_rate(clk, mult);
> > candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> > candidate_freq = freq / candidate_div;
> > diff = abs((long)desired_clk - candidate_freq);
> > if (diff < best_diff) {
> >    best_diff = diff;
> >    ser_clk = freq;
> >    best_div = candidate_div;
> > }
>
> I am afraid this still doesn't guarantee that ser_clk is a multiple of
> best_div

OK. ...I guess my question would be: does it matter for some reason?
"ser_clk" is just a local variable in this function. Who cares if it's
not a multiple of best_div? This is why we're keeping track of
"best_div" in the first place, so that later in the function instead
of:

*clk_div = ser_clk / desired_clk;
if (!(*clk_div))
  *clk_div = 1;

You just do:

*clk_div = best_div;


> I tested it with a function simulates clk_round_rate.
>
> static unsigned long clk_round_rate_test(struct clk *clk, unsigned long
> in_freq)
> {
>      unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602};
>      int i;
>
>      for (i = 0; i < 6; i++) {
>          if (root_freq[i] >= in_freq)
>              return root_freq[i];
>      }
>      return root_freq[6];
> }
>
>      {
>          unsigned long ser_clk;
>          unsigned long desired_clk;
>          unsigned long freq;
>          int div_round_closest;
>          unsigned long div;
>          unsigned long mult;
>          unsigned long candidate_div, candidate_freq;
>
>          unsigned long diff, best_diff, best_div;
>          unsigned long one;
>
>          desired_clk = 100;
>          one = 1;
>          best_diff = ULONG_MAX;
>          pr_err("\ndesired_clk-%d\n", desired_clk);
>          for (div = 1; div <= 10; div++) {
>              mult = div * desired_clk;
>
>              freq = clk_round_rate_test(clk, mult);
>              div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk);
>              candidate_div = max(one, (unsigned long)div_round_closest);
>              candidate_freq = freq / candidate_div;
>              diff = abs((long)desired_clk - candidate_freq);
>              pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d,
> candidate_div-%d, candidate_freq-%d, diff-%d\n",
>                  div, mult, freq, div_round_closest, candidate_div,
> candidate_freq, diff);
>              if (diff < best_diff) {
>                  pr_err("This is best so far\n");
>                  best_diff = diff;
>                  ser_clk = freq;
>                  best_div = candidate_div;
>              }
>          }
>          pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n",
>              best_diff, ser_clk, best_div);
>      }
>
> And here is the output
>
> [   17.835167] desired_clk-100
> [   17.839567] div-1, mult-100, freq-105, div_round_closest-1,
> candidate_div-1, candidate_freq-105, diff-5
> [   17.849220] This is best so far
> [   17.852458] div-2, mult-200, freq-204, div_round_closest-2,
> candidate_div-2, candidate_freq-102, diff-2
> [   17.862104] This is best so far
> [   17.865345] div-3, mult-300, freq-303, div_round_closest-3,
> candidate_div-3, candidate_freq-101, diff-1
> [   17.874995] This is best so far
> [   17.878237] div-4, mult-400, freq-402, div_round_closest-4,
> candidate_div-4, candidate_freq-100, diff-0
> [   17.887882] This is best so far
> [   17.891118] div-5, mult-500, freq-501, div_round_closest-5,
> candidate_div-5, candidate_freq-100, diff-0
> [   17.900770] div-6, mult-600, freq-602, div_round_closest-6,
> candidate_div-6, candidate_freq-100, diff-0
> [   17.910415] div-7, mult-700, freq-602, div_round_closest-6,
> candidate_div-6, candidate_freq-100, diff-0
> [   17.920057] div-8, mult-800, freq-602, div_round_closest-6,
> candidate_div-6, candidate_freq-100, diff-0
> [   17.929703] div-9, mult-900, freq-602, div_round_closest-6,
> candidate_div-6, candidate_freq-100, diff-0
> [   17.939353] div-10, mult-1000, freq-602, div_round_closest-6,
> candidate_div-6, candidate_freq-100, diff-0
> [   17.949181]
> [   17.949181] best_diff-0, ser_clk-402, best_div-4

That doesn't look like a terrible result. I guess nominally 602 is a
better approximation, but if we're accepting that we're not going to
have an exact rate anyway then maybe being off by that tiny amount
doesn't matter and we'd do better with the slow clock (maybe saves
power?)


> Please note that we go past cases when we have an divider that can
> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
> that doesn't.

Ah, good point. Luckily that's a 1-line fix, right?


-Doug
Vijaya Krishna Nivarthi June 7, 2022, 5:40 p.m. UTC | #7
Hi,

On 6/7/2022 1:29 AM, Doug Anderson wrote:
> Hi,
>
> On Mon, Jun 6, 2022 at 11:19 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>>
>> On 6/4/2022 12:10 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi
>>> <quic_vnivarth@quicinc.com> wrote:
>>>> Ah, or I guess what you're saying is that the table historically
>>>> contained "rounded" rates but that clk_round_rate() isn't returning
>>>> nice round rates. OK, but if we truly want to support an inexact
>>>> match, you'd want to pick the rate that reduces the error, not just
>>>> pick the first one. In other words, something like this (untested):
>>>>
>>>> freq = clk_round_rate(clk, mult);
>>>> diff = abs(((long)mult - freq) / div);
>>>> if (diff < best_diff) {
>>>>      best_diff = diff;
>>>>      ser_clk = freq;
>>>>      best_div = div;
>>>> }
>>>> I am not sure if its required that freq is a multiple of best_div now
>>>> that we don't have a multiple of desired_clk anyway.
>>> How about just this (untested):
>>>
>>> freq = clk_round_rate(clk, mult);
>>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
>>> candidate_freq = freq / candidate_div;
>>> diff = abs((long)desired_clk - candidate_freq);
>>> if (diff < best_diff) {
>>>     best_diff = diff;
>>>     ser_clk = freq;
>>>     best_div = candidate_div;
>>> }
>> I am afraid this still doesn't guarantee that ser_clk is a multiple of
>> best_div
> OK. ...I guess my question would be: does it matter for some reason?
> "ser_clk" is just a local variable in this function. Who cares if it's
> not a multiple of best_div? This is why we're keeping track of
> "best_div" in the first place, so that later in the function instead
> of:
>
> *clk_div = ser_clk / desired_clk;
> if (!(*clk_div))
>    *clk_div = 1;
>
> You just do:
>
> *clk_div = best_div;

My only concern continues to be...

Given ser_clk is the final frequency that this function is going to 
return and best_div is going to be the clk_divider, is it ok if the 
divider cant divide the frequency exactly?

In other words, Can this function output combinations like (402,4)  
(501,5) ?

If ok, then we can go ahead with this patch or even previous perhaps.

>
>> I tested it with a function simulates clk_round_rate.
>>
>> static unsigned long clk_round_rate_test(struct clk *clk, unsigned long
>> in_freq)
>> {
>>       unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602};
>>       int i;
>>
>>       for (i = 0; i < 6; i++) {
>>           if (root_freq[i] >= in_freq)
>>               return root_freq[i];
>>       }
>>       return root_freq[6];
>> }
>>
>>       {
>>           unsigned long ser_clk;
>>           unsigned long desired_clk;
>>           unsigned long freq;
>>           int div_round_closest;
>>           unsigned long div;
>>           unsigned long mult;
>>           unsigned long candidate_div, candidate_freq;
>>
>>           unsigned long diff, best_diff, best_div;
>>           unsigned long one;
>>
>>           desired_clk = 100;
>>           one = 1;
>>           best_diff = ULONG_MAX;
>>           pr_err("\ndesired_clk-%d\n", desired_clk);
>>           for (div = 1; div <= 10; div++) {
>>               mult = div * desired_clk;
>>
>>               freq = clk_round_rate_test(clk, mult);
>>               div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk);
>>               candidate_div = max(one, (unsigned long)div_round_closest);
>>               candidate_freq = freq / candidate_div;
>>               diff = abs((long)desired_clk - candidate_freq);
>>               pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d,
>> candidate_div-%d, candidate_freq-%d, diff-%d\n",
>>                   div, mult, freq, div_round_closest, candidate_div,
>> candidate_freq, diff);
>>               if (diff < best_diff) {
>>                   pr_err("This is best so far\n");
>>                   best_diff = diff;
>>                   ser_clk = freq;
>>                   best_div = candidate_div;
>>               }
>>           }
>>           pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n",
>>               best_diff, ser_clk, best_div);
>>       }
>>
>> And here is the output
>>
>> [   17.835167] desired_clk-100
>> [   17.839567] div-1, mult-100, freq-105, div_round_closest-1,
>> candidate_div-1, candidate_freq-105, diff-5
>> [   17.849220] This is best so far
>> [   17.852458] div-2, mult-200, freq-204, div_round_closest-2,
>> candidate_div-2, candidate_freq-102, diff-2
>> [   17.862104] This is best so far
>> [   17.865345] div-3, mult-300, freq-303, div_round_closest-3,
>> candidate_div-3, candidate_freq-101, diff-1
>> [   17.874995] This is best so far
>> [   17.878237] div-4, mult-400, freq-402, div_round_closest-4,
>> candidate_div-4, candidate_freq-100, diff-0
>> [   17.887882] This is best so far
>> [   17.891118] div-5, mult-500, freq-501, div_round_closest-5,
>> candidate_div-5, candidate_freq-100, diff-0
>> [   17.900770] div-6, mult-600, freq-602, div_round_closest-6,
>> candidate_div-6, candidate_freq-100, diff-0
>> [   17.910415] div-7, mult-700, freq-602, div_round_closest-6,
>> candidate_div-6, candidate_freq-100, diff-0
>> [   17.920057] div-8, mult-800, freq-602, div_round_closest-6,
>> candidate_div-6, candidate_freq-100, diff-0
>> [   17.929703] div-9, mult-900, freq-602, div_round_closest-6,
>> candidate_div-6, candidate_freq-100, diff-0
>> [   17.939353] div-10, mult-1000, freq-602, div_round_closest-6,
>> candidate_div-6, candidate_freq-100, diff-0
>> [   17.949181]
>> [   17.949181] best_diff-0, ser_clk-402, best_div-4
> That doesn't look like a terrible result. I guess nominally 602 is a
> better approximation, but if we're accepting that we're not going to
> have an exact rate anyway then maybe being off by that tiny amount
> doesn't matter and we'd do better with the slow clock (maybe saves
> power?)
Actually power saving was the anticipation behind returning first 
frequency in original patch, when we cant find exact frequency.
>
>> Please note that we go past cases when we have an divider that can
>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
>> that doesn't.
> Ah, good point. Luckily that's a 1-line fix, right?

Apologies, I could not figure out how.

Thank you.

>
>
> -Doug
Doug Anderson June 7, 2022, 7:25 p.m. UTC | #8
Hi,

On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
> On 6/7/2022 1:29 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Jun 6, 2022 at 11:19 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >> Hi,
> >>
> >>
> >> On 6/4/2022 12:10 AM, Doug Anderson wrote:
> >>> Hi,
> >>>
> >>> On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi
> >>> <quic_vnivarth@quicinc.com> wrote:
> >>>> Ah, or I guess what you're saying is that the table historically
> >>>> contained "rounded" rates but that clk_round_rate() isn't returning
> >>>> nice round rates. OK, but if we truly want to support an inexact
> >>>> match, you'd want to pick the rate that reduces the error, not just
> >>>> pick the first one. In other words, something like this (untested):
> >>>>
> >>>> freq = clk_round_rate(clk, mult);
> >>>> diff = abs(((long)mult - freq) / div);
> >>>> if (diff < best_diff) {
> >>>>      best_diff = diff;
> >>>>      ser_clk = freq;
> >>>>      best_div = div;
> >>>> }
> >>>> I am not sure if its required that freq is a multiple of best_div now
> >>>> that we don't have a multiple of desired_clk anyway.
> >>> How about just this (untested):
> >>>
> >>> freq = clk_round_rate(clk, mult);
> >>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> >>> candidate_freq = freq / candidate_div;
> >>> diff = abs((long)desired_clk - candidate_freq);
> >>> if (diff < best_diff) {
> >>>     best_diff = diff;
> >>>     ser_clk = freq;
> >>>     best_div = candidate_div;
> >>> }
> >> I am afraid this still doesn't guarantee that ser_clk is a multiple of
> >> best_div
> > OK. ...I guess my question would be: does it matter for some reason?
> > "ser_clk" is just a local variable in this function. Who cares if it's
> > not a multiple of best_div? This is why we're keeping track of
> > "best_div" in the first place, so that later in the function instead
> > of:
> >
> > *clk_div = ser_clk / desired_clk;
> > if (!(*clk_div))
> >    *clk_div = 1;
> >
> > You just do:
> >
> > *clk_div = best_div;
>
> My only concern continues to be...
>
> Given ser_clk is the final frequency that this function is going to
> return and best_div is going to be the clk_divider, is it ok if the
> divider cant divide the frequency exactly?
>
> In other words, Can this function output combinations like (402,4)
> (501,5) ?
>
> If ok, then we can go ahead with this patch or even previous perhaps.

I don't see why not. You're basically just getting a resulting clock
that's not an integral "Hz", right?

So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
* 16) = 153600

Let's imagine that we do all the math and we finally decide that our
best bet is with the rate 922000 and a divider of 6. That means that
the actual clock we'll make is 153666.67 when we _wanted_ 153600.
There's no reason it needs to be integral, though, and 153666.67 would
still be better than making 160000.


> >> I tested it with a function simulates clk_round_rate.
> >>
> >> static unsigned long clk_round_rate_test(struct clk *clk, unsigned long
> >> in_freq)
> >> {
> >>       unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602};
> >>       int i;
> >>
> >>       for (i = 0; i < 6; i++) {
> >>           if (root_freq[i] >= in_freq)
> >>               return root_freq[i];
> >>       }
> >>       return root_freq[6];
> >> }
> >>
> >>       {
> >>           unsigned long ser_clk;
> >>           unsigned long desired_clk;
> >>           unsigned long freq;
> >>           int div_round_closest;
> >>           unsigned long div;
> >>           unsigned long mult;
> >>           unsigned long candidate_div, candidate_freq;
> >>
> >>           unsigned long diff, best_diff, best_div;
> >>           unsigned long one;
> >>
> >>           desired_clk = 100;
> >>           one = 1;
> >>           best_diff = ULONG_MAX;
> >>           pr_err("\ndesired_clk-%d\n", desired_clk);
> >>           for (div = 1; div <= 10; div++) {
> >>               mult = div * desired_clk;
> >>
> >>               freq = clk_round_rate_test(clk, mult);
> >>               div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk);
> >>               candidate_div = max(one, (unsigned long)div_round_closest);
> >>               candidate_freq = freq / candidate_div;
> >>               diff = abs((long)desired_clk - candidate_freq);
> >>               pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d,
> >> candidate_div-%d, candidate_freq-%d, diff-%d\n",
> >>                   div, mult, freq, div_round_closest, candidate_div,
> >> candidate_freq, diff);
> >>               if (diff < best_diff) {
> >>                   pr_err("This is best so far\n");
> >>                   best_diff = diff;
> >>                   ser_clk = freq;
> >>                   best_div = candidate_div;
> >>               }
> >>           }
> >>           pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n",
> >>               best_diff, ser_clk, best_div);
> >>       }
> >>
> >> And here is the output
> >>
> >> [   17.835167] desired_clk-100
> >> [   17.839567] div-1, mult-100, freq-105, div_round_closest-1,
> >> candidate_div-1, candidate_freq-105, diff-5
> >> [   17.849220] This is best so far
> >> [   17.852458] div-2, mult-200, freq-204, div_round_closest-2,
> >> candidate_div-2, candidate_freq-102, diff-2
> >> [   17.862104] This is best so far
> >> [   17.865345] div-3, mult-300, freq-303, div_round_closest-3,
> >> candidate_div-3, candidate_freq-101, diff-1
> >> [   17.874995] This is best so far
> >> [   17.878237] div-4, mult-400, freq-402, div_round_closest-4,
> >> candidate_div-4, candidate_freq-100, diff-0
> >> [   17.887882] This is best so far
> >> [   17.891118] div-5, mult-500, freq-501, div_round_closest-5,
> >> candidate_div-5, candidate_freq-100, diff-0
> >> [   17.900770] div-6, mult-600, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.910415] div-7, mult-700, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.920057] div-8, mult-800, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.929703] div-9, mult-900, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.939353] div-10, mult-1000, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.949181]
> >> [   17.949181] best_diff-0, ser_clk-402, best_div-4
> > That doesn't look like a terrible result. I guess nominally 602 is a
> > better approximation, but if we're accepting that we're not going to
> > have an exact rate anyway then maybe being off by that tiny amount
> > doesn't matter and we'd do better with the slow clock (maybe saves
> > power?)
> Actually power saving was the anticipation behind returning first
> frequency in original patch, when we cant find exact frequency.

Right, except that if you just pick the first clock you find it would
be _wildly_ off. I guess if you really want to do this the right way,
you need to set a maximum tolerance and pick the first rate you find
that meets that tolerance. Random web search for "uart baud rate
tolerance" makes me believe that +/- 5% deviation is OK, but to be
safe you probably want something lower. Maybe 2%? So if the desired
clock is within 2% of a clock you can make, can you just pick that
one?


> >> Please note that we go past cases when we have an divider that can
> >> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
> >> that doesn't.
> > Ah, good point. Luckily that's a 1-line fix, right?
>
> Apologies, I could not figure out how.

Ah, sorry. Not quite 1 line, but this (untested)


freq = clk_round_rate(clk, mult);

if (freq % desired_clk == 0) {
 ser_clk = freq;
 best_div = freq / desired_clk;
 break;
}

candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
candidate_freq = freq / candidate_div;
diff = abs((long)desired_clk - candidate_freq);
if (diff < best_diff) {
  best_diff = diff;
  ser_clk = freq;
  best_div = candidate_div;
}
Vijaya Krishna Nivarthi June 8, 2022, 6:33 p.m. UTC | #9
Hi,


On 6/8/2022 12:55 AM, Doug Anderson wrote:
> Hi,
>
> On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>> On 6/7/2022 1:29 AM, Doug Anderson wrote:
>>
>> My only concern continues to be...
>>
>> Given ser_clk is the final frequency that this function is going to
>> return and best_div is going to be the clk_divider, is it ok if the
>> divider cant divide the frequency exactly?
>>
>> In other words, Can this function output combinations like (402,4)
>> (501,5) ?
>>
>> If ok, then we can go ahead with this patch or even previous perhaps.
> I don't see why not. You're basically just getting a resulting clock
> that's not an integral "Hz", right?
>
> So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
> * 16) = 153600
>
> Let's imagine that we do all the math and we finally decide that our
> best bet is with the rate 922000 and a divider of 6. That means that
> the actual clock we'll make is 153666.67 when we _wanted_ 153600.
> There's no reason it needs to be integral, though, and 153666.67 would
> still be better than making 160000.
>
Thank you for clarification.
>>> power?)
>> Actually power saving was the anticipation behind returning first
>> frequency in original patch, when we cant find exact frequency.
> Right, except that if you just pick the first clock you find it would
> be _wildly_ off. I guess if you really want to do this the right way,
> you need to set a maximum tolerance and pick the first rate you find
> that meets that tolerance. Random web search for "uart baud rate
> tolerance" makes me believe that +/- 5% deviation is OK, but to be
> safe you probably want something lower. Maybe 2%? So if the desired
> clock is within 2% of a clock you can make, can you just pick that
> one?
Ok, 2% seems good.
>
>>>> Please note that we go past cases when we have an divider that can
>>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
>>>> that doesn't.
>>> Ah, good point. Luckily that's a 1-line fix, right?
>> Apologies, I could not figure out how.
> Ah, sorry. Not quite 1 line, but this (untested)
>
>
> freq = clk_round_rate(clk, mult);
>
> if (freq % desired_clk == 0) {
>   ser_clk = freq;
>   best_div = freq / desired_clk;
>   break;
> }
>
> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> candidate_freq = freq / candidate_div;
> diff = abs((long)desired_clk - candidate_freq);
> if (diff < best_diff) {
>    best_diff = diff;
>    ser_clk = freq;
>    best_div = candidate_div;
> }

But then once again, we would likely need 2 loops because while we are 
ok with giving up on search for best_div on finding something within 2% 
tolerance, we may not want to give up on exact match (freq % desired_clk 
== 0 )

So how about something like this with 2 loops (more optimised than 
previous version with 2 loops)? (untested)


     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
     prev = 0;

     /* run through quicker loop anticipating to find an exact match */
     for (div = 1; div <= maxdiv; div++) {
         mult = (unsigned long long)div * desired_clk;
         if (mult > ULONG_MAX)
             break;

         freq = clk_round_rate(clk, max((unsigned long)mult, prev+1));
         if (!(freq % desired_clk)) {
             *clk_div = freq / desired_clk;
             return freq;
         }

         if (prev && prev == freq)
             break;

         prev = freq;
     }

     pr_warn("Can't find exact match frequency and divider\n");

     freq = 0;
     best_diff = ULONG_MAX;
     prev_candidate_div = -1;
     while (true) {
         prev = freq;
         freq = clk_round_rate(clk, freq+1);

         if (freq == prev)
             break; /* end of table */

         candidate_div = DIV_ROUND_CLOSEST(freq, desired_clk);
         /*
          * Since the frequencies are increasing, previous is better
          * if we have same divider, proceed to next in table
          */
         if (prev_candidate_div == candidate_div)
             continue;
         prev_candidate_div = candidate_div;

         if (candidate_div)
             candidate_freq = freq / candidate_div;
         else
             candidate_freq = freq;

         diff = abs((long)desired_clk - candidate_freq);
         if (diff < best_diff) {
             best_diff = diff;
             ser_clk = freq;
             *clk_div = candidate_div;
             if (diff * 50 < ser_clk) {
                 two_percent_tolerance = true;
                 break;
             }
         }
     }

     if (!two_percent_tolerance) {
         pr_warn("Can't find frequency within 2 percent tolerance\n");
     }

     return ser_clk;
}

Thank you.
Doug Anderson June 8, 2022, 10:37 p.m. UTC | #10
Hi,

On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
>
> On 6/8/2022 12:55 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >> Hi,
> >>
> >> On 6/7/2022 1:29 AM, Doug Anderson wrote:
> >>
> >> My only concern continues to be...
> >>
> >> Given ser_clk is the final frequency that this function is going to
> >> return and best_div is going to be the clk_divider, is it ok if the
> >> divider cant divide the frequency exactly?
> >>
> >> In other words, Can this function output combinations like (402,4)
> >> (501,5) ?
> >>
> >> If ok, then we can go ahead with this patch or even previous perhaps.
> > I don't see why not. You're basically just getting a resulting clock
> > that's not an integral "Hz", right?
> >
> > So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
> > * 16) = 153600
> >
> > Let's imagine that we do all the math and we finally decide that our
> > best bet is with the rate 922000 and a divider of 6. That means that
> > the actual clock we'll make is 153666.67 when we _wanted_ 153600.
> > There's no reason it needs to be integral, though, and 153666.67 would
> > still be better than making 160000.
> >
> Thank you for clarification.
> >>> power?)
> >> Actually power saving was the anticipation behind returning first
> >> frequency in original patch, when we cant find exact frequency.
> > Right, except that if you just pick the first clock you find it would
> > be _wildly_ off. I guess if you really want to do this the right way,
> > you need to set a maximum tolerance and pick the first rate you find
> > that meets that tolerance. Random web search for "uart baud rate
> > tolerance" makes me believe that +/- 5% deviation is OK, but to be
> > safe you probably want something lower. Maybe 2%? So if the desired
> > clock is within 2% of a clock you can make, can you just pick that
> > one?
> Ok, 2% seems good.
> >
> >>>> Please note that we go past cases when we have an divider that can
> >>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
> >>>> that doesn't.
> >>> Ah, good point. Luckily that's a 1-line fix, right?
> >> Apologies, I could not figure out how.
> > Ah, sorry. Not quite 1 line, but this (untested)
> >
> >
> > freq = clk_round_rate(clk, mult);
> >
> > if (freq % desired_clk == 0) {
> >   ser_clk = freq;
> >   best_div = freq / desired_clk;
> >   break;
> > }
> >
> > candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> > candidate_freq = freq / candidate_div;
> > diff = abs((long)desired_clk - candidate_freq);
> > if (diff < best_diff) {
> >    best_diff = diff;
> >    ser_clk = freq;
> >    best_div = candidate_div;
> > }
>
> But then once again, we would likely need 2 loops because while we are
> ok with giving up on search for best_div on finding something within 2%
> tolerance, we may not want to give up on exact match (freq % desired_clk
> == 0 )

Ah, it took me a while to understand why two loops. It's because in
one case you're trying multiplies and in the other you're bumping up
to the next closest clock rate. I don't think you really need to do
that. Just test the (rate - 2%) and the rate. How about this (only
lightly tested):

    ser_clk = 0;
    maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
    div = 1;
    while (div < maxdiv) {
        mult = (unsigned long long)div * desired_clk;
        if (mult != (unsigned long)mult)
            break;

        two_percent = mult / 50;

        /*
         * Loop requesting (freq - 2%) and possibly (freq).
         *
         * We'll keep track of the lowest freq inexact match we found
         * but always try to find a perfect match. NOTE: this algorithm
         * could miss a slightly better freq if there's more than one
         * freq between (freq - 2%) and (freq) but (freq) can't be made
         * exactly, but that's OK.
         *
         * This absolutely relies on the fact that the Qualcomm clock
         * driver always rounds up.
         */
        test_freq = mult - two_percent;
        while (test_freq <= mult) {
            freq = clk_round_rate(clk, test_freq);

            /*
             * A dead-on freq is an insta-win. This implicitly
             * handles when "freq == mult"
             */
            if (!(freq % desired_clk)) {
                *clk_div = freq / desired_clk;
                return freq;
            }

            /*
             * Only time clock framework doesn't round up is if
             * we're past the max clock rate. We're done searching
             * if that's the case.
             */
            if (freq < test_freq)
                return ser_clk;

            /* Save the first (lowest freq) within 2% */
            if (!ser_clk && freq <= mult + two_percent) {
                ser_clk = freq;
                *clk_div = div;
            }

            /*
             * If we already rounded up past mult then this will
             * cause the loop to exit. If not then this will run
             * the loop a second time with exactly mult.
             */
            test_freq = max(freq + 1, mult);
        }

        /*
         * test_freq will always be bigger than mult by at least 1.
         * That means we can get the next divider with a DIV_ROUND_UP.
         * This has the advantage of skipping by a whole bunch of divs
         * If the clock framework already bypassed them.
         */
        div = DIV_ROUND_UP(test_freq, desired_clk);
        }

    return ser_clk;
Doug Anderson June 10, 2022, 5:20 p.m. UTC | #11
Hi,

On Fri, Jun 10, 2022 at 2:33 AM Vijaya Krishna Nivarthi (Temp) (QUIC)
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
> Re-sending (2nd attempt) as emails are bouncing...
>
>
> > >
> > > But then once again, we would likely need 2 loops because while we are
> > > ok with giving up on search for best_div on finding something within
> > > 2% tolerance, we may not want to give up on exact match (freq %
> > > desired_clk == 0 )
> >
> > Ah, it took me a while to understand why two loops. It's because in one case
> > you're trying multiplies and in the other you're bumping up to the next
> > closest clock rate. I don't think you really need to do that. Just test the (rate -
> > 2%) and the rate. How about this (only lightly tested):
> >
> >     ser_clk = 0;
> >     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
> >     div = 1;
> >     while (div < maxdiv) {
>
>
> div <= maxdiv ?

Ah, sure.


> >         mult = (unsigned long long)div * desired_clk;
> >         if (mult != (unsigned long)mult)
> >             break;
> >
> >         two_percent = mult / 50;
> >
> >         /*
> >          * Loop requesting (freq - 2%) and possibly (freq).
> >          *
> >          * We'll keep track of the lowest freq inexact match we found
> >          * but always try to find a perfect match. NOTE: this algorithm
> >          * could miss a slightly better freq if there's more than one
> >          * freq between (freq - 2%) and (freq) but (freq) can't be made
> >          * exactly, but that's OK.
> >          *
> >          * This absolutely relies on the fact that the Qualcomm clock
> >          * driver always rounds up.
> >          */
> >         test_freq = mult - two_percent;
> >         while (test_freq <= mult) {
> >             freq = clk_round_rate(clk, test_freq);
> >
> >             /*
> >              * A dead-on freq is an insta-win. This implicitly
> >              * handles when "freq == mult"
> >              */
> >             if (!(freq % desired_clk)) {
> >                 *clk_div = freq / desired_clk;
> >                 return freq;
> >             }
> >
> >             /*
> >              * Only time clock framework doesn't round up is if
> >              * we're past the max clock rate. We're done searching
> >              * if that's the case.
> >              */
> >             if (freq < test_freq)
> >                 return ser_clk;
> >
> >             /* Save the first (lowest freq) within 2% */
> >             if (!ser_clk && freq <= mult + two_percent) {
> >                 ser_clk = freq;
> >                 *clk_div = div;
> >             }
>
> My last concern is with search happening only within 2% tolerance.
> Do we fail otherwise?
>
> This real case has best tolerance of 1.9% and seems close.
>
> [   17.963672] 20220530 desired_clk-51200000
> [   21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000
>
> Perhaps we can fallback on 1st clock rate?

I don't feel super comfortable just blindly falling back on the 1st
clock rate. It could be wildly (more than 5%) wrong, can't it?

IMO:
* If you're not comfortable with 2%, you could always pick 3% or 4%.
As I said, my random web search seemed to indicate that up to 5% was
perhaps OK.
* It's probably overkill, but you could abstract the whole search out
and try searching once for 2% and then try 4%?


-Doug
Vijaya Krishna Nivarthi (Temp) June 21, 2022, 5:58 p.m. UTC | #12
Hi,

For desired_clk = 100 and clock rates like 1st from below, DIV_ROUND_UP seems to cause missing candidate solutions.

static unsigned long clk_round_rate_test(struct clk *clk, unsigned long in_freq)
{
	//unsigned long root_freq[] = {301, 702, 1004};
	//unsigned long root_freq[] = {301, 702, 1004, 2000, 3000};
	//unsigned long root_freq[] = {50, 97, 99};
	//unsigned long root_freq[] = {50, 97, 99, 200};
	//unsigned long root_freq[] = {92, 110, 193, 230};
	//unsigned long root_freq[] = {92, 110, 193, 230, 300, 401};
	//unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 401};
	//unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 300, 401};
	//unsigned long root_freq[] = {197, 198, 199};
	unsigned long root_freq[] = {197, 198, 199, 200};
	int i;
	size_t n = sizeof root_freq / sizeof *root_freq;

	for (i = 0; i < n; i++) {
		if (root_freq[i] >= in_freq)
			return root_freq[i];
	}
	return root_freq[n-1];
}

I modified to handle such cases, optimised little and uploaded a patch.
It seems to work for all the cases like above.

Thank you,
Vijay/



> 
> Ah, it took me a while to understand why two loops. It's because in one case
> you're trying multiplies and in the other you're bumping up to the next
> closest clock rate. I don't think you really need to do that. Just test the (rate -
> 2%) and the rate. How about this (only lightly tested):
> 
>     ser_clk = 0;
>     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
>     div = 1;
>     while (div < maxdiv) {
>         mult = (unsigned long long)div * desired_clk;
>         if (mult != (unsigned long)mult)
>             break;
> 
>         two_percent = mult / 50;
> 
>         /*
>          * Loop requesting (freq - 2%) and possibly (freq).
>          *
>          * We'll keep track of the lowest freq inexact match we found
>          * but always try to find a perfect match. NOTE: this algorithm
>          * could miss a slightly better freq if there's more than one
>          * freq between (freq - 2%) and (freq) but (freq) can't be made
>          * exactly, but that's OK.
>          *
>          * This absolutely relies on the fact that the Qualcomm clock
>          * driver always rounds up.
>          */
>         test_freq = mult - two_percent;
>         while (test_freq <= mult) {
>             freq = clk_round_rate(clk, test_freq);
> 
>             /*
>              * A dead-on freq is an insta-win. This implicitly
>              * handles when "freq == mult"
>              */
>             if (!(freq % desired_clk)) {
>                 *clk_div = freq / desired_clk;
>                 return freq;
>             }
> 
>             /*
>              * Only time clock framework doesn't round up is if
>              * we're past the max clock rate. We're done searching
>              * if that's the case.
>              */
>             if (freq < test_freq)
>                 return ser_clk;
> 
>             /* Save the first (lowest freq) within 2% */
>             if (!ser_clk && freq <= mult + two_percent) {
>                 ser_clk = freq;
>                 *clk_div = div;
>             }
> 
>             /*
>              * If we already rounded up past mult then this will
>              * cause the loop to exit. If not then this will run
>              * the loop a second time with exactly mult.
>              */
>             test_freq = max(freq + 1, mult);
>         }
> 
>         /*
>          * test_freq will always be bigger than mult by at least 1.
>          * That means we can get the next divider with a DIV_ROUND_UP.
>          * This has the advantage of skipping by a whole bunch of divs
>          * If the clock framework already bypassed them.
>          */
>         div = DIV_ROUND_UP(test_freq, desired_clk);
>         }
> 
>     return ser_clk;
Doug Anderson June 23, 2022, 11:11 p.m. UTC | #13
Hi,

On Tue, Jun 21, 2022 at 10:58 AM Vijaya Krishna Nivarthi (Temp)
<vnivarth@qti.qualcomm.com> wrote:
>
> Hi,
>
> For desired_clk = 100 and clock rates like 1st from below, DIV_ROUND_UP seems to cause missing candidate solutions.
>
> static unsigned long clk_round_rate_test(struct clk *clk, unsigned long in_freq)
> {
>         //unsigned long root_freq[] = {301, 702, 1004};
>         //unsigned long root_freq[] = {301, 702, 1004, 2000, 3000};
>         //unsigned long root_freq[] = {50, 97, 99};
>         //unsigned long root_freq[] = {50, 97, 99, 200};
>         //unsigned long root_freq[] = {92, 110, 193, 230};
>         //unsigned long root_freq[] = {92, 110, 193, 230, 300, 401};
>         //unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 401};
>         //unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 300, 401};
>         //unsigned long root_freq[] = {197, 198, 199};
>         unsigned long root_freq[] = {197, 198, 199, 200};
>         int i;
>         size_t n = sizeof root_freq / sizeof *root_freq;
>
>         for (i = 0; i < n; i++) {
>                 if (root_freq[i] >= in_freq)
>                         return root_freq[i];
>         }
>         return root_freq[n-1];
> }
>
> I modified to handle such cases, optimised little and uploaded a patch.
> It seems to work for all the cases like above.

I think it would have been simpler to just change this little section:

/* Save the first (lowest freq) within 2% */
actual_mult = DIV_ROUND_CLOSEST(freq, desired_clk) * desired_clk;
if (!ser_clk && freq <= actual_mult + two_percent) {
  ser_clk = freq;
  *clk_div = div;
}

That was the only bug, right? Then you could keep the DIV_ROUND_UP() solution?

-Doug
diff mbox series

Patch

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 4733a23..08f3ad4 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -943,11 +943,11 @@  static int qcom_geni_serial_startup(struct uart_port *uport)
 static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
 			unsigned int sampling_rate, unsigned int *clk_div)
 {
-	unsigned long ser_clk;
+	unsigned long ser_clk = 0;
 	unsigned long desired_clk;
 	unsigned long freq, prev;
 	unsigned long div, maxdiv;
-	int64_t mult;
+	unsigned long long mult;
 
 	desired_clk = baud * sampling_rate;
 	if (!desired_clk) {
@@ -959,8 +959,8 @@  static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
 	prev = 0;
 
 	for (div = 1; div <= maxdiv; div++) {
-		mult = div * desired_clk;
-		if (mult > ULONG_MAX)
+		mult = (unsigned long long)div * (unsigned long long)desired_clk;
+		if (mult > (unsigned long long)ULONG_MAX)
 			break;
 
 		freq = clk_round_rate(clk, (unsigned long)mult);