diff mbox series

[v6,14/15] scripts/simplebench: improve ascii table: add difference line

Message ID 20200918181951.21752-15-vsementsov@virtuozzo.com
State New
Headers show
Series preallocate filter | expand

Commit Message

Vladimir Sementsov-Ogievskiy Sept. 18, 2020, 6:19 p.m. UTC
Performance improvements / degradations are usually discussed in
percentage. Let's make the script calculate it for us.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 4 deletions(-)

Comments

Max Reitz Sept. 25, 2020, 10:24 a.m. UTC | #1
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Performance improvements / degradations are usually discussed in

> percentage. Let's make the script calculate it for us.

> 

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---

>  scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---

>  1 file changed, 42 insertions(+), 4 deletions(-)

> 

> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py

> index 56d3a91ea2..0ff05a38b8 100644

> --- a/scripts/simplebench/simplebench.py

> +++ b/scripts/simplebench/simplebench.py


[...]

> +            for j in range(0, i):

> +                env_j = results['envs'][j]

> +                res_j = case_results[env_j['id']]

> +

> +                if 'average' not in res_j:

> +                    # Failed result

> +                    cell += ' --'

> +                    continue

> +

> +                col_j = chr(ord('A') + j)

> +                avg_j = res_j['average']

> +                delta = (res['average'] - avg_j) / avg_j * 100


I was wondering why you’d subtract, when percentage differences usually
mean a quotient.  Then I realized that this would usually be written as:

(res['average'] / avg_j - 1) * 100

> +                delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100


Why not use the new format_percent for both cases?

> +                cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'


I don’t know what I should think about ±delta_delta.  If I saw “Compared
to run A, this is +42.1%±2.0%”, I would think that you calculated the
difference between each run result, and then based on that array
calculated average and standard deviation.

Furthermore, I don’t even know what the delta_delta is supposed to tell
you.  It isn’t even a delta_delta, it’s an average_delta.

The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.
And that might be presented perhaps like “+42.1% Δ± +2.0%” (if delta
were the SD, “Δx̅=+42.1% Δσ=+2.0%” would also work; although, again, I do
interpret ± as the SD anyway).

Max
Vladimir Sementsov-Ogievskiy Sept. 25, 2020, 5:13 p.m. UTC | #2
25.09.2020 13:24, Max Reitz wrote:
> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

>> Performance improvements / degradations are usually discussed in

>> percentage. Let's make the script calculate it for us.

>>

>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

>> ---

>>   scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---

>>   1 file changed, 42 insertions(+), 4 deletions(-)

>>

>> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py

>> index 56d3a91ea2..0ff05a38b8 100644

>> --- a/scripts/simplebench/simplebench.py

>> +++ b/scripts/simplebench/simplebench.py

> 

> [...]

> 

>> +            for j in range(0, i):

>> +                env_j = results['envs'][j]

>> +                res_j = case_results[env_j['id']]

>> +

>> +                if 'average' not in res_j:

>> +                    # Failed result

>> +                    cell += ' --'

>> +                    continue

>> +

>> +                col_j = chr(ord('A') + j)

>> +                avg_j = res_j['average']

>> +                delta = (res['average'] - avg_j) / avg_j * 100

> 

> I was wondering why you’d subtract, when percentage differences usually

> mean a quotient.  Then I realized that this would usually be written as:

> 

> (res['average'] / avg_j - 1) * 100

> 

>> +                delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100

> 

> Why not use the new format_percent for both cases?


because I want less precision here

> 

>> +                cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'

> 

> I don’t know what I should think about ±delta_delta.  If I saw “Compared

> to run A, this is +42.1%±2.0%”, I would think that you calculated the

> difference between each run result, and then based on that array

> calculated average and standard deviation.

> 

> Furthermore, I don’t even know what the delta_delta is supposed to tell

> you.  It isn’t even a delta_delta, it’s an average_delta.


not avarage, but sum of errors. And it shows the error for the delta

> 

> The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.


and this shows nothing.

Assume we have = A = 10+-2 and B = 15+-2

The difference is (15-10)+-(2+2) = 5+-4.
And your formula will give (2/2 - 1) *100 = 0, which is wrong.

Anyway, my code is mess)

> And that might be presented perhaps like “+42.1% Δ± +2.0%” (if delta

> were the SD, “Δx̅=+42.1% Δσ=+2.0%” would also work; although, again, I do

> interpret ± as the SD anyway).

> 


I feel that I'm bad in statistics :( I'll learn a little and make a new version.

-- 
Best regards,
Vladimir
Max Reitz Oct. 1, 2020, 8:22 a.m. UTC | #3
On 25.09.20 19:13, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 13:24, Max Reitz wrote:

>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

>>> Performance improvements / degradations are usually discussed in

>>> percentage. Let's make the script calculate it for us.

>>>

>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

>>> ---

>>>   scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---

>>>   1 file changed, 42 insertions(+), 4 deletions(-)

>>>

>>> diff --git a/scripts/simplebench/simplebench.py

>>> b/scripts/simplebench/simplebench.py

>>> index 56d3a91ea2..0ff05a38b8 100644

>>> --- a/scripts/simplebench/simplebench.py

>>> +++ b/scripts/simplebench/simplebench.py

>>

>> [...]

>>

>>> +            for j in range(0, i):

>>> +                env_j = results['envs'][j]

>>> +                res_j = case_results[env_j['id']]

>>> +

>>> +                if 'average' not in res_j:

>>> +                    # Failed result

>>> +                    cell += ' --'

>>> +                    continue

>>> +

>>> +                col_j = chr(ord('A') + j)

>>> +                avg_j = res_j['average']

>>> +                delta = (res['average'] - avg_j) / avg_j * 100

>>

>> I was wondering why you’d subtract, when percentage differences usually

>> mean a quotient.  Then I realized that this would usually be written as:

>>

>> (res['average'] / avg_j - 1) * 100

>>

>>> +                delta_delta = (res['delta'] + res_j['delta']) /

>>> avg_j * 100

>>

>> Why not use the new format_percent for both cases?

> 

> because I want less precision here

> 

>>

>>> +                cell += f'

>>> {col_j}{round(delta):+}±{round(delta_delta)}%'

>>

>> I don’t know what I should think about ±delta_delta.  If I saw “Compared

>> to run A, this is +42.1%±2.0%”, I would think that you calculated the

>> difference between each run result, and then based on that array

>> calculated average and standard deviation.

>>

>> Furthermore, I don’t even know what the delta_delta is supposed to tell

>> you.  It isn’t even a delta_delta, it’s an average_delta.

> 

> not avarage, but sum of errors. And it shows the error for the delta

> 

>>

>> The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.

> 

> and this shows nothing.

> 

> Assume we have = A = 10+-2 and B = 15+-2

> 

> The difference is (15-10)+-(2+2) = 5+-4.

> And your formula will give (2/2 - 1) *100 = 0, which is wrong.


Well, it’s the difference in delta (whatever “delta” means here).  I
wouldn’t call it wrong.

We want to compare two test runs, so if both have the same delta, then
the difference in delta is 0.  That’s how understood it, hence my “Δ±”
notation below.  (This may be useful information, because perhaps one
may consider a big delta bad, and so if one run has less delta than
another one, that may be considered a better outcoming.  Comparing
deltas has a purpose.)

I see I understood your intentions wrong, though; you want to just give
an error estimate for the difference of the means of both runs.  I have
to admit I don’t know how that works exactly, and it will probably
heavily depend on what “delta” is.

(Googling suggests that for the standard deviation, one would square
each SD to get the variance back, then divide by the respective sample
size, add, and take the square root.  But that’s for when you have two
distributions that you want to combine, but we want to compare here...
http://homework.uoregon.edu/pub/class/es202/ztest.html seems to suggest
the same for such a comparison, though.  I don’t know.)

(As for your current version, after more thinking it does seem right
when delta is the maximum deviation.  Or perhaps the deltas shouldn’t be
added then but the maximum should be used?  I’m just not sure.)

((Perhaps it doesn’t even matter.  “Don’t believe any statistics you
haven’t forged yourself”, and so on.))

Max
diff mbox series

Patch

diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
index 56d3a91ea2..0ff05a38b8 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -153,14 +153,22 @@  def bench(test_func, test_envs, test_cases, *args, **vargs):
 
 def ascii(results):
     """Return ASCII representation of bench() returned dict."""
-    from tabulate import tabulate
+    import tabulate
+
+    # We want leading whitespace for difference row cells (see below)
+    tabulate.PRESERVE_WHITESPACE = True
 
     dim = None
-    tab = [[""] + [c['id'] for c in results['envs']]]
+    tab = [
+        # Environment columns are named A, B, ...
+        [""] + [chr(ord('A') + i) for i in range(len(results['envs']))],
+        [""] + [c['id'] for c in results['envs']]
+    ]
     for case in results['cases']:
         row = [case['id']]
+        case_results = results['tab'][case['id']]
         for env in results['envs']:
-            res = results['tab'][case['id']][env['id']]
+            res = case_results[env['id']]
             if dim is None:
                 dim = res['dimension']
             else:
@@ -168,4 +176,34 @@  def ascii(results):
             row.append(ascii_one(res))
         tab.append(row)
 
-    return f'All results are in {dim}\n\n' + tabulate(tab)
+        # Add row of difference between column. For each column starting from
+        # B we calculate difference with all previous columns.
+        row = ['', '']  # case name and first column
+        for i in range(1, len(results['envs'])):
+            cell = ''
+            env = results['envs'][i]
+            res = case_results[env['id']]
+
+            if 'average' not in res:
+                # Failed result
+                row.append(cell)
+                continue
+
+            for j in range(0, i):
+                env_j = results['envs'][j]
+                res_j = case_results[env_j['id']]
+
+                if 'average' not in res_j:
+                    # Failed result
+                    cell += ' --'
+                    continue
+
+                col_j = chr(ord('A') + j)
+                avg_j = res_j['average']
+                delta = (res['average'] - avg_j) / avg_j * 100
+                delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100
+                cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'
+            row.append(cell)
+        tab.append(row)
+
+    return f'All results are in {dim}\n\n' + tabulate.tabulate(tab)