diff mbox series

[v1] selftests/mm: Log run_vmtests.sh results in TAP format

Message ID 20231214162434.3580009-1-ryan.roberts@arm.com
State Accepted
Commit a3c5cc5129ef55ac6c69f468e5ee6e4b0cd8179c
Headers show
Series [v1] selftests/mm: Log run_vmtests.sh results in TAP format | expand

Commit Message

Ryan Roberts Dec. 14, 2023, 4:24 p.m. UTC
When running tests on a CI system (e.g. LAVA) it is useful to output
test results in TAP format so that the CI can parse the fine-grained
results to show regressions. Many of the mm selftest binaries already
output using the TAP format. And the kselftests runner
(run_kselftest.sh) also uses the format. CI systems such as LAVA can
already handle nested TAP reports. However, with the mm selftests we
have 3 levels of nesting (run_kselftest.sh -> run_vmtests.sh ->
individual test binaries) and the middle level did not previously
support TAP, which breaks the parser.

Let's fix that by teaching run_vmtests.sh to output using the TAP
format. Ideally this would be opt-in via a command line argument to
avoid the possibility of breaking anyone's existing scripts that might
scrape the output. However, it is not possible to pass arguments to
tests invoked via run_kselftest.sh. So I've implemented an opt-out
option (-n), which will revert to the existing output format.

Future changes to this file should be aware of 2 new conventions:

 - output that is part of the TAP reporting is piped through tap_output
 - general output is piped through tap_prefix

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 51 +++++++++++++++++------
 1 file changed, 39 insertions(+), 12 deletions(-)

--
2.25.1

Comments

John Hubbard Dec. 16, 2023, 2:25 a.m. UTC | #1
On 12/15/23 06:28, Ryan Roberts wrote:
...
> I've kept all the existing "pretty" output and results summary as is, it just
> gets a hash in front of it when TAP is enabled.
> 
> so this:
> 
> -----------------------
> running ./hugepage-mmap
> -----------------------
> Returned address is 0xffff89e00000
> First hex is 0
> First hex is 3020100
> [PASS]
> SUMMARY: PASS=1 SKIP=0 FAIL=0
> 
> becomes this:
> 
> TAP version 13
> # -----------------------
> # running ./hugepage-mmap
> # -----------------------
> # Returned address is 0xffff89e00000
> # First hex is 0
> # First hex is 3020100
> # [PASS]
> ok 1 hugepage-mmap
> # SUMMARY: PASS=1 SKIP=0 FAIL=0
> 1..1
> 
> If you think the latter is ofensive, then I can do the wrapping as you suggest.

I applied this and ran the tests, all while carefully reminding myself
to "think like a human". :) And from that perspective, to me, the output
is effectively the same: the leading '#' characters do not really change
anything, from a readability point of view.

So IMHO you're on perfectly solid ground, if you just switch over
directly to this format.

Tested-by: John Hubbard <jhubbard@nvidia.com>


thanks,
John Hubbard Dec. 16, 2023, 2:40 a.m. UTC | #2
On 12/15/23 18:25, John Hubbard wrote:
> On 12/15/23 06:28, Ryan Roberts wrote:
> ...
>> I've kept all the existing "pretty" output and results summary as is, it just
>> gets a hash in front of it when TAP is enabled.
>>
>> so this:
>>
>> -----------------------
>> running ./hugepage-mmap
>> -----------------------
>> Returned address is 0xffff89e00000
>> First hex is 0
>> First hex is 3020100
>> [PASS]
>> SUMMARY: PASS=1 SKIP=0 FAIL=0
>>
>> becomes this:
>>
>> TAP version 13
>> # -----------------------
>> # running ./hugepage-mmap
>> # -----------------------
>> # Returned address is 0xffff89e00000
>> # First hex is 0
>> # First hex is 3020100
>> # [PASS]
>> ok 1 hugepage-mmap
>> # SUMMARY: PASS=1 SKIP=0 FAIL=0
>> 1..1
>>
>> If you think the latter is ofensive, then I can do the wrapping as you suggest.
> 
> I applied this and ran the tests, all while carefully reminding myself
> to "think like a human". :) And from that perspective, to me, the output
> is effectively the same: the leading '#' characters do not really change
> anything, from a readability point of view.
> 
> So IMHO you're on perfectly solid ground, if you just switch over
> directly to this format.
> 
> Tested-by: John Hubbard <jhubbard@nvidia.com>
> 

I should also point out that some of the subtests already attempt a TAP
output. So now we end up with TAP-within-TAP output for those programs.

For example:
     # -----------------------
     # running ./madv_populate
     # -----------------------
     # TAP version 13
     # 1..21
     # # [RUN] test_prot_read
     # ok 1 MADV_POPULATE_READ with PROT_READ
     # ok 2 MADV_POPULATE_WRITE with PROT_READ
     # # [RUN] test_prot_write
     # ok 3 MADV_POPULATE_READ with PROT_WRITE
     ...etc...

Note the double level of leading '#' characters.

Again, this is still readable enough for humans. But it should probably
be removed in subsequent patches to the subtests.


thanks,
Ryan Roberts Dec. 18, 2023, 11:32 a.m. UTC | #3
On 16/12/2023 02:40, John Hubbard wrote:
> On 12/15/23 18:25, John Hubbard wrote:
>> On 12/15/23 06:28, Ryan Roberts wrote:
>> ...
>>> I've kept all the existing "pretty" output and results summary as is, it just
>>> gets a hash in front of it when TAP is enabled.
>>>
>>> so this:
>>>
>>> -----------------------
>>> running ./hugepage-mmap
>>> -----------------------
>>> Returned address is 0xffff89e00000
>>> First hex is 0
>>> First hex is 3020100
>>> [PASS]
>>> SUMMARY: PASS=1 SKIP=0 FAIL=0
>>>
>>> becomes this:
>>>
>>> TAP version 13
>>> # -----------------------
>>> # running ./hugepage-mmap
>>> # -----------------------
>>> # Returned address is 0xffff89e00000
>>> # First hex is 0
>>> # First hex is 3020100
>>> # [PASS]
>>> ok 1 hugepage-mmap
>>> # SUMMARY: PASS=1 SKIP=0 FAIL=0
>>> 1..1
>>>
>>> If you think the latter is ofensive, then I can do the wrapping as you suggest.
>>
>> I applied this and ran the tests, all while carefully reminding myself
>> to "think like a human". :) And from that perspective, to me, the output
>> is effectively the same: the leading '#' characters do not really change
>> anything, from a readability point of view.
>>
>> So IMHO you're on perfectly solid ground, if you just switch over
>> directly to this format.

Great thanks for taking a look!

>>
>> Tested-by: John Hubbard <jhubbard@nvidia.com>
>>
> 
> I should also point out that some of the subtests already attempt a TAP
> output. So now we end up with TAP-within-TAP output for those programs.

It's actually TAP-in-TAP-in-TAP if you're running from run_kselftest.sh :)

> 
> For example:
>     # -----------------------
>     # running ./madv_populate
>     # -----------------------
>     # TAP version 13
>     # 1..21
>     # # [RUN] test_prot_read
>     # ok 1 MADV_POPULATE_READ with PROT_READ
>     # ok 2 MADV_POPULATE_WRITE with PROT_READ
>     # # [RUN] test_prot_write
>     # ok 3 MADV_POPULATE_READ with PROT_WRITE
>     ...etc...
> 
> Note the double level of leading '#' characters.
> 
> Again, this is still readable enough for humans. But it should probably
> be removed in subsequent patches to the subtests.

I personally don't agree with this. It would be difficult to flatten to a single
TAP instance because the top level doesn't have a clue how many test cases the
child is running. Trying to do this will make things more fragile and less
modular. LAVA can certainly deal with nested test cases and correctly parses
everything to test case names that contain the test name at each level of
nesting. The thing I was trying to solve with this patch was that previously the
top level (run_kselftest.sh) and the bottom level (individual mm test binaries)
were using TAP, but the middle level (run_vmtests.sh) wasn't, and this was
confusing the LAVA parser.

> 
> 
> thanks,
John Hubbard Dec. 19, 2023, 12:51 a.m. UTC | #4
On 12/18/23 03:32, Ryan Roberts wrote:
...
>> I should also point out that some of the subtests already attempt a TAP
>> output. So now we end up with TAP-within-TAP output for those programs.
> 
> It's actually TAP-in-TAP-in-TAP if you're running from run_kselftest.sh :)
> 
>>
>> For example:
>>      # -----------------------
>>      # running ./madv_populate
>>      # -----------------------
>>      # TAP version 13
>>      # 1..21
>>      # # [RUN] test_prot_read
>>      # ok 1 MADV_POPULATE_READ with PROT_READ
>>      # ok 2 MADV_POPULATE_WRITE with PROT_READ
>>      # # [RUN] test_prot_write
>>      # ok 3 MADV_POPULATE_READ with PROT_WRITE
>>      ...etc...
>>
>> Note the double level of leading '#' characters.
>>
>> Again, this is still readable enough for humans. But it should probably
>> be removed in subsequent patches to the subtests.
> 
> I personally don't agree with this. It would be difficult to flatten to a single
> TAP instance because the top level doesn't have a clue how many test cases the

That's not quite what I had in mind...

> child is running. Trying to do this will make things more fragile and less
> modular. LAVA can certainly deal with nested test cases and correctly parses
> everything to test case names that contain the test name at each level of
> nesting. The thing I was trying to solve with this patch was that previously the
> top level (run_kselftest.sh) and the bottom level (individual mm test binaries)
> were using TAP, but the middle level (run_vmtests.sh) wasn't, and this was
> confusing the LAVA parser.
> 

I was thinking more along these lines:

a) For the individual programs (binaries), there is actually neither need nor
desire to create TAP output at that level, because frameworks like LAVA only
care about running a lot of tests and parsing the output.

b) Therefore, just stop specifying TAP output at the leaf level, and let
run_vmtests.sh and run_kselftest.sh do it.

Looking at madv_populate.c, I see that it scatters calls to ksft_*() around.
And I was thinking that this is all just redundant, isn't it?


thanks,
John Hubbard Dec. 19, 2023, 12:55 a.m. UTC | #5
On 12/18/23 16:51, John Hubbard wrote:
> On 12/18/23 03:32, Ryan Roberts wrote:
> ...
>>> I should also point out that some of the subtests already attempt a TAP
>>> output. So now we end up with TAP-within-TAP output for those programs.
>>
>> It's actually TAP-in-TAP-in-TAP if you're running from run_kselftest.sh :)
>>
>>>
>>> For example:
>>>      # -----------------------
>>>      # running ./madv_populate
>>>      # -----------------------
>>>      # TAP version 13
>>>      # 1..21
>>>      # # [RUN] test_prot_read
>>>      # ok 1 MADV_POPULATE_READ with PROT_READ
>>>      # ok 2 MADV_POPULATE_WRITE with PROT_READ
>>>      # # [RUN] test_prot_write
>>>      # ok 3 MADV_POPULATE_READ with PROT_WRITE
>>>      ...etc...
>>>
>>> Note the double level of leading '#' characters.
>>>
>>> Again, this is still readable enough for humans. But it should probably
>>> be removed in subsequent patches to the subtests.
>>
>> I personally don't agree with this. It would be difficult to flatten to a single
>> TAP instance because the top level doesn't have a clue how many test cases the
> 
> That's not quite what I had in mind...
> 
>> child is running. Trying to do this will make things more fragile and less
>> modular. LAVA can certainly deal with nested test cases and correctly parses
>> everything to test case names that contain the test name at each level of
>> nesting. The thing I was trying to solve with this patch was that previously the
>> top level (run_kselftest.sh) and the bottom level (individual mm test binaries)
>> were using TAP, but the middle level (run_vmtests.sh) wasn't, and this was
>> confusing the LAVA parser.
>>
> 
> I was thinking more along these lines:
> 
> a) For the individual programs (binaries), there is actually neither need nor
> desire to create TAP output at that level, because frameworks like LAVA only
> care about running a lot of tests and parsing the output.
> 
> b) Therefore, just stop specifying TAP output at the leaf level, and let
> run_vmtests.sh and run_kselftest.sh do it.
> 
> Looking at madv_populate.c, I see that it scatters calls to ksft_*() around.
> And I was thinking that this is all just redundant, isn't it?
> 

Although I suppose that the counter argument is that the subtests in
madv_populate.c really *do* want to be specifically printed in TAP
format.

arggh, I guess this is just not worth fooling around with after all.
  

thanks,
Ryan Roberts Dec. 19, 2023, 8:33 a.m. UTC | #6
On 19/12/2023 00:55, John Hubbard wrote:
> On 12/18/23 16:51, John Hubbard wrote:
>> On 12/18/23 03:32, Ryan Roberts wrote:
>> ...
>>>> I should also point out that some of the subtests already attempt a TAP
>>>> output. So now we end up with TAP-within-TAP output for those programs.
>>>
>>> It's actually TAP-in-TAP-in-TAP if you're running from run_kselftest.sh :)
>>>
>>>>
>>>> For example:
>>>>      # -----------------------
>>>>      # running ./madv_populate
>>>>      # -----------------------
>>>>      # TAP version 13
>>>>      # 1..21
>>>>      # # [RUN] test_prot_read
>>>>      # ok 1 MADV_POPULATE_READ with PROT_READ
>>>>      # ok 2 MADV_POPULATE_WRITE with PROT_READ
>>>>      # # [RUN] test_prot_write
>>>>      # ok 3 MADV_POPULATE_READ with PROT_WRITE
>>>>      ...etc...
>>>>
>>>> Note the double level of leading '#' characters.
>>>>
>>>> Again, this is still readable enough for humans. But it should probably
>>>> be removed in subsequent patches to the subtests.
>>>
>>> I personally don't agree with this. It would be difficult to flatten to a single
>>> TAP instance because the top level doesn't have a clue how many test cases the
>>
>> That's not quite what I had in mind...
>>
>>> child is running. Trying to do this will make things more fragile and less
>>> modular. LAVA can certainly deal with nested test cases and correctly parses
>>> everything to test case names that contain the test name at each level of
>>> nesting. The thing I was trying to solve with this patch was that previously the
>>> top level (run_kselftest.sh) and the bottom level (individual mm test binaries)
>>> were using TAP, but the middle level (run_vmtests.sh) wasn't, and this was
>>> confusing the LAVA parser.
>>>
>>
>> I was thinking more along these lines:
>>
>> a) For the individual programs (binaries), there is actually neither need nor
>> desire to create TAP output at that level, because frameworks like LAVA only
>> care about running a lot of tests and parsing the output.
>>
>> b) Therefore, just stop specifying TAP output at the leaf level, and let
>> run_vmtests.sh and run_kselftest.sh do it.
>>
>> Looking at madv_populate.c, I see that it scatters calls to ksft_*() around.
>> And I was thinking that this is all just redundant, isn't it?
>>
> 
> Although I suppose that the counter argument is that the subtests in
> madv_populate.c really *do* want to be specifically printed in TAP
> format.
> 
> arggh, I guess this is just not worth fooling around with after all.

Yes; I wouldn't want to lose the fine granularity we have currently. For example
cow.c has ~900 test cases now that I've multiplied everything up for mTHP. 16 of
those are known to fail (hugetlb issue) and 1 is skipped. I wouldn't want to
reduce that down to a single cow test case that always fails; that's not helpful
to understand if I've regressed something.

But sounds like we are both on the same page now.


>  
> 
> thanks,
diff mbox series

Patch

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 87f513f5cf91..246d53a5d7f2 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -5,6 +5,7 @@ 
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4

+count_total=0
 count_pass=0
 count_fail=0
 count_skip=0
@@ -17,6 +18,7 @@  usage: ${BASH_SOURCE[0]:-$0} [ options ]
   -a: run all tests, including extra ones
   -t: specify specific categories to tests to run
   -h: display this message
+  -n: disable TAP output

 The default behavior is to run required tests only.  If -a is specified,
 will run all tests.
@@ -77,12 +79,14 @@  EOF
 }

 RUN_ALL=false
+TAP_PREFIX="# "

-while getopts "aht:" OPT; do
+while getopts "aht:n" OPT; do
 	case ${OPT} in
 		"a") RUN_ALL=true ;;
 		"h") usage ;;
 		"t") VM_SELFTEST_ITEMS=${OPTARG} ;;
+		"n") TAP_PREFIX= ;;
 	esac
 done
 shift $((OPTIND -1))
@@ -184,30 +188,52 @@  fi
 VADDR64=0
 echo "$ARCH64STR" | grep "$ARCH" &>/dev/null && VADDR64=1

+tap_prefix() {
+	sed -e "s/^/${TAP_PREFIX}/"
+}
+
+tap_output() {
+	if [[ ! -z "$TAP_PREFIX" ]]; then
+		read str
+		echo $str
+	fi
+}
+
+pretty_name() {
+	echo "$*" | sed -e 's/^\(bash \)\?\.\///'
+}
+
 # Usage: run_test [test binary] [arbitrary test arguments...]
 run_test() {
 	if test_selected ${CATEGORY}; then
+		local test=$(pretty_name "$*")
 		local title="running $*"
 		local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -)
-		printf "%s\n%s\n%s\n" "$sep" "$title" "$sep"
+		printf "%s\n%s\n%s\n" "$sep" "$title" "$sep" | tap_prefix

-		"$@"
-		local ret=$?
+		("$@" 2>&1) | tap_prefix
+		local ret=${PIPESTATUS[0]}
+		count_total=$(( count_total + 1 ))
 		if [ $ret -eq 0 ]; then
 			count_pass=$(( count_pass + 1 ))
-			echo "[PASS]"
+			echo "[PASS]" | tap_prefix
+			echo "ok ${count_total} ${test}" | tap_output
 		elif [ $ret -eq $ksft_skip ]; then
 			count_skip=$(( count_skip + 1 ))
-			echo "[SKIP]"
+			echo "[SKIP]" | tap_prefix
+			echo "ok ${count_total} ${test} # SKIP" | tap_output
 			exitcode=$ksft_skip
 		else
 			count_fail=$(( count_fail + 1 ))
-			echo "[FAIL]"
+			echo "[FAIL]" | tap_prefix
+			echo "not ok ${count_total} ${test} # exit=$ret" | tap_output
 			exitcode=1
 		fi
 	fi # test_selected
 }

+echo "TAP version 13" | tap_output
+
 CATEGORY="hugetlb" run_test ./hugepage-mmap

 shmmax=$(cat /proc/sys/kernel/shmmax)
@@ -231,9 +257,9 @@  CATEGORY="hugetlb" run_test ./hugetlb_fault_after_madv
 echo "$nr_hugepages_tmp" > /proc/sys/vm/nr_hugepages

 if test_selected "hugetlb"; then
-	echo "NOTE: These hugetlb tests provide minimal coverage.  Use"
-	echo "      https://github.com/libhugetlbfs/libhugetlbfs.git for"
-	echo "      hugetlb regression testing."
+	echo "NOTE: These hugetlb tests provide minimal coverage.  Use"	  | tap_prefix
+	echo "      https://github.com/libhugetlbfs/libhugetlbfs.git for" | tap_prefix
+	echo "      hugetlb regression testing."			  | tap_prefix
 fi

 CATEGORY="mmap" run_test ./map_fixed_noreplace
@@ -312,7 +338,7 @@  CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
 # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
 CATEGORY="madv_populate" run_test ./madv_populate

-echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
+(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
 CATEGORY="memfd_secret" run_test ./memfd_secret

 # KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
@@ -369,6 +395,7 @@  CATEGORY="mkdirty" run_test ./mkdirty

 CATEGORY="mdwe" run_test ./mdwe_test

-echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}"
+echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix
+echo "1..${count_total}" | tap_output

 exit $exitcode