diff mbox series

hmm_test issues with latest mainline

Message ID 26017fe3-5ad7-6946-57db-e5ec48063ceb@suse.cz
State New
Headers show
Series hmm_test issues with latest mainline | expand

Commit Message

Vlastimil Babka Oct. 13, 2022, 4:54 p.m. UTC
Hi,

I've been trying the hmm_tests as of today's commit:

a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)

and run into several issues that seemed worth reporting.

First, it seems the FIXTURE_TEARDOWN(hmm) in
tools/testing/selftests/vm/hmm-tests.c
using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
assertion failure. Dunno if it's a kselftests issue or it's a bug to
use asserts in teardown. I hacked it up like this locally to proceed:


Next, there are some tests that fail (and thus also trigger the issue above)

#  RUN           hmm.hmm_device_private.exclusive ...
# hmm-tests.c:1702:exclusive:Expected ret (-16) == 0 (0)
close returned (-1) fd is (3)
# exclusive: Test failed at step #1
#          FAIL  hmm.hmm_device_private.exclusive
not ok 20 hmm.hmm_device_private.exclusive
#  RUN           hmm.hmm_device_private.exclusive_mprotect ...
# hmm-tests.c:1756:exclusive_mprotect:Expected ret (-16) == 0 (0)
close returned (-1) fd is (3)
# exclusive_mprotect: Test failed at step #1
#          FAIL  hmm.hmm_device_private.exclusive_mprotect
not ok 21 hmm.hmm_device_private.exclusive_mprotect
#  RUN           hmm.hmm_device_private.exclusive_cow ...
# hmm-tests.c:1809:exclusive_cow:Expected ret (-16) == 0 (0)
close returned (-1) fd is (3)
# exclusive_cow: Test failed at step #1
#          FAIL  hmm.hmm_device_private.exclusive_cow
not ok 22 hmm.hmm_device_private.exclusive_cow

I'll try to check more closely but maybe if you can reproduce it too, you'll
have more idea what's going on.

The next thing is more of a question/documentation suggestion. Tons of tests
fail like this:

ok 24 hmm.hmm_device_private.hmm_cow_in_device
#  RUN           hmm.hmm_device_coherent.open_close ...
could not open hmm dmirror driver (/dev/hmm_dmirror2)
#      SKIP      DEVICE_COHERENT not available
#            OK  hmm.hmm_device_coherent.open_close

I assume this is because I run "test_hmm.sh smoke" without the SPM parameters.
The help message doesn't say much about what to specify there for
<spm_addr_dev0> <spm_addr_dev1>. Do these tests need a particular hardware?
(unlike the rest?) Maybe it could be clarified.

Last thing, I noticed all these DEVICE_COHERENT tests ultimately count as OK,
not SKIPPED, which would probably be more appropriate?

# FAILED: 51 / 54 tests passed.
# Totals: pass:50 fail:3 xfail:0 xpass:0 skip:1 error:0

(the skip:1 is due to test 9 "#      SKIP      Huge page could not be allocated"
which is probably a misconfiguration on my part so I don't report that as an issue)

Thanks,
Vlastimil

Comments

David Hildenbrand Oct. 13, 2022, 5:01 p.m. UTC | #1
On 13.10.22 18:54, Vlastimil Babka wrote:
> Hi,
> 
> I've been trying the hmm_tests as of today's commit:
> 
> a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
> 
> and run into several issues that seemed worth reporting.
> 
> First, it seems the FIXTURE_TEARDOWN(hmm) in
> tools/testing/selftests/vm/hmm-tests.c
> using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
> assertion failure. Dunno if it's a kselftests issue or it's a bug to
> use asserts in teardown. I hacked it up like this locally to proceed:
> 
> --- a/tools/testing/selftests/vm/hmm-tests.c
> +++ b/tools/testing/selftests/vm/hmm-tests.c
> @@ -154,6 +154,11 @@ FIXTURE_TEARDOWN(hmm)
>   {
>   	int ret = close(self->fd);
>   
> +	if (ret != 0) {
> +		fprintf(stderr, "close returned (%d) fd is (%d)\n", ret,self->fd);
> +		exit(1);
> +	}
> +
>   	ASSERT_EQ(ret, 0);
>   	self->fd = -1;
>   }
> 
> Next, there are some tests that fail (and thus also trigger the issue above)
> 
> #  RUN           hmm.hmm_device_private.exclusive ...
> # hmm-tests.c:1702:exclusive:Expected ret (-16) == 0 (0)
> close returned (-1) fd is (3)
> # exclusive: Test failed at step #1
> #          FAIL  hmm.hmm_device_private.exclusive
> not ok 20 hmm.hmm_device_private.exclusive
> #  RUN           hmm.hmm_device_private.exclusive_mprotect ...
> # hmm-tests.c:1756:exclusive_mprotect:Expected ret (-16) == 0 (0)
> close returned (-1) fd is (3)
> # exclusive_mprotect: Test failed at step #1
> #          FAIL  hmm.hmm_device_private.exclusive_mprotect
> not ok 21 hmm.hmm_device_private.exclusive_mprotect
> #  RUN           hmm.hmm_device_private.exclusive_cow ...
> # hmm-tests.c:1809:exclusive_cow:Expected ret (-16) == 0 (0)
> close returned (-1) fd is (3)
> # exclusive_cow: Test failed at step #1
> #          FAIL  hmm.hmm_device_private.exclusive_cow
> not ok 22 hmm.hmm_device_private.exclusive_cow
> 
>

When did that test start failing? Was it still ok for 6.0?
Shuah Khan Oct. 13, 2022, 5:10 p.m. UTC | #2
On 10/13/22 11:01, David Hildenbrand wrote:
> On 13.10.22 18:54, Vlastimil Babka wrote:
>> Hi,
>>
>> I've been trying the hmm_tests as of today's commit:
>>
>> a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
>>
>> and run into several issues that seemed worth reporting.
>>
>> First, it seems the FIXTURE_TEARDOWN(hmm) in
>> tools/testing/selftests/vm/hmm-tests.c
>> using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
>> assertion failure. Dunno if it's a kselftests issue or it's a bug to
>> use asserts in teardown. I hacked it up like this locally to proceed:
>>
>> --- a/tools/testing/selftests/vm/hmm-tests.c
>> +++ b/tools/testing/selftests/vm/hmm-tests.c
>> @@ -154,6 +154,11 @@ FIXTURE_TEARDOWN(hmm)
>>   {
>>       int ret = close(self->fd);
>> +    if (ret != 0) {
>> +        fprintf(stderr, "close returned (%d) fd is (%d)\n", ret,self->fd);
>> +        exit(1);
>> +    }
>> +
>>       ASSERT_EQ(ret, 0);
>>       self->fd = -1;
>>   }
>>
>> Next, there are some tests that fail (and thus also trigger the issue above)
>>
>> #  RUN           hmm.hmm_device_private.exclusive ...
>> # hmm-tests.c:1702:exclusive:Expected ret (-16) == 0 (0)
>> close returned (-1) fd is (3)
>> # exclusive: Test failed at step #1
>> #          FAIL  hmm.hmm_device_private.exclusive
>> not ok 20 hmm.hmm_device_private.exclusive
>> #  RUN           hmm.hmm_device_private.exclusive_mprotect ...
>> # hmm-tests.c:1756:exclusive_mprotect:Expected ret (-16) == 0 (0)
>> close returned (-1) fd is (3)
>> # exclusive_mprotect: Test failed at step #1
>> #          FAIL  hmm.hmm_device_private.exclusive_mprotect
>> not ok 21 hmm.hmm_device_private.exclusive_mprotect
>> #  RUN           hmm.hmm_device_private.exclusive_cow ...
>> # hmm-tests.c:1809:exclusive_cow:Expected ret (-16) == 0 (0)
>> close returned (-1) fd is (3)
>> # exclusive_cow: Test failed at step #1
>> #          FAIL  hmm.hmm_device_private.exclusive_cow
>> not ok 22 hmm.hmm_device_private.exclusive_cow
>>
>>
> 
> When did that test start failing? Was it still ok for 6.0?
> 

commit 4fe89d07dcc2804c8b562f6c7896a45643d34b2f (tag: v6.0, linux/master)

# FAILED: 25 / 50 tests passed.
# Totals: pass:25 fail:25 xfail:0 xpass:0 skip:0 error:0

Looks good to me.

Possible change in 6.1 and we have to time fix them all. :)

thanks,
-- Shuah
Vlastimil Babka Oct. 13, 2022, 5:12 p.m. UTC | #3
On 10/13/22 19:10, Shuah Khan wrote:
> On 10/13/22 11:01, David Hildenbrand wrote:
>> On 13.10.22 18:54, Vlastimil Babka wrote:
>>> Hi,
>>>
>>> I've been trying the hmm_tests as of today's commit:
>>>
>>> a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
>>>
>>> and run into several issues that seemed worth reporting.
>>>
>>> First, it seems the FIXTURE_TEARDOWN(hmm) in
>>> tools/testing/selftests/vm/hmm-tests.c
>>> using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
>>> assertion failure. Dunno if it's a kselftests issue or it's a bug to
>>> use asserts in teardown. I hacked it up like this locally to proceed:
>>>
>>> --- a/tools/testing/selftests/vm/hmm-tests.c
>>> +++ b/tools/testing/selftests/vm/hmm-tests.c
>>> @@ -154,6 +154,11 @@ FIXTURE_TEARDOWN(hmm)
>>>   {
>>>       int ret = close(self->fd);
>>> +    if (ret != 0) {
>>> +        fprintf(stderr, "close returned (%d) fd is (%d)\n", ret,self->fd);
>>> +        exit(1);
>>> +    }
>>> +
>>>       ASSERT_EQ(ret, 0);
>>>       self->fd = -1;
>>>   }
>>>
>>> Next, there are some tests that fail (and thus also trigger the issue above)
>>>
>>> #  RUN           hmm.hmm_device_private.exclusive ...
>>> # hmm-tests.c:1702:exclusive:Expected ret (-16) == 0 (0)
>>> close returned (-1) fd is (3)
>>> # exclusive: Test failed at step #1
>>> #          FAIL  hmm.hmm_device_private.exclusive
>>> not ok 20 hmm.hmm_device_private.exclusive
>>> #  RUN           hmm.hmm_device_private.exclusive_mprotect ...
>>> # hmm-tests.c:1756:exclusive_mprotect:Expected ret (-16) == 0 (0)
>>> close returned (-1) fd is (3)
>>> # exclusive_mprotect: Test failed at step #1
>>> #          FAIL  hmm.hmm_device_private.exclusive_mprotect
>>> not ok 21 hmm.hmm_device_private.exclusive_mprotect
>>> #  RUN           hmm.hmm_device_private.exclusive_cow ...
>>> # hmm-tests.c:1809:exclusive_cow:Expected ret (-16) == 0 (0)
>>> close returned (-1) fd is (3)
>>> # exclusive_cow: Test failed at step #1
>>> #          FAIL  hmm.hmm_device_private.exclusive_cow
>>> not ok 22 hmm.hmm_device_private.exclusive_cow
>>>
>>>
>>
>> When did that test start failing? Was it still ok for 6.0?

Didn't test yet, will try, in case it's my system/config specific thing.

>>
> 
> commit 4fe89d07dcc2804c8b562f6c7896a45643d34b2f (tag: v6.0, linux/master)
> 
> # FAILED: 25 / 50 tests passed.
> # Totals: pass:25 fail:25 xfail:0 xpass:0 skip:0 error:0
> 
> Looks good to me.

Hmm but there's 25 that failed? Or are those also misreported SKIPs?

> Possible change in 6.1 and we have to time fix them all. :)
> 
> thanks,
> -- Shuah
Vlastimil Babka Oct. 13, 2022, 5:45 p.m. UTC | #4
On 10/13/22 19:12, Vlastimil Babka wrote:
> On 10/13/22 19:10, Shuah Khan wrote:
>> On 10/13/22 11:01, David Hildenbrand wrote:
>>> On 13.10.22 18:54, Vlastimil Babka wrote:
>>>> Hi,
>>>>
>>>> I've been trying the hmm_tests as of today's commit:
>>>>
>>>> a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
>>>>
>>>> and run into several issues that seemed worth reporting.
>>>>
>>>> First, it seems the FIXTURE_TEARDOWN(hmm) in
>>>> tools/testing/selftests/vm/hmm-tests.c
>>>> using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
>>>> assertion failure. Dunno if it's a kselftests issue or it's a bug to
>>>> use asserts in teardown. I hacked it up like this locally to proceed:
>>>>
>>>> --- a/tools/testing/selftests/vm/hmm-tests.c
>>>> +++ b/tools/testing/selftests/vm/hmm-tests.c
>>>> @@ -154,6 +154,11 @@ FIXTURE_TEARDOWN(hmm)
>>>>   {
>>>>       int ret = close(self->fd);
>>>> +    if (ret != 0) {
>>>> +        fprintf(stderr, "close returned (%d) fd is (%d)\n", ret,self->fd);
>>>> +        exit(1);
>>>> +    }
>>>> +
>>>>       ASSERT_EQ(ret, 0);
>>>>       self->fd = -1;
>>>>   }
>>>>
>>>> Next, there are some tests that fail (and thus also trigger the issue above)
>>>>
>>>> #  RUN           hmm.hmm_device_private.exclusive ...
>>>> # hmm-tests.c:1702:exclusive:Expected ret (-16) == 0 (0)
>>>> close returned (-1) fd is (3)
>>>> # exclusive: Test failed at step #1
>>>> #          FAIL  hmm.hmm_device_private.exclusive
>>>> not ok 20 hmm.hmm_device_private.exclusive
>>>> #  RUN           hmm.hmm_device_private.exclusive_mprotect ...
>>>> # hmm-tests.c:1756:exclusive_mprotect:Expected ret (-16) == 0 (0)
>>>> close returned (-1) fd is (3)
>>>> # exclusive_mprotect: Test failed at step #1
>>>> #          FAIL  hmm.hmm_device_private.exclusive_mprotect
>>>> not ok 21 hmm.hmm_device_private.exclusive_mprotect
>>>> #  RUN           hmm.hmm_device_private.exclusive_cow ...
>>>> # hmm-tests.c:1809:exclusive_cow:Expected ret (-16) == 0 (0)
>>>> close returned (-1) fd is (3)
>>>> # exclusive_cow: Test failed at step #1
>>>> #          FAIL  hmm.hmm_device_private.exclusive_cow
>>>> not ok 22 hmm.hmm_device_private.exclusive_cow
>>>>
>>>>
>>>
>>> When did that test start failing? Was it still ok for 6.0?
> 
> Didn't test yet, will try, in case it's my system/config specific thing.

So it's actually all the same with v6.0 for me. The infinite loops, the test
failures, the misreported SKIPs.

#  RUN           hmm.hmm_device_private.exclusive ...
# hmm-tests.c:1673:exclusive:Expected ret (-16) == 0 (0)
hmm close returned (-1) fd is (3)
# exclusive: Test failed at step #1
#          FAIL  hmm.hmm_device_private.exclusive
not ok 20 hmm.hmm_device_private.exclusive
#  RUN           hmm.hmm_device_private.exclusive_mprotect ...
# hmm-tests.c:1727:exclusive_mprotect:Expected ret (-16) == 0 (0)
hmm close returned (-1) fd is (3)
# exclusive_mprotect: Test failed at step #1
#          FAIL  hmm.hmm_device_private.exclusive_mprotect
not ok 21 hmm.hmm_device_private.exclusive_mprotect
#  RUN           hmm.hmm_device_private.exclusive_cow ...
# hmm-tests.c:1780:exclusive_cow:Expected ret (-16) == 0 (0)
hmm close returned (-1) fd is (3)
# exclusive_cow: Test failed at step #1
#          FAIL  hmm.hmm_device_private.exclusive_cow
not ok 22 hmm.hmm_device_private.exclusive_cow
David Hildenbrand Oct. 13, 2022, 6 p.m. UTC | #5
>>>> When did that test start failing? Was it still ok for 6.0?
>>
>> Didn't test yet, will try, in case it's my system/config specific thing.
> 
> So it's actually all the same with v6.0 for me. The infinite loops, the test
> failures, the misreported SKIPs.
> 
> #  RUN           hmm.hmm_device_private.exclusive ...
> # hmm-tests.c:1673:exclusive:Expected ret (-16) == 0 (0)
> hmm close returned (-1) fd is (3)
> # exclusive: Test failed at step #1
> #          FAIL  hmm.hmm_device_private.exclusive
> not ok 20 hmm.hmm_device_private.exclusive
> #  RUN           hmm.hmm_device_private.exclusive_mprotect ...
> # hmm-tests.c:1727:exclusive_mprotect:Expected ret (-16) == 0 (0)
> hmm close returned (-1) fd is (3)
> # exclusive_mprotect: Test failed at step #1
> #          FAIL  hmm.hmm_device_private.exclusive_mprotect
> not ok 21 hmm.hmm_device_private.exclusive_mprotect
> #  RUN           hmm.hmm_device_private.exclusive_cow ...
> # hmm-tests.c:1780:exclusive_cow:Expected ret (-16) == 0 (0)
> hmm close returned (-1) fd is (3)
> # exclusive_cow: Test failed at step #1
> #          FAIL  hmm.hmm_device_private.exclusive_cow
> not ok 22 hmm.hmm_device_private.exclusive_cow
> 

Is the kernel compiled with support. I have the feeling that we might 
simply miss kernel support and it's not handled gracefully ...
Shuah Khan Oct. 13, 2022, 7:38 p.m. UTC | #6
On 10/13/22 12:00, David Hildenbrand wrote:
>>>>> When did that test start failing? Was it still ok for 6.0?
>>>
>>> Didn't test yet, will try, in case it's my system/config specific thing.
>>
>> So it's actually all the same with v6.0 for me. The infinite loops, the test
>> failures, the misreported SKIPs.
>>

I am not seeing infinite loops and seeing 25 failures which could
be skips.

> 
> Is the kernel compiled with support. I have the feeling that we might simply miss kernel support and it's not handled gracefully ...
> 

Here is my config
CONFIG_HMM_MIRROR=y
# CONFIG_TEST_HMM is not set

Okay here is what is going on - hmm_tests are supposed to be run
from test_hmm.sh script. When I run this I see a message that tells
me what to do.

sudo ./test_hmm.sh
./test_hmm.sh: You must have the following enabled in your kernel:
CONFIG_TEST_HMM=m

Running ./hmm_tests gives me all the failures. So it appears running
hmm_tests executable won't work. This is expected as test_hmm.sh does
the right setup before running the test. We have several tests that do
that.

Vlastimil, can you try this and let me know what you see. I will compile
with CONFIG_TEST_HMM=m and let you know what I see on my system.

thanks,
-- Shuah
Vlastimil Babka Oct. 13, 2022, 7:43 p.m. UTC | #7
On 10/13/2022 9:38 PM, Shuah Khan wrote:
> On 10/13/22 12:00, David Hildenbrand wrote:
>>>>>> When did that test start failing? Was it still ok for 6.0?
>>>>
>>>> Didn't test yet, will try, in case it's my system/config specific thing.
>>>
>>> So it's actually all the same with v6.0 for me. The infinite loops, the test
>>> failures, the misreported SKIPs.
>>>
> 
> I am not seeing infinite loops and seeing 25 failures which could
> be skips.
> 
>>
>> Is the kernel compiled with support. I have the feeling that we might simply miss kernel support and it's not handled gracefully ...
>>
> 
> Here is my config
> CONFIG_HMM_MIRROR=y
> # CONFIG_TEST_HMM is not set
> 
> Okay here is what is going on - hmm_tests are supposed to be run
> from test_hmm.sh script. When I run this I see a message that tells
> me what to do.
> 
> sudo ./test_hmm.sh
> ./test_hmm.sh: You must have the following enabled in your kernel:
> CONFIG_TEST_HMM=m
> 
> Running ./hmm_tests gives me all the failures. So it appears running
> hmm_tests executable won't work. This is expected as test_hmm.sh does
> the right setup before running the test. We have several tests that do
> that.
> 
> Vlastimil, can you try this and let me know what you see. I will compile
> with CONFIG_TEST_HMM=m and let you know what I see on my system.

Right, I didn't mention it, sorry. I did have CONFIG_TEST_HMM=m and was running
"test_hmm.sh smoke"

> thanks,
> -- Shuah
> 
> 
>
Alistair Popple Oct. 14, 2022, 1:45 a.m. UTC | #8
Vlastimil Babka <vbabka@suse.cz> writes:

> On 10/13/2022 9:38 PM, Shuah Khan wrote:
>> On 10/13/22 12:00, David Hildenbrand wrote:
>>>>>>> When did that test start failing? Was it still ok for 6.0?
>>>>>
>>>>> Didn't test yet, will try, in case it's my system/config specific thing.
>>>>
>>>> So it's actually all the same with v6.0 for me. The infinite loops, the test
>>>> failures, the misreported SKIPs.
>>>>
>>
>> I am not seeing infinite loops and seeing 25 failures which could
>> be skips.
>>
>>>
>>> Is the kernel compiled with support. I have the feeling that we might simply miss kernel support and it's not handled gracefully ...
>>>
>>
>> Here is my config
>> CONFIG_HMM_MIRROR=y
>> # CONFIG_TEST_HMM is not set
>>
>> Okay here is what is going on - hmm_tests are supposed to be run
>> from test_hmm.sh script. When I run this I see a message that tells
>> me what to do.
>>
>> sudo ./test_hmm.sh
>> ./test_hmm.sh: You must have the following enabled in your kernel:
>> CONFIG_TEST_HMM=m
>>
>> Running ./hmm_tests gives me all the failures. So it appears running
>> hmm_tests executable won't work. This is expected as test_hmm.sh does
>> the right setup before running the test. We have several tests that do
>> that.
>>
>> Vlastimil, can you try this and let me know what you see. I will compile
>> with CONFIG_TEST_HMM=m and let you know what I see on my system.
>
> Right, I didn't mention it, sorry. I did have CONFIG_TEST_HMM=m and was running
> "test_hmm.sh smoke"

FWIW I tend not to use that script on my development machine, mainly
because I either have the module built in or otherwise don't have
modules installed in a place modprobe knows about.

Anyway I am not seeing test failures running hmm-tests directly. However
I do observe both the issue of SKIP in FIXTURE_SETUP() being reported as
a pass in the summary, and the infinite loop on ASSERT failure in
FIXTURE_TEARDOWN.

There does seem to be some framework issues here which are causing this
behaviour. Consider the following representitive snippet:

#include "../kselftest_harness.h"

#include <stdio.h>

FIXTURE(test) {};

FIXTURE_SETUP(test)
{
	SKIP(return, "skip");
}

FIXTURE_TEARDOWN(test)
{
	ASSERT_TRUE(0);
}

TEST_F(test, test)
{
	printf("Running test\n");
}

TEST_HARNESS_MAIN

In this case the test will still be run even though SKIP() was called in
FIXTURE_SETUP. The ASSERT_TRUE() during FIXTURE_TEARDOWN results in the
infinite loop. So it looks to me like calling SKIP from FIXTURE_SETUP
isn't supported, and calling ASSERT_*() in FIXTURE_TEARDOWN is also not
allowed/supported by the kselftest framework.

Unlike hmm-tests though the above snippet reports correct pass/skip
statistics with the teardown assertion removed. This is because there is
also a bug in hmm-tests. Currently we have:

   SKIP(exit(0), "DEVICE_COHERENT not available");

Which should really be:

   SKIP(return, "DEVICE_COHERENT not available");

Of course that results in an infinite loop due to the associated
assertion failure during teardown which is still called despite the SKIP
in setup. Not sure if this is why it was originally coded this way.

 - Alistair

>> thanks,
>> -- Shuah
>>
>>
>>
Alistair Popple Oct. 14, 2022, 3:21 a.m. UTC | #9
Seems like this would fix both the SKIP in FIXTURE_SETUP and ASSERT in
FIXTURE_TEARDOWN issues:

---

diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 25f4d54067c0..1998fe888f8f 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -387,12 +387,12 @@
 		if (setjmp(_metadata->env) == 0) { \
 			fixture_name##_setup(_metadata, &self, variant->data); \
 			/* Let setup failure terminate early. */ \
-			if (!_metadata->passed) \
+			if (!_metadata->passed || _metadata->skip) \
 				return; \
 			_metadata->setup_completed = true; \
 			fixture_name##_##test_name(_metadata, &self, variant->data); \
 		} \
-		if (_metadata->setup_completed) \
+		if (_metadata->setup_completed && setjmp(_metadata->env) == 0) \
 			fixture_name##_teardown(_metadata, &self, variant->data); \
 		__test_check_assert(_metadata); \
 	} \

Alistair Popple <apopple@nvidia.com> writes:

> Vlastimil Babka <vbabka@suse.cz> writes:
>
>> On 10/13/2022 9:38 PM, Shuah Khan wrote:
>>> On 10/13/22 12:00, David Hildenbrand wrote:
>>>>>>>> When did that test start failing? Was it still ok for 6.0?
>>>>>>
>>>>>> Didn't test yet, will try, in case it's my system/config specific thing.
>>>>>
>>>>> So it's actually all the same with v6.0 for me. The infinite loops, the test
>>>>> failures, the misreported SKIPs.
>>>>>
>>>
>>> I am not seeing infinite loops and seeing 25 failures which could
>>> be skips.
>>>
>>>>
>>>> Is the kernel compiled with support. I have the feeling that we might simply miss kernel support and it's not handled gracefully ...
>>>>
>>>
>>> Here is my config
>>> CONFIG_HMM_MIRROR=y
>>> # CONFIG_TEST_HMM is not set
>>>
>>> Okay here is what is going on - hmm_tests are supposed to be run
>>> from test_hmm.sh script. When I run this I see a message that tells
>>> me what to do.
>>>
>>> sudo ./test_hmm.sh
>>> ./test_hmm.sh: You must have the following enabled in your kernel:
>>> CONFIG_TEST_HMM=m
>>>
>>> Running ./hmm_tests gives me all the failures. So it appears running
>>> hmm_tests executable won't work. This is expected as test_hmm.sh does
>>> the right setup before running the test. We have several tests that do
>>> that.
>>>
>>> Vlastimil, can you try this and let me know what you see. I will compile
>>> with CONFIG_TEST_HMM=m and let you know what I see on my system.
>>
>> Right, I didn't mention it, sorry. I did have CONFIG_TEST_HMM=m and was running
>> "test_hmm.sh smoke"
>
> FWIW I tend not to use that script on my development machine, mainly
> because I either have the module built in or otherwise don't have
> modules installed in a place modprobe knows about.
>
> Anyway I am not seeing test failures running hmm-tests directly. However
> I do observe both the issue of SKIP in FIXTURE_SETUP() being reported as
> a pass in the summary, and the infinite loop on ASSERT failure in
> FIXTURE_TEARDOWN.
>
> There does seem to be some framework issues here which are causing this
> behaviour. Consider the following representitive snippet:
>
> #include "../kselftest_harness.h"
>
> #include <stdio.h>
>
> FIXTURE(test) {};
>
> FIXTURE_SETUP(test)
> {
> 	SKIP(return, "skip");
> }
>
> FIXTURE_TEARDOWN(test)
> {
> 	ASSERT_TRUE(0);
> }
>
> TEST_F(test, test)
> {
> 	printf("Running test\n");
> }
>
> TEST_HARNESS_MAIN
>
> In this case the test will still be run even though SKIP() was called in
> FIXTURE_SETUP. The ASSERT_TRUE() during FIXTURE_TEARDOWN results in the
> infinite loop. So it looks to me like calling SKIP from FIXTURE_SETUP
> isn't supported, and calling ASSERT_*() in FIXTURE_TEARDOWN is also not
> allowed/supported by the kselftest framework.
>
> Unlike hmm-tests though the above snippet reports correct pass/skip
> statistics with the teardown assertion removed. This is because there is
> also a bug in hmm-tests. Currently we have:
>
>    SKIP(exit(0), "DEVICE_COHERENT not available");
>
> Which should really be:
>
>    SKIP(return, "DEVICE_COHERENT not available");
>
> Of course that results in an infinite loop due to the associated
> assertion failure during teardown which is still called despite the SKIP
> in setup. Not sure if this is why it was originally coded this way.
>
>  - Alistair
>
>>> thanks,
>>> -- Shuah
>>>
>>>
>>>
Vlastimil Babka Oct. 14, 2022, 6:45 a.m. UTC | #10
On 10/13/22 20:00, David Hildenbrand wrote:
>>>>> When did that test start failing? Was it still ok for 6.0?
>>>
>>> Didn't test yet, will try, in case it's my system/config specific thing.
>>
>> So it's actually all the same with v6.0 for me. The infinite loops, the test
>> failures, the misreported SKIPs.
>>
>> #  RUN           hmm.hmm_device_private.exclusive ...
>> # hmm-tests.c:1673:exclusive:Expected ret (-16) == 0 (0)
>> hmm close returned (-1) fd is (3)
>> # exclusive: Test failed at step #1
>> #          FAIL  hmm.hmm_device_private.exclusive
>> not ok 20 hmm.hmm_device_private.exclusive
>> #  RUN           hmm.hmm_device_private.exclusive_mprotect ...
>> # hmm-tests.c:1727:exclusive_mprotect:Expected ret (-16) == 0 (0)
>> hmm close returned (-1) fd is (3)
>> # exclusive_mprotect: Test failed at step #1
>> #          FAIL  hmm.hmm_device_private.exclusive_mprotect
>> not ok 21 hmm.hmm_device_private.exclusive_mprotect
>> #  RUN           hmm.hmm_device_private.exclusive_cow ...
>> # hmm-tests.c:1780:exclusive_cow:Expected ret (-16) == 0 (0)
>> hmm close returned (-1) fd is (3)
>> # exclusive_cow: Test failed at step #1
>> #          FAIL  hmm.hmm_device_private.exclusive_cow
>> not ok 22 hmm.hmm_device_private.exclusive_cow
>>
> 
> Is the kernel compiled with support. I have the feeling that we might simply
> miss kernel support and it's not handled gracefully ...

If you mean CONFIG_DEVICE_PRIVATE=y then it's there. Couldn't find anything
relevant that wouldn't be enabled.
Vlastimil Babka Oct. 14, 2022, 6:53 a.m. UTC | #11
On 10/14/22 05:21, Alistair Popple wrote:
> 
> Seems like this would fix both the SKIP in FIXTURE_SETUP and ASSERT in
> FIXTURE_TEARDOWN issues:

Yep, that fixed the infinite error loops for me, thanks.

...

>> Unlike hmm-tests though the above snippet reports correct pass/skip
>> statistics with the teardown assertion removed. This is because there is
>> also a bug in hmm-tests. Currently we have:
>>
>>    SKIP(exit(0), "DEVICE_COHERENT not available");
>>
>> Which should really be:
>>
>>    SKIP(return, "DEVICE_COHERENT not available");
>>

And with this on top, I got the skips due to DEVICE_COHERENT not available
counted correctly.

>> Of course that results in an infinite loop due to the associated
>> assertion failure during teardown which is still called despite the SKIP
>> in setup. Not sure if this is why it was originally coded this way.
>>
>>  - Alistair
>>
>>>> thanks,
>>>> -- Shuah
>>>>
>>>>
>>>>
Jason Gunthorpe Oct. 14, 2022, 12:01 p.m. UTC | #12
On Thu, Oct 13, 2022 at 06:54:24PM +0200, Vlastimil Babka wrote:
> Hi,
> 
> I've been trying the hmm_tests as of today's commit:
> 
> a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
> 
> and run into several issues that seemed worth reporting.
> 
> First, it seems the FIXTURE_TEARDOWN(hmm) in
> tools/testing/selftests/vm/hmm-tests.c
> using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
> assertion failure. Dunno if it's a kselftests issue or it's a bug to
> use asserts in teardown. I hacked it up like this locally to proceed:

I've seen this too in other tests, it is a kselftests bug/limitation,
AFAIK. You can't use assert macros in those functions.

Jason
Felix Kuehling Oct. 14, 2022, 3:03 p.m. UTC | #13
Am 2022-10-14 um 08:01 schrieb Jason Gunthorpe:
> On Thu, Oct 13, 2022 at 06:54:24PM +0200, Vlastimil Babka wrote:
>> Hi,
>>
>> I've been trying the hmm_tests as of today's commit:
>>
>> a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
>>
>> and run into several issues that seemed worth reporting.
>>
>> First, it seems the FIXTURE_TEARDOWN(hmm) in
>> tools/testing/selftests/vm/hmm-tests.c
>> using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
>> assertion failure. Dunno if it's a kselftests issue or it's a bug to
>> use asserts in teardown. I hacked it up like this locally to proceed:
> I've seen this too in other tests, it is a kselftests bug/limitation,
> AFAIK. You can't use assert macros in those functions.

I vaguely remember looking at this when I reviewed Alex's patches that 
added device-coherent support. We wanted to have these checks in the 
fixture setup so that we wouldn't have to duplicate them in all the tests.

I'm not sure if I missed it in review, and Alex missed it in testing, or 
if this is a regression that happened more recently. Sorry for the 
trouble. It looks like Alistair already figured out a fix.

Regards,
   Felix


>
> Jason
Jason Gunthorpe Oct. 14, 2022, 3:47 p.m. UTC | #14
On Fri, Oct 14, 2022 at 11:03:39AM -0400, Felix Kuehling wrote:
> Am 2022-10-14 um 08:01 schrieb Jason Gunthorpe:
> > On Thu, Oct 13, 2022 at 06:54:24PM +0200, Vlastimil Babka wrote:
> > > Hi,
> > > 
> > > I've been trying the hmm_tests as of today's commit:
> > > 
> > > a185a0995518 ("Merge tag 'linux-kselftest-kunit-6.1-rc1-2' ...)
> > > 
> > > and run into several issues that seemed worth reporting.
> > > 
> > > First, it seems the FIXTURE_TEARDOWN(hmm) in
> > > tools/testing/selftests/vm/hmm-tests.c
> > > using ASSERT_EQ(ret, 0); can run into an infinite loop of reporting the
> > > assertion failure. Dunno if it's a kselftests issue or it's a bug to
> > > use asserts in teardown. I hacked it up like this locally to proceed:
> > I've seen this too in other tests, it is a kselftests bug/limitation,
> > AFAIK. You can't use assert macros in those functions.
> 
> I vaguely remember looking at this when I reviewed Alex's patches that added
> device-coherent support. We wanted to have these checks in the fixture setup
> so that we wouldn't have to duplicate them in all the tests.
> 
> I'm not sure if I missed it in review, and Alex missed it in testing, or if
> this is a regression that happened more recently. Sorry for the trouble. It
> looks like Alistair already figured out a fix.

I think the design is fine, it is just surprising you can't call
ASSERT/etc in the fixture codes. Hopefully something like Alistair's
fix gets merged.

Jason
diff mbox series

Patch

--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -154,6 +154,11 @@  FIXTURE_TEARDOWN(hmm)
 {
 	int ret = close(self->fd);
 
+	if (ret != 0) {
+		fprintf(stderr, "close returned (%d) fd is (%d)\n", ret,self->fd);
+		exit(1);
+	}
+
 	ASSERT_EQ(ret, 0);
 	self->fd = -1;
 }