diff mbox series

KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL

Message ID 20221103162302.4ba62d72@maurocar-mobl2
State New
Headers show
Series KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL | expand

Commit Message

Mauro Carvalho Chehab Nov. 3, 2022, 3:23 p.m. UTC
Hi,

I'm facing a couple of issues when testing KUnit with the i915 driver.

The DRM subsystem and the i915 driver has, for a long time, his own
way to do unit tests, which seems to be added before KUnit.

I'm now checking if it is worth start using KUnit at i915. So, I wrote
a RFC with some patches adding support for the tests we have to be
reported using Kernel TAP and KUnit.

There are basically 3 groups of tests there:

- mock tests - check i915 hardware-independent logic;
- live tests - run some hardware-specific tests;
- perf tests - check perf support - also hardware-dependent.

As they depend on i915 driver, they run only on x86, with PCI
stack enabled, but the mock tests run nicely via qemu.

The live and perf tests require a real hardware. As we run them
together with our CI, which, among other things, test module
unload/reload and test loading i915 driver with different
modprobe parameters, the KUnit tests should be able to run as
a module.

While testing KUnit, I noticed a couple of issues:

1. kunit.py parser is currently broken when used with modules

the parser expects "TAP version xx" output, but this won't
happen when loading the kunit test driver.

Are there any plans or patches fixing this issue?

2. current->mm is not initialized

Some tests do mmap(). They need the mm user context to be initialized,
but this is not happening right now.

Are there a way to properly initialize it for KUnit?

3. there's no test filters for modules

In order to be able to do proper CI automation, it is needed to
be able to control what tests will run or not. That's specially
interesting at development time where some tests may not apply
or not run properly on new hardware.

Are there any plans to add support for it at kunit_test_suites()
when the driver is built as module? Ideally, the best would be to
export a per-module filter_glob parameter on such cases.

4. there are actually 3 levels of tests on i915:
	- Level 1: mock, live, perf
	- Level 2: test group (mmap, fences, ...)
	- Level 3: unit tests

Currently, KUnit seems to have just two levels (test suite and tests).
Are there a way to add test groups there?

Regards,
Mauro

Forwarded message:

Date: Thu,  3 Nov 2022 14:51:38 +0000
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: 
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>, linux-kselftest@vger.kernel.org, Michał Winiarski <michal.winiarski@intel.com>, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Daniel Latypov <dlatypov@google.com>, linux-kernel@vger.kernel.org, igt-dev@lists.freedesktop.org, Matthew Auld <matthew.auld@intel.com>, Daniel Vetter <daniel@ffwll.ch>, Rodrigo Vivi <rodrigo.vivi@intel.com>, skhan@linuxfoundation.org, Isabella Basso <isabbasso@riseup.net>, David Airlie <airlied@gmail.com>, Christian König <christian.koenig@amd.com>
Subject: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL


The mmap tests require mm in order to work. Failing to do that
will cause a crash:

[  316.820722] BUG: kernel NULL pointer dereference, address: 00000000000000e8
[  316.822517] #PF: supervisor write access in kernel mode
[  316.823430] #PF: error_code(0x0002) - not-present page
[  316.824390] PGD 0 P4D 0
[  316.825357] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  316.826350] CPU: 0 PID: 1517 Comm: kunit_try_catch Tainted: G     U           N 6.1.0-rc2-drm-266703e6f163+ #14
[  316.827503] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[  316.828633] RIP: 0010:down_write_killable+0x50/0x110
[  316.829756] Code: 24 10 45 31 c9 31 c9 41 b8 01 00 00 00 31 d2 31 f6 48 89 ef e8 e1 74 4a ff bf 01 00 00 00 e8 87 d6 46 ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 0f 94 c0 5a 84 c0 74 62 8b 05 49 12 e4 00 85 c0 74
[  316.830896] RSP: 0018:ffffc90001eabc58 EFLAGS: 00010246
[  316.832008] RAX: 0000000000000000 RBX: 00000000000000e8 RCX: 0000000000000000
[  316.833141] RDX: 0000000000000001 RSI: ffffffff81c94fc9 RDI: ffffffff81c94fc9
[  316.834195] RBP: 0000000000000158 R08: 0000000000000001 R09: 0000000000000000
[  316.835231] R10: 0000000000000000 R11: ffff8883a13350b8 R12: 0000000000000002
[  316.836259] R13: 0000000000000001 R14: 0000000000100000 R15: 00000000000000e8
[  316.837237] FS:  0000000000000000(0000) GS:ffff8883a3800000(0000) knlGS:0000000000000000
[  316.838214] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  316.839190] CR2: 00000000000000e8 CR3: 0000000002812003 CR4: 0000000000770ef0
[  316.840147] PKRU: 55555554
[  316.841099] Call Trace:
[  316.842047]  <TASK>
[  316.842990]  ? vm_mmap_pgoff+0x78/0x150
[  316.843936]  vm_mmap_pgoff+0x78/0x150
[  316.844884]  igt_mmap_offset+0x178/0x1b9 [i915]
[  316.846119]  __igt_mmap+0xfe/0x680 [i915]

Unfortunately, when KUnit module runs, it doesn't create an
user context, causing mmap tests to fail.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH RFC v2 0/8] at: https://lore.kernel.org/all/cover.1667486144.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Daniel Latypov Nov. 3, 2022, 10:43 p.m. UTC | #1
On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
<mauro.chehab@linux.intel.com> wrote:
>
> Hi,
>
> I'm facing a couple of issues when testing KUnit with the i915 driver.
>
> The DRM subsystem and the i915 driver has, for a long time, his own
> way to do unit tests, which seems to be added before KUnit.
>
> I'm now checking if it is worth start using KUnit at i915. So, I wrote
> a RFC with some patches adding support for the tests we have to be
> reported using Kernel TAP and KUnit.
>
> There are basically 3 groups of tests there:
>
> - mock tests - check i915 hardware-independent logic;
> - live tests - run some hardware-specific tests;
> - perf tests - check perf support - also hardware-dependent.
>
> As they depend on i915 driver, they run only on x86, with PCI
> stack enabled, but the mock tests run nicely via qemu.
>
> The live and perf tests require a real hardware. As we run them
> together with our CI, which, among other things, test module
> unload/reload and test loading i915 driver with different
> modprobe parameters, the KUnit tests should be able to run as
> a module.
>
> While testing KUnit, I noticed a couple of issues:
>
> 1. kunit.py parser is currently broken when used with modules
>
> the parser expects "TAP version xx" output, but this won't
> happen when loading the kunit test driver.
>
> Are there any plans or patches fixing this issue?

Partially.
Note: we need a header to look for so we can strip prefixes (like timestamps).

But there is a patch in the works to add a TAP header for each
subtest, hopefully in time for 6.2.
This is to match the KTAP spec:
https://kernel.org/doc/html/latest/dev-tools/ktap.html

That should fix it so you can parse one suite's results at a time.
I'm pretty sure it won't fix the case where there's multiple suites
and/or you're trying to parse all test results at once via

$ find /sys/kernel/debug/kunit/ -type f | xargs cat |
./tools/testing/kunit/kunit.py parse

I think that in-kernel code change + some more python changes could
make the above command work, but no one has actively started looking
at that just yet.
Hopefully we can pick this up and also get it done for 6.2 (unless I'm
underestimating how complicated this is).

>
> 2. current->mm is not initialized
>
> Some tests do mmap(). They need the mm user context to be initialized,
> but this is not happening right now.
>
> Are there a way to properly initialize it for KUnit?

Right, this is a consequence of how early built-in KUnit tests are run
after boot.
I think for now, the answer is to make the test module-only.

I know David had some ideas here, but I can't speak to them.

>
> 3. there's no test filters for modules
>
> In order to be able to do proper CI automation, it is needed to
> be able to control what tests will run or not. That's specially
> interesting at development time where some tests may not apply
> or not run properly on new hardware.
>
> Are there any plans to add support for it at kunit_test_suites()
> when the driver is built as module? Ideally, the best would be to
> export a per-module filter_glob parameter on such cases.

I think this is a good idea and is doable. (I think I said as much on
the other thread).

The thinking before was that people would make group tests together in modules.
But if you want to share a single module for many tests, this becomes
more useful.

This has some potential merge conflicts w/ other pending work.
I was also prototyping the ability to tell KUnit "run tests #2 - #5",
so that also touches the filtering code very heavily.
(The goal there is to have kunit.py able to shard up tests and boot
multiple kernels concurrently.)

>
> 4. there are actually 3 levels of tests on i915:
>         - Level 1: mock, live, perf
>         - Level 2: test group (mmap, fences, ...)
>         - Level 3: unit tests
>
> Currently, KUnit seems to have just two levels (test suite and tests).
> Are there a way to add test groups there?

Parameterized tests are the closest we have to a third-level of tests.
But other than that, the answer is no.

I'd need to get more familiar with the existing tests, but I'm pretty
sure parameters won't work for you.

And I don't know if this will get done.

Note: the kunit_parser.py code should be able to handle arbitrary
levels of tests in the output.
This restriction is purely in the in-kernel code.

I had brought up the idea of more layers of tests before.
It would also be useful for
a) sharing expensive setup between multiple tests
b) allowing more granular scope for cleanups (kunit_kmalloc and others)
c) more flexibility in dynamically generating subtests than
parameterized testing

There's some precedent in other unit testing frameworks, for example:
https://pkg.go.dev/testing#T.Run

Daniel
Mauro Carvalho Chehab Nov. 4, 2022, 7:49 a.m. UTC | #2
On Thu, 3 Nov 2022 15:43:26 -0700
Daniel Latypov <dlatypov@google.com> wrote:

> On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> <mauro.chehab@linux.intel.com> wrote:
> >
> > Hi,
> >
> > I'm facing a couple of issues when testing KUnit with the i915 driver.
> >
> > The DRM subsystem and the i915 driver has, for a long time, his own
> > way to do unit tests, which seems to be added before KUnit.
> >
> > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > a RFC with some patches adding support for the tests we have to be
> > reported using Kernel TAP and KUnit.
> >
> > There are basically 3 groups of tests there:
> >
> > - mock tests - check i915 hardware-independent logic;
> > - live tests - run some hardware-specific tests;
> > - perf tests - check perf support - also hardware-dependent.
> >
> > As they depend on i915 driver, they run only on x86, with PCI
> > stack enabled, but the mock tests run nicely via qemu.
> >
> > The live and perf tests require a real hardware. As we run them
> > together with our CI, which, among other things, test module
> > unload/reload and test loading i915 driver with different
> > modprobe parameters, the KUnit tests should be able to run as
> > a module.
> >
> > While testing KUnit, I noticed a couple of issues:
> >
> > 1. kunit.py parser is currently broken when used with modules
> >
> > the parser expects "TAP version xx" output, but this won't
> > happen when loading the kunit test driver.
> >
> > Are there any plans or patches fixing this issue?  
> 
> Partially.
> Note: we need a header to look for so we can strip prefixes (like timestamps).
> 
> But there is a patch in the works to add a TAP header for each
> subtest, hopefully in time for 6.2.

Good to know.

> This is to match the KTAP spec:
> https://kernel.org/doc/html/latest/dev-tools/ktap.html

I see.

> That should fix it so you can parse one suite's results at a time.
> I'm pretty sure it won't fix the case where there's multiple suites
> and/or you're trying to parse all test results at once via
> 
> $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> ./tools/testing/kunit/kunit.py parse

Could you point me to the changeset? perhaps I can write a followup
patch addressing this case.

> I think that in-kernel code change + some more python changes could
> make the above command work, but no one has actively started looking
> at that just yet.
> Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> underestimating how complicated this is).
> 
> >
> > 2. current->mm is not initialized
> >
> > Some tests do mmap(). They need the mm user context to be initialized,
> > but this is not happening right now.
> >
> > Are there a way to properly initialize it for KUnit?  
> 
> Right, this is a consequence of how early built-in KUnit tests are run
> after boot.
> I think for now, the answer is to make the test module-only.
> 
> I know David had some ideas here, but I can't speak to them.

This is happening when test-i915 is built as module as well.

I suspect that the function which initializes it is mm_alloc() inside 
kernel/fork.c:

	struct mm_struct *mm_alloc(void)
	{
	        struct mm_struct *mm;

	        mm = allocate_mm();
	        if (!mm)
	                return NULL;

	        memset(mm, 0, sizeof(*mm));
	        return mm_init(mm, current, current_user_ns());
	}

As modprobing a test won't fork until all tests run, this never runs.

It seems that the normal usage is at fs/exec.c:

	fs/exec.c:      bprm->mm = mm = mm_alloc();

but other places also call it:

	arch/arm/mach-rpc/ecard.c:      struct mm_struct * mm = mm_alloc();
	drivers/dma-buf/dma-resv.c:     struct mm_struct *mm = mm_alloc();
	include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
	mm/debug_vm_pgtable.c:  args->mm = mm_alloc();

Probably the solution would be to call it inside kunit executor code,
adding support for modules to use it.

> > 3. there's no test filters for modules
> >
> > In order to be able to do proper CI automation, it is needed to
> > be able to control what tests will run or not. That's specially
> > interesting at development time where some tests may not apply
> > or not run properly on new hardware.
> >
> > Are there any plans to add support for it at kunit_test_suites()
> > when the driver is built as module? Ideally, the best would be to
> > export a per-module filter_glob parameter on such cases.  
> 
> I think this is a good idea and is doable. (I think I said as much on
> the other thread).
> 
> The thinking before was that people would make group tests together in modules.
> But if you want to share a single module for many tests, this becomes
> more useful.

At least for this RFC, I opted to place everything we have already on
a single module. 

Perhaps I could create, instead, 3 separate modules. This way, I would gain
a "third level" and a poor man's way of filtering what test type
will run (mock, live or perf).

Yet, we will still need to be able to filter the unit tests, as this
is where all the fun happens.

> This has some potential merge conflicts w/ other pending work.
> I was also prototyping the ability to tell KUnit "run tests #2 - #5",
> so that also touches the filtering code very heavily.

Are you planning to allow this to support such feature also on modules?

> (The goal there is to have kunit.py able to shard up tests and boot
> multiple kernels concurrently.)
> 
> >
> > 4. there are actually 3 levels of tests on i915:
> >         - Level 1: mock, live, perf
> >         - Level 2: test group (mmap, fences, ...)
> >         - Level 3: unit tests
> >
> > Currently, KUnit seems to have just two levels (test suite and tests).
> > Are there a way to add test groups there?  
> 
> Parameterized tests are the closest we have to a third-level of tests.
> But other than that, the answer is no.
> 
> I'd need to get more familiar with the existing tests, but I'm pretty
> sure parameters won't work for you.

Our current approach with selftests is that each test can be disabled.
You can see how it currently works by taking a look at the __run_selftests
logic (level 2 on the above hierarchy) inside
drivers/gpu/drm/i915/selftests/i915_selftest.c:


	static int __run_selftests(const char *name,
				   struct selftest *st,
				   unsigned int count,
				   void *data)
	{
...
		/* Tests are listed in order in i915_*_selftests.h */
		for (; count--; st++) {
			if (!st->enabled)
				continue;
...
		pr_info(DRIVER_NAME ": Running %s\n", st->name);
		if (data)
			err = st->live(data);
		else
			err = st->mock();

The same also happens at subtests (level 3):

	int __i915_subtests(const char *caller,
			    int (*setup)(void *data),
			    int (*teardown)(int err, void *data),
			    const struct i915_subtest *st,
			    unsigned int count,
			    void *data)
	{
...
		if (!apply_subtest_filter(caller, st->name))
			continue;
...
		pr_info(DRIVER_NAME ": Running %s/%s\n", caller, st->name);
		GEM_TRACE("Running %s/%s\n", caller, st->name);

		err = teardown(st->func(data), data);
 
> And I don't know if this will get done.
> 
> Note: the kunit_parser.py code should be able to handle arbitrary
> levels of tests in the output.
> This restriction is purely in the in-kernel code.
> 
> I had brought up the idea of more layers of tests before.
> It would also be useful for
> a) sharing expensive setup between multiple tests
> b) allowing more granular scope for cleanups (kunit_kmalloc and others)
> c) more flexibility in dynamically generating subtests than
> parameterized testing

Yes, it makes sense to me. 

> There's some precedent in other unit testing frameworks, for example:
> https://pkg.go.dev/testing#T.Run

Regards,
Mauro
Mauro Carvalho Chehab Nov. 4, 2022, 12:47 p.m. UTC | #3
On Fri, 4 Nov 2022 08:49:55 +0100
Mauro Carvalho Chehab <mauro.chehab@linux.intel.com> wrote:

> On Thu, 3 Nov 2022 15:43:26 -0700
> Daniel Latypov <dlatypov@google.com> wrote:
> 
> > On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> > <mauro.chehab@linux.intel.com> wrote:  
> > >
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.
> > >
> > > While testing KUnit, I noticed a couple of issues:
> > >
> > > 1. kunit.py parser is currently broken when used with modules
> > >
> > > the parser expects "TAP version xx" output, but this won't
> > > happen when loading the kunit test driver.
> > >
> > > Are there any plans or patches fixing this issue?    
> > 
> > Partially.
> > Note: we need a header to look for so we can strip prefixes (like timestamps).
> > 
> > But there is a patch in the works to add a TAP header for each
> > subtest, hopefully in time for 6.2.  
> 
> Good to know.
> 
> > This is to match the KTAP spec:
> > https://kernel.org/doc/html/latest/dev-tools/ktap.html  
> 
> I see.
> 
> > That should fix it so you can parse one suite's results at a time.
> > I'm pretty sure it won't fix the case where there's multiple suites
> > and/or you're trying to parse all test results at once via
> > 
> > $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> > ./tools/testing/kunit/kunit.py parse  
> 
> Could you point me to the changeset? perhaps I can write a followup
> patch addressing this case.
> 
> > I think that in-kernel code change + some more python changes could
> > make the above command work, but no one has actively started looking
> > at that just yet.
> > Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> > underestimating how complicated this is).
> >   
> > >
> > > 2. current->mm is not initialized
> > >
> > > Some tests do mmap(). They need the mm user context to be initialized,
> > > but this is not happening right now.
> > >
> > > Are there a way to properly initialize it for KUnit?    
> > 
> > Right, this is a consequence of how early built-in KUnit tests are run
> > after boot.
> > I think for now, the answer is to make the test module-only.
> > 
> > I know David had some ideas here, but I can't speak to them.  
> 
> This is happening when test-i915 is built as module as well.
> 
> I suspect that the function which initializes it is mm_alloc() inside 
> kernel/fork.c:
> 
> 	struct mm_struct *mm_alloc(void)
> 	{
> 	        struct mm_struct *mm;
> 
> 	        mm = allocate_mm();
> 	        if (!mm)
> 	                return NULL;
> 
> 	        memset(mm, 0, sizeof(*mm));
> 	        return mm_init(mm, current, current_user_ns());
> 	}
> 
> As modprobing a test won't fork until all tests run, this never runs.
> 
> It seems that the normal usage is at fs/exec.c:
> 
> 	fs/exec.c:      bprm->mm = mm = mm_alloc();
> 
> but other places also call it:
> 
> 	arch/arm/mach-rpc/ecard.c:      struct mm_struct * mm = mm_alloc();
> 	drivers/dma-buf/dma-resv.c:     struct mm_struct *mm = mm_alloc();
> 	include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
> 	mm/debug_vm_pgtable.c:  args->mm = mm_alloc();
> 
> Probably the solution would be to call it inside kunit executor code,
> adding support for modules to use it.


Hmm... it is not that simple... I tried the enclosed patch, but it caused
another issue at the live/mman/mmap test:

<snip>
[  152.815543] test_i915: 0000:00:02.0: it is a i915 device.
[  152.816456]     # Subtest: i915 live selftests
[  152.816463]     1..1
[  152.816835] kunit_try_run_case: allocating user context
[  152.816978] CPU: 1 PID: 1139 Comm: kunit_try_catch Tainted: G                 N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[  152.817063] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[  152.817583] i915: Performing live_mman selftests with st_random_seed=0x11aaba4d st_timeout=500
[  152.817735] test_i915: Setting dangerous option KUnit live_mman - tainting kernel
[  152.817819] test_i915: Running live_mman on 0000:00:02.0
[  152.817899] i915: Running i915_gem_mman_live_selftests/igt_partial_tiling
[  153.346653] check_partial_mappings: timed out after tiling=0 stride=0
[  153.847696] check_partial_mappings: timed out after tiling=1 stride=262144
[  154.348615] check_partial_mappings: timed out after tiling=2 stride=262144
[  154.376677] i915: Running i915_gem_mman_live_selftests/igt_smoke_tiling
[  154.877686] igt_smoke_tiling: Completed 3465 trials
[  155.025764] i915: Running i915_gem_mman_live_selftests/igt_mmap_offset_exhaustion
[  155.050908] i915: Running i915_gem_mman_live_selftests/igt_mmap
[  155.052056] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  155.052080] #PF: supervisor instruction fetch in kernel mode
[  155.052095] #PF: error_code(0x0010) - not-present page
[  155.052110] PGD 0 P4D 0 
[  155.052121] Oops: 0010 [#1] PREEMPT SMP NOPTI
[  155.052135] CPU: 5 PID: 1139 Comm: kunit_try_catch Tainted: G     U           N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[  155.052162] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[  155.052191] RIP: 0010:0x0
[  155.052207] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  155.052223] RSP: 0018:ffffc900019ebbe8 EFLAGS: 00010246
[  155.052238] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000100000
[  155.052257] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881111a6840
[  155.052275] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[  155.052292] R10: ffff8881049ad000 R11: 00000000ffffffff R12: 0000000000000002
[  155.052309] R13: ffff8881111a6840 R14: 0000000000100000 R15: 0000000000000000
[  155.052327] FS:  0000000000000000(0000) GS:ffff8883a3a80000(0000) knlGS:0000000000000000
[  155.052347] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.052361] CR2: ffffffffffffffd6 CR3: 000000011118c004 CR4: 0000000000770ee0
[  155.052379] PKRU: 55555554
[  155.052387] Call Trace:
[  155.052396]  <TASK>
[  155.052403]  get_unmapped_area+0x80/0x130
[  155.052422]  do_mmap+0xe5/0x530
[  155.052439]  vm_mmap_pgoff+0xab/0x150
[  155.052457]  igt_mmap_offset+0x133/0x1e0 [i915]
[  155.052875]  __igt_mmap+0xfe/0x680 [i915]
[  155.053233]  ? __i915_gem_object_create_user_ext+0x49c/0x550 [i915]
[  155.053614]  igt_mmap+0xd8/0x290 [i915]
[  155.054057]  ? __trace_bprintk+0x8c/0xa0
[  155.054080]  __i915_subtests.cold+0x53/0xd5 [i915]
[  155.054648]  ? __i915_nop_teardown+0x20/0x20 [i915]
[  155.055127]  ? __i915_live_setup+0x60/0x60 [i915]
[  155.055608]  ? singleton_release+0x40/0x40 [i915]
[  155.056060]  i915_gem_mman_live_selftests+0x4e/0x60 [i915]
[  155.056503]  run_pci_test.cold+0x4d/0x163 [test_i915]
[  155.056535]  ? kunit_try_catch_throw+0x20/0x20
[  155.056557]  live_mman+0x19/0x26 [test_i915]
[  155.056581]  kunit_try_run_case+0xf0/0x145
[  155.056607]  kunit_generic_run_threadfn_adapter+0x13/0x30
[  155.057715]  kthread+0xf2/0x120
[  155.058864]  ? kthread_complete_and_exit+0x20/0x20
[  155.060014]  ret_from_fork+0x1f/0x30
[  155.061108]  </TASK>
[  155.062174] Modules linked in: test_i915 x86_pkg_temp_thermal coretemp snd_hda_codec_hdmi mei_hdcp kvm_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep kvm snd_hda_core mei_me irqbypass wmi_bmof snd_pcm i2c_i801 mei i2c_smbus intel_lpss_pci crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 prime_numbers drm_buddy drm_display_helper drm_kms_helper syscopyarea e1000e sysfillrect sysimgblt ptp fb_sys_fops pps_core ttm video wmi fuse
[  155.064354] CR2: 0000000000000000
[  155.065413] ---[ end trace 0000000000000000 ]---
[  155.074555] RIP: 0010:0x0
[  155.075437] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  155.076313] RSP: 0018:ffffc900019ebbe8 EFLAGS: 00010246
[  155.077195] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000100000
[  155.078124] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881111a6840
[  155.079013] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[  155.079898] R10: ffff8881049ad000 R11: 00000000ffffffff R12: 0000000000000002
[  155.080785] R13: ffff8881111a6840 R14: 0000000000100000 R15: 0000000000000000
[  155.081668] FS:  0000000000000000(0000) GS:ffff8883a3a80000(0000) knlGS:0000000000000000
[  155.082565] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.083451] CR2: ffffffffffffffd6 CR3: 0000000110904006 CR4: 0000000000770ee0
[  155.084348] PKRU: 55555554
</snip>

It sounds that something else is needed to properly initialize the user
context.

Regards,
Mauro

---

[PATCH] kunit: allocate user context mm

Without that, tests envolving mmap won't work.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 90640a43cf62..809522e110c5 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -14,6 +14,7 @@
 #include <linux/moduleparam.h>
 #include <linux/panic.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/mm.h>
 #include <linux/sched.h>
 
 #include "debugfs.h"
@@ -381,9 +382,23 @@ static void kunit_try_run_case(void *data)
 	struct kunit *test = ctx->test;
 	struct kunit_suite *suite = ctx->suite;
 	struct kunit_case *test_case = ctx->test_case;
+	struct mm_struct *mm = NULL;
 
 	current->kunit_test = test;
 
+	if (!current->mm) {
+		pr_info("%s: allocating user context\n", __func__);
+		mm = mm_alloc();
+		if (!mm) {
+			kunit_err(suite, KUNIT_SUBTEST_INDENT
+				"# failed to allocate mm user context");
+			return;
+		}
+		current->mm = mm;
+	} else {
+		pr_info("%s: using already-existing user context\n", __func__);
+	}
+
 	/*
 	 * kunit_run_case_internal may encounter a fatal error; if it does,
 	 * abort will be called, this thread will exit, and finally the parent
@@ -392,6 +407,11 @@ static void kunit_try_run_case(void *data)
 	kunit_run_case_internal(test, suite, test_case);
 	/* This line may never be reached. */
 	kunit_run_case_cleanup(test, suite);
+
+	if (mm) {
+		mmdrop(mm);
+		current->mm = NULL;
+	}
 }
 
 static void kunit_catch_run_case(void *data)
Daniel Latypov Nov. 4, 2022, 7:20 p.m. UTC | #4
On Fri, Nov 4, 2022 at 12:50 AM Mauro Carvalho Chehab
<mauro.chehab@linux.intel.com> wrote:
>
> On Thu, 3 Nov 2022 15:43:26 -0700
> Daniel Latypov <dlatypov@google.com> wrote:
>
> > On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> > <mauro.chehab@linux.intel.com> wrote:
> > >
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.
> > >
> > > While testing KUnit, I noticed a couple of issues:
> > >
> > > 1. kunit.py parser is currently broken when used with modules
> > >
> > > the parser expects "TAP version xx" output, but this won't
> > > happen when loading the kunit test driver.
> > >
> > > Are there any plans or patches fixing this issue?
> >
> > Partially.
> > Note: we need a header to look for so we can strip prefixes (like timestamps).
> >
> > But there is a patch in the works to add a TAP header for each
> > subtest, hopefully in time for 6.2.
>
> Good to know.
>
> > This is to match the KTAP spec:
> > https://kernel.org/doc/html/latest/dev-tools/ktap.html
>
> I see.
>
> > That should fix it so you can parse one suite's results at a time.
> > I'm pretty sure it won't fix the case where there's multiple suites
> > and/or you're trying to parse all test results at once via
> >
> > $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> > ./tools/testing/kunit/kunit.py parse
>
> Could you point me to the changeset? perhaps I can write a followup
> patch addressing this case.

rmoar@google.com was working on them and should hopefully be able to
send them out real soon.
You should get CC'd on those.

I think the follow-up work is just crafting an example parser input
file and iterating until
  $ ./tools/testing/kunit/kunit.py parse < /tmp/example_input
produces our desired results.

>
> > I think that in-kernel code change + some more python changes could
> > make the above command work, but no one has actively started looking
> > at that just yet.
> > Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> > underestimating how complicated this is).
> >
> > >
> > > 2. current->mm is not initialized
> > >
> > > Some tests do mmap(). They need the mm user context to be initialized,
> > > but this is not happening right now.
> > >
> > > Are there a way to properly initialize it for KUnit?
> >
> > Right, this is a consequence of how early built-in KUnit tests are run
> > after boot.
> > I think for now, the answer is to make the test module-only.
> >
> > I know David had some ideas here, but I can't speak to them.
>
> This is happening when test-i915 is built as module as well.

Oh, I didn't expect that at all.

>
> I suspect that the function which initializes it is mm_alloc() inside
> kernel/fork.c:
>
>         struct mm_struct *mm_alloc(void)
>         {
>                 struct mm_struct *mm;
>
>                 mm = allocate_mm();
>                 if (!mm)
>                         return NULL;
>
>                 memset(mm, 0, sizeof(*mm));
>                 return mm_init(mm, current, current_user_ns());
>         }
>
> As modprobing a test won't fork until all tests run, this never runs.
>
> It seems that the normal usage is at fs/exec.c:
>
>         fs/exec.c:      bprm->mm = mm = mm_alloc();
>
> but other places also call it:
>
>         arch/arm/mach-rpc/ecard.c:      struct mm_struct * mm = mm_alloc();
>         drivers/dma-buf/dma-resv.c:     struct mm_struct *mm = mm_alloc();
>         include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
>         mm/debug_vm_pgtable.c:  args->mm = mm_alloc();
>
> Probably the solution would be to call it inside kunit executor code,
> adding support for modules to use it.

I know basically nothing about the mm code.
I think I vaguely recall there being issues with this on UML or
something, but I could be totally wrong.

I'll wait for David to chime in when he can.

>
> > > 3. there's no test filters for modules
> > >
> > > In order to be able to do proper CI automation, it is needed to
> > > be able to control what tests will run or not. That's specially
> > > interesting at development time where some tests may not apply
> > > or not run properly on new hardware.
> > >
> > > Are there any plans to add support for it at kunit_test_suites()
> > > when the driver is built as module? Ideally, the best would be to
> > > export a per-module filter_glob parameter on such cases.
> >
> > I think this is a good idea and is doable. (I think I said as much on
> > the other thread).
> >
> > The thinking before was that people would make group tests together in modules.
> > But if you want to share a single module for many tests, this becomes
> > more useful.
>
> At least for this RFC, I opted to place everything we have already on
> a single module.
>
> Perhaps I could create, instead, 3 separate modules. This way, I would gain
> a "third level" and a poor man's way of filtering what test type
> will run (mock, live or perf).
>
> Yet, we will still need to be able to filter the unit tests, as this
> is where all the fun happens.
>
> > This has some potential merge conflicts w/ other pending work.
> > I was also prototyping the ability to tell KUnit "run tests #2 - #5",
> > so that also touches the filtering code very heavily.
>
> Are you planning to allow this to support such feature also on modules?

I was not expecting to.
The main benefit would be for automation to try sharding up tests
across multiple kernel boots. Human users would continue to use
name-based selection.

So this fits into how kunit.py works w/ UML invocations or QEMU VMs.
I don't see how this would be useful for module-based testing, where
it feels like you want to boot once and pick which modules to
modprobe.

Do you think it would be useful to have this for modules as well?

>
> > (The goal there is to have kunit.py able to shard up tests and boot
> > multiple kernels concurrently.)
> >
> > >
> > > 4. there are actually 3 levels of tests on i915:
> > >         - Level 1: mock, live, perf
> > >         - Level 2: test group (mmap, fences, ...)
> > >         - Level 3: unit tests
> > >
> > > Currently, KUnit seems to have just two levels (test suite and tests).
> > > Are there a way to add test groups there?
> >
> > Parameterized tests are the closest we have to a third-level of tests.
> > But other than that, the answer is no.
> >
> > I'd need to get more familiar with the existing tests, but I'm pretty
> > sure parameters won't work for you.
>
> Our current approach with selftests is that each test can be disabled.
> You can see how it currently works by taking a look at the __run_selftests
> logic (level 2 on the above hierarchy) inside
> drivers/gpu/drm/i915/selftests/i915_selftest.c:
>
>
>         static int __run_selftests(const char *name,
>                                    struct selftest *st,
>                                    unsigned int count,
>                                    void *data)
>         {
> ...
>                 /* Tests are listed in order in i915_*_selftests.h */
>                 for (; count--; st++) {
>                         if (!st->enabled)
>                                 continue;
> ...
>                 pr_info(DRIVER_NAME ": Running %s\n", st->name);
>                 if (data)
>                         err = st->live(data);
>                 else
>                         err = st->mock();
>
> The same also happens at subtests (level 3):
>
>         int __i915_subtests(const char *caller,
>                             int (*setup)(void *data),
>                             int (*teardown)(int err, void *data),
>                             const struct i915_subtest *st,
>                             unsigned int count,
>                             void *data)
>         {
> ...
>                 if (!apply_subtest_filter(caller, st->name))
>                         continue;
> ...
>                 pr_info(DRIVER_NAME ": Running %s/%s\n", caller, st->name);
>                 GEM_TRACE("Running %s/%s\n", caller, st->name);
>
>                 err = teardown(st->func(data), data);

Had a brief look.

Hmm, the idea of dynamically adding subtests wouldn't work well with
this more structured approach.
The filtering right now in KUnit is done directly on the suites/test
case objects before we run them.
We'd need to pass in this extra filter while running the tests, which
would need some thought to do cleanly.

Here's an idea of how you could roughly emulate this in KUnit right now:
1) could have each KUNIT_CASE map to a test group
2) could have some shared test code declare a module param 'subtest_filter'
3) you could check if the subtest name matches 'subtest_filter', and
use kunit_skip() otherwise

You'd be able to plumb in the subtest filter via:
$ kunit.py run --kernel_args 'subtest_filter=foo'

This feels a bit gross, but not as gross as I thought it might.


Daniel

>
> > And I don't know if this will get done.
> >
> > Note: the kunit_parser.py code should be able to handle arbitrary
> > levels of tests in the output.
> > This restriction is purely in the in-kernel code.
> >
> > I had brought up the idea of more layers of tests before.
> > It would also be useful for
> > a) sharing expensive setup between multiple tests
> > b) allowing more granular scope for cleanups (kunit_kmalloc and others)
> > c) more flexibility in dynamically generating subtests than
> > parameterized testing
>
> Yes, it makes sense to me.
>
> > There's some precedent in other unit testing frameworks, for example:
> > https://pkg.go.dev/testing#T.Run
>
> Regards,
> Mauro
Michał Winiarski Nov. 7, 2022, 6:38 p.m. UTC | #5
On Thu, Nov 03, 2022 at 04:23:02PM +0100, Mauro Carvalho Chehab wrote:
> Hi,
> 
> I'm facing a couple of issues when testing KUnit with the i915 driver.
> 
> The DRM subsystem and the i915 driver has, for a long time, his own
> way to do unit tests, which seems to be added before KUnit.
> 
> I'm now checking if it is worth start using KUnit at i915. So, I wrote
> a RFC with some patches adding support for the tests we have to be
> reported using Kernel TAP and KUnit.
> 
> There are basically 3 groups of tests there:
> 
> - mock tests - check i915 hardware-independent logic;
> - live tests - run some hardware-specific tests;
> - perf tests - check perf support - also hardware-dependent.
> 
> As they depend on i915 driver, they run only on x86, with PCI
> stack enabled, but the mock tests run nicely via qemu.
> 
> The live and perf tests require a real hardware. As we run them
> together with our CI, which, among other things, test module
> unload/reload and test loading i915 driver with different
> modprobe parameters, the KUnit tests should be able to run as
> a module.

Note that KUnit tests that are doing more of a functional/integration
testing (on "live" hardware) rather than unit testing (where hardware
interactions are mocked) are not very common.
Do we have other KUnit tests like this merged?
Some of the "live tests" are not even that, being more of a pure
hardware tests (e.g. live_workarounds, which is checking whether values
in MMIO regs stick over various HW state transitions).

I'm wondering, is KUnit the right tool for this job?

-Michał
Daniel Latypov Nov. 7, 2022, 11:16 p.m. UTC | #6
On Mon, Nov 7, 2022 at 10:38 AM Michał Winiarski
<michal.winiarski@intel.com> wrote:
>
> On Thu, Nov 03, 2022 at 04:23:02PM +0100, Mauro Carvalho Chehab wrote:
> > Hi,
> >
> > I'm facing a couple of issues when testing KUnit with the i915 driver.
> >
> > The DRM subsystem and the i915 driver has, for a long time, his own
> > way to do unit tests, which seems to be added before KUnit.
> >
> > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > a RFC with some patches adding support for the tests we have to be
> > reported using Kernel TAP and KUnit.
> >
> > There are basically 3 groups of tests there:
> >
> > - mock tests - check i915 hardware-independent logic;
> > - live tests - run some hardware-specific tests;
> > - perf tests - check perf support - also hardware-dependent.
> >
> > As they depend on i915 driver, they run only on x86, with PCI
> > stack enabled, but the mock tests run nicely via qemu.
> >
> > The live and perf tests require a real hardware. As we run them
> > together with our CI, which, among other things, test module
> > unload/reload and test loading i915 driver with different
> > modprobe parameters, the KUnit tests should be able to run as
> > a module.
>
> Note that KUnit tests that are doing more of a functional/integration
> testing (on "live" hardware) rather than unit testing (where hardware
> interactions are mocked) are not very common.
> Do we have other KUnit tests like this merged?

I don't think we have other tests like this.

> Some of the "live tests" are not even that, being more of a pure
> hardware tests (e.g. live_workarounds, which is checking whether values
> in MMIO regs stick over various HW state transitions).
>
> I'm wondering, is KUnit the right tool for this job?

The main focus of KUnit is for hw-independent tests.
So in theory: no.

But I can imagine it could be easier to write the validation via
KUNIT_EXPECT_EQ and friends as opposed to writing your own kernel
module w/ its own set of macros, etc.

So my first thought is: "if it works, then you can try using it."
(Might want to take steps like make sure they don't get enabled by
CONFIG_KUNIT_ALL_TESTS=y).

Talking with David, he seems to have echoed my thoughts.
David also suggested that maybe the test could use a fake of the hw by
default, but have an option to run against real hw when available.
I think that sounds like a good chunk of work, so I don't know if you
need to worry about that.

Daniel
Daniel Latypov Nov. 8, 2022, 1:40 a.m. UTC | #7
On Fri, Nov 4, 2022 at 12:20 PM Daniel Latypov <dlatypov@google.com> wrote:
> rmoar@google.com was working on them and should hopefully be able to
> send them out real soon.
> You should get CC'd on those.
>
> I think the follow-up work is just crafting an example parser input
> file and iterating until
>   $ ./tools/testing/kunit/kunit.py parse < /tmp/example_input
> produces our desired results.

I briefly played around with this on top of
https://lore.kernel.org/linux-kselftest/20221104194705.3245738-1-rmoar@google.com/
But instead, I tried to make it so kunit.py parse could be just given
a directory as input so we don't have to do `find | xargs cat`.

I can't get module tests to work locally, so I'll replicate the setup
with some regular files.
Say we have /tmp/results with two files:
file1 contains
  # Subtest: suite1
  KTAP version 1
  1..1
    ok 1 test
  ok 1 suite1
and file2 contains
  # Subtest: suite2
  KTAP version 1
  1..1
    ok 1 test
  ok 1 suite2

I think this should match what module output in debugfs would look like.
A key thing to note: the top level test counter resets between modules, iirc.

The patch below makes this almost work

$ ./tools/testing/kunit/kunit.py parse /tmp/results/
============================================================
 ==================== suite2 (1 subtest) ====================
[PASSED] test
===================== [PASSED] suite2 ======================
==================== suite1 (1 subtest) ====================
[PASSED] test
[ERROR] Test: suite1: Expected test number 2 but found 1
===================== [PASSED] suite1 ======================
============================================================
Testing complete. Ran 2 tests: passed: 2, errors: 1

But there's a few issues
a. the error about multiple top level tests with "test number 1"
b. how do we make this handle kernel output with prefixes (timestamps)
c. and what happens if files each have different length prefixes?

diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 4d4663fb578b..c5b2eb416c2d 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -189,6 +189,14 @@ def _map_to_overall_status(test_status:
kunit_parser.TestStatus) -> KunitStatus:
                return KunitStatus.SUCCESS
        return KunitStatus.TEST_FAILURE

+def _kernel_output_from_dir(dir: str) -> Iterable[str]:
+       yield 'KTAP version 1'
+       for root, dirs, files in os.walk(dir):
+               for sub in files:
+                       with open(os.path.join(root, sub),
errors='backslashreplace') as f:
+                               for line in f:
+                                       yield line
+
 def parse_tests(request: KunitParseRequest, metadata:
kunit_json.Metadata, input_data: Iterable[str]) -> Tuple[KunitResult,
kunit_parser.Test]:
        parse_start = time.time()

@@ -496,6 +504,8 @@ def main(argv):
                if cli_args.file is None:

sys.stdin.reconfigure(errors='backslashreplace')  # pytype:
disable=attribute-error
                        kunit_output = sys.stdin
+               elif os.path.isdir(cli_args.file):
+                       kunit_output = _kernel_output_from_dir(cli_args.file)
                else:
                        with open(cli_args.file, 'r',
errors='backslashreplace') as f:
                                kunit_output = f.read().splitlines()
David Gow Nov. 8, 2022, 4:30 a.m. UTC | #8
On Thu, Nov 3, 2022 at 11:23 PM Mauro Carvalho Chehab
<mauro.chehab@linux.intel.com> wrote:
>
> Hi,
>
> I'm facing a couple of issues when testing KUnit with the i915 driver.
>
> The DRM subsystem and the i915 driver has, for a long time, his own
> way to do unit tests, which seems to be added before KUnit.
>
> I'm now checking if it is worth start using KUnit at i915. So, I wrote
> a RFC with some patches adding support for the tests we have to be
> reported using Kernel TAP and KUnit.

Thanks very much for looking into this, and sorry for the delayed
response (I've been out sick).

I think Daniel has answered most of your questions (thanks, Daniel),
and I agree with pretty much everything he's said.

In short, I think that it'd be great to have the i915 tests use KUnit
where appropriate, and even where KUnit isn't the ideal tool, using
KTAP as a result format would be great.
I definitely think that there's a whole bunch of areas of i915 for
which KUnit makes sense: the more hardware independent unit tests
(things like swizzling/tiling, maybe some command-buffer creation /
validation, "utility" functions generally) are an obvious option. If
KUnit isn't working for those sorts of tests, that's clearly a
deficiency in KUnit that we'll want to rectify (though it might take
some time to do so).

The more hardware-specific stuff probably isn't as good a fit for
KUnit, but if using KUnit is the easiest way to do test
management/assertion macros/KTAP output/etc., then it may be worth
using whatever parts of it make sense. I'd prefer it if any tests
which depend strongly on specific hardware were marked as such, and
maybe lived under a different Kconfig option (which might not be
auto-enabled by KUNIT_ALL_TESTS). Though as long as the tests are
skipped if the hardware isn't present (which seems to be the case from
running them under qemu), it's not a real problem to have them. It's
not something we plan to "officially support", though, so if the
requirements of hardware-specific tests and more traditional unit
tests conflict, KUnit will lean towards supporting the
non-hardware-specific ones.

>
> There are basically 3 groups of tests there:
>
> - mock tests - check i915 hardware-independent logic;
> - live tests - run some hardware-specific tests;
> - perf tests - check perf support - also hardware-dependent.
>
> As they depend on i915 driver, they run only on x86, with PCI
> stack enabled, but the mock tests run nicely via qemu.
>
> The live and perf tests require a real hardware. As we run them
> together with our CI, which, among other things, test module
> unload/reload and test loading i915 driver with different
> modprobe parameters, the KUnit tests should be able to run as
> a module.
>
> While testing KUnit, I noticed a couple of issues:
>
> 1. kunit.py parser is currently broken when used with modules
>
> the parser expects "TAP version xx" output, but this won't
> happen when loading the kunit test driver.
>
> Are there any plans or patches fixing this issue?
>
Yeah: this is on our to-do list to fix, hopefully pretty soon.

> 2. current->mm is not initialized
>
> Some tests do mmap(). They need the mm user context to be initialized,
> but this is not happening right now.
>
> Are there a way to properly initialize it for KUnit?
>
This is something we've hit before and don't have a good solution for
(as you've found out). I'm not an expert on the mm subsystem, so while
it's something we want to support, I don't think anyone quite knows
how yet.

As a totally wild, untested guess, you may have some luck setting
current->mm = current->active_mm, or current->mm = &init_mm?

It's definitely true that, even when loaded from modules, current->mm
won't be set as KUnit tests run in their own kthreads. Maybe setting
mm = active_mm would let us carry that context with us from the module
loader to the test...

In any case, you're not the only person to hit this issue, so it's
definitely something we'd like to work out.

> 3. there's no test filters for modules
>
> In order to be able to do proper CI automation, it is needed to
> be able to control what tests will run or not. That's specially
> interesting at development time where some tests may not apply
> or not run properly on new hardware.
>
> Are there any plans to add support for it at kunit_test_suites()
> when the driver is built as module? Ideally, the best would be to
> export a per-module filter_glob parameter on such cases.
>

Again, this is on the to-do list. It may be implemented as a global
property which affects future module loads (and might be able to be
changed via, e.g., debugfs), rather than a per-module parameter, but
we haven't designed it yet.

Alas, module support has always seen a little less love than the
built-in UML/qemu-based mode, so it does tend to lag behind a little
bit with these sort of features, and tends to be tested less well.
Hopefully we can bring it up to scratch soon.

> 4. there are actually 3 levels of tests on i915:
>         - Level 1: mock, live, perf
>         - Level 2: test group (mmap, fences, ...)
>         - Level 3: unit tests
>
> Currently, KUnit seems to have just two levels (test suite and tests).
> Are there a way to add test groups there?

The closest thing we have at the moment is "parameterised tests",
which are really designed for the case where the same test code is
being run multiple times with different inputs. It should be possible
to use this to hack a third level in (have the "parameter" be an array
of name/function-pointer pairs), and kunit.py will parse the results
correctly, as KTAP doesn't have this limitation.

The other thing you could do is to treat each "test group" as a KUnit
suite, and just prefix them with "i915_{mock,life,perf}". This isn't
ideal, but you could eventually use the test filtering to split them
up.

Ultimately, supporting more deeply nested tests is something we're not
opposed to doing in KUnit, we've just not had any need for it thus
far, so haven't really looked into how we'd design and implement it.
Now there's a potential user, we can look into it, though it's likely
to be lower-priority here, given there are workarounds.


Thanks again, and I hope that helps a bit!

Cheers,
-- David
Mauro Carvalho Chehab Nov. 14, 2022, 10:59 a.m. UTC | #9
On Mon, 7 Nov 2022 15:16:17 -0800
Daniel Latypov <dlatypov@google.com> wrote:

> On Mon, Nov 7, 2022 at 10:38 AM Michał Winiarski
> <michal.winiarski@intel.com> wrote:
> >
> > On Thu, Nov 03, 2022 at 04:23:02PM +0100, Mauro Carvalho Chehab wrote:  
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.  
> >
> > Note that KUnit tests that are doing more of a functional/integration
> > testing (on "live" hardware) rather than unit testing (where hardware
> > interactions are mocked) are not very common.
> > Do we have other KUnit tests like this merged?  
> 
> I don't think we have other tests like this.
> 
> > Some of the "live tests" are not even that, being more of a pure
> > hardware tests (e.g. live_workarounds, which is checking whether values
> > in MMIO regs stick over various HW state transitions).
> >
> > I'm wondering, is KUnit the right tool for this job?  
> 
> The main focus of KUnit is for hw-independent tests.
> So in theory: no.
> 
> But I can imagine it could be easier to write the validation via
> KUNIT_EXPECT_EQ and friends as opposed to writing your own kernel
> module w/ its own set of macros, etc.

Right now, i915 has its own way of doing that, for both hw-independent
and live tests. The current patches are keeping them, because it helps 
comparing both approaches, and won't disrupt CI jobs.

However, if we're willing to merge KUnit support, the best would be
to use the same macros/logic for both, as having two different unit
test frameworks used on i915 doesn't make much sense.

So yeah, using KUNIT_EXPECT_EQ and friends for live unit and hw tests
makes sense to me.

The main difference between i915 selftest and KUnit seems to be at
the way the tests are started:

- i915 selftest uses module parameters to run the driver on test mode.
  When such parameters are enabled, the probe() function will run the
  tests.

- from what I understood, KUnit uses a .kunit_test_suites code
  section, which does an early initialization of the tests, calling
  the test suite a lot earlier than probe(). 

Due to such differences, running KUnit against a real hardware requires
to load the real driver first, letting it probe the hardware. Once the
driver is loaded and have the hardware properly initialized, load a 
separate KUnit module that would call the test functions from the driver.
That's the approach took on this series.

It sounds possible to merge both approaches, by adding some helper
functions similar to what kunit_test_suites() do, but, instead of
using a separate segment, run the tests at probe() time.

That would mean a cleaner solution with the usage of all KUnit
macros and, in thesis, it will be equivalent to what i915 selftest
already does.

Another possibility would be to make sure that kunit_test_suites()
will fully initiate the user context, allowing mmap() to work.

> So my first thought is: "if it works, then you can try using it."
> (Might want to take steps like make sure they don't get enabled by
> CONFIG_KUNIT_ALL_TESTS=y).

Yeah, it makes sense to have hw-dependent tests not enabled with
KUNIT_ALL_TESTS. Another approach would be to just skip them when
hw is required[1]. Skipping the tests is needed anyway, as the
hardware probing may fail.

[1] Btw, despite the comments:

	/**
	 * kunit_skip() - Marks @test_or_suite as skipped
	 *
	 * @test_or_suite: The test context object.
	 * @fmt:  A printk() style format string.
	 *
	 * Skips the test. @fmt is given output as the test status
	 * comment, typically the reason the test was skipped.
	 *
	 * Test execution is halted after kunit_skip() is called.
	 */

kunit_skip() doesn't work inside .suite_init, so this would cause
compilation errors:

	static int i915_pci_init_suite(struct kunit_suite *suite)
	{
		if (!hw)
			kunit_skip(suite);
	}


> Talking with David, he seems to have echoed my thoughts.
> David also suggested that maybe the test could use a fake of the hw by
> default, but have an option to run against real hw when available.
> I think that sounds like a good chunk of work, so I don't know if you
> need to worry about that.

Emulating a GPU is a lot of work, and offers their own challenges.
Besides that, the goal here is to check the driver against real
hardware, which may have different steppings and versions. Running
on an emulated hardware defeats such purpose, and may introduce
bugs on its own.

Regards,
Mauro
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index add5ae56cd89..2c5f93e946b5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1845,6 +1845,11 @@  int i915_gem_mman_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_mmap_gpu),
 	};
 
+	if (!current->mm) {
+		pr_err("Test called without an user context!\n");
+		return -EINVAL;
+	}
+
 	return i915_live_subtests(tests, i915);
 }
 EXPORT_SYMBOL_NS_GPL(i915_gem_mman_live_selftests, I915_SELFTEST);