Message ID | 20250107-virtual_address_range-tests-v1-0-3834a2fb47fe@linutronix.de |
---|---|
Headers | show |
Series | selftests/mm: virtual_address_range: Two bugfixes and a cleanup | expand |
On 08.01.25 07:09, Dev Jain wrote: > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >> During the execution of validate_complete_va_space() a lot of memory is >> on the VM subsystem. When running on a low memory subsystem an OOM may >> be triggered, when writing to the dump file as the filesystem may also >> require memory. >> >> On my test system with 1100MiB physical memory: >> >> Tasks state (memory values in pages): >> [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name >> [ 57] 0 57 34359215953 695 256 0 439 1064390656 0 0 virtual_address >> >> Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 >> <snip> >> fault_in_iov_iter_readable+0x4a/0xd0 >> generic_perform_write+0x9c/0x280 >> shmem_file_write_iter+0x86/0x90 >> vfs_write+0x29c/0x480 >> ksys_write+0x6c/0xe0 >> do_syscall_64+0x9e/0x1a0 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> >> Write the dumped data into /dev/null instead which does not require >> additional memory during write(), making the code simpler as a >> side-effect. >> >> Signed-off-by: Thomas Weißschuh<thomas.weissschuh@linutronix.de> >> --- >> tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c >> index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644 >> --- a/tools/testing/selftests/mm/virtual_address_range.c >> +++ b/tools/testing/selftests/mm/virtual_address_range.c >> @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) >> FILE *file; >> int fd; >> >> - fd = open("va_dump", O_CREAT | O_WRONLY, 0600); >> - unlink("va_dump"); >> + fd = open("/dev/null", O_WRONLY); >> if (fd < 0) { >> - ksft_test_result_skip("cannot create or open dump file\n"); >> + ksft_test_result_skip("cannot create or open /dev/null\n"); >> ksft_finished(); >> } >> >> @@ -152,7 +151,6 @@ static int validate_complete_va_space(void) >> while (start_addr + hop < end_addr) { >> if (write(fd, (void *)(start_addr + hop), 1) != 1) >> return 1; >> - lseek(fd, 0, SEEK_SET); >> >> hop += MAP_CHUNK_SIZE; >> } >> > > The reason I had not used /dev/null was that write() was succeeding to /dev/null > even from an address not in my VA space. I was puzzled about this behaviour of > /dev/null and I chose to ignore it and just use a real file. > > To test this behaviour, run the following program: > > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <fcntl.h> > #include <sys/mman.h> > intmain() > { > intfd; > fd = open("va_dump", O_CREAT| O_WRONLY, 0600); > unlink("va_dump"); > // fd = open("/dev/null", O_WRONLY); > intret = munmap((void*)(1UL<< 30), 100); > if(!ret) > printf("munmap succeeded\n"); > intres = write(fd, (void*)(1UL<< 30), 1); > if(res == 1) > printf("write succeeded\n"); > return0; > } > The write will fail as expected, but if you comment out the va_dump > lines and use /dev/null, the write will succeed. What exactly do we want to achieve with the write? Verify that the output of /proc/self/map is reasonable and we can actually resolve a fault / map a page? Why not access the memory directly+signal handler or using /proc/self/mem, so you can avoid the temp file completely?
On 08.01.25 17:13, Thomas Weißschuh wrote: > On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote: >> On 08.01.25 09:05, Thomas Weißschuh wrote: >>> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote: >>>> >>>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote: >>>>> If not enough physical memory is available the kernel may fail mmap(); >>>>> see __vm_enough_memory() and vm_commit_limit(). >>>>> In that case the logic in validate_complete_va_space() does not make >>>>> sense and will even incorrectly fail. >>>>> Instead skip the test if no mmap() succeeded. >>>>> >>>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") >>>>> Cc: stable@vger.kernel.org >> >> CC stable on tests is ... odd. > > I thought it was fairly common, but it isn't. > Will drop it. As it's not really a "kernel BUG", it's rather uncommon. >> >> Note that with MAP_NORESRVE, most setups we care about will allow mapping as >> much as you want, but on access OOM will fire. > > Thanks for the hint. > >> So one could require that /proc/sys/vm/overcommit_memory is setup properly >> and use MAP_NORESRVE. > > Isn't the check for lchunks == 0 essentially exactly this? I assume paired with MAP_NORESERVE? Maybe, but it could be better to have something that says "if overcommit_memory is not setup properly I will SKIP this test", but otherwise I expect this to work and will FAIL if it doesn't". Or would you expect to run into lchunks == 0 even if overcommit_memory is setup properly and MAP_NORESERVE is used? (very very low memory that we cannot even create all the VMAs?) > >> Reading from anonymous memory will populate the shared zeropage. To mitigate >> OOM from "too many page tables", one could simply unmap the pieces as they >> are verified (or MAP_FIXED over them, to free page tables). > > The code has to figure out if a verified region was created by mmap(), > otherwise an munmap() could crash the process. > As the entries from /proc/self/maps may have been merged and (I assume) Yes, and partial unmap (in chunk granularity?) would split them again. > the ordering of mappings is not guaranteed, some bespoke logic to establish > the link will be needed. My thinking was that you simply process one /proc/self/maps entry in some chunks. After processing a chunk, you munmap() it. So you would process + munmap in chunks. > > Is it fine to rely on CONFIG_ANON_VMA_NAME? > That would make it much easier to implement. Can you elaborate how you would do it? > > Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even > in very low physical memory conditions. Cool.
The selftest started failing since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") was merged. While debugging I stumbled upon another bug and potential cleanup. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> --- Thomas Weißschuh (3): selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB selftests/mm: virtual_address_range: Avoid reading VVAR mappings selftests/mm: virtual_address_range: Dump to /dev/null tools/testing/selftests/mm/virtual_address_range.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) --- base-commit: fbfd64d25c7af3b8695201ebc85efe90be28c5a3 change-id: 20250107-virtual_address_range-tests-95843766fa97 Best regards,