diff mbox

[1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

Message ID 20160708135447.GB22099@e104818-lin.cambridge.arm.com
State New
Headers show

Commit Message

Catalin Marinas July 8, 2016, 1:54 p.m. UTC
On Fri, Jul 08, 2016 at 11:36:57AM +0800, Leizhen (ThunderTown) wrote:
> On 2016/7/7 23:37, Catalin Marinas wrote:

> > On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:

> >> At present, PG_dcache_clean is only cleared when the related huge page

> >> is about to be freed. But sometimes, there maybe a process is in charge

> >> to copy binary codes into a shared memory, and notifies other processes

> >> to execute base on that. For the first time, there is no problem, because

> >> the default value of page->flags is PG_dcache_clean cleared. So the cache

> >> will be maintained at the time of set_pte_at for other processes. But if

> >> the content of the shared memory have been updated again, there is no

> >> cache operations, because the PG_dcache_clean is still set.

> >>

> >> For example:

> >> Process A

> >> 	open a hugetlbfs file

> >> 	mmap it as a shared memory

> >> 	copy some binary codes into it

> >> 	munmap

> >>

> >> Process B

> >> 	open the hugetlbfs file

> >> 	mmap it as a shared memory, executable

> >> 	invoke the functions in the shared memory

> >> 	munmap

> >>

> >> repeat the above steps.

> > 

> > Does this work as you would expect with small pages (and for example

> > shared file mmap)? I don't want to have a different behaviour between

> > small and huge pages.

> 

> The small pages also have this problem, I will try to fix it too.


Have you run the above tests on a standard file (with small pages)? It's
strange that we haven't hit this so far with gcc or something else
generating code (unless they don't use mmap but just sequential writes).

If both cases need solving, we might better move the fix in the
__sync_icache_dcache() function. Untested:

------------8<----------------
----------------8<---------------------

BTW, can you make your tests (source) available somewhere?

Thanks.

-- 
Catalin

Comments

Leizhen (ThunderTown) July 8, 2016, 3:24 p.m. UTC | #1
On 2016/7/8 21:54, Catalin Marinas wrote:
> On Fri, Jul 08, 2016 at 11:36:57AM +0800, Leizhen (ThunderTown) wrote:

>> On 2016/7/7 23:37, Catalin Marinas wrote:

>>> On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:

>>>> At present, PG_dcache_clean is only cleared when the related huge page

>>>> is about to be freed. But sometimes, there maybe a process is in charge

>>>> to copy binary codes into a shared memory, and notifies other processes

>>>> to execute base on that. For the first time, there is no problem, because

>>>> the default value of page->flags is PG_dcache_clean cleared. So the cache

>>>> will be maintained at the time of set_pte_at for other processes. But if

>>>> the content of the shared memory have been updated again, there is no

>>>> cache operations, because the PG_dcache_clean is still set.

>>>>

>>>> For example:

>>>> Process A

>>>> 	open a hugetlbfs file

>>>> 	mmap it as a shared memory

>>>> 	copy some binary codes into it

>>>> 	munmap

>>>>

>>>> Process B

>>>> 	open the hugetlbfs file

>>>> 	mmap it as a shared memory, executable

>>>> 	invoke the functions in the shared memory

>>>> 	munmap

>>>>

>>>> repeat the above steps.

>>>

>>> Does this work as you would expect with small pages (and for example

>>> shared file mmap)? I don't want to have a different behaviour between

>>> small and huge pages.

>>

>> The small pages also have this problem, I will try to fix it too.

> 

> Have you run the above tests on a standard file (with small pages)? It's

> strange that we haven't hit this so far with gcc or something else

> generating code (unless they don't use mmap but just sequential writes).

The test code should be randomly generated, to make sure the context
in ICache is always stale. I have attached the simplified testcase demo.

The main portion is picked as below:
	srand(time(NULL));
	ptr = (unsigned int *)share_mem;
	*ptr++ = 0xd2800000;				//mov x0, #0
	for (i = 0, total = 0; i < 100; i++) {
		value = 0xfff & rand();
		total += value;
		*ptr++ = 0xb1000000 | (value << 10);	//adds x0, x0, #value
	}
	*ptr = 0xd65f03c0;				//ret

> 

> If both cases need solving, we might better move the fix in the

> __sync_icache_dcache() function. Untested:

Yes.

At first I also want to fix it as below. But I'm not sure which time the PageDirty
will be cleared, and if two or more processes mmap it as executable, cache operations
will be duplicated. At present, I really have not found any good place to clear
PG_dcache_clean. So the below modification may be the best choice, concisely and clearly.

> 

> ------------8<----------------

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> index dbd12ea8ce68..c753fa804165 100644

> --- a/arch/arm64/mm/flush.c

> +++ b/arch/arm64/mm/flush.c

> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

>  	if (!page_mapping(page))

>  		return;

>  

> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

> +	    PageDirty(page))

>  		sync_icache_aliases(page_address(page),

>  				    PAGE_SIZE << compound_order(page));

>  	else if (icache_is_aivivt())

> ----------------8<---------------------

> 

> BTW, can you make your tests (source) available somewhere?

Both cases worked well with this patch.

> 

> Thanks.

>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>

#define FILENAME		"/mnt/huge/test_file"
#define TST_MMAP_SIZE		0x200000

typedef unsigned int (*TEST_FUNC_T)(void);

/*
 * mkdir -p /mnt/huge
 * echo 20 > /proc/sys/vm/nr_hugepages
 * mount none /mnt/huge -t hugetlbfs -o pagesize=2048K
 */
int main(void)
{
	int i;
	int fd;
	int ret;
	void *share_mem;
	size_t size;
	struct stat sb;
	TEST_FUNC_T func_ptr;
	unsigned int value, total;
	unsigned int *ptr;

	fd = open(FILENAME, O_RDWR | O_CREAT);
	if (fd == -1) {
		printf("Open file %s failed P1: %s\n", FILENAME, strerror(errno));
		return 1;
	}

	lseek(fd, TST_MMAP_SIZE - 1, SEEK_SET);  
	write(fd, "", 1);

	share_mem = mmap(NULL, TST_MMAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
	if (share_mem == MAP_FAILED) {
		printf("Call mmap failed P1: %s\n", strerror(errno));
		exit(1);
	}

	close(fd);

	srand(time(NULL));
	ptr = (unsigned int *)share_mem;
	*ptr++ = 0xd2800000;				//mov x0, #0
	for (i = 0, total = 0; i < 100; i++) {
		value = 0xfff & rand();
		total += value;
		*ptr++ = 0xb1000000 | (value << 10);	//adds x0, x0, #value
	}
	*ptr = 0xd65f03c0;				//ret

	//__clear_cache((char *)share_mem, (char *)share_mem + 0x200);

	ret = msync(share_mem, TST_MMAP_SIZE, MS_SYNC);
	if (ret) {
		printf("Call msync failed: %s\n", strerror(errno));
		exit(1);
	}

	ret = munmap(share_mem, TST_MMAP_SIZE);
	if (ret) {
		printf("Call munmap failed P1: %s\n", strerror(errno));
		exit(1);
	}

	fd = open(FILENAME, S_IXUSR);
	if (fd == -1) {
		printf("Open file %s failed P2: %s\n", FILENAME, strerror(errno));
		return 1;
	}

	if (fstat(fd, &sb) == -1) {
		printf("Call fstat failed: %s\n", strerror(errno));
		exit(1);
	}
	size = sb.st_size;

	func_ptr = (TEST_FUNC_T)mmap(NULL, size, PROT_EXEC, MAP_SHARED, fd, 0);
	if (func_ptr == MAP_FAILED) {
		printf("Call mmap failed P2: %s\n", strerror(errno));
		exit(1);
	}

	close(fd);

	value = func_ptr();
	printf("Test is %s: The result is 0x%x, expect = 0x%x\n", (value == total) ? "Passed" : "Failed", value, total);

	ret = munmap(share_mem, TST_MMAP_SIZE);
	if (ret) {
		printf("Call munmap failed P2: %s\n", strerror(errno));
		exit(1);
	}

	return 0;
}
Catalin Marinas July 8, 2016, 4:13 p.m. UTC | #2
On Fri, Jul 08, 2016 at 11:24:26PM +0800, Leizhen (ThunderTown) wrote:
> On 2016/7/8 21:54, Catalin Marinas wrote:

> > On Fri, Jul 08, 2016 at 11:36:57AM +0800, Leizhen (ThunderTown) wrote:

> >> On 2016/7/7 23:37, Catalin Marinas wrote:

> >>> On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:

> >>>> At present, PG_dcache_clean is only cleared when the related huge page

> >>>> is about to be freed. But sometimes, there maybe a process is in charge

> >>>> to copy binary codes into a shared memory, and notifies other processes

> >>>> to execute base on that. For the first time, there is no problem, because

> >>>> the default value of page->flags is PG_dcache_clean cleared. So the cache

> >>>> will be maintained at the time of set_pte_at for other processes. But if

> >>>> the content of the shared memory have been updated again, there is no

> >>>> cache operations, because the PG_dcache_clean is still set.

> >>>>

> >>>> For example:

> >>>> Process A

> >>>> 	open a hugetlbfs file

> >>>> 	mmap it as a shared memory

> >>>> 	copy some binary codes into it

> >>>> 	munmap

> >>>>

> >>>> Process B

> >>>> 	open the hugetlbfs file

> >>>> 	mmap it as a shared memory, executable

> >>>> 	invoke the functions in the shared memory

> >>>> 	munmap

> >>>>

> >>>> repeat the above steps.

> >>>

> >>> Does this work as you would expect with small pages (and for example

> >>> shared file mmap)? I don't want to have a different behaviour between

> >>> small and huge pages.

> >>

> >> The small pages also have this problem, I will try to fix it too.

[...]
> > If both cases need solving, we might better move the fix in the

> > __sync_icache_dcache() function. Untested:

>

> At first I also want to fix it as below. But I'm not sure which time the PageDirty

> will be cleared, and if two or more processes mmap it as executable, cache operations

> will be duplicated. At present, I really have not found any good place to clear

> PG_dcache_clean. So the below modification may be the best choice, concisely and clearly.

> 

> > ------------8<----------------

> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> > index dbd12ea8ce68..c753fa804165 100644

> > --- a/arch/arm64/mm/flush.c

> > +++ b/arch/arm64/mm/flush.c

> > @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

> >  	if (!page_mapping(page))

> >  		return;

> >  

> > -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> > +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

> > +	    PageDirty(page))

> >  		sync_icache_aliases(page_address(page),

> >  				    PAGE_SIZE << compound_order(page));

> >  	else if (icache_is_aivivt())

> > ----------------8<---------------------

> > 

> > BTW, can you make your tests (source) available somewhere?

>

> Both cases worked well with this patch.


Now I'm even more confused ;). IIUC, after an msync() in user space we
should flush the pages to disk via write_cache_pages(). This function
calls clear_page_dirty_for_io() after which PageDirty() is no longer
true. I can't tell how a subsequent mmap() can see the written pages as
dirty.

-- 
Catalin
Leizhen (ThunderTown) July 11, 2016, 12:43 p.m. UTC | #3
On 2016/7/9 0:13, Catalin Marinas wrote:
> On Fri, Jul 08, 2016 at 11:24:26PM +0800, Leizhen (ThunderTown) wrote:

>> On 2016/7/8 21:54, Catalin Marinas wrote:

>>> On Fri, Jul 08, 2016 at 11:36:57AM +0800, Leizhen (ThunderTown) wrote:

>>>> On 2016/7/7 23:37, Catalin Marinas wrote:

>>>>> On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:

>>>>>> At present, PG_dcache_clean is only cleared when the related huge page

>>>>>> is about to be freed. But sometimes, there maybe a process is in charge

>>>>>> to copy binary codes into a shared memory, and notifies other processes

>>>>>> to execute base on that. For the first time, there is no problem, because

>>>>>> the default value of page->flags is PG_dcache_clean cleared. So the cache

>>>>>> will be maintained at the time of set_pte_at for other processes. But if

>>>>>> the content of the shared memory have been updated again, there is no

>>>>>> cache operations, because the PG_dcache_clean is still set.

>>>>>>

>>>>>> For example:

>>>>>> Process A

>>>>>> 	open a hugetlbfs file

>>>>>> 	mmap it as a shared memory

>>>>>> 	copy some binary codes into it

>>>>>> 	munmap

>>>>>>

>>>>>> Process B

>>>>>> 	open the hugetlbfs file

>>>>>> 	mmap it as a shared memory, executable

>>>>>> 	invoke the functions in the shared memory

>>>>>> 	munmap

>>>>>>

>>>>>> repeat the above steps.

>>>>>

>>>>> Does this work as you would expect with small pages (and for example

>>>>> shared file mmap)? I don't want to have a different behaviour between

>>>>> small and huge pages.

>>>>

>>>> The small pages also have this problem, I will try to fix it too.

> [...]

>>> If both cases need solving, we might better move the fix in the

>>> __sync_icache_dcache() function. Untested:

>>

>> At first I also want to fix it as below. But I'm not sure which time the PageDirty

>> will be cleared, and if two or more processes mmap it as executable, cache operations

>> will be duplicated. At present, I really have not found any good place to clear

>> PG_dcache_clean. So the below modification may be the best choice, concisely and clearly.

>>

>>> ------------8<----------------

>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

>>> index dbd12ea8ce68..c753fa804165 100644

>>> --- a/arch/arm64/mm/flush.c

>>> +++ b/arch/arm64/mm/flush.c

>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

>>>  	if (!page_mapping(page))

>>>  		return;

>>>  

>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

>>> +	    PageDirty(page))

>>>  		sync_icache_aliases(page_address(page),

>>>  				    PAGE_SIZE << compound_order(page));

>>>  	else if (icache_is_aivivt())

>>> ----------------8<---------------------

>>>

>>> BTW, can you make your tests (source) available somewhere?

>>

>> Both cases worked well with this patch.

> 

> Now I'm even more confused ;). IIUC, after an msync() in user space we

> should flush the pages to disk via write_cache_pages(). This function

> calls clear_page_dirty_for_io() after which PageDirty() is no longer

> true. I can't tell how a subsequent mmap() can see the written pages as

> dirty.

> 


As my tracing, both cases invoked empty function.

int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
	......
	return file->f_op->fsync(file, start, end, datasync);
}

const struct file_operations hugetlbfs_file_operations = {
	.fsync			= noop_fsync,

static const struct file_operations shmem_file_operations = {
	.mmap		= shmem_mmap,
#ifdef CONFIG_TMPFS
	.fsync		= noop_fsync,
Catalin Marinas July 12, 2016, 3:35 p.m. UTC | #4
On Mon, Jul 11, 2016 at 08:43:32PM +0800, Leizhen (ThunderTown) wrote:
> On 2016/7/9 0:13, Catalin Marinas wrote:

> > On Fri, Jul 08, 2016 at 11:24:26PM +0800, Leizhen (ThunderTown) wrote:

> >> On 2016/7/8 21:54, Catalin Marinas wrote:

> >>> ------------8<----------------

> >>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> >>> index dbd12ea8ce68..c753fa804165 100644

> >>> --- a/arch/arm64/mm/flush.c

> >>> +++ b/arch/arm64/mm/flush.c

> >>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

> >>>  	if (!page_mapping(page))

> >>>  		return;

> >>>  

> >>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> >>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

> >>> +	    PageDirty(page))

> >>>  		sync_icache_aliases(page_address(page),

> >>>  				    PAGE_SIZE << compound_order(page));

> >>>  	else if (icache_is_aivivt())

> >>> ----------------8<---------------------

> >>>

> >>> BTW, can you make your tests (source) available somewhere?

> >>

> >> Both cases worked well with this patch.

> > 

> > Now I'm even more confused ;). IIUC, after an msync() in user space we

> > should flush the pages to disk via write_cache_pages(). This function

> > calls clear_page_dirty_for_io() after which PageDirty() is no longer

> > true. I can't tell how a subsequent mmap() can see the written pages as

> > dirty.

> 

> As my tracing, both cases invoked empty function.

> 

> int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)

> 	......

> 	return file->f_op->fsync(file, start, end, datasync);

> }

> 

> const struct file_operations hugetlbfs_file_operations = {

> 	.fsync			= noop_fsync,

> 

> static const struct file_operations shmem_file_operations = {

> 	.mmap		= shmem_mmap,

> #ifdef CONFIG_TMPFS

> 	.fsync		= noop_fsync,


I was referring to standard filesystem (e.g. ext4) writes where, IIUC,
the PageDirty() status is cleared after I/O but it's not necessarily
removed from the page cache.

-- 
Catalin
Leizhen (ThunderTown) July 20, 2016, 2:46 a.m. UTC | #5
On 2016/7/12 23:35, Catalin Marinas wrote:
> On Mon, Jul 11, 2016 at 08:43:32PM +0800, Leizhen (ThunderTown) wrote:

>> On 2016/7/9 0:13, Catalin Marinas wrote:

>>> On Fri, Jul 08, 2016 at 11:24:26PM +0800, Leizhen (ThunderTown) wrote:

>>>> On 2016/7/8 21:54, Catalin Marinas wrote:

>>>>> ------------8<----------------

>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

>>>>> index dbd12ea8ce68..c753fa804165 100644

>>>>> --- a/arch/arm64/mm/flush.c

>>>>> +++ b/arch/arm64/mm/flush.c

>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

>>>>>  	if (!page_mapping(page))

>>>>>  		return;

>>>>>  

>>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

>>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

>>>>> +	    PageDirty(page))

>>>>>  		sync_icache_aliases(page_address(page),

>>>>>  				    PAGE_SIZE << compound_order(page));

>>>>>  	else if (icache_is_aivivt())

>>>>> ----------------8<---------------------

Hi, Catalin:
  Do you plan to send this patch? My colleagues told me that if our patches are quite
different, it should be Signed-off-by you.

  I searched all Linux source code, __sync_icache_dcache is only called by set_pte_at,
and some check conditions(especially pte_exec) will limit its impact.

	if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
		__sync_icache_dcache(pte, addr);

>>>>>

>>>>> BTW, can you make your tests (source) available somewhere?

>>>>

>>>> Both cases worked well with this patch.

>>>

>>> Now I'm even more confused ;). IIUC, after an msync() in user space we

>>> should flush the pages to disk via write_cache_pages(). This function

>>> calls clear_page_dirty_for_io() after which PageDirty() is no longer

>>> true. I can't tell how a subsequent mmap() can see the written pages as

>>> dirty.

>>

>> As my tracing, both cases invoked empty function.

>>

>> int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)

>> 	......

>> 	return file->f_op->fsync(file, start, end, datasync);

>> }

>>

>> const struct file_operations hugetlbfs_file_operations = {

>> 	.fsync			= noop_fsync,

>>

>> static const struct file_operations shmem_file_operations = {

>> 	.mmap		= shmem_mmap,

>> #ifdef CONFIG_TMPFS

>> 	.fsync		= noop_fsync,

> 

> I was referring to standard filesystem (e.g. ext4) writes where, IIUC,

> the PageDirty() status is cleared after I/O but it's not necessarily

> removed from the page cache.

>
Catalin Marinas July 20, 2016, 9:19 a.m. UTC | #6
On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:
> >>>> On 2016/7/8 21:54, Catalin Marinas wrote:

> >>>>> ------------8<----------------

> >>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> >>>>> index dbd12ea8ce68..c753fa804165 100644

> >>>>> --- a/arch/arm64/mm/flush.c

> >>>>> +++ b/arch/arm64/mm/flush.c

> >>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

> >>>>>  	if (!page_mapping(page))

> >>>>>  		return;

> >>>>>  

> >>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> >>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

> >>>>> +	    PageDirty(page))

> >>>>>  		sync_icache_aliases(page_address(page),

> >>>>>  				    PAGE_SIZE << compound_order(page));

> >>>>>  	else if (icache_is_aivivt())

> >>>>> ----------------8<---------------------

> 

> Do you plan to send this patch? My colleagues told me that if our

> patches are quite different, it should be Signed-off-by you.


The reason I'm not sending it is that I don't fully understand how it
solves the problem for a shared file mmap(), not just hugetlbfs. As I
said in an earlier email: after an msync() in user space we
should flush the pages to disk via write_cache_pages(). This function
calls clear_page_dirty_for_io() after which PageDirty() is no longer
true. I can't tell how a subsequent mmap() can see the written pages as
dirty.

> I searched all Linux source code, __sync_icache_dcache is only called

> by set_pte_at, and some check conditions(especially pte_exec) will

> limit its impact.

> 

> 	if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))

> 		__sync_icache_dcache(pte, addr);


Yes, and set_pte_at() would be called as a result of a page fault when
accessing the mmap'ed file.

-- 
Catalin
Leizhen (ThunderTown) Aug. 22, 2016, 4:19 a.m. UTC | #7
On 2016/7/20 17:19, Catalin Marinas wrote:
> On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:

>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:

>>>>>>> ------------8<----------------

>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

>>>>>>> index dbd12ea8ce68..c753fa804165 100644

>>>>>>> --- a/arch/arm64/mm/flush.c

>>>>>>> +++ b/arch/arm64/mm/flush.c

>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

>>>>>>>  	if (!page_mapping(page))

>>>>>>>  		return;

>>>>>>>  

>>>>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

>>>>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

>>>>>>> +	    PageDirty(page))

>>>>>>>  		sync_icache_aliases(page_address(page),

>>>>>>>  				    PAGE_SIZE << compound_order(page));

>>>>>>>  	else if (icache_is_aivivt())

>>>>>>> ----------------8<---------------------

>>

>> Do you plan to send this patch? My colleagues told me that if our

>> patches are quite different, it should be Signed-off-by you.

> 

> The reason I'm not sending it is that I don't fully understand how it

> solves the problem for a shared file mmap(), not just hugetlbfs. As I

> said in an earlier email: after an msync() in user space we

> should flush the pages to disk via write_cache_pages(). This function

Hi Catalin:
   I'm so sorry for my fault. The previous small pages test result I actually ran on ramfs.
Today, I ran the case on harddisk fs, it worked well without this patch.

Summarized as follows:
small pages on ramfs: need this patch
small pages on harddisk fs: no need this patch
hugetlbfs: need this patch



> calls clear_page_dirty_for_io() after which PageDirty() is no longer

> true. I can't tell how a subsequent mmap() can see the written pages as

> dirty.

> 

>> I searched all Linux source code, __sync_icache_dcache is only called

>> by set_pte_at, and some check conditions(especially pte_exec) will

>> limit its impact.

>>

>> 	if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))

>> 		__sync_icache_dcache(pte, addr);

> 

> Yes, and set_pte_at() would be called as a result of a page fault when

> accessing the mmap'ed file.

>
Leizhen (ThunderTown) Aug. 24, 2016, 9 a.m. UTC | #8
On 2016/8/24 1:28, Catalin Marinas wrote:
> On Mon, Aug 22, 2016 at 12:19:04PM +0800, Leizhen (ThunderTown) wrote:

>> On 2016/7/20 17:19, Catalin Marinas wrote:

>>> On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:

>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:

>>>>>>>>> ------------8<----------------

>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644

>>>>>>>>> --- a/arch/arm64/mm/flush.c

>>>>>>>>> +++ b/arch/arm64/mm/flush.c

>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

>>>>>>>>>  	if (!page_mapping(page))

>>>>>>>>>  		return;

>>>>>>>>>  

>>>>>>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

>>>>>>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

>>>>>>>>> +	    PageDirty(page))

>>>>>>>>>  		sync_icache_aliases(page_address(page),

>>>>>>>>>  				    PAGE_SIZE << compound_order(page));

>>>>>>>>>  	else if (icache_is_aivivt())

>>>>>>>>> ----------------8<---------------------

>>>>

>>>> Do you plan to send this patch? My colleagues told me that if our

>>>> patches are quite different, it should be Signed-off-by you.

>>>

>>> The reason I'm not sending it is that I don't fully understand how it

>>> solves the problem for a shared file mmap(), not just hugetlbfs. As I

>>> said in an earlier email: after an msync() in user space we

>>> should flush the pages to disk via write_cache_pages(). This function

>> Hi Catalin:

>>    I'm so sorry for my fault. The previous small pages test result I actually ran on ramfs.

>> Today, I ran the case on harddisk fs, it worked well without this patch.

>>

>> Summarized as follows:

>> small pages on ramfs: need this patch

>> small pages on harddisk fs: no need this patch

>> hugetlbfs: need this patch

> 

> I would add:

> 

> small pages over nfs: fails with or without this patch

> 

> (tested on Juno, Cortex-A57; seems to be fixed if I remove the

> PG_dcache_clean test altogether but, well, we end up over-flushing)

> 

> I assume that when using a hard drive, it goes through the block I/O

> layer and we may have a flush_dcache_page() called when the kernel is

> about to read a page that has been mapped in user space. This would

> clear the PG_dcache_clean bit and subsequent __sync_icache_dcache()

> would perform cache maintenance.

> 

> Could you try on your system the test case without the msync() call? I'm

According to my test results: without msync, the test case may failed.

10-175-112-211:~ # ./tst_small_page_no_msync
Test is Failed: The result is 0x316b9, expect = 0x365a5
10-175-112-211:~ # ./tst_small_page_no_msync
Test is Failed: The result is 0x31023, expect = 0x31efa
10-175-112-211:~ # ./tst_small_page_no_msync
Test is Passed: The result is 0x31efa, expect = 0x31efa

10-175-112-211:~ # ./tst_small_page
Test is Passed: The result is 0x31eb7, expect = 0x31eb7
10-175-112-211:~ # ./tst_small_page
Test is Passed: The result is 0x3111f, expect = 0x3111f
10-175-112-211:~ # ./tst_small_page
Test is Passed: The result is 0x3111f, expect = 0x3111f

> not sure whether munmap() would trigger an immediate write-back, in

> which case we may see the issue even with the filesystem on a hard

> drive.

>
Catalin Marinas Aug. 24, 2016, 10:30 a.m. UTC | #9
On Wed, Aug 24, 2016 at 05:00:50PM +0800, Leizhen (ThunderTown) wrote:
> 

> 

> On 2016/8/24 1:28, Catalin Marinas wrote:

> > On Mon, Aug 22, 2016 at 12:19:04PM +0800, Leizhen (ThunderTown) wrote:

> >> On 2016/7/20 17:19, Catalin Marinas wrote:

> >>> On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:

> >>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:

> >>>>>>>>> ------------8<----------------

> >>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> >>>>>>>>> index dbd12ea8ce68..c753fa804165 100644

> >>>>>>>>> --- a/arch/arm64/mm/flush.c

> >>>>>>>>> +++ b/arch/arm64/mm/flush.c

> >>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

> >>>>>>>>>  	if (!page_mapping(page))

> >>>>>>>>>  		return;

> >>>>>>>>>  

> >>>>>>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> >>>>>>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

> >>>>>>>>> +	    PageDirty(page))

> >>>>>>>>>  		sync_icache_aliases(page_address(page),

> >>>>>>>>>  				    PAGE_SIZE << compound_order(page));

> >>>>>>>>>  	else if (icache_is_aivivt())

> >>>>>>>>> ----------------8<---------------------

> >>>>

> >>>> Do you plan to send this patch? My colleagues told me that if our

> >>>> patches are quite different, it should be Signed-off-by you.

> >>>

> >>> The reason I'm not sending it is that I don't fully understand how it

> >>> solves the problem for a shared file mmap(), not just hugetlbfs. As I

> >>> said in an earlier email: after an msync() in user space we

> >>> should flush the pages to disk via write_cache_pages(). This function

> >> Hi Catalin:

> >>    I'm so sorry for my fault. The previous small pages test result I actually ran on ramfs.

> >> Today, I ran the case on harddisk fs, it worked well without this patch.

> >>

> >> Summarized as follows:

> >> small pages on ramfs: need this patch

> >> small pages on harddisk fs: no need this patch

> >> hugetlbfs: need this patch

> > 

> > I would add:

> > 

> > small pages over nfs: fails with or without this patch

> > 

> > (tested on Juno, Cortex-A57; seems to be fixed if I remove the

> > PG_dcache_clean test altogether but, well, we end up over-flushing)

> > 

> > I assume that when using a hard drive, it goes through the block I/O

> > layer and we may have a flush_dcache_page() called when the kernel is

> > about to read a page that has been mapped in user space. This would

> > clear the PG_dcache_clean bit and subsequent __sync_icache_dcache()

> > would perform cache maintenance.

> > 

> > Could you try on your system the test case without the msync() call? I'm

> 

> According to my test results: without msync, the test case may failed.


Thanks. Just to be clear, does the test generate a file on on a hard
drive?

> 10-175-112-211:~ # ./tst_small_page_no_msync

> Test is Failed: The result is 0x316b9, expect = 0x365a5

> 10-175-112-211:~ # ./tst_small_page_no_msync

> Test is Failed: The result is 0x31023, expect = 0x31efa

> 10-175-112-211:~ # ./tst_small_page_no_msync

> Test is Passed: The result is 0x31efa, expect = 0x31efa

> 

> 10-175-112-211:~ # ./tst_small_page

> Test is Passed: The result is 0x31eb7, expect = 0x31eb7

> 10-175-112-211:~ # ./tst_small_page

> Test is Passed: The result is 0x3111f, expect = 0x3111f

> 10-175-112-211:~ # ./tst_small_page

> Test is Passed: The result is 0x3111f, expect = 0x3111f


How many tests did you run for the "passed" case? With NFS it may
sometime take minutes before a failure (I use the "watch" command with a
slightly modified test to return non-zero in case of value mismatch).

While we indeed see failures on multiple filesystem types, I wonder
whether this test case is actually expected to work. If I modify the
test to pass O_TRUNC to open(), I can no longer see failures. So any
standard tool that copies/creates executable files (gcc, dpkg, cp, rsync
etc.) wouldn't encounter such issues since they truncate the original
file and old page cache pages would be removed.

Do you have a real use-case where a task mmap's an executable file,
modifies it in place and expects another task to see the new
instructions without user-space cache maintenance?

-- 
Catalin
Catalin Marinas Aug. 25, 2016, 9:30 a.m. UTC | #10
On Thu, Aug 25, 2016 at 09:42:26AM +0800, Leizhen (ThunderTown) wrote:
> On 2016/8/24 18:30, Catalin Marinas wrote:

> >>>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:

> >>>>>>>>>>> ------------8<----------------

> >>>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

> >>>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644

> >>>>>>>>>>> --- a/arch/arm64/mm/flush.c

> >>>>>>>>>>> +++ b/arch/arm64/mm/flush.c

> >>>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

> >>>>>>>>>>>  	if (!page_mapping(page))

> >>>>>>>>>>>  		return;

> >>>>>>>>>>>  

> >>>>>>>>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

> >>>>>>>>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

> >>>>>>>>>>> +	    PageDirty(page))

> >>>>>>>>>>>  		sync_icache_aliases(page_address(page),

> >>>>>>>>>>>  				    PAGE_SIZE << compound_order(page));

> >>>>>>>>>>>  	else if (icache_is_aivivt())

> >>>>>>>>>>> ----------------8<---------------------

[...]
> > While we indeed see failures on multiple filesystem types, I wonder

> > whether this test case is actually expected to work. If I modify the

> > test to pass O_TRUNC to open(), I can no longer see failures. So any

> > standard tool that copies/creates executable files (gcc, dpkg, cp, rsync

> > etc.) wouldn't encounter such issues since they truncate the original

> > file and old page cache pages would be removed.

> > 

> > Do you have a real use-case where a task mmap's an executable file,

> > modifies it in place and expects another task to see the new

> > instructions without user-space cache maintenance?

> 

> No, it's just a test case created by testers.


In this case I propose we ignore this patch and you adjust the test to
use O_TRUNC, at least until we find a real scenario where this would
matter.

-- 
Catalin
Leizhen (ThunderTown) Aug. 25, 2016, 11:27 a.m. UTC | #11
On 2016/8/25 17:30, Catalin Marinas wrote:
> On Thu, Aug 25, 2016 at 09:42:26AM +0800, Leizhen (ThunderTown) wrote:

>> On 2016/8/24 18:30, Catalin Marinas wrote:

>>>>>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:

>>>>>>>>>>>>> ------------8<----------------

>>>>>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c

>>>>>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644

>>>>>>>>>>>>> --- a/arch/arm64/mm/flush.c

>>>>>>>>>>>>> +++ b/arch/arm64/mm/flush.c

>>>>>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)

>>>>>>>>>>>>>  	if (!page_mapping(page))

>>>>>>>>>>>>>  		return;

>>>>>>>>>>>>>  

>>>>>>>>>>>>> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))

>>>>>>>>>>>>> +	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||

>>>>>>>>>>>>> +	    PageDirty(page))

>>>>>>>>>>>>>  		sync_icache_aliases(page_address(page),

>>>>>>>>>>>>>  				    PAGE_SIZE << compound_order(page));

>>>>>>>>>>>>>  	else if (icache_is_aivivt())

>>>>>>>>>>>>> ----------------8<---------------------

> [...]

>>> While we indeed see failures on multiple filesystem types, I wonder

>>> whether this test case is actually expected to work. If I modify the

>>> test to pass O_TRUNC to open(), I can no longer see failures. So any

>>> standard tool that copies/creates executable files (gcc, dpkg, cp, rsync

>>> etc.) wouldn't encounter such issues since they truncate the original

>>> file and old page cache pages would be removed.

>>>

>>> Do you have a real use-case where a task mmap's an executable file,

>>> modifies it in place and expects another task to see the new

>>> instructions without user-space cache maintenance?

>>

>> No, it's just a test case created by testers.

> 

> In this case I propose we ignore this patch and you adjust the test to

> use O_TRUNC, at least until we find a real scenario where this would

> matter.

OK, thanks. We currently add __clear_cache in user space.

>
diff mbox

Patch

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index dbd12ea8ce68..c753fa804165 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -75,7 +75,8 @@  void __sync_icache_dcache(pte_t pte, unsigned long addr)
 	if (!page_mapping(page))
 		return;
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+	if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
+	    PageDirty(page))
 		sync_icache_aliases(page_address(page),
 				    PAGE_SIZE << compound_order(page));
 	else if (icache_is_aivivt())