Message ID | 20201009195033.3208459-23-ira.weiny@intel.com |
---|---|
State | New |
Headers | show |
Series | PMEM: Introduce stray write protection for PMEM | expand |
On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote: > On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote: > > The kmap() calls in this FS are localized to a single thread. To avoid > > the over head of global PKRS updates use the new kmap_thread() call. > > > > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( > > > > static inline void f2fs_copy_page(struct page *src, struct page *dst) > > { > > - char *src_kaddr = kmap(src); > > - char *dst_kaddr = kmap(dst); > > + char *src_kaddr = kmap_thread(src); > > + char *dst_kaddr = kmap_thread(dst); > > > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE); > > - kunmap(dst); > > - kunmap(src); > > + kunmap_thread(dst); > > + kunmap_thread(src); > > } > > Wouldn't it make more sense to switch cases like this to kmap_atomic()? > The pages are only mapped to do a memcpy(), then they're immediately unmapped. Maybe you missed the earlier thread from Thomas trying to do something similar for rather different reasons ... https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/
On Sat, Oct 10, 2020 at 01:39:54AM +0100, Matthew Wilcox wrote: > On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote: > > On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote: > > > The kmap() calls in this FS are localized to a single thread. To avoid > > > the over head of global PKRS updates use the new kmap_thread() call. > > > > > > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( > > > > > > static inline void f2fs_copy_page(struct page *src, struct page *dst) > > > { > > > - char *src_kaddr = kmap(src); > > > - char *dst_kaddr = kmap(dst); > > > + char *src_kaddr = kmap_thread(src); > > > + char *dst_kaddr = kmap_thread(dst); > > > > > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE); > > > - kunmap(dst); > > > - kunmap(src); > > > + kunmap_thread(dst); > > > + kunmap_thread(src); > > > } > > > > Wouldn't it make more sense to switch cases like this to kmap_atomic()? > > The pages are only mapped to do a memcpy(), then they're immediately unmapped. > > Maybe you missed the earlier thread from Thomas trying to do something > similar for rather different reasons ... > > https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/ I did miss it. I'm not subscribed to any of the mailing lists it was sent to. Anyway, it shouldn't matter. Patchsets should be standalone, and not require reading random prior threads on linux-kernel to understand. And I still don't really understand. After this patchset, there is still code nearly identical to the above (doing a temporary mapping just for a memcpy) that would still be using kmap_atomic(). Is the idea that later, such code will be converted to use kmap_thread() instead? If not, why use one over the other? - Eric
On Fri, 2020-10-09 at 14:34 -0700, Eric Biggers wrote: > On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > The kmap() calls in this FS are localized to a single thread. To > > avoid the over head of global PKRS updates use the new > > kmap_thread() call. > > > > Cc: Jaegeuk Kim <jaegeuk@kernel.org> > > Cc: Chao Yu <chao@kernel.org> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > > fs/f2fs/f2fs.h | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index d9e52a7f3702..ff72a45a577e 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -2410,12 +2410,12 @@ static inline struct page > > *f2fs_pagecache_get_page( > > > > static inline void f2fs_copy_page(struct page *src, struct page > > *dst) > > { > > - char *src_kaddr = kmap(src); > > - char *dst_kaddr = kmap(dst); > > + char *src_kaddr = kmap_thread(src); > > + char *dst_kaddr = kmap_thread(dst); > > > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE); > > - kunmap(dst); > > - kunmap(src); > > + kunmap_thread(dst); > > + kunmap_thread(src); > > } > > Wouldn't it make more sense to switch cases like this to > kmap_atomic()? > The pages are only mapped to do a memcpy(), then they're immediately > unmapped. On a VIPT/VIVT architecture, this is horrendously wasteful. You're taking something that was mapped at colour c_src mapping it to a new address src_kaddr, which is likely a different colour and necessitates flushing the original c_src, then you copy it to dst_kaddr, which is also likely a different colour from c_dst, so dst_kaddr has to be flushed on kunmap and c_dst has to be invalidated on kmap. What we should have is an architectural primitive for doing this, something like kmemcopy_arch(dst, src). PIPT architectures can implement it as the above (possibly losing kmap if they don't need it) but VIPT/VIVT architectures can set up a correctly coloured mapping so they can simply copy from c_src to c_dst without any need to flush and the data arrives cache hot at c_dst. James
On Sun, Oct 11, 2020 at 11:56:35PM -0700, Ira Weiny wrote: > > > > And I still don't really understand. After this patchset, there is still code > > nearly identical to the above (doing a temporary mapping just for a memcpy) that > > would still be using kmap_atomic(). > > I don't understand. You mean there would be other call sites calling: > > kmap_atomic() > memcpy() > kunmap_atomic() Yes, there are tons of places that do this. Try 'git grep -A6 kmap_atomic' and look for memcpy(). Hence why I'm asking what will be the "recommended" way to do this... kunmap_thread() or kmap_atomic()? > And since I don't know the call site details if there are kmap_thread() calls > which are better off as kmap_atomic() calls I think it is worth converting > them. But I made the assumption that kmap users would already be calling > kmap_atomic() if they could (because it is more efficient). Not necessarily. In cases where either one is correct, people might not have put much thought into which of kmap() and kmap_atomic() they are using. - Eric
On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote: > kmap_atomic() is always preferred over kmap()/kmap_thread(). > kmap_atomic() is _much_ more lightweight since its TLB invalidation is > always CPU-local and never broadcast. > > So, basically, unless you *must* sleep while the mapping is in place, > kmap_atomic() is preferred. But kmap_atomic() disables preemption, so the _ideal_ interface would map it only locally, then on preemption make it global. I don't even know if that _can_ be done. But this email makes it seem like kmap_atomic() has no downsides.
On Mon, Oct 12, 2020 at 12:53:54PM -0700, Ira Weiny wrote: > On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote: > > On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote: > > > kmap_atomic() is always preferred over kmap()/kmap_thread(). > > > kmap_atomic() is _much_ more lightweight since its TLB invalidation is > > > always CPU-local and never broadcast. > > > > > > So, basically, unless you *must* sleep while the mapping is in place, > > > kmap_atomic() is preferred. > > > > But kmap_atomic() disables preemption, so the _ideal_ interface would map > > it only locally, then on preemption make it global. I don't even know > > if that _can_ be done. But this email makes it seem like kmap_atomic() > > has no downsides. > > And that is IIUC what Thomas was trying to solve. > > Also, Linus brought up that kmap_atomic() has quirks in nesting.[1] > > >From what I can see all of these discussions support the need to have something > between kmap() and kmap_atomic(). > > However, the reason behind converting call sites to kmap_thread() are different > between Thomas' patch set and mine. Both require more kmap granularity. > However, they do so with different reasons and underlying implementations but > with the _same_ resulting semantics; a thread local mapping which is > preemptable.[2] Therefore they each focus on changing different call sites. > > While this patch set is huge I think it serves a valuable purpose to identify a > large number of call sites which are candidates for this new semantic. Yes, I agree. My problem with this patch-set is that it ties it to some Intel feature that almost nobody cares about. Maybe we should care about it, but you didn't try very hard to make anyone care about it in the cover letter. For a future patch-set, I'd like to see you just introduce the new API. Then you can optimise the Intel implementation of it afterwards. Those patch-sets have entirely different reviewers.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d9e52a7f3702..ff72a45a577e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( static inline void f2fs_copy_page(struct page *src, struct page *dst) { - char *src_kaddr = kmap(src); - char *dst_kaddr = kmap(dst); + char *src_kaddr = kmap_thread(src); + char *dst_kaddr = kmap_thread(dst); memcpy(dst_kaddr, src_kaddr, PAGE_SIZE); - kunmap(dst); - kunmap(src); + kunmap_thread(dst); + kunmap_thread(src); } static inline void f2fs_put_page(struct page *page, int unlock)