Message ID | 20250523161522.409504-4-mhklinux@outlook.com |
---|---|
State | New |
Headers | show |
Series | fbdev: Add deferred I/O support for contiguous kernel memory framebuffers | expand |
Hi, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.15-rc7 next-20250523] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/mhkelley58-gmail-com/mm-Export-vmf_insert_mixed_mkwrite/20250524-001707 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20250523161522.409504-4-mhklinux%40outlook.com patch subject: [PATCH v3 3/4] fbdev/deferred-io: Support contiguous kernel memory framebuffers config: arm-randconfig-001-20250524 (https://download.01.org/0day-ci/archive/20250524/202505241553.VSXoFOvX-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 7.5.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250524/202505241553.VSXoFOvX-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202505241553.VSXoFOvX-lkp@intel.com/ All errors (new ones prefixed by >>, old ones prefixed by <<): >> ERROR: modpost: "vmf_insert_pfn" [drivers/video/fbdev/core/fb.ko] undefined! >> ERROR: modpost: "vmf_insert_mixed_mkwrite" [drivers/video/fbdev/core/fb.ko] undefined!
On Fri, May 23, 2025 at 09:15:21AM -0700, mhkelley58@gmail.com wrote: > Commit 37b4837959cb ("video: deferred io with physically contiguous > memory") from the year 2008 purported to add support for contiguous > kernel memory framebuffers. The motivating device, sh_mobile_lcdcfb, uses > dma_alloc_coherent() to allocate framebuffer memory, which is likely to > use alloc_pages(). It's unclear to me how this commit actually worked at > the time, unless dma_alloc_coherent() was pulling from a CMA pool instead > of alloc_pages(). Or perhaps alloc_pages() worked differently or on the > arm32 architecture on which sh_mobile_lcdcfb is used. > > In any case, for x86 and arm64 today, commit 37b4837959cb9 is not > sufficient to support contiguous kernel memory framebuffers. The problem > can be seen with the hyperv_fb driver, which may allocate the framebuffer > memory using vmalloc() or alloc_pages(), depending on the configuration > of the Hyper-V guest VM (Gen 1 vs. Gen 2) and the size of the framebuffer. That langugage is far too nice. The existing users of fb_defio are all gravely broken because they violate the dma API restriction to not poke into the memory. You can't speculate what you get from dma_alloc_coherent and it can change behind you all the time. > Fix this limitation by adding defio support for contiguous kernel memory > framebuffers. A driver with a framebuffer allocated from contiguous > kernel memory must set the FBINFO_KMEMFB flag to indicate such. Honestly, the right thing is to invert the flag. What hypervs is doing here - take kernel memory in the direct mapping or from vmalloc is fine. What others drivers do it completely broken crap. So add a flag FBINFO_BROKEN_CRAP to maybe keep the guessing. Or just disable it because it's dangerous.
Hi Am 03.06.25 um 03:49 schrieb Michael Kelley: [...] >> Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a >> horrible hack. >> >> In another thread, you mention that you use PFN_SPECIAL to bypass the >> check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set? > The VMA has VM_PFNMAP set, not VM_MIXEDMAP. It seemed like > VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's > a wrong impression. vm_mixed_ok() does a thorough job of validating the > use of __vm_insert_mixed(), and since what I did was allowed, I thought > perhaps it was OK. Your feedback has set me straight, and that's what I > needed. :-) > > But the whole approach is moot with Alistair Popple's patch set that > eliminates pfn_t. Is there an existing mm API that will do mkwrite on a > special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed > it. If there's not one, I'll take a crack at adding it in the next version of my > patch set. What is the motivation behind this work? The driver or fbdev as a whole does not have much of a future anyway. I'd like to suggest removing hyperv_fb entirely in favor of hypervdrm? Best regards Thomas > > Michael
From: Thomas Zimmermann <tzimmermann@suse.de> Sent: Monday, June 2, 2025 11:25 PM > > Hi > > Am 03.06.25 um 03:49 schrieb Michael Kelley: > [...] > >> Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a > >> horrible hack. > >> > >> In another thread, you mention that you use PFN_SPECIAL to bypass the > >> check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set? > > The VMA has VM_PFNMAP set, not VM_MIXEDMAP. It seemed like > > VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's > > a wrong impression. vm_mixed_ok() does a thorough job of validating the > > use of __vm_insert_mixed(), and since what I did was allowed, I thought > > perhaps it was OK. Your feedback has set me straight, and that's what I > > needed. :-) > > > > But the whole approach is moot with Alistair Popple's patch set that > > eliminates pfn_t. Is there an existing mm API that will do mkwrite on a > > special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed > > it. If there's not one, I'll take a crack at adding it in the next version of my > > patch set. > > What is the motivation behind this work? The driver or fbdev as a whole > does not have much of a future anyway. > > I'd like to suggest removing hyperv_fb entirely in favor of hypervdrm? > Yes, I think that's the longer term direction. A couple months ago I had an email conversation with Saurabh Sengar from the Microsoft Linux team where he raised this idea. I think the Microsoft folks will need to drive the deprecation process, as they need to coordinate with the distro vendors who publish images for running on local Hyper-V and in the Azure cloud. And my understanding is that the Linux kernel process would want the driver to be available but marked "deprecated" for a year or so before it actually goes away. I do have some concerns about the maturity of the hyperv_drm driver "around the edges". For example, somebody just recently submitted a patch to flush output on panic. I have less familiarity hyperv_drm vs. hyperv_fb, so some of my concern is probably due to that. We might need to do review of hyperv_drm and see if there's anything else to deal with before hyperv_fb goes away. This all got started when I was looking at a problem with hyperv_fb, and I found several other related problems, some of which also existed in hyperv_drm. You've seen several small'ish fixes from me and Saurabh as a result, and this issue with mmap()'ing /dev/fb0 is the last one of that set. This fix is definitely a bit bigger, but it's the right fix. On the flip side, if we really get on a path to deprecate hyperv_fb, there are hack fixes for the mmap problem that are smaller and contained to hyperv_fb. I would be OK with a hack fix in that case. Michael
On Wed, Jun 04, 2025 at 10:12:45AM +0200, Thomas Zimmermann wrote: > Hi > > Am 03.06.25 um 19:50 schrieb Michael Kelley: > > From: Thomas Zimmermann <tzimmermann@suse.de> Sent: Monday, June 2, 2025 11:25 PM > > > Hi > > > > > > Am 03.06.25 um 03:49 schrieb Michael Kelley: > > > [...] > > > > > Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a > > > > > horrible hack. > > > > > > > > > > In another thread, you mention that you use PFN_SPECIAL to bypass the > > > > > check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set? > > > > The VMA has VM_PFNMAP set, not VM_MIXEDMAP. It seemed like > > > > VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's > > > > a wrong impression. vm_mixed_ok() does a thorough job of validating the > > > > use of __vm_insert_mixed(), and since what I did was allowed, I thought > > > > perhaps it was OK. Your feedback has set me straight, and that's what I > > > > needed. :-) > > > > > > > > But the whole approach is moot with Alistair Popple's patch set that > > > > eliminates pfn_t. Is there an existing mm API that will do mkwrite on a > > > > special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed > > > > it. If there's not one, I'll take a crack at adding it in the next version of my > > > > patch set. > > > What is the motivation behind this work? The driver or fbdev as a whole > > > does not have much of a future anyway. > > > > > > I'd like to suggest removing hyperv_fb entirely in favor of hypervdrm? > > > > > Yes, I think that's the longer term direction. A couple months ago I had an > > email conversation with Saurabh Sengar from the Microsoft Linux team where > > he raised this idea. I think the Microsoft folks will need to drive the deprecation > > process, as they need to coordinate with the distro vendors who publish > > images for running on local Hyper-V and in the Azure cloud. And my > > understanding is that the Linux kernel process would want the driver to > > be available but marked "deprecated" for a year or so before it actually > > goes away. > > We (DRM upstream) recently considered moving some fbdev drivers to > drivers/staging or marking them with !DRM if a DRM driver is available. > Hyverv_fb would be a candidate. > > At least at SUSE, we ship hypervdrm instead of hyperv_fb. This works well on > the various generations of the hyperv system. Much of our userspace would > not be able to use hyperv_fb anyway. Yeah investing into fbdev drivers, especially when some mm surgery seems needed, does not sound like a good idea to me overall. > > I do have some concerns about the maturity of the hyperv_drm driver > > "around the edges". For example, somebody just recently submitted a > > patch to flush output on panic. I have less familiarity hyperv_drm vs. > > hyperv_fb, so some of my concern is probably due to that. We might > > need to do review of hyperv_drm and see if there's anything else to > > deal with before hyperv_fb goes away. > > The panic output is a feature that we recently added to the kernel. It > allows a DRM driver to display a final error message in the case of a kernel > panic (think of blue screens on Windows). Drivers require a minimum of > support to make it work. That's what the hypervdrm patches were about. I'm also happy to help with any other issues and shortfalls of drm vs fbdev. There are some, but I thought it was mostly around some of the low bit color formats that really old devices want, and not anything that hyperv would need. Cheers, Sima > > Best regards > Thomas > > > > > This all got started when I was looking at a problem with hyperv_fb, > > and I found several other related problems, some of which also existed > > in hyperv_drm. You've seen several small'ish fixes from me and Saurabh > > as a result, and this issue with mmap()'ing /dev/fb0 is the last one of that > > set. This fix is definitely a bit bigger, but it's the right fix. On the flip side, > > if we really get on a path to deprecate hyperv_fb, there are hack fixes for > > the mmap problem that are smaller and contained to hyperv_fb. I would > > be OK with a hack fix in that case. > > > > Michael > > -- > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Frankenstrasse 146, 90461 Nuernberg, Germany > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman > HRB 36809 (AG Nuernberg) >
From: Michael Kelley <mhklinux@outlook.com> Sent: Tuesday, June 3, 2025 10:25 AM > > From: David Hildenbrand <david@redhat.com> Sent: Tuesday, June 3, 2025 12:55 AM > > > > On 03.06.25 03:49, Michael Kelley wrote: > > > From: David Hildenbrand <david@redhat.com> Sent: Monday, June 2, 2025 2:48 AM > > >> [snip] > > >>> @@ -182,20 +221,34 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long > > >>> } > > >>> > > >>> /* > > >>> - * We want the page to remain locked from ->page_mkwrite until > > >>> - * the PTE is marked dirty to avoid mapping_wrprotect_range() > > >>> - * being called before the PTE is updated, which would leave > > >>> - * the page ignored by defio. > > >>> - * Do this by locking the page here and informing the caller > > >>> - * about it with VM_FAULT_LOCKED. > > >>> + * The PTE must be marked writable before the defio deferred work runs > > >>> + * again and potentially marks the PTE write-protected. If the order > > >>> + * should be switched, the PTE would become writable without defio > > >>> + * tracking the page, leaving the page forever ignored by defio. > > >>> + * > > >>> + * For vmalloc() framebuffers, the associated struct page is locked > > >>> + * before releasing the defio lock. mm will later mark the PTE writaable > > >>> + * and release the struct page lock. The struct page lock prevents > > >>> + * the page from being prematurely being marked write-protected. > > >>> + * > > >>> + * For FBINFO_KMEMFB framebuffers, mm assumes there is no struct page, > > >>> + * so the PTE must be marked writable while the defio lock is held. > > >>> */ > > >>> - lock_page(pageref->page); > > >>> + if (info->flags & FBINFO_KMEMFB) { > > >>> + unsigned long pfn = page_to_pfn(pageref->page); > > >>> + > > >>> + ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, > > >>> + __pfn_to_pfn_t(pfn, PFN_SPECIAL)); > > >> > > >> Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a > > >> horrible hack. > > >> > > >> In another thread, you mention that you use PFN_SPECIAL to bypass the > > >> check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set? > > > > > > The VMA has VM_PFNMAP set, not VM_MIXEDMAP. It seemed like > > > VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's > > > a wrong impression. > > > > VM_PFNMAP: nothing is refcounted except anon pages > > > > VM_MIXEDMAP: anything with a "struct page" (pfn_valid()) is refcounted > > > > pte_special() is a way for GUP-fast to distinguish these refcounted (can > > GUP) from non-refcounted (camnnot GUP) pages mapped by PTEs without any > > locks or the VMA being available. > > > > Setting pte_special() in VM_MIXEDMAP on ptes that have a "struct page" > > (pfn_valid()) is likely very bogus. > > OK, good to know. > > > > > > vm_mixed_ok() does a thorough job of validating the > > > use of __vm_insert_mixed(), and since what I did was allowed, I thought > > > perhaps it was OK. Your feedback has set me straight, and that's what I > > > needed. :-) > > > > What exactly are you trying to achieve? :) > > > > If it's mapping a page with a "struct page" and *not* refcounting it, > > then vmf_insert_pfn() is the current way to achieve that in a VM_PFNMAP > > mapping. It will set pte_special() automatically for you. > > > > Yes, that's what I'm using to initially create the special PTE in the > .fault callback. > > > > > > > But the whole approach is moot with Alistair Popple's patch set that > > > eliminates pfn_t. Is there an existing mm API that will do mkwrite on a > > > special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed > > > it. If there's not one, I'll take a crack at adding it in the next version of my > > > patch set. > > > > I assume you'd want vmf_insert_pfn_mkwrite(), correct? Probably > > vmf_insert_pfn_prot() can be used by adding PAGE_WRITE to pgprot. (maybe > > :) ) > > Ok, I'll look at that more closely. The sequence is that the special > PTE gets created with vmf_insert_pfn(). Then when the page is first > written to, the .pfn_mkwrite callback is invoked by mm. The question > is the best way for that callback to mark the existing PTE as writable. > FWIW, vmf_insert_pfn_prot() won't work. It calls insert_pfn() with the "mkwrite" parameter set to 'false', in which case insert_pfn() does nothing if the PTE already exists. So I would need to create a new API that does appropriate validation for a VM_PFNMAP VMA, and then calls insert_pfn() with the "mkwrite" parameter set to 'true'. Michael
On 04.06.25 23:58, Michael Kelley wrote: > From: Michael Kelley <mhklinux@outlook.com> Sent: Tuesday, June 3, 2025 10:25 AM >> >> From: David Hildenbrand <david@redhat.com> Sent: Tuesday, June 3, 2025 12:55 AM >>> >>> On 03.06.25 03:49, Michael Kelley wrote: >>>> From: David Hildenbrand <david@redhat.com> Sent: Monday, June 2, 2025 2:48 AM >>>>> > > [snip] > >>>>>> @@ -182,20 +221,34 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long >>>>>> } >>>>>> >>>>>> /* >>>>>> - * We want the page to remain locked from ->page_mkwrite until >>>>>> - * the PTE is marked dirty to avoid mapping_wrprotect_range() >>>>>> - * being called before the PTE is updated, which would leave >>>>>> - * the page ignored by defio. >>>>>> - * Do this by locking the page here and informing the caller >>>>>> - * about it with VM_FAULT_LOCKED. >>>>>> + * The PTE must be marked writable before the defio deferred work runs >>>>>> + * again and potentially marks the PTE write-protected. If the order >>>>>> + * should be switched, the PTE would become writable without defio >>>>>> + * tracking the page, leaving the page forever ignored by defio. >>>>>> + * >>>>>> + * For vmalloc() framebuffers, the associated struct page is locked >>>>>> + * before releasing the defio lock. mm will later mark the PTE writaable >>>>>> + * and release the struct page lock. The struct page lock prevents >>>>>> + * the page from being prematurely being marked write-protected. >>>>>> + * >>>>>> + * For FBINFO_KMEMFB framebuffers, mm assumes there is no struct page, >>>>>> + * so the PTE must be marked writable while the defio lock is held. >>>>>> */ >>>>>> - lock_page(pageref->page); >>>>>> + if (info->flags & FBINFO_KMEMFB) { >>>>>> + unsigned long pfn = page_to_pfn(pageref->page); >>>>>> + >>>>>> + ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, >>>>>> + __pfn_to_pfn_t(pfn, PFN_SPECIAL)); >>>>> >>>>> Will the VMA have VM_PFNMAP or VM_MIXEDMAP set? PFN_SPECIAL is a >>>>> horrible hack. >>>>> >>>>> In another thread, you mention that you use PFN_SPECIAL to bypass the >>>>> check in vm_mixed_ok(), so VM_MIXEDMAP is likely not set? >>>> >>>> The VMA has VM_PFNMAP set, not VM_MIXEDMAP. It seemed like >>>> VM_MIXEDMAP is somewhat of a superset of VM_PFNMAP, but maybe that's >>>> a wrong impression. >>> >>> VM_PFNMAP: nothing is refcounted except anon pages >>> >>> VM_MIXEDMAP: anything with a "struct page" (pfn_valid()) is refcounted >>> >>> pte_special() is a way for GUP-fast to distinguish these refcounted (can >>> GUP) from non-refcounted (camnnot GUP) pages mapped by PTEs without any >>> locks or the VMA being available. >>> >>> Setting pte_special() in VM_MIXEDMAP on ptes that have a "struct page" >>> (pfn_valid()) is likely very bogus. >> >> OK, good to know. >> >>> >>>> vm_mixed_ok() does a thorough job of validating the >>>> use of __vm_insert_mixed(), and since what I did was allowed, I thought >>>> perhaps it was OK. Your feedback has set me straight, and that's what I >>>> needed. :-) >>> >>> What exactly are you trying to achieve? :) >>> >>> If it's mapping a page with a "struct page" and *not* refcounting it, >>> then vmf_insert_pfn() is the current way to achieve that in a VM_PFNMAP >>> mapping. It will set pte_special() automatically for you. >>> >> >> Yes, that's what I'm using to initially create the special PTE in the >> .fault callback. >> >>>> >>>> But the whole approach is moot with Alistair Popple's patch set that >>>> eliminates pfn_t. Is there an existing mm API that will do mkwrite on a >>>> special PTE in a VM_PFNMAP VMA? I didn't see one, but maybe I missed >>>> it. If there's not one, I'll take a crack at adding it in the next version of my >>>> patch set. >>> >>> I assume you'd want vmf_insert_pfn_mkwrite(), correct? Probably >>> vmf_insert_pfn_prot() can be used by adding PAGE_WRITE to pgprot. (maybe >>> :) ) >> >> Ok, I'll look at that more closely. The sequence is that the special >> PTE gets created with vmf_insert_pfn(). Then when the page is first >> written to, the .pfn_mkwrite callback is invoked by mm. The question >> is the best way for that callback to mark the existing PTE as writable. >> > > FWIW, vmf_insert_pfn_prot() won't work. It calls insert_pfn() with > the "mkwrite" parameter set to 'false', in which case insert_pfn() > does nothing if the PTE already exists. Ah, you are worried about the "already exists but is R/O case". > > So I would need to create a new API that does appropriate validation > for a VM_PFNMAP VMA, and then calls insert_pfn() with the "mkwrite" > parameter set to 'true'. IMHO, nothing would speak against vmf_insert_pfn_mkwrite(). Much better than using that "mixed" ... beauty of a function.
From: Thomas Zimmermann <tzimmermann@suse.de> Sent: Thursday, June 5, 2025 8:36 AM > > Hi > > Am 04.06.25 um 23:43 schrieb Michael Kelley: > [...] > > Nonetheless, there's an underlying issue. A main cause of the difference > > is the number of messages to Hyper-V to update dirty regions. With > > hyperv_fb using deferred I/O, the messages are limited 20/second, so > > the total number of messages to Hyper-V is about 480. But hyperv_drm > > appears to send 3 messages to Hyper-V for each line of output, or a total of > > about 3,000,000 messages (~90K/second). That's a lot of additional load > > on the Hyper-V host, and it adds the 10 seconds of additional elapsed > > time seen in the guest. There also this ugly output in dmesg because the > > ring buffer for sending messages to the Hyper-V host gets full -- Hyper-V > > doesn't always keep up, at least not on my local laptop where I'm > > testing: > > > > [12574.327615] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12574.327684] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12574.327760] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12574.327841] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12597.016128] hyperv_sendpacket: 6211 callbacks suppressed > > [12597.016133] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12597.016172] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12597.016220] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > [12597.016267] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] > *ERROR* Unable to send packet via vmbus; error -11 > > > > hyperv_drm could be fixed to not output the ugly messages, but there's > > still the underlying issue of overrunning the ring buffer, and excessively > > hammering on the host. If we could get hyperv_drm doing deferred I/O, I > > would feel much better about going full-on with deprecating hyperv_fb. > > I try to address the problem with the patches at > > https://lore.kernel.org/dri-devel/20250605152637.98493-1-tzimmermann@suse.de/ > > Testing and feedback is much appreciated. > Nice! I ran the same test case with your patches, and everything works well. The hyperv_drm numbers are now pretty much the same as the hyperv_fb numbers for both elapsed time and system CPU time -- within a few percent. For hyperv_drm, there's no longer a gap in the elapsed time and system CPU time. No errors due to the guest-to-host ring buffer being full. Total messages to Hyper-V for hyperv_drm are now a few hundred instead of 3M. The hyperv_drm message count is still a little higher than for hyperv_fb, presumably because the simulated vblank rate in hyperv_drm is higher than the 20 Hz rate used by hyperv_fb deferred I/O. But the overall numbers are small enough that the difference is in the noise. Question: what is the default value for the simulated vblank rate? Just curious ... Michael
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c index 4fc93f253e06..f8ae91a1c4df 100644 --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -8,11 +8,40 @@ * for more details. */ +/* + * Deferred I/O ("defio") allows framebuffers that are mmap()'ed to user space + * to batch user space writes into periodic updates to the underlying + * framebuffer hardware or other implementation (such as with a virtualized + * framebuffer in a VM). At each batch interval, a callback is invoked in the + * framebuffer's kernel driver, and the callback is supplied with a list of + * pages that have been modified in the preceding interval. The callback can + * use this information to update the framebuffer hardware as necessary. The + * batching can improve performance and reduce the overhead of updating the + * hardware. + * + * Defio is supported on framebuffers allocated using vmalloc() and allocated + * as contiguous kernel memory using alloc_pages() or kmalloc(). These + * memory allocations all have corresponding "struct page"s. Framebuffers + * allocated using dma_alloc_coherent() should not be used with defio. + * Such allocations should be treated as a black box owned by the DMA + * layer, and should not be deconstructed into individual pages as defio + * does. Framebuffers in MMIO space are *not* supported because MMIO space + * does not have corrresponding "struct page"s. + * + * For framebuffers allocated using vmalloc(), struct fb_info must have + * "screen_buffer" set to the vmalloc address of the framebuffer. For + * framebuffers allocated from contiguous kernel memory, FBINFO_KMEMFB must + * be set, and "fix.smem_start" must be set to the physical address of the + * frame buffer. In both cases, "fix.smem_len" must be set to the framebuffer + * size in bytes. + */ + #include <linux/module.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/string.h> #include <linux/mm.h> +#include <linux/pfn_t.h> #include <linux/vmalloc.h> #include <linux/delay.h> #include <linux/interrupt.h> @@ -37,7 +66,7 @@ static struct page *fb_deferred_io_get_page(struct fb_info *info, unsigned long else if (info->fix.smem_start) page = pfn_to_page((info->fix.smem_start + offs) >> PAGE_SHIFT); - if (page) + if (page && !(info->flags & FBINFO_KMEMFB)) get_page(page); return page; @@ -137,6 +166,15 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf) BUG_ON(!info->fbdefio->mapping); + if (info->flags & FBINFO_KMEMFB) + /* + * In this path, the VMA is marked VM_PFNMAP, so mm assumes + * there is no struct page associated with the page. The + * PFN must be directly inserted and the created PTE will be + * marked "special". + */ + return vmf_insert_pfn(vmf->vma, vmf->address, page_to_pfn(page)); + vmf->page = page; return 0; } @@ -163,13 +201,14 @@ EXPORT_SYMBOL_GPL(fb_deferred_io_fsync); /* * Adds a page to the dirty list. Call this from struct - * vm_operations_struct.page_mkwrite. + * vm_operations_struct.page_mkwrite or .pfn_mkwrite. */ -static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long offset, +static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, struct vm_fault *vmf, struct page *page) { struct fb_deferred_io *fbdefio = info->fbdefio; struct fb_deferred_io_pageref *pageref; + unsigned long offset = vmf->pgoff << PAGE_SHIFT; vm_fault_t ret; /* protect against the workqueue changing the page list */ @@ -182,20 +221,34 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long } /* - * We want the page to remain locked from ->page_mkwrite until - * the PTE is marked dirty to avoid mapping_wrprotect_range() - * being called before the PTE is updated, which would leave - * the page ignored by defio. - * Do this by locking the page here and informing the caller - * about it with VM_FAULT_LOCKED. + * The PTE must be marked writable before the defio deferred work runs + * again and potentially marks the PTE write-protected. If the order + * should be switched, the PTE would become writable without defio + * tracking the page, leaving the page forever ignored by defio. + * + * For vmalloc() framebuffers, the associated struct page is locked + * before releasing the defio lock. mm will later mark the PTE writaable + * and release the struct page lock. The struct page lock prevents + * the page from being prematurely being marked write-protected. + * + * For FBINFO_KMEMFB framebuffers, mm assumes there is no struct page, + * so the PTE must be marked writable while the defio lock is held. */ - lock_page(pageref->page); + if (info->flags & FBINFO_KMEMFB) { + unsigned long pfn = page_to_pfn(pageref->page); + + ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, + __pfn_to_pfn_t(pfn, PFN_SPECIAL)); + } else { + lock_page(pageref->page); + ret = VM_FAULT_LOCKED; + } mutex_unlock(&fbdefio->lock); /* come back after delay to process the deferred IO */ schedule_delayed_work(&info->deferred_work, fbdefio->delay); - return VM_FAULT_LOCKED; + return ret; err_mutex_unlock: mutex_unlock(&fbdefio->lock); @@ -207,10 +260,10 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long * @fb_info: The fbdev info structure * @vmf: The VM fault * - * This is a callback we get when userspace first tries to - * write to the page. We schedule a workqueue. That workqueue - * will eventually mkclean the touched pages and execute the - * deferred framebuffer IO. Then if userspace touches a page + * This is a callback we get when userspace first tries to write to a + * page. We schedule a workqueue. That workqueue will eventually do + * mapping_wrprotect_range() on the written pages and execute the + * deferred framebuffer IO. Then if userspace writes to a page * again, we repeat the same scheme. * * Returns: @@ -218,12 +271,11 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long */ static vm_fault_t fb_deferred_io_page_mkwrite(struct fb_info *info, struct vm_fault *vmf) { - unsigned long offset = vmf->pgoff << PAGE_SHIFT; struct page *page = vmf->page; file_update_time(vmf->vma->vm_file); - return fb_deferred_io_track_page(info, offset, page); + return fb_deferred_io_track_page(info, vmf, page); } /* vm_ops->page_mkwrite handler */ @@ -234,9 +286,25 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf) return fb_deferred_io_page_mkwrite(info, vmf); } +/* + * Similar to fb_deferred_io_mkwrite(), but for first writes to pages + * in VMAs that have VM_PFNMAP set. + */ +static vm_fault_t fb_deferred_io_pfn_mkwrite(struct vm_fault *vmf) +{ + struct fb_info *info = vmf->vma->vm_private_data; + unsigned long offset = vmf->pgoff << PAGE_SHIFT; + struct page *page = phys_to_page(info->fix.smem_start + offset); + + file_update_time(vmf->vma->vm_file); + + return fb_deferred_io_track_page(info, vmf, page); +} + static const struct vm_operations_struct fb_deferred_io_vm_ops = { .fault = fb_deferred_io_fault, .page_mkwrite = fb_deferred_io_mkwrite, + .pfn_mkwrite = fb_deferred_io_pfn_mkwrite, }; static const struct address_space_operations fb_deferred_io_aops = { @@ -246,11 +314,31 @@ static const struct address_space_operations fb_deferred_io_aops = { int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma) { vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot); + vm_flags_t flags = VM_DONTEXPAND | VM_DONTDUMP; vma->vm_ops = &fb_deferred_io_vm_ops; - vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP); - if (!(info->flags & FBINFO_VIRTFB)) - vm_flags_set(vma, VM_IO); + if (info->flags & FBINFO_KMEMFB) { + /* + * I/O fault path calls vmf_insert_pfn(), which bug checks + * if the vma is not marked shared. mmap'ing the framebuffer + * as PRIVATE doesn't really make sense anyway, though doing + * so isn't harmful for vmalloc() framebuffers. So there's + * no prohibition for that case. + */ + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + /* + * Set VM_PFNMAP so mm code will not try to manage the pages' + * lifecycles. We don't want individual pages to be freed + * based on refcount. Instead the memory must be returned to + * the free pool in the usual way. Cf. the implementation of + * remap_pfn_range() and remap_pfn_range_internal(). + */ + flags |= VM_PFNMAP | VM_IO; + } else if (!(info->flags & FBINFO_VIRTFB)) { + flags |= VM_IO; + } + vm_flags_set(vma, flags); vma->vm_private_data = info; return 0; }