Message ID | 20230906073902.4229-2-adrian.hunter@intel.com |
---|---|
State | New |
Headers | show |
Series | Do not map unaccepted memory | expand |
On 9/6/23 00:39, Adrian Hunter wrote: > @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, > * pages without a reason. > */ > idx = srcu_read_lock(&vmcore_cb_srcu); > - if (!list_empty(&vmcore_cb_list)) > + if (!list_empty(&vmcore_cb_list) || > + range_contains_unaccepted_memory(paddr, paddr + size)) > ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot); > else > ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot); The whole callback mechanism which fs/proc/vmcore.c::pfn_is_ram() implements seems to be in place to ensure that there aren't a billion different "ram" checks in here. Is there a reason you can't register_vmcore_cb() a callback to check for unaccepted memory?
On 7/09/23 18:39, Dave Hansen wrote: > On 9/6/23 00:39, Adrian Hunter wrote: >> @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, >> * pages without a reason. >> */ >> idx = srcu_read_lock(&vmcore_cb_srcu); >> - if (!list_empty(&vmcore_cb_list)) >> + if (!list_empty(&vmcore_cb_list) || >> + range_contains_unaccepted_memory(paddr, paddr + size)) >> ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot); >> else >> ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot); > > The whole callback mechanism which fs/proc/vmcore.c::pfn_is_ram() > implements seems to be in place to ensure that there aren't a billion > different "ram" checks in here. > > Is there a reason you can't register_vmcore_cb() a callback to check for > unaccepted memory? Someone asked for the change to be in arch-independent code... ;-)
On 9/7/23 08:44, Adrian Hunter wrote: > On 7/09/23 18:39, Dave Hansen wrote: >> On 9/6/23 00:39, Adrian Hunter wrote: >>> @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, >>> * pages without a reason. >>> */ >>> idx = srcu_read_lock(&vmcore_cb_srcu); >>> - if (!list_empty(&vmcore_cb_list)) >>> + if (!list_empty(&vmcore_cb_list) || >>> + range_contains_unaccepted_memory(paddr, paddr + size)) >>> ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot); >>> else >>> ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot); >> The whole callback mechanism which fs/proc/vmcore.c::pfn_is_ram() >> implements seems to be in place to ensure that there aren't a billion >> different "ram" checks in here. >> >> Is there a reason you can't register_vmcore_cb() a callback to check for >> unaccepted memory? > Someone asked for the change to be in arch-independent code... 😉 That doesn't really answer my question. virtio_mem_init_kdump(), for instance, is in arch-independent code.
On 06.09.23 09:39, Adrian Hunter wrote: > Support for unaccepted memory was added recently, refer commit > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby > a virtual machine may need to accept memory before it can be used. > > Do not map unaccepted memory because it can cause the guest to fail. > > For /proc/vmcore, which is read-only, this means a read or mmap of > unaccepted memory will return zeros. Does a second (kdump) kernel that exposes /proc/vmcore reliably get access to the information whether memory of the first kernel is unaccepted (IOW, not its memory, but the memory of the first kernel it is supposed to expose via /proc/vmcore)? I recall there might be other kdump-related issues for TDX and friends to solve. Especially, which information the second kernel gets provided by the first kernel. So can this patch even be tested reasonably (IOW, get into a kdump kernel in an environment where the first kernel has unaccepted memory, and verify that unaccepted memory is handled accordingly? ... while kdump doing anything reasonable in such an environment at all?)
On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote: > On 06.09.23 09:39, Adrian Hunter wrote: > > Support for unaccepted memory was added recently, refer commit > > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby > > a virtual machine may need to accept memory before it can be used. > > > > Do not map unaccepted memory because it can cause the guest to fail. > > > > For /proc/vmcore, which is read-only, this means a read or mmap of > > unaccepted memory will return zeros. > > Does a second (kdump) kernel that exposes /proc/vmcore reliably get access > to the information whether memory of the first kernel is unaccepted (IOW, > not its memory, but the memory of the first kernel it is supposed to expose > via /proc/vmcore)? There are few patches in my queue to few related issue, but generally, yes, the information is available to the target kernel via EFI configuration table.
On 11.09.23 10:41, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote: >> On 06.09.23 09:39, Adrian Hunter wrote: >>> Support for unaccepted memory was added recently, refer commit >>> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby >>> a virtual machine may need to accept memory before it can be used. >>> >>> Do not map unaccepted memory because it can cause the guest to fail. >>> >>> For /proc/vmcore, which is read-only, this means a read or mmap of >>> unaccepted memory will return zeros. >> >> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access >> to the information whether memory of the first kernel is unaccepted (IOW, >> not its memory, but the memory of the first kernel it is supposed to expose >> via /proc/vmcore)? > > There are few patches in my queue to few related issue, but generally, > yes, the information is available to the target kernel via EFI > configuration table. I assume that table provided by the first kernel, and not read directly from HW, correct?
On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote: > On 11.09.23 10:41, Kirill A. Shutemov wrote: > > On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote: > > > On 06.09.23 09:39, Adrian Hunter wrote: > > > > Support for unaccepted memory was added recently, refer commit > > > > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby > > > > a virtual machine may need to accept memory before it can be used. > > > > > > > > Do not map unaccepted memory because it can cause the guest to fail. > > > > > > > > For /proc/vmcore, which is read-only, this means a read or mmap of > > > > unaccepted memory will return zeros. > > > > > > Does a second (kdump) kernel that exposes /proc/vmcore reliably get access > > > to the information whether memory of the first kernel is unaccepted (IOW, > > > not its memory, but the memory of the first kernel it is supposed to expose > > > via /proc/vmcore)? > > > > There are few patches in my queue to few related issue, but generally, > > yes, the information is available to the target kernel via EFI > > configuration table. > > I assume that table provided by the first kernel, and not read directly from > HW, correct? The table is constructed by the EFI stub in the first kernel based on EFI memory map.
On 11.09.23 11:27, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote: >> On 11.09.23 10:41, Kirill A. Shutemov wrote: >>> On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote: >>>> On 06.09.23 09:39, Adrian Hunter wrote: >>>>> Support for unaccepted memory was added recently, refer commit >>>>> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby >>>>> a virtual machine may need to accept memory before it can be used. >>>>> >>>>> Do not map unaccepted memory because it can cause the guest to fail. >>>>> >>>>> For /proc/vmcore, which is read-only, this means a read or mmap of >>>>> unaccepted memory will return zeros. >>>> >>>> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access >>>> to the information whether memory of the first kernel is unaccepted (IOW, >>>> not its memory, but the memory of the first kernel it is supposed to expose >>>> via /proc/vmcore)? >>> >>> There are few patches in my queue to few related issue, but generally, >>> yes, the information is available to the target kernel via EFI >>> configuration table. >> >> I assume that table provided by the first kernel, and not read directly from >> HW, correct? > > The table is constructed by the EFI stub in the first kernel based on EFI > memory map. > Okay, should work then once that's done by the first kernel. Maybe include this patch in your series?
On Mon, Sep 11, 2023 at 11:50:31AM +0200, David Hildenbrand wrote: > On 11.09.23 11:27, Kirill A. Shutemov wrote: > > On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote: > > > On 11.09.23 10:41, Kirill A. Shutemov wrote: > > > > On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote: > > > > > On 06.09.23 09:39, Adrian Hunter wrote: > > > > > > Support for unaccepted memory was added recently, refer commit > > > > > > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby > > > > > > a virtual machine may need to accept memory before it can be used. > > > > > > > > > > > > Do not map unaccepted memory because it can cause the guest to fail. > > > > > > > > > > > > For /proc/vmcore, which is read-only, this means a read or mmap of > > > > > > unaccepted memory will return zeros. > > > > > > > > > > Does a second (kdump) kernel that exposes /proc/vmcore reliably get access > > > > > to the information whether memory of the first kernel is unaccepted (IOW, > > > > > not its memory, but the memory of the first kernel it is supposed to expose > > > > > via /proc/vmcore)? > > > > > > > > There are few patches in my queue to few related issue, but generally, > > > > yes, the information is available to the target kernel via EFI > > > > configuration table. > > > > > > I assume that table provided by the first kernel, and not read directly from > > > HW, correct? > > > > The table is constructed by the EFI stub in the first kernel based on EFI > > memory map. > > > > Okay, should work then once that's done by the first kernel. > > Maybe include this patch in your series? Can do. But the other two patches are not related to kexec. Hm.
On 11.09.23 12:05, Kirill A. Shutemov wrote: > On Mon, Sep 11, 2023 at 11:50:31AM +0200, David Hildenbrand wrote: >> On 11.09.23 11:27, Kirill A. Shutemov wrote: >>> On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote: >>>> On 11.09.23 10:41, Kirill A. Shutemov wrote: >>>>> On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote: >>>>>> On 06.09.23 09:39, Adrian Hunter wrote: >>>>>>> Support for unaccepted memory was added recently, refer commit >>>>>>> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby >>>>>>> a virtual machine may need to accept memory before it can be used. >>>>>>> >>>>>>> Do not map unaccepted memory because it can cause the guest to fail. >>>>>>> >>>>>>> For /proc/vmcore, which is read-only, this means a read or mmap of >>>>>>> unaccepted memory will return zeros. >>>>>> >>>>>> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access >>>>>> to the information whether memory of the first kernel is unaccepted (IOW, >>>>>> not its memory, but the memory of the first kernel it is supposed to expose >>>>>> via /proc/vmcore)? >>>>> >>>>> There are few patches in my queue to few related issue, but generally, >>>>> yes, the information is available to the target kernel via EFI >>>>> configuration table. >>>> >>>> I assume that table provided by the first kernel, and not read directly from >>>> HW, correct? >>> >>> The table is constructed by the EFI stub in the first kernel based on EFI >>> memory map. >>> >> >> Okay, should work then once that's done by the first kernel. >> >> Maybe include this patch in your series? > > Can do. But the other two patches are not related to kexec. Hm. Yes, the others can go in separately. But this here really needs other kexec/kdump changes.
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 1fb213f379a5..a28da2033ce8 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -118,6 +118,13 @@ static bool pfn_is_ram(unsigned long pfn) return ret; } +static bool pfn_is_unaccepted_memory(unsigned long pfn) +{ + phys_addr_t paddr = pfn << PAGE_SHIFT; + + return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE); +} + static int open_vmcore(struct inode *inode, struct file *file) { spin_lock(&vmcore_cb_lock); @@ -150,7 +157,7 @@ ssize_t read_from_oldmem(struct iov_iter *iter, size_t count, nr_bytes = count; /* If pfn is not ram, return zeros for sparse dump files */ - if (!pfn_is_ram(pfn)) { + if (!pfn_is_ram(pfn) || pfn_is_unaccepted_memory(pfn)) { tmp = iov_iter_zero(nr_bytes, iter); } else { if (encrypted) @@ -511,7 +518,7 @@ static int remap_oldmem_pfn_checked(struct vm_area_struct *vma, pos_end = pfn + (size >> PAGE_SHIFT); for (pos = pos_start; pos < pos_end; ++pos) { - if (!pfn_is_ram(pos)) { + if (!pfn_is_ram(pos) || pfn_is_unaccepted_memory(pos)) { /* * We hit a page which is not ram. Remap the continuous * region between pos_start and pos-1 and replace @@ -552,6 +559,7 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, unsigned long from, unsigned long pfn, unsigned long size, pgprot_t prot) { + phys_addr_t paddr = pfn << PAGE_SHIFT; int ret, idx; /* @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma, * pages without a reason. */ idx = srcu_read_lock(&vmcore_cb_srcu); - if (!list_empty(&vmcore_cb_list)) + if (!list_empty(&vmcore_cb_list) || + range_contains_unaccepted_memory(paddr, paddr + size)) ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot); else ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
Support for unaccepted memory was added recently, refer commit dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby a virtual machine may need to accept memory before it can be used. Do not map unaccepted memory because it can cause the guest to fail. For /proc/vmcore, which is read-only, this means a read or mmap of unaccepted memory will return zeros. Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> --- fs/proc/vmcore.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)