diff mbox series

[1/3] proc/vmcore: Do not map unaccepted memory

Message ID 20230906073902.4229-2-adrian.hunter@intel.com
State New
Headers show
Series Do not map unaccepted memory | expand

Commit Message

Adrian Hunter Sept. 6, 2023, 7:39 a.m. UTC
Support for unaccepted memory was added recently, refer commit
dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
a virtual machine may need to accept memory before it can be used.

Do not map unaccepted memory because it can cause the guest to fail.

For /proc/vmcore, which is read-only, this means a read or mmap of
unaccepted memory will return zeros.

Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 fs/proc/vmcore.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Dave Hansen Sept. 7, 2023, 3:39 p.m. UTC | #1
On 9/6/23 00:39, Adrian Hunter wrote:
> @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
>  	 * pages without a reason.
>  	 */
>  	idx = srcu_read_lock(&vmcore_cb_srcu);
> -	if (!list_empty(&vmcore_cb_list))
> +	if (!list_empty(&vmcore_cb_list) ||
> +	    range_contains_unaccepted_memory(paddr, paddr + size))
>  		ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
>  	else
>  		ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);

The whole callback mechanism which fs/proc/vmcore.c::pfn_is_ram()
implements seems to be in place to ensure that there aren't a billion
different "ram" checks in here.

Is there a reason you can't register_vmcore_cb() a callback to check for
unaccepted memory?
Adrian Hunter Sept. 7, 2023, 3:44 p.m. UTC | #2
On 7/09/23 18:39, Dave Hansen wrote:
> On 9/6/23 00:39, Adrian Hunter wrote:
>> @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
>>  	 * pages without a reason.
>>  	 */
>>  	idx = srcu_read_lock(&vmcore_cb_srcu);
>> -	if (!list_empty(&vmcore_cb_list))
>> +	if (!list_empty(&vmcore_cb_list) ||
>> +	    range_contains_unaccepted_memory(paddr, paddr + size))
>>  		ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
>>  	else
>>  		ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
> 
> The whole callback mechanism which fs/proc/vmcore.c::pfn_is_ram()
> implements seems to be in place to ensure that there aren't a billion
> different "ram" checks in here.
> 
> Is there a reason you can't register_vmcore_cb() a callback to check for
> unaccepted memory?

Someone asked for the change to be in arch-independent code... ;-)
Dave Hansen Sept. 7, 2023, 3:51 p.m. UTC | #3
On 9/7/23 08:44, Adrian Hunter wrote:
> On 7/09/23 18:39, Dave Hansen wrote:
>> On 9/6/23 00:39, Adrian Hunter wrote:
>>> @@ -559,7 +567,8 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
>>>  	 * pages without a reason.
>>>  	 */
>>>  	idx = srcu_read_lock(&vmcore_cb_srcu);
>>> -	if (!list_empty(&vmcore_cb_list))
>>> +	if (!list_empty(&vmcore_cb_list) ||
>>> +	    range_contains_unaccepted_memory(paddr, paddr + size))
>>>  		ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
>>>  	else
>>>  		ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
>> The whole callback mechanism which fs/proc/vmcore.c::pfn_is_ram()
>> implements seems to be in place to ensure that there aren't a billion
>> different "ram" checks in here.
>>
>> Is there a reason you can't register_vmcore_cb() a callback to check for
>> unaccepted memory?
> Someone asked for the change to be in arch-independent code... 😉

That doesn't really answer my question.  virtio_mem_init_kdump(), for
instance, is in arch-independent code.
David Hildenbrand Sept. 11, 2023, 8:03 a.m. UTC | #4
On 06.09.23 09:39, Adrian Hunter wrote:
> Support for unaccepted memory was added recently, refer commit
> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
> a virtual machine may need to accept memory before it can be used.
> 
> Do not map unaccepted memory because it can cause the guest to fail.
> 
> For /proc/vmcore, which is read-only, this means a read or mmap of
> unaccepted memory will return zeros.

Does a second (kdump) kernel that exposes /proc/vmcore reliably get 
access to the information whether memory of the first kernel is 
unaccepted (IOW, not its memory, but the memory of the first kernel it 
is supposed to expose via /proc/vmcore)?

I recall there might be other kdump-related issues for TDX and friends 
to solve. Especially, which information the second kernel gets provided 
by the first kernel.

So can this patch even be tested reasonably (IOW, get into a kdump 
kernel in an environment where the first kernel has unaccepted memory, 
and verify that unaccepted memory is handled accordingly? ... while 
kdump doing anything reasonable in such an environment at all?)
Kirill A. Shutemov Sept. 11, 2023, 8:41 a.m. UTC | #5
On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote:
> On 06.09.23 09:39, Adrian Hunter wrote:
> > Support for unaccepted memory was added recently, refer commit
> > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
> > a virtual machine may need to accept memory before it can be used.
> > 
> > Do not map unaccepted memory because it can cause the guest to fail.
> > 
> > For /proc/vmcore, which is read-only, this means a read or mmap of
> > unaccepted memory will return zeros.
> 
> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access
> to the information whether memory of the first kernel is unaccepted (IOW,
> not its memory, but the memory of the first kernel it is supposed to expose
> via /proc/vmcore)?

There are few patches in my queue to few related issue, but generally,
yes, the information is available to the target kernel via EFI
configuration table.
David Hildenbrand Sept. 11, 2023, 8:42 a.m. UTC | #6
On 11.09.23 10:41, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote:
>> On 06.09.23 09:39, Adrian Hunter wrote:
>>> Support for unaccepted memory was added recently, refer commit
>>> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
>>> a virtual machine may need to accept memory before it can be used.
>>>
>>> Do not map unaccepted memory because it can cause the guest to fail.
>>>
>>> For /proc/vmcore, which is read-only, this means a read or mmap of
>>> unaccepted memory will return zeros.
>>
>> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access
>> to the information whether memory of the first kernel is unaccepted (IOW,
>> not its memory, but the memory of the first kernel it is supposed to expose
>> via /proc/vmcore)?
> 
> There are few patches in my queue to few related issue, but generally,
> yes, the information is available to the target kernel via EFI
> configuration table.

I assume that table provided by the first kernel, and not read directly 
from HW, correct?
Kirill A. Shutemov Sept. 11, 2023, 9:27 a.m. UTC | #7
On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote:
> On 11.09.23 10:41, Kirill A. Shutemov wrote:
> > On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote:
> > > On 06.09.23 09:39, Adrian Hunter wrote:
> > > > Support for unaccepted memory was added recently, refer commit
> > > > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
> > > > a virtual machine may need to accept memory before it can be used.
> > > > 
> > > > Do not map unaccepted memory because it can cause the guest to fail.
> > > > 
> > > > For /proc/vmcore, which is read-only, this means a read or mmap of
> > > > unaccepted memory will return zeros.
> > > 
> > > Does a second (kdump) kernel that exposes /proc/vmcore reliably get access
> > > to the information whether memory of the first kernel is unaccepted (IOW,
> > > not its memory, but the memory of the first kernel it is supposed to expose
> > > via /proc/vmcore)?
> > 
> > There are few patches in my queue to few related issue, but generally,
> > yes, the information is available to the target kernel via EFI
> > configuration table.
> 
> I assume that table provided by the first kernel, and not read directly from
> HW, correct?

The table is constructed by the EFI stub in the first kernel based on EFI
memory map.
David Hildenbrand Sept. 11, 2023, 9:50 a.m. UTC | #8
On 11.09.23 11:27, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote:
>> On 11.09.23 10:41, Kirill A. Shutemov wrote:
>>> On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote:
>>>> On 06.09.23 09:39, Adrian Hunter wrote:
>>>>> Support for unaccepted memory was added recently, refer commit
>>>>> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
>>>>> a virtual machine may need to accept memory before it can be used.
>>>>>
>>>>> Do not map unaccepted memory because it can cause the guest to fail.
>>>>>
>>>>> For /proc/vmcore, which is read-only, this means a read or mmap of
>>>>> unaccepted memory will return zeros.
>>>>
>>>> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access
>>>> to the information whether memory of the first kernel is unaccepted (IOW,
>>>> not its memory, but the memory of the first kernel it is supposed to expose
>>>> via /proc/vmcore)?
>>>
>>> There are few patches in my queue to few related issue, but generally,
>>> yes, the information is available to the target kernel via EFI
>>> configuration table.
>>
>> I assume that table provided by the first kernel, and not read directly from
>> HW, correct?
> 
> The table is constructed by the EFI stub in the first kernel based on EFI
> memory map.
> 

Okay, should work then once that's done by the first kernel.

Maybe include this patch in your series?
Kirill A. Shutemov Sept. 11, 2023, 10:05 a.m. UTC | #9
On Mon, Sep 11, 2023 at 11:50:31AM +0200, David Hildenbrand wrote:
> On 11.09.23 11:27, Kirill A. Shutemov wrote:
> > On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote:
> > > On 11.09.23 10:41, Kirill A. Shutemov wrote:
> > > > On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote:
> > > > > On 06.09.23 09:39, Adrian Hunter wrote:
> > > > > > Support for unaccepted memory was added recently, refer commit
> > > > > > dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
> > > > > > a virtual machine may need to accept memory before it can be used.
> > > > > > 
> > > > > > Do not map unaccepted memory because it can cause the guest to fail.
> > > > > > 
> > > > > > For /proc/vmcore, which is read-only, this means a read or mmap of
> > > > > > unaccepted memory will return zeros.
> > > > > 
> > > > > Does a second (kdump) kernel that exposes /proc/vmcore reliably get access
> > > > > to the information whether memory of the first kernel is unaccepted (IOW,
> > > > > not its memory, but the memory of the first kernel it is supposed to expose
> > > > > via /proc/vmcore)?
> > > > 
> > > > There are few patches in my queue to few related issue, but generally,
> > > > yes, the information is available to the target kernel via EFI
> > > > configuration table.
> > > 
> > > I assume that table provided by the first kernel, and not read directly from
> > > HW, correct?
> > 
> > The table is constructed by the EFI stub in the first kernel based on EFI
> > memory map.
> > 
> 
> Okay, should work then once that's done by the first kernel.
> 
> Maybe include this patch in your series?

Can do. But the other two patches are not related to kexec. Hm.
David Hildenbrand Sept. 11, 2023, 2:33 p.m. UTC | #10
On 11.09.23 12:05, Kirill A. Shutemov wrote:
> On Mon, Sep 11, 2023 at 11:50:31AM +0200, David Hildenbrand wrote:
>> On 11.09.23 11:27, Kirill A. Shutemov wrote:
>>> On Mon, Sep 11, 2023 at 10:42:51AM +0200, David Hildenbrand wrote:
>>>> On 11.09.23 10:41, Kirill A. Shutemov wrote:
>>>>> On Mon, Sep 11, 2023 at 10:03:36AM +0200, David Hildenbrand wrote:
>>>>>> On 06.09.23 09:39, Adrian Hunter wrote:
>>>>>>> Support for unaccepted memory was added recently, refer commit
>>>>>>> dcdfdd40fa82 ("mm: Add support for unaccepted memory"), whereby
>>>>>>> a virtual machine may need to accept memory before it can be used.
>>>>>>>
>>>>>>> Do not map unaccepted memory because it can cause the guest to fail.
>>>>>>>
>>>>>>> For /proc/vmcore, which is read-only, this means a read or mmap of
>>>>>>> unaccepted memory will return zeros.
>>>>>>
>>>>>> Does a second (kdump) kernel that exposes /proc/vmcore reliably get access
>>>>>> to the information whether memory of the first kernel is unaccepted (IOW,
>>>>>> not its memory, but the memory of the first kernel it is supposed to expose
>>>>>> via /proc/vmcore)?
>>>>>
>>>>> There are few patches in my queue to few related issue, but generally,
>>>>> yes, the information is available to the target kernel via EFI
>>>>> configuration table.
>>>>
>>>> I assume that table provided by the first kernel, and not read directly from
>>>> HW, correct?
>>>
>>> The table is constructed by the EFI stub in the first kernel based on EFI
>>> memory map.
>>>
>>
>> Okay, should work then once that's done by the first kernel.
>>
>> Maybe include this patch in your series?
> 
> Can do. But the other two patches are not related to kexec. Hm.

Yes, the others can go in separately. But this here really needs other 
kexec/kdump changes.
diff mbox series

Patch

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 1fb213f379a5..a28da2033ce8 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -118,6 +118,13 @@  static bool pfn_is_ram(unsigned long pfn)
 	return ret;
 }
 
+static bool pfn_is_unaccepted_memory(unsigned long pfn)
+{
+	phys_addr_t paddr = pfn << PAGE_SHIFT;
+
+	return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE);
+}
+
 static int open_vmcore(struct inode *inode, struct file *file)
 {
 	spin_lock(&vmcore_cb_lock);
@@ -150,7 +157,7 @@  ssize_t read_from_oldmem(struct iov_iter *iter, size_t count,
 			nr_bytes = count;
 
 		/* If pfn is not ram, return zeros for sparse dump files */
-		if (!pfn_is_ram(pfn)) {
+		if (!pfn_is_ram(pfn) || pfn_is_unaccepted_memory(pfn)) {
 			tmp = iov_iter_zero(nr_bytes, iter);
 		} else {
 			if (encrypted)
@@ -511,7 +518,7 @@  static int remap_oldmem_pfn_checked(struct vm_area_struct *vma,
 	pos_end = pfn + (size >> PAGE_SHIFT);
 
 	for (pos = pos_start; pos < pos_end; ++pos) {
-		if (!pfn_is_ram(pos)) {
+		if (!pfn_is_ram(pos) || pfn_is_unaccepted_memory(pos)) {
 			/*
 			 * We hit a page which is not ram. Remap the continuous
 			 * region between pos_start and pos-1 and replace
@@ -552,6 +559,7 @@  static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
 			    unsigned long from, unsigned long pfn,
 			    unsigned long size, pgprot_t prot)
 {
+	phys_addr_t paddr = pfn << PAGE_SHIFT;
 	int ret, idx;
 
 	/*
@@ -559,7 +567,8 @@  static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
 	 * pages without a reason.
 	 */
 	idx = srcu_read_lock(&vmcore_cb_srcu);
-	if (!list_empty(&vmcore_cb_list))
+	if (!list_empty(&vmcore_cb_list) ||
+	    range_contains_unaccepted_memory(paddr, paddr + size))
 		ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
 	else
 		ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);