mbox series

[v3,00/11] KVM: Mapping guest_memfd backed memory at the host for software protected VMs

Message ID 20250211121128.703390-1-tabba@google.com
Headers show
Series KVM: Mapping guest_memfd backed memory at the host for software protected VMs | expand

Message

Fuad Tabba Feb. 11, 2025, 12:11 p.m. UTC
Changes since v2 [1]:
- Added more documentation
- Hook the folio_put callback as a stub with a warning
- Tidying up and refactoring
- Rebased on Linux 6.14-rc2

The purpose of this series is to serve as a base for _restricted_
mmap() support for guest_memfd backed memory at the host [2]. It
would allow experimentation with what that support would be like
in the safe environment of the software VM types, which are meant
for testing and experimentation.

For more background and how to test this series, please refer to v2 [1].

Cheers,
/fuad

[1] https://lore.kernel.org/all/20250129172320.950523-1-tabba@google.com/
[2] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/

Fuad Tabba (11):
  mm: Consolidate freeing of typed folios on final folio_put()
  KVM: guest_memfd: Handle final folio_put() of guest_memfd pages
  KVM: guest_memfd: Allow host to map guest_memfd() pages
  KVM: guest_memfd: Add KVM capability to check if guest_memfd is shared
  KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed
    memory
  KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd
    shared memory
  KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  KVM: arm64: Handle guest_memfd()-backed guest page faults
  KVM: arm64: Introduce KVM_VM_TYPE_ARM_SW_PROTECTED machine type
  KVM: arm64: Enable mapping guest_memfd in arm64
  KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is
    allowed

 Documentation/virt/kvm/api.rst                |   5 +
 arch/arm64/include/asm/kvm_host.h             |  10 ++
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/arm.c                          |   5 +
 arch/arm64/kvm/mmu.c                          |  91 ++++++++++------
 arch/x86/include/asm/kvm_host.h               |   5 +
 arch/x86/kvm/Kconfig                          |   3 +-
 include/linux/kvm_host.h                      |  28 ++++-
 include/linux/page-flags.h                    |  32 ++++++
 include/uapi/linux/kvm.h                      |   7 ++
 mm/debug.c                                    |   1 +
 mm/swap.c                                     |  32 +++++-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  |  75 +++++++++++--
 tools/testing/selftests/kvm/lib/kvm_util.c    |   3 +-
 virt/kvm/Kconfig                              |   4 +
 virt/kvm/guest_memfd.c                        | 100 ++++++++++++++++++
 virt/kvm/kvm_main.c                           |   9 +-
 18 files changed, 360 insertions(+), 52 deletions(-)


base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3

Comments

Quentin Perret Feb. 11, 2025, 4:29 p.m. UTC | #1
On Tuesday 11 Feb 2025 at 16:17:25 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 11 Feb 2025 at 16:12, Quentin Perret <qperret@google.com> wrote:
> >
> > Hi Fuad,
> >
> > On Tuesday 11 Feb 2025 at 12:11:25 (+0000), Fuad Tabba wrote:
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 117937a895da..f155d3781e08 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -652,6 +652,12 @@ struct kvm_enable_cap {
> > >  #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK        0xffULL
> > >  #define KVM_VM_TYPE_ARM_IPA_SIZE(x)          \
> > >       ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> > > +
> > > +#define KVM_VM_TYPE_ARM_SW_PROTECTED (1UL << 9)
> >
> > FWIW, the downstream Android code has used bit 31 since forever
> > for that.
> >
> > Although I very much believe that upstream should not care about the
> > downstream mess in general, in this particular instance bit 9 really
> > isn't superior in any way, and there's a bunch of existing userspace
> > code that uses bit 31 today as we speak. It is very much Android's
> > problem to update these userspace programs if we do go with bit 9
> > upstream, but I don't really see how that would benefit upstream
> > either.
> >
> > So, given that there is no maintenance cost for upstream to use bit 31
> > instead of 9, I'd vote for using bit 31 and ease the landing with
> > existing userspace code, unless folks are really opinionated with this
> > stuff :)
> 
> My thinking is that this bit does _not_ mean pKVM. It means an
> experimental software VM that is similar to the x86
> KVM_X86_SW_PROTECTED_VM. Hence why I didn't choose bit 31.
> 
> From Documentation/virt/kvm/api.rst (for x86):
> 
> '''
> Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
> Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
> production.  The behavior and effective ABI for software-protected VMs is
> unstable.
> '''
> 
> which is similar to the documentation I added here.

Aha, I see, but are we going to allocate _another_ bit for protected VMs
proper once they're supported? Or just update the doc for the existing
bit? If the latter, then I guess this discussion can still happen :)

Thanks,
Quentin
Quentin Perret Feb. 11, 2025, 5:09 p.m. UTC | #2
Hi Patrick,

On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> I was hoping that SW_PROTECTED_VM will be the VM type that something
> like Firecracker could use, e.g. an interface to guest_memfd specifically
> _without_ pKVM, as Fuad was saying.

I had, probably incorrectly, assumed that we'd eventually want to allow
gmem for all VMs, including traditional KVM VMs that don't have anything
special. Perhaps the gmem support could be exposed via a KVM_CAP in this
case?

Anyway, no objection to the proposed approach in this patch assuming we
will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
bit 31 :).

Thanks,
Quentin
Fuad Tabba Feb. 13, 2025, 8:24 a.m. UTC | #3
Hi Peter,

On Wed, 12 Feb 2025 at 21:24, Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Feb 11, 2025 at 12:11:19PM +0000, Fuad Tabba wrote:
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 54e959e7d68f..4e759e8020c5 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> >  config HAVE_KVM_ARCH_GMEM_INVALIDATE
> >         bool
> >         depends on KVM_PRIVATE_MEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > +       select KVM_PRIVATE_MEM
> > +       bool
>
> No strong opinion here, but this might not be straightforward enough for
> any reader to know why a shared mem option will select a private mem..
>
> I wonder would it be clearer if we could have a config for gmem alone, and
> select that option no matter how gmem would be consumed.  Then the two
> options above could select it.
>
> I'm not sure whether there're too many guest-memfd stuff hard-coded to
> PRIVATE_MEM, actually that's what I hit myself both in qemu & kvm when I
> wanted to try guest-memfd on QEMU as purely shared (aka no conversions, no
> duplicated backends, but in-place).  So pretty much a pure question to ask
> here.

Yes, the whole thing with guest_memfd being initially called private
mem has left a few things like this, e.g., config options, function
names. It has caused (and will probably continue to cause) confusion.

In order not to blend bikeshedding over names and the patch series
adding mmap support (i.e., this one), I am planning on sending a
separate patch series to handle the name issue/

> The other thing is, currently guest-memfd binding only allows 1:1 binding
> to kvm memslots for a specific offset range of gmem, rather than being able
> to be mapped in multiple memslots:
>
> kvm_gmem_bind():
>         if (!xa_empty(&gmem->bindings) &&
>             xa_find(&gmem->bindings, &start, end - 1, XA_PRESENT)) {
>                 filemap_invalidate_unlock(inode->i_mapping);
>                 goto err;
>         }
>
> I didn't dig further yet, but I feel like this won't trivially work with
> things like SMRAM when in-place, which can map the same portion of a gmem
> range more than once.  I wonder if this is a hard limit for guest-memfd,
> and whether you hit anything similar when working on this series.

I haven't thought about this much, but it could be something to tackle later on.

Thank you,
/fuad

> Thanks,
>
> --
> Peter Xu
>

On Wed, 12 Feb 2025 at 21:24, Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Feb 11, 2025 at 12:11:19PM +0000, Fuad Tabba wrote:
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 54e959e7d68f..4e759e8020c5 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
> >  config HAVE_KVM_ARCH_GMEM_INVALIDATE
> >         bool
> >         depends on KVM_PRIVATE_MEM
> > +
> > +config KVM_GMEM_SHARED_MEM
> > +       select KVM_PRIVATE_MEM
> > +       bool
>
> No strong opinion here, but this might not be straightforward enough for
> any reader to know why a shared mem option will select a private mem..
>
> I wonder would it be clearer if we could have a config for gmem alone, and
> select that option no matter how gmem would be consumed.  Then the two
> options above could select it.
>
> I'm not sure whether there're too many guest-memfd stuff hard-coded to
> PRIVATE_MEM, actually that's what I hit myself both in qemu & kvm when I
> wanted to try guest-memfd on QEMU as purely shared (aka no conversions, no
> duplicated backends, but in-place).  So pretty much a pure question to ask
> here.
>
> The other thing is, currently guest-memfd binding only allows 1:1 binding
> to kvm memslots for a specific offset range of gmem, rather than being able
> to be mapped in multiple memslots:
>
> kvm_gmem_bind():
>         if (!xa_empty(&gmem->bindings) &&
>             xa_find(&gmem->bindings, &start, end - 1, XA_PRESENT)) {
>                 filemap_invalidate_unlock(inode->i_mapping);
>                 goto err;
>         }
>
> I didn't dig further yet, but I feel like this won't trivially work with
> things like SMRAM when in-place, which can map the same portion of a gmem
> range more than once.  I wonder if this is a hard limit for guest-memfd,
> and whether you hit anything similar when working on this series.
>
> Thanks,
>
> --
> Peter Xu
>
Quentin Perret Feb. 14, 2025, 11:13 a.m. UTC | #4
On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> Hi Patrick,
> 
> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> > I was hoping that SW_PROTECTED_VM will be the VM type that something
> > like Firecracker could use, e.g. an interface to guest_memfd specifically
> > _without_ pKVM, as Fuad was saying.
> 
> I had, probably incorrectly, assumed that we'd eventually want to allow
> gmem for all VMs, including traditional KVM VMs that don't have anything
> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> case?
> 
> Anyway, no objection to the proposed approach in this patch assuming we
> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> bit 31 :).

Thinking about this a bit deeper, I am still wondering what this new
SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
both guest-memfd backed memslots and traditional HVA-backed memslots, we
could just make normal KVM guests accept guest-memfd memslots and get
the same thing? Is there any reason not to do that instead? Even though
SW_PROTECTED VMs are documented as 'unstable', the reality is this is
UAPI and you can bet it will end up being relied upon, so I would prefer
to have a solid reason for introducing this new VM type.

Cheers,
Quentin
Fuad Tabba Feb. 14, 2025, 11:33 a.m. UTC | #5
Hi Quentin,

On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> > Hi Patrick,
> >
> > On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> > > I was hoping that SW_PROTECTED_VM will be the VM type that something
> > > like Firecracker could use, e.g. an interface to guest_memfd specifically
> > > _without_ pKVM, as Fuad was saying.
> >
> > I had, probably incorrectly, assumed that we'd eventually want to allow
> > gmem for all VMs, including traditional KVM VMs that don't have anything
> > special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> > case?
> >
> > Anyway, no objection to the proposed approach in this patch assuming we
> > will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> > bit 31 :).
>
> Thinking about this a bit deeper, I am still wondering what this new
> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
> both guest-memfd backed memslots and traditional HVA-backed memslots, we
> could just make normal KVM guests accept guest-memfd memslots and get
> the same thing? Is there any reason not to do that instead? Even though
> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
> UAPI and you can bet it will end up being relied upon, so I would prefer
> to have a solid reason for introducing this new VM type.

The more I think about it, I agree with you. I think that reasonable
behavior (for kvm/arm64) would be to allow using guest_memfd with all
VM types. If the VM type is a non-protected type, then its memory is
considered shared by default and is mappable --- as long as the
kconfig option is enabled. If VM is protected then the memory is not
shared by default.

What do you think Patrick? Do you need an explicit VM type?

Cheers,
/fuad

> Cheers,
> Quentin
Patrick Roy Feb. 14, 2025, 12:37 p.m. UTC | #6
On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
> Hi Quentin,
> 
> On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
>>
>> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
>>> Hi Patrick,
>>>
>>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
>>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
>>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
>>>> _without_ pKVM, as Fuad was saying.
>>>
>>> I had, probably incorrectly, assumed that we'd eventually want to allow
>>> gmem for all VMs, including traditional KVM VMs that don't have anything
>>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
>>> case?
>>>
>>> Anyway, no objection to the proposed approach in this patch assuming we
>>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
>>> bit 31 :).
>>
>> Thinking about this a bit deeper, I am still wondering what this new
>> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
>> both guest-memfd backed memslots and traditional HVA-backed memslots, we
>> could just make normal KVM guests accept guest-memfd memslots and get
>> the same thing? Is there any reason not to do that instead? Even though
>> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
>> UAPI and you can bet it will end up being relied upon, so I would prefer
>> to have a solid reason for introducing this new VM type.
> 
> The more I think about it, I agree with you. I think that reasonable
> behavior (for kvm/arm64) would be to allow using guest_memfd with all
> VM types. If the VM type is a non-protected type, then its memory is
> considered shared by default and is mappable --- as long as the
> kconfig option is enabled. If VM is protected then the memory is not
> shared by default.
> 
> What do you think Patrick? Do you need an explicit VM type?

Mhh, no, if "normal" VMs support guest_memfd, then that works too. I
suggested the VM type because that's how x86 works
(KVM_X86_SW_PROTECTED_VM), but never actually stopped to think about
whether it makes sense for ARM. Maybe Sean knows something we're missing?

I wonder whether having the "default sharedness" depend on the vm type
works out though - whether a range of gmem is shared or private is a
property of the guest_memfd instance, not the VM it's attached to, so I
guess the default behavior needs to be based solely on the guest_memfd
as well (and then if someone tries to attach a gmem to a VM whose desire
of protection doesnt match the guest_memfd's configuration, that
operation would fail)?

Tangentially related, does KVM_GMEM_SHARED to you mean "guest_memfd also
supports shared sections", or "guest_memfd does not support private
memory anymore"? (the difference being that in the former, then
KVM_GMEM_SHARED would later get the ability to convert ranges private,
and the EOPNOSUPP is just a transient state until conversion support is
merged) - doesnt matter for my usecase, but I got curious as some other
threads implied the second option to me and I ended up wondering why.

Best,
Patrick

> Cheers,
> /fuad
> 
>> Cheers,
>> Quentin
Fuad Tabba Feb. 14, 2025, 1:11 p.m. UTC | #7
Hi Patrick,

On Fri, 14 Feb 2025 at 12:37, Patrick Roy <roypat@amazon.co.uk> wrote:
>
>
>
> On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
> > Hi Quentin,
> >
> > On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
> >>
> >> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> >>> Hi Patrick,
> >>>
> >>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> >>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
> >>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
> >>>> _without_ pKVM, as Fuad was saying.
> >>>
> >>> I had, probably incorrectly, assumed that we'd eventually want to allow
> >>> gmem for all VMs, including traditional KVM VMs that don't have anything
> >>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> >>> case?
> >>>
> >>> Anyway, no objection to the proposed approach in this patch assuming we
> >>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> >>> bit 31 :).
> >>
> >> Thinking about this a bit deeper, I am still wondering what this new
> >> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
> >> both guest-memfd backed memslots and traditional HVA-backed memslots, we
> >> could just make normal KVM guests accept guest-memfd memslots and get
> >> the same thing? Is there any reason not to do that instead? Even though
> >> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
> >> UAPI and you can bet it will end up being relied upon, so I would prefer
> >> to have a solid reason for introducing this new VM type.
> >
> > The more I think about it, I agree with you. I think that reasonable
> > behavior (for kvm/arm64) would be to allow using guest_memfd with all
> > VM types. If the VM type is a non-protected type, then its memory is
> > considered shared by default and is mappable --- as long as the
> > kconfig option is enabled. If VM is protected then the memory is not
> > shared by default.
> >
> > What do you think Patrick? Do you need an explicit VM type?
>
> Mhh, no, if "normal" VMs support guest_memfd, then that works too. I
> suggested the VM type because that's how x86 works
> (KVM_X86_SW_PROTECTED_VM), but never actually stopped to think about
> whether it makes sense for ARM. Maybe Sean knows something we're missing?
>
> I wonder whether having the "default sharedness" depend on the vm type
> works out though - whether a range of gmem is shared or private is a
> property of the guest_memfd instance, not the VM it's attached to, so I
> guess the default behavior needs to be based solely on the guest_memfd
> as well (and then if someone tries to attach a gmem to a VM whose desire
> of protection doesnt match the guest_memfd's configuration, that
> operation would fail)?

Each guest_memfd is associated with a KVM instance. Although it could
migrate, it would be weird for a guest_memfd instance to migrate
between different types of VM, or at least, migrate between VMs that
have different confidentiality requirements.


> Tangentially related, does KVM_GMEM_SHARED to you mean "guest_memfd also
> supports shared sections", or "guest_memfd does not support private
> memory anymore"? (the difference being that in the former, then
> KVM_GMEM_SHARED would later get the ability to convert ranges private,
> and the EOPNOSUPP is just a transient state until conversion support is
> merged) - doesnt matter for my usecase, but I got curious as some other
> threads implied the second option to me and I ended up wondering why.

My thinking (and implementation in the other patch series) is that
KVM_GMEM_SHARED (back then called KVM_GMEM_MAPPABLE) allows sharing in
place/mapping, without adding restrictions.

Cheers,
/fuad

> Best,
> Patrick
>
> > Cheers,
> > /fuad
> >
> >> Cheers,
> >> Quentin
Patrick Roy Feb. 14, 2025, 1:18 p.m. UTC | #8
On Fri, 2025-02-14 at 13:11 +0000, Fuad Tabba wrote:
> Hi Patrick,
> 
> On Fri, 14 Feb 2025 at 12:37, Patrick Roy <roypat@amazon.co.uk> wrote:
>>
>>
>>
>> On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
>>> Hi Quentin,
>>>
>>> On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
>>>>
>>>> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
>>>>> Hi Patrick,
>>>>>
>>>>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
>>>>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
>>>>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
>>>>>> _without_ pKVM, as Fuad was saying.
>>>>>
>>>>> I had, probably incorrectly, assumed that we'd eventually want to allow
>>>>> gmem for all VMs, including traditional KVM VMs that don't have anything
>>>>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
>>>>> case?
>>>>>
>>>>> Anyway, no objection to the proposed approach in this patch assuming we
>>>>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
>>>>> bit 31 :).
>>>>
>>>> Thinking about this a bit deeper, I am still wondering what this new
>>>> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
>>>> both guest-memfd backed memslots and traditional HVA-backed memslots, we
>>>> could just make normal KVM guests accept guest-memfd memslots and get
>>>> the same thing? Is there any reason not to do that instead? Even though
>>>> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
>>>> UAPI and you can bet it will end up being relied upon, so I would prefer
>>>> to have a solid reason for introducing this new VM type.
>>>
>>> The more I think about it, I agree with you. I think that reasonable
>>> behavior (for kvm/arm64) would be to allow using guest_memfd with all
>>> VM types. If the VM type is a non-protected type, then its memory is
>>> considered shared by default and is mappable --- as long as the
>>> kconfig option is enabled. If VM is protected then the memory is not
>>> shared by default.
>>>
>>> What do you think Patrick? Do you need an explicit VM type?
>>
>> Mhh, no, if "normal" VMs support guest_memfd, then that works too. I
>> suggested the VM type because that's how x86 works
>> (KVM_X86_SW_PROTECTED_VM), but never actually stopped to think about
>> whether it makes sense for ARM. Maybe Sean knows something we're missing?
>>
>> I wonder whether having the "default sharedness" depend on the vm type
>> works out though - whether a range of gmem is shared or private is a
>> property of the guest_memfd instance, not the VM it's attached to, so I
>> guess the default behavior needs to be based solely on the guest_memfd
>> as well (and then if someone tries to attach a gmem to a VM whose desire
>> of protection doesnt match the guest_memfd's configuration, that
>> operation would fail)?
> 
> Each guest_memfd is associated with a KVM instance. Although it could
> migrate, it would be weird for a guest_memfd instance to migrate
> between different types of VM, or at least, migrate between VMs that
> have different confidentiality requirements.

Ahh, right, I keep forgetting that CREATE_GUEST_MEMFD() is a vm ioctl. My
bad, sorry!

>> Tangentially related, does KVM_GMEM_SHARED to you mean "guest_memfd also
>> supports shared sections", or "guest_memfd does not support private
>> memory anymore"? (the difference being that in the former, then
>> KVM_GMEM_SHARED would later get the ability to convert ranges private,
>> and the EOPNOSUPP is just a transient state until conversion support is
>> merged) - doesnt matter for my usecase, but I got curious as some other
>> threads implied the second option to me and I ended up wondering why.
> 
> My thinking (and implementation in the other patch series) is that
> KVM_GMEM_SHARED (back then called KVM_GMEM_MAPPABLE) allows sharing in
> place/mapping, without adding restrictions.

That makes sense to me, thanks for the explanation!

> Cheers,
> /fuad
> 
>> Best,
>> Patrick
>>
>>> Cheers,
>>> /fuad
>>>
>>>> Cheers,
>>>> Quentin
Sean Christopherson Feb. 14, 2025, 3:12 p.m. UTC | #9
On Fri, Feb 14, 2025, Patrick Roy wrote:
> On Fri, 2025-02-14 at 13:11 +0000, Fuad Tabba wrote:
> > On Fri, 14 Feb 2025 at 12:37, Patrick Roy <roypat@amazon.co.uk> wrote:
> >> On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
> >>> Hi Quentin,
> >>>
> >>> On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
> >>>>
> >>>> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> >>>>> Hi Patrick,
> >>>>>
> >>>>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> >>>>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
> >>>>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
> >>>>>> _without_ pKVM, as Fuad was saying.
> >>>>>
> >>>>> I had, probably incorrectly, assumed that we'd eventually want to allow
> >>>>> gmem for all VMs, including traditional KVM VMs that don't have anything
> >>>>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> >>>>> case?
> >>>>>
> >>>>> Anyway, no objection to the proposed approach in this patch assuming we
> >>>>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> >>>>> bit 31 :).
> >>>>
> >>>> Thinking about this a bit deeper, I am still wondering what this new
> >>>> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
> >>>> both guest-memfd backed memslots and traditional HVA-backed memslots, we
> >>>> could just make normal KVM guests accept guest-memfd memslots and get
> >>>> the same thing? Is there any reason not to do that instead?

Once guest_memfd can be mmap()'d, no.  KVM_X86_SW_PROTECTED_VM was added for
testing and development of guest_memfd largely because KVM can't support a "real"
VM if KVM can't read/write guest memory through its normal mechanisms.  The gap
is most apparent on x86, but it holds true for arm64 as well.

> >>>> Even though SW_PROTECTED VMs are documented as 'unstable', the reality
> >>>> is this is UAPI and you can bet it will end up being relied upon, so I
> >>>> would prefer to have a solid reason for introducing this new VM type.
> >>>
> >>> The more I think about it, I agree with you. I think that reasonable
> >>> behavior (for kvm/arm64) would be to allow using guest_memfd with all
> >>> VM types. If the VM type is a non-protected type, then its memory is
> >>> considered shared by default and is mappable --- as long as the
> >>> kconfig option is enabled. If VM is protected then the memory is not
> >>> shared by default.

This aligns with what I see happening for x86, except that for non-protected VMs
there will be no shared vs. private, because such VMs won't have a concept of
private memory.
Vlastimil Babka Feb. 17, 2025, 9:33 a.m. UTC | #10
On 2/11/25 13:11, Fuad Tabba wrote:
> Some folio types, such as hugetlb, handle freeing their own
> folios. Moreover, guest_memfd will require being notified once a
> folio's reference count reaches 0 to facilitate shared to private
> folio conversion, without the folio actually being freed at that
> point.
> 
> As a first step towards that, this patch consolidates freeing
> folios that have a type. The first user is hugetlb folios. Later
> in this patch series, guest_memfd will become the second user of
> this.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
David Hildenbrand Feb. 20, 2025, 11:17 a.m. UTC | #11
On 11.02.25 13:11, Fuad Tabba wrote:
> Some folio types, such as hugetlb, handle freeing their own
> folios. Moreover, guest_memfd will require being notified once a
> folio's reference count reaches 0 to facilitate shared to private
> folio conversion, without the folio actually being freed at that
> point.
> 
> As a first step towards that, this patch consolidates freeing
> folios that have a type. The first user is hugetlb folios. Later
> in this patch series, guest_memfd will become the second user of
> this.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>