diff mbox

[Xen-devel,RFC,21/22] xen/arm: p2m: Re-implement p2m_set_mem_access using p2m_{set, get}_entry

Message ID 1469717505-8026-22-git-send-email-julien.grall@arm.com
State Superseded
Headers show

Commit Message

Julien Grall July 28, 2016, 2:51 p.m. UTC
The function p2m_set_mem_access can be re-implemented using the generic
functions p2m_get_entry and __p2m_set_entry.

Note that because of the implementation of p2m_get_entry, a TLB
invalidation instruction will be issued for each 4KB page. Therefore the
performance of memaccess will be impacted, however the function is now
safe on all the processors.

Also the function apply_p2m_changes is dropped completely as it is not
unused anymore.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
Cc: Tamas K Lengyel <tamas@tklengyel.com>

---
    I have not ran any performance test with memaccess for now, but I
    expect an important and unavoidable impact because of how memaccess
    has been designed to workaround hardware limitation. Note that might
    be possible to re-work memaccess work on superpage but this should
    be done in a separate patch.
---
 xen/arch/arm/p2m.c | 329 +++++++----------------------------------------------
 1 file changed, 38 insertions(+), 291 deletions(-)

Comments

Julien Grall July 28, 2016, 3:16 p.m. UTC | #1
On 28/07/16 16:04, Razvan Cojocaru wrote:
> On 07/28/2016 05:51 PM, Julien Grall wrote:
>> The function p2m_set_mem_access can be re-implemented using the generic
>> functions p2m_get_entry and __p2m_set_entry.
>>
>> Note that because of the implementation of p2m_get_entry, a TLB
>> invalidation instruction will be issued for each 4KB page. Therefore the
>> performance of memaccess will be impacted, however the function is now
>> safe on all the processors.
>>
>> Also the function apply_p2m_changes is dropped completely as it is not
>> unused anymore.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> Cc: Tamas K Lengyel <tamas@tklengyel.com>
>>
>> ---
>>     I have not ran any performance test with memaccess for now, but I
>>     expect an important and unavoidable impact because of how memaccess
>>     has been designed to workaround hardware limitation. Note that might
>>     be possible to re-work memaccess work on superpage but this should
>>     be done in a separate patch.
>> ---
>>  xen/arch/arm/p2m.c | 329 +++++++----------------------------------------------
>>  1 file changed, 38 insertions(+), 291 deletions(-)
>
> Thanks for the CC!

Hi Razvan,

> This seems to only impact ARM, are there any planned changes for x86
> along these lines as well?

The break-before-make sequence is required by the ARM architecture. I 
don't know the x86 architecture, you can ask the x86 maintainers if this 
is something necessary.

Regards,
Julien Grall Aug. 1, 2016, 3:40 p.m. UTC | #2
Hi Razvan,

On 28/07/16 16:04, Razvan Cojocaru wrote:
> On 07/28/2016 05:51 PM, Julien Grall wrote:
>> The function p2m_set_mem_access can be re-implemented using the generic
>> functions p2m_get_entry and __p2m_set_entry.
>>
>> Note that because of the implementation of p2m_get_entry, a TLB
>> invalidation instruction will be issued for each 4KB page. Therefore the
>> performance of memaccess will be impacted, however the function is now
>> safe on all the processors.
>>
>> Also the function apply_p2m_changes is dropped completely as it is not
>> unused anymore.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> Cc: Tamas K Lengyel <tamas@tklengyel.com>
>>
>> ---
>>     I have not ran any performance test with memaccess for now, but I
>>     expect an important and unavoidable impact because of how memaccess
>>     has been designed to workaround hardware limitation. Note that might
>>     be possible to re-work memaccess work on superpage but this should
>>     be done in a separate patch.
>> ---
>>  xen/arch/arm/p2m.c | 329 +++++++----------------------------------------------
>>  1 file changed, 38 insertions(+), 291 deletions(-)
>
> Thanks for the CC!
>
> This seems to only impact ARM, are there any planned changes for x86
> along these lines as well?

Actually, it might be possible to remove the TLB for each 4KB entry in 
the memaccess case.

After I read again multiple time the ARM ARM (D4-1732 in ARM DDI 
0487A.i) and spoke with various ARM folks, changing the permission (i.e 
read, write, execute) does not require the break-before-make sequence.

However, I noticed a latent bug in the memaccess code when the 
permission restriction are removed/changed.

In the current implementation (i.e without this series), the TLB 
invalidation is deferred until the p2m is released. Until that time, a 
vCPU may still run with the previous permission and trap into the 
hypervisor the permission fault. However, as the permission as changed, 
p2m_memaccess_check may fail and we will inject an abort to the guest.

The problem is very similar to [1]. This will still be true with this 
series applied if I relaxed the use of the break-before-make sequence. 
The two ways I can see to fix this are either try again the instruction 
(we would trap again if the permission was not correct) or keep the 
break-before-make.

The former will be cleaner given than stage-2 permission fault can only 
happen because of memaccess for now. Any opinions?

Regards,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg03133.html
Julien Grall Aug. 1, 2016, 4:15 p.m. UTC | #3
On 01/08/16 16:59, Tamas K Lengyel wrote:
> On Mon, Aug 1, 2016 at 9:40 AM, Julien Grall <julien.grall@arm.com> wrote:
> IMHO we should just pause the domain while mem_access permissions are
> being changed. On x86 it is not necessary as the mem_access
> bookkeeping is kept in the ept pte entries but since on ARM the two
> things are disjoint it is required. Even on x86 though - while it's
> not strictly necessary - I think it would be a good idea to just pause
> the domain as from a user perspective it would be more intuitive then
> keeping the vCPUs running.

The problem is not because of the bookkeeping, but how the TLBs work. I 
never mentioned the radix tree in my previous mail because storing the 
access in the entry will not help here. The entry may have been updated 
but the TLBs not yet invalidated.

x86 does not seem to be affected because if the mem access check fail, 
the instruction is replayed (see hvm_hap_nested_page_fault). This is 
exactly the solution I suggested, which is IHMO the best.

I don't think pausing all the vCPUs of a domain is a good solution, the 
overhead will be too high for modifying only one page (see 
p2m_mem_access_check for instance).

Regards,
Julien Grall Aug. 1, 2016, 4:33 p.m. UTC | #4
On 01/08/16 17:27, Tamas K Lengyel wrote:
> On Mon, Aug 1, 2016 at 10:15 AM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 01/08/16 16:59, Tamas K Lengyel wrote:
>>>
>>> On Mon, Aug 1, 2016 at 9:40 AM, Julien Grall <julien.grall@arm.com> wrote:
>>> IMHO we should just pause the domain while mem_access permissions are
>>> being changed. On x86 it is not necessary as the mem_access
>>> bookkeeping is kept in the ept pte entries but since on ARM the two
>>> things are disjoint it is required. Even on x86 though - while it's
>>> not strictly necessary - I think it would be a good idea to just pause
>>> the domain as from a user perspective it would be more intuitive then
>>> keeping the vCPUs running.
>>
>>
>> The problem is not because of the bookkeeping, but how the TLBs work. I
>> never mentioned the radix tree in my previous mail because storing the
>> access in the entry will not help here. The entry may have been updated but
>> the TLBs not yet invalidated.
>>
>> x86 does not seem to be affected because if the mem access check fail, the
>> instruction is replayed (see hvm_hap_nested_page_fault). This is exactly the
>> solution I suggested, which is IHMO the best.
>
> I see. Yes, indeed that sounds like the route to take here.

I am looking forward to see a patch.

>>
>> I don't think pausing all the vCPUs of a domain is a good solution, the
>> overhead will be too high for modifying only one page (see
>> p2m_mem_access_check for instance).
>>
>
> IMHO the overhead of pausing the domain for setting mem_access
> permissions is not critical as it is done only on occasion. We can
> certainly leave the default behavior like this but I'll probably also
> introduce a new XENMEM_access_op_set_access_sync op that will pause
> the domain for the duration of the op.

Oh, you meant for a user app to set the memaccess. I guess it is fine. I 
mentioned p2m_mem_access_check because this function is supposed to be 
called often in the data/instruction abort handlers, so pausing a domain 
here would be a big overhead.

Regards,
Mark Rutland Aug. 1, 2016, 4:34 p.m. UTC | #5
Hi Julien, 

On Mon, Aug 01, 2016 at 04:40:50PM +0100, Julien Grall wrote:
> Hi Razvan,
> 
> On 28/07/16 16:04, Razvan Cojocaru wrote:
> >On 07/28/2016 05:51 PM, Julien Grall wrote:
> >>The function p2m_set_mem_access can be re-implemented using the generic
> >>functions p2m_get_entry and __p2m_set_entry.
> >>
> >>Note that because of the implementation of p2m_get_entry, a TLB
> >>invalidation instruction will be issued for each 4KB page. Therefore the
> >>performance of memaccess will be impacted, however the function is now
> >>safe on all the processors.
> >>
> >>Also the function apply_p2m_changes is dropped completely as it is not
> >>unused anymore.
> >>
> >>Signed-off-by: Julien Grall <julien.grall@arm.com>
> >>Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
> >>Cc: Tamas K Lengyel <tamas@tklengyel.com>
> >>
> >>---
> >>    I have not ran any performance test with memaccess for now, but I
> >>    expect an important and unavoidable impact because of how memaccess
> >>    has been designed to workaround hardware limitation. Note that might
> >>    be possible to re-work memaccess work on superpage but this should
> >>    be done in a separate patch.
> >>---
> >> xen/arch/arm/p2m.c | 329 +++++++----------------------------------------------
> >> 1 file changed, 38 insertions(+), 291 deletions(-)
> >
> >Thanks for the CC!
> >
> >This seems to only impact ARM, are there any planned changes for x86
> >along these lines as well?
> 
> Actually, it might be possible to remove the TLB for each 4KB entry
> in the memaccess case.

Could you clarify what you mean by "remove the TLB"? Do you mean a TLBI
instruction?

> After I read again multiple time the ARM ARM (D4-1732 in ARM DDI
> 0487A.i) and spoke with various ARM folks, changing the permission
> (i.e read, write, execute) does not require the break-before-make
> sequence.

Regardless of whether Break-Before-Make (BBM) is required, TLB
invalidation is still required to ensure the new permissions have taken
effect after *any* modification to page tables, unless:

* The prior entry was not permitted to be held in a TLB, and:
* No TLB held an entry for the address range.

> However, I noticed a latent bug in the memaccess code when the
> permission restriction are removed/changed.
> 
> In the current implementation (i.e without this series), the TLB
> invalidation is deferred until the p2m is released. Until that time,
> a vCPU may still run with the previous permission and trap into the
> hypervisor the permission fault. However, as the permission as
> changed, p2m_memaccess_check may fail and we will inject an abort to
> the guest.
> 
> The problem is very similar to [1]. This will still be true with
> this series applied if I relaxed the use of the break-before-make
> sequence. The two ways I can see to fix this are either try again
> the instruction (we would trap again if the permission was not
> correct) or keep the break-before-make.

Why can you not have TLB invalidation without the full BBM sequence?

Thanks,
Mark.
Julien Grall Aug. 1, 2016, 4:57 p.m. UTC | #6
On 01/08/16 17:34, Mark Rutland wrote:
> Hi Julien,

Hi Mark,

> On Mon, Aug 01, 2016 at 04:40:50PM +0100, Julien Grall wrote:
>> Hi Razvan,
>>
>> On 28/07/16 16:04, Razvan Cojocaru wrote:
>>> On 07/28/2016 05:51 PM, Julien Grall wrote:
>>>> The function p2m_set_mem_access can be re-implemented using the generic
>>>> functions p2m_get_entry and __p2m_set_entry.
>>>>
>>>> Note that because of the implementation of p2m_get_entry, a TLB
>>>> invalidation instruction will be issued for each 4KB page. Therefore the
>>>> performance of memaccess will be impacted, however the function is now
>>>> safe on all the processors.
>>>>
>>>> Also the function apply_p2m_changes is dropped completely as it is not
>>>> unused anymore.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
>>>> Cc: Tamas K Lengyel <tamas@tklengyel.com>
>>>>
>>>> ---
>>>>    I have not ran any performance test with memaccess for now, but I
>>>>    expect an important and unavoidable impact because of how memaccess
>>>>    has been designed to workaround hardware limitation. Note that might
>>>>    be possible to re-work memaccess work on superpage but this should
>>>>    be done in a separate patch.
>>>> ---
>>>> xen/arch/arm/p2m.c | 329 +++++++----------------------------------------------
>>>> 1 file changed, 38 insertions(+), 291 deletions(-)
>>>
>>> Thanks for the CC!
>>>
>>> This seems to only impact ARM, are there any planned changes for x86
>>> along these lines as well?
>>
>> Actually, it might be possible to remove the TLB for each 4KB entry
>> in the memaccess case.
>
> Could you clarify what you mean by "remove the TLB"? Do you mean a TLBI
> instruction?

I meant TLBI instruction sorry for the confusion.

>> After I read again multiple time the ARM ARM (D4-1732 in ARM DDI
>> 0487A.i) and spoke with various ARM folks, changing the permission
>> (i.e read, write, execute) does not require the break-before-make
>> sequence.
>
> Regardless of whether Break-Before-Make (BBM) is required, TLB
> invalidation is still required to ensure the new permissions have taken
> effect after *any* modification to page tables, unless:
>
> * The prior entry was not permitted to be held in a TLB, and:
> * No TLB held an entry for the address range.

Agreed, however we only need one TLBI instruction (assuming there is no 
superpage shattering) per-batch rather than one per-entry in this case.

>
>> However, I noticed a latent bug in the memaccess code when the
>> permission restriction are removed/changed.
>>
>> In the current implementation (i.e without this series), the TLB
>> invalidation is deferred until the p2m is released. Until that time,
>> a vCPU may still run with the previous permission and trap into the
>> hypervisor the permission fault. However, as the permission as
>> changed, p2m_memaccess_check may fail and we will inject an abort to
>> the guest.
>>
>> The problem is very similar to [1]. This will still be true with
>> this series applied if I relaxed the use of the break-before-make
>> sequence. The two ways I can see to fix this are either try again
>> the instruction (we would trap again if the permission was not
>> correct) or keep the break-before-make.
>
> Why can you not have TLB invalidation without the full BBM sequence?

I agree that in general TLBI instruction does not require the full BBM 
sequence. If we ever need the TLB invalidation per entry, it will be 
better to keep BBM to keep the code streamlined.

However, in this case I think it will be better to return to the guest 
and replay the instruction if a data/instruction abort has been received.

Regards,
Mark Rutland Aug. 1, 2016, 5:26 p.m. UTC | #7
On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote:
> On 01/08/16 17:34, Mark Rutland wrote:
> >On Mon, Aug 01, 2016 at 04:40:50PM +0100, Julien Grall wrote:
> >>After I read again multiple time the ARM ARM (D4-1732 in ARM DDI
> >>0487A.i) and spoke with various ARM folks, changing the permission
> >>(i.e read, write, execute) does not require the break-before-make
> >>sequence.
> >
> >Regardless of whether Break-Before-Make (BBM) is required, TLB
> >invalidation is still required to ensure the new permissions have taken
> >effect after *any* modification to page tables, unless:
> >
> >* The prior entry was not permitted to be held in a TLB, and:
> >* No TLB held an entry for the address range.
> 
> Agreed, however we only need one TLBI instruction (assuming there is
> no superpage shattering) per-batch rather than one per-entry in this
> case.

I got Cc'd to a reply without the original patch context, so I'm not
sure what the case is here. I'm not exactly sure what you mean by
"per-batch".

Assuming that you've (only) changed the permissions (i.e. the AP bits
and XN bits) of a number of stage-2 leaf entries, you need either:

* A single TLBI VMALLE1IS

  Note that this removes *all* stage-2 or combined stage-1&2 entries
  from TLBs, and thus is stronger than necessary.

* Per entry, a TLBI IPAS2LE1IS

  e.g. 

    for_each_entry(x)
      modify_ap_bits(x);
    dsb(ishst);
    tlbi(ipas2le1is);
    dsb(ish);

> >>In the current implementation (i.e without this series), the TLB
> >>invalidation is deferred until the p2m is released. Until that time,
> >>a vCPU may still run with the previous permission and trap into the
> >>hypervisor the permission fault. However, as the permission as
> >>changed, p2m_memaccess_check may fail and we will inject an abort to
> >>the guest.
> >>
> >>The problem is very similar to [1]. This will still be true with
> >>this series applied if I relaxed the use of the break-before-make
> >>sequence. The two ways I can see to fix this are either try again
> >>the instruction (we would trap again if the permission was not
> >>correct) or keep the break-before-make.
> >
> >Why can you not have TLB invalidation without the full BBM sequence?
> 
> I agree that in general TLBI instruction does not require the full
> BBM sequence. If we ever need the TLB invalidation per entry, it
> will be better to keep BBM to keep the code streamlined.

If this is not performance-critical, this sounds fine.

This does unnecessarily penalise some cases, though.

e.g. a guest only performing reads if write permission is removed from
pages. 

> However, in this case I think it will be better to return to the
> guest and replay the instruction if a data/instruction abort has
> been received.

That sounds like what we do in Linux.

On a fault, if the page tables are such that the fault should not occur,
we assume we raced with concurrent modification, and return to the
faulting instruction. Eventually (once the TLB maintenance is
completed), the guest will make forward progress.

We have locking around page table manipulation, but only have to take
them in the (hopefully) unlikely case of a race on the affected memory
area.

Thanks,
Mark.
Mark Rutland Aug. 1, 2016, 6:22 p.m. UTC | #8
Hi,

I realised I made a confusing mistake in my last reply; clarification below.

On Mon, Aug 01, 2016 at 06:26:54PM +0100, Mark Rutland wrote:
> On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote:
> > however we only need one TLBI instruction (assuming there is
> > no superpage shattering) per-batch rather than one per-entry in this
> > case.
> 
> I got Cc'd to a reply without the original patch context, so I'm not
> sure what the case is here. I'm not exactly sure what you mean by
> "per-batch".
> 
> Assuming that you've (only) changed the permissions (i.e. the AP bits
> and XN bits) of a number of stage-2 leaf entries, you need either:

[...]

> * Per entry, a TLBI IPAS2LE1IS
> 
>   e.g. 
> 
>     for_each_entry(x)
>       modify_ap_bits(x);
>     dsb(ishst);
>     tlbi(ipas2le1is);
>     dsb(ish);

Here I was trying to have the bare minimum barriers necessary, but in focussing
on that I failed to add the required loop to have a TLBI per entry.

The above should have been:

  for_each_entry(x)
    modify_ap_bits(x);
  dsb(ishst);
  for_each_entry(x)
    tlbi(ipas2le1is, x);
  dsb(ish);

Assuming the necessary bit fiddling for the TLBI to affect the IPA of x, with
the right VMID, etc.

Thanks,
Mark.
Julien Grall Aug. 2, 2016, 9:58 a.m. UTC | #9
On 01/08/2016 19:22, Mark Rutland wrote:
> Hi,

Hi Mark,

> I realised I made a confusing mistake in my last reply; clarification below.
>
> On Mon, Aug 01, 2016 at 06:26:54PM +0100, Mark Rutland wrote:
>> On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote:
>>> however we only need one TLBI instruction (assuming there is
>>> no superpage shattering) per-batch rather than one per-entry in this
>>> case.
>>
>> I got Cc'd to a reply without the original patch context, so I'm not
>> sure what the case is here. I'm not exactly sure what you mean by
>> "per-batch".

Sorry for that. I CCed in case I did not summarize correctly the 
conversation we had.

The page table handling code can be found in patch #18 [1].

>>
>> Assuming that you've (only) changed the permissions (i.e. the AP bits
>> and XN bits) of a number of stage-2 leaf entries, you need either:
>
> [...]
>
>> * Per entry, a TLBI IPAS2LE1IS
>>
>>   e.g.
>>
>>     for_each_entry(x)
>>       modify_ap_bits(x);
>>     dsb(ishst);
>>     tlbi(ipas2le1is);
>>     dsb(ish);
>
> Here I was trying to have the bare minimum barriers necessary, but in focussing
> on that I failed to add the required loop to have a TLBI per entry.
>
> The above should have been:
>
>   for_each_entry(x)
>     modify_ap_bits(x);
>   dsb(ishst);
>   for_each_entry(x)
>     tlbi(ipas2le1is, x);
>   dsb(ish);

I have a question related to this example. Is there a threshold where 
invalidate all the TLB entry for a given VMID/ASID is worth?

>
> Assuming the necessary bit fiddling for the TLBI to affect the IPA of x, with
> the right VMID, etc.


Regards,


[1] https://lists.xen.org/archives/html/xen-devel/2016-07/msg02966.html
Mark Rutland Aug. 2, 2016, 10:26 a.m. UTC | #10
On Tue, Aug 02, 2016 at 10:58:00AM +0100, Julien Grall wrote:
> On 01/08/2016 19:22, Mark Rutland wrote:
> >On Mon, Aug 01, 2016 at 06:26:54PM +0100, Mark Rutland wrote:
> >>On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote:
> >>>however we only need one TLBI instruction (assuming there is
> >>>no superpage shattering) per-batch rather than one per-entry in this
> >>>case.
> >>
> >>I got Cc'd to a reply without the original patch context, so I'm not
> >>sure what the case is here. I'm not exactly sure what you mean by
> >>"per-batch".
> 
> Sorry for that. I CCed in case I did not summarize correctly the
> conversation we had.
> 
> The page table handling code can be found in patch #18 [1].

If I've understood, you're asking if you can do a TLBI VMALLE1IS per
batch of leaf entry updates in p2m_set_entry?

As below, if only the AP and/or XN bits are changing, that should be
safe. If any other fields are being altered (inc. the output address,
even for intermediate entries), that may not be safe.

> >>Assuming that you've (only) changed the permissions (i.e. the AP bits
> >>and XN bits) of a number of stage-2 leaf entries, you need either:

> >>* Per entry, a TLBI IPAS2LE1IS
> >>
> >>  e.g.

> >  for_each_entry(x)
> >    modify_ap_bits(x);
> >  dsb(ishst);
> >  for_each_entry(x)
> >    tlbi(ipas2le1is, x);
> >  dsb(ish);
> 
> I have a question related to this example. Is there a threshold
> where invalidate all the TLB entry for a given VMID/ASID is worth?

Strictly speaking, "yes", but the value is going to depend on
implementation and workload, so there's no "good" value as such provided
by the architecture.

In Linux, we end up doing so in some cases to avoid softlockups. Look
for MAX_TLB_RANGE in arch/arm64/include/asm/tlbflush.h. There are some
more details in commit 05ac65305437e8ef ("arm64: fix soft lockup due to
large tlb flush range").

Thanks,
Mark.

> [1] https://lists.xen.org/archives/html/xen-devel/2016-07/msg02966.html
Julien Grall Sept. 7, 2016, 6:56 a.m. UTC | #11
Hi Ravzan,

On 06/09/2016 20:16, Razvan Cojocaru wrote:
> On 09/06/16 22:06, Stefano Stabellini wrote:
>> On Thu, 28 Jul 2016, Julien Grall wrote:
>>>> The function p2m_set_mem_access can be re-implemented using the generic
>>>> functions p2m_get_entry and __p2m_set_entry.
>>>>
>>>> Note that because of the implementation of p2m_get_entry, a TLB
>>>> invalidation instruction will be issued for each 4KB page. Therefore the
>>>> performance of memaccess will be impacted, however the function is now
>>>> safe on all the processors.
>>>>
>>>> Also the function apply_p2m_changes is dropped completely as it is not
>>>> unused anymore.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
>>>> Cc: Tamas K Lengyel <tamas@tklengyel.com>
>> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>
> How far is this patch from landing into staging? Considering the recent
> discussion about the patch I'm working on, this would certainly impact
> the upcoming ARM part of it.

I expect this to be in Xen 4.8. I also realized that without this patch 
it will be harder to implement the ARM side.

Given that I would be fine to write the ARM side once this series is out 
and your patch ready.

Regards,
diff mbox

Patch

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 707c7be..16ed393 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1081,295 +1081,6 @@  static int p2m_set_entry(struct p2m_domain *p2m,
     return rc;
 }
 
-#define P2M_ONE_DESCEND        0
-#define P2M_ONE_PROGRESS_NOP   0x1
-#define P2M_ONE_PROGRESS       0x10
-
-static int p2m_shatter_page(struct p2m_domain *p2m,
-                            lpae_t *entry,
-                            unsigned int level)
-{
-    const paddr_t level_shift = level_shifts[level];
-    int rc = p2m_create_table(p2m, entry, level_shift - PAGE_SHIFT);
-
-    if ( !rc )
-    {
-        p2m->stats.shattered[level]++;
-        p2m->stats.mappings[level]--;
-        p2m->stats.mappings[level+1] += LPAE_ENTRIES;
-    }
-
-    return rc;
-}
-
-/*
- * 0   == (P2M_ONE_DESCEND) continue to descend the tree
- * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush,
- *        entry, addr and maddr updated.  Return value is an
- *        indication of the amount of work done (for preemption).
- * -ve == (-Exxx) error.
- */
-static int apply_one_level(struct domain *d,
-                           lpae_t *entry,
-                           unsigned int level,
-                           enum p2m_operation op,
-                           paddr_t start_gpaddr,
-                           paddr_t end_gpaddr,
-                           paddr_t *addr,
-                           paddr_t *maddr,
-                           bool_t *flush,
-                           p2m_type_t t,
-                           p2m_access_t a)
-{
-    const paddr_t level_size = level_sizes[level];
-
-    struct p2m_domain *p2m = &d->arch.p2m;
-    lpae_t pte;
-    const lpae_t orig_pte = *entry;
-    int rc;
-
-    BUG_ON(level > 3);
-
-    switch ( op )
-    {
-    case MEMACCESS:
-        if ( level < 3 )
-        {
-            if ( !p2m_valid(orig_pte) )
-            {
-                *addr += level_size;
-                return P2M_ONE_PROGRESS_NOP;
-            }
-
-            /* Shatter large pages as we descend */
-            if ( p2m_mapping(orig_pte) )
-            {
-                rc = p2m_shatter_page(p2m, entry, level);
-                if ( rc < 0 )
-                    return rc;
-            } /* else: an existing table mapping -> descend */
-
-            return P2M_ONE_DESCEND;
-        }
-        else
-        {
-            pte = orig_pte;
-
-            if ( p2m_valid(pte) )
-            {
-                rc = p2m_mem_access_radix_set(p2m, _gfn(paddr_to_pfn(*addr)),
-                                              a);
-                if ( rc < 0 )
-                    return rc;
-
-                p2m_set_permission(&pte, pte.p2m.type, a);
-                p2m_write_pte(entry, pte, p2m->clean_pte);
-            }
-
-            *addr += level_size;
-            *flush = true;
-            return P2M_ONE_PROGRESS;
-        }
-    }
-
-    BUG(); /* Should never get here */
-}
-
-/*
- * The page is only used by the P2M code which is protected by the p2m->lock.
- * So we can avoid to use atomic helpers.
- */
-static void update_reference_mapping(struct page_info *page,
-                                     lpae_t old_entry,
-                                     lpae_t new_entry)
-{
-    if ( p2m_valid(old_entry) && !p2m_valid(new_entry) )
-        page->u.inuse.p2m_refcount--;
-    else if ( !p2m_valid(old_entry) && p2m_valid(new_entry) )
-        page->u.inuse.p2m_refcount++;
-}
-
-static int apply_p2m_changes(struct domain *d,
-                     enum p2m_operation op,
-                     gfn_t sgfn,
-                     unsigned long nr,
-                     mfn_t smfn,
-                     uint32_t mask,
-                     p2m_type_t t,
-                     p2m_access_t a)
-{
-    paddr_t start_gpaddr = pfn_to_paddr(gfn_x(sgfn));
-    paddr_t end_gpaddr = pfn_to_paddr(gfn_x(sgfn) + nr);
-    paddr_t maddr = pfn_to_paddr(mfn_x(smfn));
-    int rc, ret;
-    struct p2m_domain *p2m = &d->arch.p2m;
-    lpae_t *mappings[4] = { NULL, NULL, NULL, NULL };
-    struct page_info *pages[4] = { NULL, NULL, NULL, NULL };
-    paddr_t addr;
-    unsigned int level = 0;
-    unsigned int cur_root_table = ~0;
-    unsigned int cur_offset[4] = { ~0, ~0, ~0, ~0 };
-    unsigned int count = 0;
-    const unsigned int preempt_count_limit = (op == MEMACCESS) ? 1 : 0x2000;
-    const bool_t preempt = !is_idle_vcpu(current);
-    bool_t flush = false;
-    PAGE_LIST_HEAD(free_pages);
-    struct page_info *pg;
-
-    p2m_write_lock(p2m);
-
-    /* Static mapping. P2M_ROOT_PAGES > 1 are handled below */
-    if ( P2M_ROOT_PAGES == 1 )
-    {
-        mappings[P2M_ROOT_LEVEL] = __map_domain_page(p2m->root);
-        pages[P2M_ROOT_LEVEL] = p2m->root;
-    }
-
-    addr = start_gpaddr;
-    while ( addr < end_gpaddr )
-    {
-        int root_table;
-        const unsigned int offsets[4] = {
-            zeroeth_table_offset(addr),
-            first_table_offset(addr),
-            second_table_offset(addr),
-            third_table_offset(addr)
-        };
-
-        /*
-         * Check if current iteration should be possibly preempted.
-         * Since count is initialised to 0 above we are guaranteed to
-         * always make at least one pass as long as preempt_count_limit is
-         * initialized with a value >= 1.
-         */
-        if ( preempt && count >= preempt_count_limit
-             && hypercall_preempt_check() )
-        {
-            switch ( op )
-            {
-            case MEMACCESS:
-            {
-                /*
-                 * Preempt setting mem_access permissions as required by XSA-89,
-                 * if it's not the last iteration.
-                 */
-                uint32_t progress = paddr_to_pfn(addr) - gfn_x(sgfn) + 1;
-
-                if ( nr > progress && !(progress & mask) )
-                {
-                    rc = progress;
-                    goto out;
-                }
-                break;
-            }
-
-            default:
-                break;
-            };
-
-            /*
-             * Reset current iteration counter.
-             */
-            count = 0;
-        }
-
-        if ( P2M_ROOT_PAGES > 1 )
-        {
-            int i;
-            /*
-             * Concatenated root-level tables. The table number will be the
-             * offset at the previous level. It is not possible to concatenate
-             * a level-0 root.
-             */
-            ASSERT(P2M_ROOT_LEVEL > 0);
-            root_table = offsets[P2M_ROOT_LEVEL - 1];
-            if ( root_table >= P2M_ROOT_PAGES )
-            {
-                rc = -EINVAL;
-                goto out;
-            }
-
-            if ( cur_root_table != root_table )
-            {
-                if ( mappings[P2M_ROOT_LEVEL] )
-                    unmap_domain_page(mappings[P2M_ROOT_LEVEL]);
-                mappings[P2M_ROOT_LEVEL] =
-                    __map_domain_page(p2m->root + root_table);
-                pages[P2M_ROOT_LEVEL] = p2m->root + root_table;
-                cur_root_table = root_table;
-                /* Any mapping further down is now invalid */
-                for ( i = P2M_ROOT_LEVEL; i < 4; i++ )
-                    cur_offset[i] = ~0;
-            }
-        }
-
-        for ( level = P2M_ROOT_LEVEL; level < 4; level++ )
-        {
-            unsigned offset = offsets[level];
-            lpae_t *entry = &mappings[level][offset];
-            lpae_t old_entry = *entry;
-
-            ret = apply_one_level(d, entry,
-                                  level, op,
-                                  start_gpaddr, end_gpaddr,
-                                  &addr, &maddr, &flush,
-                                  t, a);
-            if ( ret < 0 ) { rc = ret ; goto out; }
-            count += ret;
-
-            if ( ret != P2M_ONE_PROGRESS_NOP )
-                update_reference_mapping(pages[level], old_entry, *entry);
-
-            /* L3 had better have done something! We cannot descend any further */
-            BUG_ON(level == 3 && ret == P2M_ONE_DESCEND);
-            if ( ret != P2M_ONE_DESCEND ) break;
-
-            BUG_ON(!p2m_valid(*entry));
-
-            if ( cur_offset[level] != offset )
-            {
-                /* Update mapping for next level */
-                int i;
-                if ( mappings[level+1] )
-                    unmap_domain_page(mappings[level+1]);
-                mappings[level+1] = map_domain_page(_mfn(entry->p2m.base));
-                pages[level+1] = mfn_to_page(entry->p2m.base);
-                cur_offset[level] = offset;
-                /* Any mapping further down is now invalid */
-                for ( i = level+1; i < 4; i++ )
-                    cur_offset[i] = ~0;
-            }
-            /* else: next level already valid */
-        }
-
-        BUG_ON(level > 3);
-    }
-
-    rc = 0;
-
-out:
-    if ( flush )
-    {
-        p2m_flush_tlb_sync(&d->arch.p2m);
-        ret = iommu_iotlb_flush(d, gfn_x(sgfn), nr);
-        if ( !rc )
-            rc = ret;
-    }
-
-    while ( (pg = page_list_remove_head(&free_pages)) )
-        free_domheap_page(pg);
-
-    for ( level = P2M_ROOT_LEVEL; level < 4; level ++ )
-    {
-        if ( mappings[level] )
-            unmap_domain_page(mappings[level]);
-    }
-
-    p2m_write_unlock(p2m);
-
-    return rc;
-}
-
 static inline int p2m_insert_mapping(struct domain *d,
                                      gfn_t start_gfn,
                                      unsigned long nr,
@@ -2069,6 +1780,7 @@  long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     p2m_access_t a;
+    unsigned int order;
     long rc = 0;
 
     static const p2m_access_t memaccess[] = {
@@ -2111,8 +1823,43 @@  long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
         return 0;
     }
 
-    rc = apply_p2m_changes(d, MEMACCESS, gfn_add(gfn, start),
-                           (nr - start), INVALID_MFN, mask, 0, a);
+    p2m_write_lock(p2m);
+
+    for ( gfn = gfn_add(gfn, start); nr > start; gfn = gfn_add(gfn, 1UL << order) )
+    {
+        p2m_type_t t;
+        mfn_t mfn = p2m_get_entry(p2m, gfn, &t, NULL, &order);
+
+        /* Skip hole */
+        if ( mfn_eq(mfn, INVALID_MFN) )
+        {
+            /*
+             * the order corresponds to the order of the mapping in the
+             * page table. so we need to align the gfn before
+             * incrementing.
+             */
+            gfn = _gfn(gfn_x(gfn) & ~((1UL << order) - 1));
+            continue;
+        }
+        else
+        {
+            order = 0;
+            rc = __p2m_set_entry(p2m, gfn, 0, mfn, t, a);
+            if ( rc )
+                break;
+        }
+
+        start += (1UL << order);
+        /* Check for continuation if it is not the last iteration */
+        if ( nr > start && !(start & mask) && hypercall_preempt_check() )
+        {
+            rc = start;
+            break;
+        }
+    }
+
+    p2m_write_unlock(p2m);
+
     if ( rc < 0 )
         return rc;
     else if ( rc > 0 )