Message ID | 20200929085550.30926-1-eric.auger@redhat.com |
---|---|
Headers | show |
Series | NVMe passthrough: Take into account host IOVA reserved regions | expand |
On Tue, Sep 29, 2020 at 10:55:50AM +0200, Eric Auger wrote: > diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c > index ba0ee6e21c..71145970f3 100644 > --- a/util/vfio-helpers.c > +++ b/util/vfio-helpers.c > @@ -667,6 +667,50 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s) > return true; > } > > +static int > +qemu_vfio_find_fixed_iova(QEMUVFIOState *s, size_t size, uint64_t *iova) > +{ > + int i; > + > + for (i = 0; i < s->nb_iova_ranges; i++) { > + if (s->usable_iova_ranges[i].end < s->low_water_mark) { > + continue; > + } > + s->low_water_mark = > + MAX(s->low_water_mark, s->usable_iova_ranges[i].start); > + > + if (s->usable_iova_ranges[i].end - s->low_water_mark + 1 >= size || > + s->usable_iova_ranges[i].end - s->low_water_mark + 1 == 0) { I don't understand the == 0 case. It seems like we are allocating an IOVA beyond usable_iova_ranges[i].end?
Hi Stefan, On 9/29/20 5:59 PM, Stefan Hajnoczi wrote: > On Tue, Sep 29, 2020 at 10:55:50AM +0200, Eric Auger wrote: >> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c >> index ba0ee6e21c..71145970f3 100644 >> --- a/util/vfio-helpers.c >> +++ b/util/vfio-helpers.c >> @@ -667,6 +667,50 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s) >> return true; >> } >> >> +static int >> +qemu_vfio_find_fixed_iova(QEMUVFIOState *s, size_t size, uint64_t *iova) >> +{ >> + int i; >> + >> + for (i = 0; i < s->nb_iova_ranges; i++) { >> + if (s->usable_iova_ranges[i].end < s->low_water_mark) { >> + continue; >> + } >> + s->low_water_mark = >> + MAX(s->low_water_mark, s->usable_iova_ranges[i].start); >> + >> + if (s->usable_iova_ranges[i].end - s->low_water_mark + 1 >= size || >> + s->usable_iova_ranges[i].end - s->low_water_mark + 1 == 0) { > > I don't understand the == 0 case. It seems like we are allocating an > IOVA beyond usable_iova_ranges[i].end?> It is meant to handle the case were low_water_mark = 0 and s->usable_iova_ranges[0].end = ULLONG_MAX (I know it cannot exist at the moment but may happen in the future) where we get an overflow. Given the if (s->usable_iova_ranges[i].end < s->low_water_mark) { continue; } I think this prevents us from allocating beyond usable_iova_ranges[i].end or do I miss something? Thanks Eric
On Tue, Sep 29, 2020 at 09:44:48PM +0200, Auger Eric wrote: > Hi Stefan, > > On 9/29/20 5:59 PM, Stefan Hajnoczi wrote: > > On Tue, Sep 29, 2020 at 10:55:50AM +0200, Eric Auger wrote: > >> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c > >> index ba0ee6e21c..71145970f3 100644 > >> --- a/util/vfio-helpers.c > >> +++ b/util/vfio-helpers.c > >> @@ -667,6 +667,50 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s) > >> return true; > >> } > >> > >> +static int > >> +qemu_vfio_find_fixed_iova(QEMUVFIOState *s, size_t size, uint64_t *iova) > >> +{ > >> + int i; > >> + > >> + for (i = 0; i < s->nb_iova_ranges; i++) { > >> + if (s->usable_iova_ranges[i].end < s->low_water_mark) { > >> + continue; > >> + } > >> + s->low_water_mark = > >> + MAX(s->low_water_mark, s->usable_iova_ranges[i].start); > >> + > >> + if (s->usable_iova_ranges[i].end - s->low_water_mark + 1 >= size || > >> + s->usable_iova_ranges[i].end - s->low_water_mark + 1 == 0) { > > > > I don't understand the == 0 case. It seems like we are allocating an > > IOVA beyond usable_iova_ranges[i].end?> > It is meant to handle the case were low_water_mark = 0 and > s->usable_iova_ranges[0].end = ULLONG_MAX (I know it cannot exist at the > moment but may happen in the future) where we get an overflow. Given the > if (s->usable_iova_ranges[i].end < s->low_water_mark) { > continue; > } > I think this prevents us from allocating beyond > usable_iova_ranges[i].end or do I miss something? Yes, you are right. Here are the constraints: e >= l j = max(l, s) e - j + 1 < s e - j + 1 == 0 Assume l >= s so we can replace j with l: e >= l e - l + 1 < s e - l + 1 == 0 The case I'm worried about is when the iova range cannot fit s bytes. The last condition is only true when e = l - 1, but this violates the first condition e >= l. So the problem scenario cannot occur. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On Tue, Sep 29, 2020 at 10:55:48AM +0200, Eric Auger wrote: > The current IOVA allocator allocates within the [0x10000, 1ULL << 39] > window, without paying attention to the host IOVA reserved regions. > This prevents NVMe passthtrough from working on ARM as the fixed > IOVAs rapidly grow up to the MSI reserved region [0x8000000, 0x8100000] > causing some VFIO MAP DMA failures. This series collects the usable > IOVA regions using VFIO GET_INFO (this requires the host to support > VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE) and rework the fixed and > temporary IOVA allocators to avoid those latter. > > For the time being we do not change the arbitrary min/max IOVAs. > In theory they could be dynamically determined but the kernel > currently fails to expose some HW limitations described in the ACPI > tables (such as PCI root complex Device Memory Address Size Limit). > See kernel thread related to "[RFC 0/3] iommu: Reserved regions for > IOVAs beyond dma_mask and iommu aperture" for more details: > https://lkml.org/lkml/2020/9/28/1102 > > Best Regards > > Eric > > This series can be found at: > https://github.com/eauger/qemu/tree/nvme_resv_v2 > > This was tested on ARM only. > > History: > v1 -> v2: > - remove "util/vfio-helpers: Dynamically compute the min/max IOVA" to > relax the kernel dependency > - Fix cabapbility enumeration loop > - set s->usable_iova_ranges=NULL to avoid double free > - handle possible u64 wrap > > Eric Auger (2): > util/vfio-helpers: Collect IOVA reserved regions > util/vfio-helpers: Rework the IOVA allocator to avoid IOVA reserved > regions > > util/vfio-helpers.c | 129 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 123 insertions(+), 6 deletions(-) > > -- > 2.21.3 > Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan