Message ID | 20180319072726.4273-1-ard.biesheuvel@linaro.org |
---|---|
State | New |
Headers | show |
Series | irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis | expand |
On 19 March 2018 at 15:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Commit 4f2c7583e33e upstream. > > When struct its_device instances are created, the nr_ites member > will be set to a power of 2 that equals or exceeds the requested > number of MSIs passed to the msi_prepare() callback. At the same > time, the LPI map is allocated to be some multiple of 32 in size, > where the allocated size may be less than the requested size > depending on whether a contiguous range of sufficient size is > available in the global LPI bitmap. > > This may result in the situation where the nr_ites < nr_lpis, and > since nr_ites is what we program into the hardware when we map the > device, the additional LPIs will be non-functional. > > For bog standard hardware, this does not really matter. However, > in cases where ITS device IDs are shared between different PCIe > devices, we may end up allocating these additional LPIs without > taking into account that they don't actually work. > > So let's make nr_ites at least 32. This ensures that all allocated > LPIs are 'live', and that its_alloc_device_irq() will fail when > attempts are made to allocate MSIs beyond what was allocated in > the first place. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > [maz: updated comment] > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > [ardb: trivial tweak of unrelated context] > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- Please apply to v4.9 > drivers/irqchip/irq-gic-v3-its.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index acb9d250a905..ac15e5d5d9b2 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -684,7 +684,7 @@ static struct irq_chip its_irq_chip = { > * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations. > */ > #define IRQS_PER_CHUNK_SHIFT 5 > -#define IRQS_PER_CHUNK (1 << IRQS_PER_CHUNK_SHIFT) > +#define IRQS_PER_CHUNK (1UL << IRQS_PER_CHUNK_SHIFT) > > static unsigned long *lpi_bitmap; > static u32 lpi_chunks; > @@ -1320,11 +1320,10 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, > > dev = kzalloc(sizeof(*dev), GFP_KERNEL); > /* > - * At least one bit of EventID is being used, hence a minimum > - * of two entries. No, the architecture doesn't let you > - * express an ITT with a single entry. > + * We allocate at least one chunk worth of LPIs bet device, > + * and thus that many ITEs. The device may require less though. > */ > - nr_ites = max(2UL, roundup_pow_of_two(nvecs)); > + nr_ites = max(IRQS_PER_CHUNK, roundup_pow_of_two(nvecs)); > sz = nr_ites * its->ite_size; > sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; > itt = kzalloc(sz, GFP_KERNEL); > -- > 2.11.0 >
On Mon, Mar 19, 2018 at 03:28:40PM +0800, Ard Biesheuvel wrote: > On 19 March 2018 at 15:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > Commit 4f2c7583e33e upstream. > > > > When struct its_device instances are created, the nr_ites member > > will be set to a power of 2 that equals or exceeds the requested > > number of MSIs passed to the msi_prepare() callback. At the same > > time, the LPI map is allocated to be some multiple of 32 in size, > > where the allocated size may be less than the requested size > > depending on whether a contiguous range of sufficient size is > > available in the global LPI bitmap. > > > > This may result in the situation where the nr_ites < nr_lpis, and > > since nr_ites is what we program into the hardware when we map the > > device, the additional LPIs will be non-functional. > > > > For bog standard hardware, this does not really matter. However, > > in cases where ITS device IDs are shared between different PCIe > > devices, we may end up allocating these additional LPIs without > > taking into account that they don't actually work. > > > > So let's make nr_ites at least 32. This ensures that all allocated > > LPIs are 'live', and that its_alloc_device_irq() will fail when > > attempts are made to allocate MSIs beyond what was allocated in > > the first place. > > > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > [maz: updated comment] > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > > [ardb: trivial tweak of unrelated context] > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > --- > > Please apply to v4.9 What about 4.14.y and 4.15.y? Why only add it to one really old tree, you don't want people updating to a newer kernel and have a regression, right? thanks, greg k-h
On Mon, Mar 19, 2018 at 08:37:07AM +0100, Greg KH wrote: > On Mon, Mar 19, 2018 at 03:28:40PM +0800, Ard Biesheuvel wrote: > > On 19 March 2018 at 15:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > > Commit 4f2c7583e33e upstream. > > > > > > When struct its_device instances are created, the nr_ites member > > > will be set to a power of 2 that equals or exceeds the requested > > > number of MSIs passed to the msi_prepare() callback. At the same > > > time, the LPI map is allocated to be some multiple of 32 in size, > > > where the allocated size may be less than the requested size > > > depending on whether a contiguous range of sufficient size is > > > available in the global LPI bitmap. > > > > > > This may result in the situation where the nr_ites < nr_lpis, and > > > since nr_ites is what we program into the hardware when we map the > > > device, the additional LPIs will be non-functional. > > > > > > For bog standard hardware, this does not really matter. However, > > > in cases where ITS device IDs are shared between different PCIe > > > devices, we may end up allocating these additional LPIs without > > > taking into account that they don't actually work. > > > > > > So let's make nr_ites at least 32. This ensures that all allocated > > > LPIs are 'live', and that its_alloc_device_irq() will fail when > > > attempts are made to allocate MSIs beyond what was allocated in > > > the first place. > > > > > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > [maz: updated comment] > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > > > [ardb: trivial tweak of unrelated context] > > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > --- > > > > Please apply to v4.9 > > What about 4.14.y and 4.15.y? Why only add it to one really old tree, > you don't want people updating to a newer kernel and have a regression, > right? Ah, now I see your other email, sorry, nevermind...
On 19 March 2018 at 15:37, Greg KH <gregkh@linuxfoundation.org> wrote: > On Mon, Mar 19, 2018 at 08:37:07AM +0100, Greg KH wrote: >> On Mon, Mar 19, 2018 at 03:28:40PM +0800, Ard Biesheuvel wrote: >> > On 19 March 2018 at 15:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> > > Commit 4f2c7583e33e upstream. >> > > >> > > When struct its_device instances are created, the nr_ites member >> > > will be set to a power of 2 that equals or exceeds the requested >> > > number of MSIs passed to the msi_prepare() callback. At the same >> > > time, the LPI map is allocated to be some multiple of 32 in size, >> > > where the allocated size may be less than the requested size >> > > depending on whether a contiguous range of sufficient size is >> > > available in the global LPI bitmap. >> > > >> > > This may result in the situation where the nr_ites < nr_lpis, and >> > > since nr_ites is what we program into the hardware when we map the >> > > device, the additional LPIs will be non-functional. >> > > >> > > For bog standard hardware, this does not really matter. However, >> > > in cases where ITS device IDs are shared between different PCIe >> > > devices, we may end up allocating these additional LPIs without >> > > taking into account that they don't actually work. >> > > >> > > So let's make nr_ites at least 32. This ensures that all allocated >> > > LPIs are 'live', and that its_alloc_device_irq() will fail when >> > > attempts are made to allocate MSIs beyond what was allocated in >> > > the first place. >> > > >> > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> > > [maz: updated comment] >> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >> > > [ardb: trivial tweak of unrelated context] >> > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> > > --- >> > >> > Please apply to v4.9 >> >> What about 4.14.y and 4.15.y? Why only add it to one really old tree, >> you don't want people updating to a newer kernel and have a regression, >> right? > > Ah, now I see your other email, sorry, nevermind... No worries. Actually, this version applies cleanly to v4.4 as well, so could you please add it there as well?
On Mon, Mar 19, 2018 at 03:47:27PM +0800, Ard Biesheuvel wrote: > On 19 March 2018 at 15:37, Greg KH <gregkh@linuxfoundation.org> wrote: > > On Mon, Mar 19, 2018 at 08:37:07AM +0100, Greg KH wrote: > >> On Mon, Mar 19, 2018 at 03:28:40PM +0800, Ard Biesheuvel wrote: > >> > On 19 March 2018 at 15:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > >> > > Commit 4f2c7583e33e upstream. > >> > > > >> > > When struct its_device instances are created, the nr_ites member > >> > > will be set to a power of 2 that equals or exceeds the requested > >> > > number of MSIs passed to the msi_prepare() callback. At the same > >> > > time, the LPI map is allocated to be some multiple of 32 in size, > >> > > where the allocated size may be less than the requested size > >> > > depending on whether a contiguous range of sufficient size is > >> > > available in the global LPI bitmap. > >> > > > >> > > This may result in the situation where the nr_ites < nr_lpis, and > >> > > since nr_ites is what we program into the hardware when we map the > >> > > device, the additional LPIs will be non-functional. > >> > > > >> > > For bog standard hardware, this does not really matter. However, > >> > > in cases where ITS device IDs are shared between different PCIe > >> > > devices, we may end up allocating these additional LPIs without > >> > > taking into account that they don't actually work. > >> > > > >> > > So let's make nr_ites at least 32. This ensures that all allocated > >> > > LPIs are 'live', and that its_alloc_device_irq() will fail when > >> > > attempts are made to allocate MSIs beyond what was allocated in > >> > > the first place. > >> > > > >> > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >> > > [maz: updated comment] > >> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > >> > > [ardb: trivial tweak of unrelated context] > >> > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >> > > --- > >> > > >> > Please apply to v4.9 > >> > >> What about 4.14.y and 4.15.y? Why only add it to one really old tree, > >> you don't want people updating to a newer kernel and have a regression, > >> right? > > > > Ah, now I see your other email, sorry, nevermind... > > No worries. > > Actually, this version applies cleanly to v4.4 as well, so could you > please add it there as well? Now queued up everywhere, thanks. greg k-h
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index acb9d250a905..ac15e5d5d9b2 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -684,7 +684,7 @@ static struct irq_chip its_irq_chip = { * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations. */ #define IRQS_PER_CHUNK_SHIFT 5 -#define IRQS_PER_CHUNK (1 << IRQS_PER_CHUNK_SHIFT) +#define IRQS_PER_CHUNK (1UL << IRQS_PER_CHUNK_SHIFT) static unsigned long *lpi_bitmap; static u32 lpi_chunks; @@ -1320,11 +1320,10 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, dev = kzalloc(sizeof(*dev), GFP_KERNEL); /* - * At least one bit of EventID is being used, hence a minimum - * of two entries. No, the architecture doesn't let you - * express an ITT with a single entry. + * We allocate at least one chunk worth of LPIs bet device, + * and thus that many ITEs. The device may require less though. */ - nr_ites = max(2UL, roundup_pow_of_two(nvecs)); + nr_ites = max(IRQS_PER_CHUNK, roundup_pow_of_two(nvecs)); sz = nr_ites * its->ite_size; sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; itt = kzalloc(sz, GFP_KERNEL);