Message ID | 20210702150555.2401722-1-ming.lei@redhat.com |
---|---|
Headers | show |
Series | blk-mq: fix blk_mq_alloc_request_hctx | expand |
On Fri, Jul 02, 2021 at 11:55:40AM -0400, Michael S. Tsirkin wrote: > On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote: > > blk-mq needs to know if the device uses managed irq, so add one field > > to virtio_device for recording if device uses managed irq. > > > > If the driver use managed irq, this flag has to be set so it can be > > passed to blk-mq. > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > Cc: Jason Wang <jasowang@redhat.com> > > Cc: virtualization@lists.linux-foundation.org > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > The API seems somewhat confusing. virtio does not request > a managed irq as such, does it? I think it's a decision taken > by the irq core. vp_request_msix_vectors(): if (desc) { flags |= PCI_IRQ_AFFINITY; desc->pre_vectors++; /* virtio config vector */ vdev->use_managed_irq = true; } err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors, nvectors, flags, desc); Managed irq is used once PCI_IRQ_AFFINITY is passed to pci_alloc_irq_vectors_affinity(). > > Any way to query the irq to find out if it's managed? So far the managed info isn't exported via /proc/irq, but if one irq is managed, its smp_affinity & smp_affinity_list attributes can't be changed successfully. If irq's debugfs is enabled, this info can be retrieved. Thanks, Ming
在 2021/7/2 下午11:05, Ming Lei 写道: > blk-mq needs to know if the device uses managed irq, so add one field > to virtio_device for recording if device uses managed irq. > > If the driver use managed irq, this flag has to be set so it can be > passed to blk-mq. > > Cc: "Michael S. Tsirkin"<mst@redhat.com> > Cc: Jason Wang<jasowang@redhat.com> > Cc:virtualization@lists.linux-foundation.org > Signed-off-by: Ming Lei<ming.lei@redhat.com> > --- > drivers/block/virtio_blk.c | 2 ++ > drivers/scsi/virtio_scsi.c | 1 + > drivers/virtio/virtio_pci_common.c | 1 + > include/linux/virtio.h | 1 + > 4 files changed, 5 insertions(+) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index e4bd3b1fc3c2..33b9c80ac475 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -764,6 +764,8 @@ static int virtblk_probe(struct virtio_device *vdev) > vblk->tag_set.queue_depth = queue_depth; > vblk->tag_set.numa_node = NUMA_NO_NODE; > vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; > + if (vdev->use_managed_irq) > + vblk->tag_set.flags |= BLK_MQ_F_MANAGED_IRQ; I'm not familiar with blk mq. But the name is kind of confusing, I guess "BLK_MQ_F_AFFINITY_MANAGED_IRQ" is better? (Consider we had "IRQD_AFFINITY_MANAGED") This helps me to differ this from the devres (device managed IRQ) at least. Thanks
On 02/07/2021 16:05, Ming Lei wrote: > blk-mq needs this information of using managed irq for improving > deactivating hctx, so add such flag to 'struct Scsi_Host', then > drivers can pass such flag to blk-mq via scsi_mq_setup_tags(). > > The rule is that driver has to tell blk-mq if managed irq is used. > > Signed-off-by: Ming Lei<ming.lei@redhat.com> As was said before, can we have something like this instead of relying on the LLDs to do the setting: --------->8------------ diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c index b595a94c4d16..2037a5b69fe1 100644 --- a/block/blk-mq-pci.c +++ b/block/blk-mq-pci.c @@ -37,7 +37,7 @@ int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev, for_each_cpu(cpu, mask) qmap->mq_map[cpu] = qmap->queue_offset + queue; } - + qmap->drain_hwq = 1; return 0; fallback: diff --git a/block/blk-mq.c b/block/blk-mq.c index 46fe5b45a8b8..7b610e680e08 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3408,7 +3408,8 @@ static int blk_mq_update_queue_map(struct blk_mq_tag_set *set) set->map[HCTX_TYPE_DEFAULT].nr_queues = set->nr_hw_queues; if (set->ops->map_queues && !is_kdump_kernel()) { - int i; + struct blk_mq_queue_map *qmap = &set->map[HCTX_TYPE_DEFAULT]; + int i, ret; /* * transport .map_queues is usually done in the following @@ -3427,7 +3428,12 @@ static int blk_mq_update_queue_map(struct blk_mq_tag_set *set) for (i = 0; i < set->nr_maps; i++) blk_mq_clear_mq_map(&set->map[i]); - return set->ops->map_queues(set); + ret = set->ops->map_queues(set); + if (ret) + return ret; + if (qmap->drain_hwq) + set->flags |= BLK_MQ_F_MANAGED_IRQ; + return 0; } else { BUG_ON(set->nr_maps > 1); return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 65ed99b3f2f0..2b2e71ff43e0 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -193,6 +193,7 @@ struct blk_mq_queue_map { unsigned int *mq_map; unsigned int nr_queues; unsigned int queue_offset; + bool drain_hwq; };
On Mon, Jul 05, 2021 at 10:25:38AM +0100, John Garry wrote: > On 02/07/2021 16:05, Ming Lei wrote: > > blk-mq needs this information of using managed irq for improving > > deactivating hctx, so add such flag to 'struct Scsi_Host', then > > drivers can pass such flag to blk-mq via scsi_mq_setup_tags(). > > > > The rule is that driver has to tell blk-mq if managed irq is used. > > > > Signed-off-by: Ming Lei<ming.lei@redhat.com> > > As was said before, can we have something like this instead of relying on > the LLDs to do the setting: > > --------->8------------ > > diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c > index b595a94c4d16..2037a5b69fe1 100644 > --- a/block/blk-mq-pci.c > +++ b/block/blk-mq-pci.c > @@ -37,7 +37,7 @@ int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, > struct pci_dev *pdev, > for_each_cpu(cpu, mask) > qmap->mq_map[cpu] = qmap->queue_offset + queue; > } > - > + qmap->drain_hwq = 1; The thing is that blk_mq_pci_map_queues() is allowed to be called for non-managed irqs. Also some managed irq consumers don't use blk_mq_pci_map_queues(). So this way just provides hint about managed irq uses, but we really need to get this flag set if driver uses managed irq. Thanks, Ming
On Mon, Jul 05, 2021 at 05:55:49PM +0800, Ming Lei wrote: > The thing is that blk_mq_pci_map_queues() is allowed to be called for > non-managed irqs. Also some managed irq consumers don't use blk_mq_pci_map_queues(). > > So this way just provides hint about managed irq uses, but we really > need to get this flag set if driver uses managed irq. blk_mq_pci_map_queues is absolutely intended to only be used by managed irqs. I wonder if we can enforce that somehow?
On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote: > blk-mq needs to know if the device uses managed irq, so add one field > to virtio_device for recording if device uses managed irq. > > If the driver use managed irq, this flag has to be set so it can be > passed to blk-mq. I don't think all this boilerplate code make a whole lot of sense. I think we need to record this information deep down in the irq code by setting a flag in struct device only if pci_alloc_irq_vectors_affinity atually managed to allocate multiple vectors and the PCI_IRQ_AFFINITY flag was set. Then blk-mq can look at that flag, and also check that more than one queue is in used and work based on that.
On Tue, Jul 06, 2021 at 07:37:19AM +0200, Christoph Hellwig wrote: > On Mon, Jul 05, 2021 at 05:55:49PM +0800, Ming Lei wrote: > > The thing is that blk_mq_pci_map_queues() is allowed to be called for > > non-managed irqs. Also some managed irq consumers don't use blk_mq_pci_map_queues(). > > > > So this way just provides hint about managed irq uses, but we really > > need to get this flag set if driver uses managed irq. > > blk_mq_pci_map_queues is absolutely intended to only be used by > managed irqs. I wonder if we can enforce that somehow? It may break some scsi drivers. And blk_mq_pci_map_queues() just calls pci_irq_get_affinity() to retrieve the irq's affinity, and the irq can be one non-managed irq, which affinity is set via either irq_set_affinity_hint() from kernel or /proc/irq/. Thanks, Ming
On Tue, Jul 06, 2021 at 07:42:03AM +0200, Christoph Hellwig wrote: > On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote: > > blk-mq needs to know if the device uses managed irq, so add one field > > to virtio_device for recording if device uses managed irq. > > > > If the driver use managed irq, this flag has to be set so it can be > > passed to blk-mq. > > I don't think all this boilerplate code make a whole lot of sense. > I think we need to record this information deep down in the irq code by > setting a flag in struct device only if pci_alloc_irq_vectors_affinity > atually managed to allocate multiple vectors and the PCI_IRQ_AFFINITY > flag was set. Then blk-mq can look at that flag, and also check that > more than one queue is in used and work based on that. How can blk-mq look at that flag? Usually blk-mq doesn't play with physical device(HBA). So far almost all physically properties(segment, max_hw_sectors, ...) are not provided to blk-mq directly, instead by queue limits abstract. Thanks, Ming
On 7/6/21 9:41 AM, Ming Lei wrote: > On Tue, Jul 06, 2021 at 07:37:19AM +0200, Christoph Hellwig wrote: >> On Mon, Jul 05, 2021 at 05:55:49PM +0800, Ming Lei wrote: >>> The thing is that blk_mq_pci_map_queues() is allowed to be called for >>> non-managed irqs. Also some managed irq consumers don't use blk_mq_pci_map_queues(). >>> >>> So this way just provides hint about managed irq uses, but we really >>> need to get this flag set if driver uses managed irq. >> >> blk_mq_pci_map_queues is absolutely intended to only be used by >> managed irqs. I wonder if we can enforce that somehow? > > It may break some scsi drivers. > > And blk_mq_pci_map_queues() just calls pci_irq_get_affinity() to > retrieve the irq's affinity, and the irq can be one non-managed irq, > which affinity is set via either irq_set_affinity_hint() from kernel > or /proc/irq/. > But that's static, right? IE blk_mq_pci_map_queues() will be called once during module init; so what happens if the user changes the mapping later on? How will that be transferred to the driver? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
On Tue, Jul 06 2021 at 07:42, Christoph Hellwig wrote: > On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote: >> blk-mq needs to know if the device uses managed irq, so add one field >> to virtio_device for recording if device uses managed irq. >> >> If the driver use managed irq, this flag has to be set so it can be >> passed to blk-mq. > > I don't think all this boilerplate code make a whole lot of sense. > I think we need to record this information deep down in the irq code by > setting a flag in struct device only if pci_alloc_irq_vectors_affinity > atually managed to allocate multiple vectors and the PCI_IRQ_AFFINITY > flag was set. Then blk-mq can look at that flag, and also check that > more than one queue is in used and work based on that. Ack.
On Wed, Jul 07, 2021 at 11:06:25AM +0200, Thomas Gleixner wrote: > On Tue, Jul 06 2021 at 07:42, Christoph Hellwig wrote: > > On Fri, Jul 02, 2021 at 11:05:54PM +0800, Ming Lei wrote: > >> blk-mq needs to know if the device uses managed irq, so add one field > >> to virtio_device for recording if device uses managed irq. > >> > >> If the driver use managed irq, this flag has to be set so it can be > >> passed to blk-mq. > > > > I don't think all this boilerplate code make a whole lot of sense. > > I think we need to record this information deep down in the irq code by > > setting a flag in struct device only if pci_alloc_irq_vectors_affinity > > atually managed to allocate multiple vectors and the PCI_IRQ_AFFINITY > > flag was set. Then blk-mq can look at that flag, and also check that > > more than one queue is in used and work based on that. > > Ack. The problem is that how blk-mq looks at that flag, since the device representing the controller which allocates irq vectors isn't visible to blk-mq. Usually blk-mq(block layer) provides queue limits abstract for drivers to tell any physical property(dma segment, max sectors, ...) to block layer, that is why this patch takes this very similar usage to check if HBA uses managed irq or not. Thanks, Ming
On Tue, Jul 06, 2021 at 12:32:27PM +0200, Hannes Reinecke wrote: > On 7/6/21 9:41 AM, Ming Lei wrote: > > On Tue, Jul 06, 2021 at 07:37:19AM +0200, Christoph Hellwig wrote: > > > On Mon, Jul 05, 2021 at 05:55:49PM +0800, Ming Lei wrote: > > > > The thing is that blk_mq_pci_map_queues() is allowed to be called for > > > > non-managed irqs. Also some managed irq consumers don't use blk_mq_pci_map_queues(). > > > > > > > > So this way just provides hint about managed irq uses, but we really > > > > need to get this flag set if driver uses managed irq. > > > > > > blk_mq_pci_map_queues is absolutely intended to only be used by > > > managed irqs. I wonder if we can enforce that somehow? > > > > It may break some scsi drivers. > > > > And blk_mq_pci_map_queues() just calls pci_irq_get_affinity() to > > retrieve the irq's affinity, and the irq can be one non-managed irq, > > which affinity is set via either irq_set_affinity_hint() from kernel > > or /proc/irq/. > > > But that's static, right? IE blk_mq_pci_map_queues() will be called once > during module init; so what happens if the user changes the mapping later > on? How will that be transferred to the driver? Yeah, that may not work well enough, but still works since non-managed irq supports migration. And there are several SCSI drivers which provide module parameter to enable/disable managed irq, meantime blk_mq_pci_map_queues() is always called for mapping queues. Thanks, Ming
On Wed, Jul 07, 2021 at 05:42:54PM +0800, Ming Lei wrote: > The problem is that how blk-mq looks at that flag, since the device > representing the controller which allocates irq vectors isn't visible > to blk-mq. In blk_mq_pci_map_queues and similar helpers.
On Wed, Jul 07, 2021 at 04:05:29PM +0200, Christoph Hellwig wrote: > On Wed, Jul 07, 2021 at 05:42:54PM +0800, Ming Lei wrote: > > The problem is that how blk-mq looks at that flag, since the device > > representing the controller which allocates irq vectors isn't visible > > to blk-mq. > > In blk_mq_pci_map_queues and similar helpers. Firstly it depends if drivers call into these helpers, so this way is fragile. Secondly, I think it isn't good to expose specific physical devices into blk-mq which shouldn't deal with physical device directly, also all the three helpers just duplicates same logic except for retrieving each vector's affinity from specific physical device. I will think about how to cleanup them. Thanks, Ming