Message ID | 1537801594-207139-1-git-send-email-john.garry@huawei.com |
---|---|
Headers | show |
Series | hisi_sas: Misc bugfixes and an optimisation patch | expand |
On Wed, Oct 10, 2018 at 09:58:25PM -0400, Martin K. Petersen wrote: > > This is because the IPTT index must be a unique value per HBA. However, > > if we switched to SCSI MQ, the block layer tag becomes unique per queue, > > and not per HBA. > > That doesn't sound right. blk-mq tags are always per-host (which has actually caused problems for ATA, which is now using its own per-device tags).
On 11/10/2018 11:15, Christoph Hellwig wrote: > On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote: >> >>> blk-mq tags are always per-host (which has actually caused problems for >>> ATA, which is now using its own per-device tags). >>> >> >> So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues = >> 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant >> on queue count? > > Yes, if can_queue is 2048 you will gets tags from 0..2047. > I should be clear about some things before discussing this further. Our device has 16 hw queues. And each command we send to any queue in the device must have a unique tag across all hw queues for that device, and should be in the range [0, 2048) - it's called an IPTT. So Scsi_host.can_queue = 2048. However today we only expose a single queue to upper layer (for unrelated LLDD error handling restriction). We hope to expose all 16 queues in future, which is what I meant by "enabling SCSI MQ in the driver". However, with 6/7, this creates a problem, below. > IFF you device needs different tags for different queues it can use > the blk_mq_unique_tag heper to generate unique global tag. So this helper can't help, as fundamentially the issue is "the tag field in struct request is unique per hardware queue but not all all hw queues". Indeed blk_mq_unique_tag() does give a unique global tag, but cannot be used for the IPTT. OTOH, We could expose 16 queues to upper layer, and drop 6/7, but we found it performs worse. > > But unless you actuall have multiple hardware queues that latter part > is rather irrelevant to start with. > > . > Thanks, John
On Thu, Oct 11, 2018 at 02:12:11PM +0100, John Garry wrote: > On 11/10/2018 11:15, Christoph Hellwig wrote: > > On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote: > > > > > > > blk-mq tags are always per-host (which has actually caused problems for > > > > ATA, which is now using its own per-device tags). > > > > > > > > > > So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues = > > > 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant > > > on queue count? > > > > Yes, if can_queue is 2048 you will gets tags from 0..2047. > > > > I should be clear about some things before discussing this further. Our > device has 16 hw queues. And each command we send to any queue in the device > must have a unique tag across all hw queues for that device, and should be > in the range [0, 2048) - it's called an IPTT. So Scsi_host.can_queue = 2048. Could you describe a bit about IPTT? Looks like the 16 hw queues are like reply queues in other drivers, such as megara_sas, but given all the 16 reply queues share one tagset, so the hw queue number has to be 1 from blk-mq's view. > > However today we only expose a single queue to upper layer (for unrelated > LLDD error handling restriction). We hope to expose all 16 queues in future, > which is what I meant by "enabling SCSI MQ in the driver". However, with > 6/7, this creates a problem, below. If the tag of each request from all hw queues has to be unique, you can't expose all 16 queues. > > > IFF you device needs different tags for different queues it can use > > the blk_mq_unique_tag heper to generate unique global tag. > > So this helper can't help, as fundamentially the issue is "the tag field in > struct request is unique per hardware queue but not all all hw queues". > Indeed blk_mq_unique_tag() does give a unique global tag, but cannot be used > for the IPTT. > > OTOH, We could expose 16 queues to upper layer, and drop 6/7, but we found > it performs worse. We discussed this issue before, but not found a good solution yet for exposing multiple hw queues to blk-mq. However, we still get good performance in case of none scheduler by the following patches: 8824f62246be blk-mq: fail the request in case issue failure 6ce3dd6eec11 blk-mq: issue directly if hw queue isn't busy in case of 'none' Thanks, Ming
On 11/10/2018 14:32, Ming Lei wrote: > On Thu, Oct 11, 2018 at 02:12:11PM +0100, John Garry wrote: >> On 11/10/2018 11:15, Christoph Hellwig wrote: >>> On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote: >>>> >>>>> blk-mq tags are always per-host (which has actually caused problems for >>>>> ATA, which is now using its own per-device tags). >>>>> >>>> >>>> So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues = >>>> 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant >>>> on queue count? >>> >>> Yes, if can_queue is 2048 you will gets tags from 0..2047. >>> >> >> I should be clear about some things before discussing this further. Our >> device has 16 hw queues. And each command we send to any queue in the device >> must have a unique tag across all hw queues for that device, and should be >> in the range [0, 2048) - it's called an IPTT. So Scsi_host.can_queue = 2048. > > Could you describe a bit about IPTT? > IPTT is an "Initiator Tag". It is a tag to map to the context of a hw queue command. It is related to SAS protocol Initiator Port tag. I think that most SAS HBAs have a similar concept. IPTTs are limited, and must be recycled when an IO completes. Our hw supports upto 2048. So we have a limit of 2048 commands issued at any point in time. Previously we had been managing IPTT in LLDD, but found rq tag can be used as IPTT (as in 6/7), to gave a good performance boost. > Looks like the 16 hw queues are like reply queues in other drivers, > such as megara_sas, but given all the 16 reply queues share one tagset, > so the hw queue number has to be 1 from blk-mq's view. > >> >> However today we only expose a single queue to upper layer (for unrelated >> LLDD error handling restriction). We hope to expose all 16 queues in future, >> which is what I meant by "enabling SCSI MQ in the driver". However, with >> 6/7, this creates a problem, below. > > If the tag of each request from all hw queues has to be unique, you > can't expose all 16 queues. Well we can if we generate and manage the IPTT in the LLDD, as we had been doing. If we want to use the rq tag - which 6/7 is for - then we can't. > >> >>> IFF you device needs different tags for different queues it can use >>> the blk_mq_unique_tag heper to generate unique global tag. >> >> So this helper can't help, as fundamentially the issue is "the tag field in >> struct request is unique per hardware queue but not all all hw queues". >> Indeed blk_mq_unique_tag() does give a unique global tag, but cannot be used >> for the IPTT. >> >> OTOH, We could expose 16 queues to upper layer, and drop 6/7, but we found >> it performs worse. > > We discussed this issue before, but not found a good solution yet for > exposing multiple hw queues to blk-mq. I just think that it's unfortunate that enabling blk-mq means that the LLDD loses this unique tag across all queues in range [0, Scsi_host.can_queue), so much so that we found performance better by not exposing multiple queues and continuing to use single rq tag... > > However, we still get good performance in case of none scheduler by the > following patches: > > 8824f62246be blk-mq: fail the request in case issue failure > 6ce3dd6eec11 blk-mq: issue directly if hw queue isn't busy in case of 'none' > I think that these patches would have been included in our testing. I need to check. > > Thanks, > Ming Thanks, John > > . >
On 11/10/2018 02:58, Martin K. Petersen wrote: > > John, > Hi Martin, >> However it does block us in future from enabling SCSI MQ in the driver. > > We're going to remove the legacy I/O path so I'm not particularly keen > on merging something that's going in the opposite direction. What I really meant was that we cannot expose multiple queues to upper layer, and nothing related to legacy IO path. > >> This is because the IPTT index must be a unique value per HBA. However, >> if we switched to SCSI MQ, the block layer tag becomes unique per queue, >> and not per HBA. > > That doesn't sound right. > Again, my wording my probably was not accurate. I meant that rq tag is not unqiue across all hw queues. As I mentioned in the thread that spawned from this, we actually can't expose multiple hw queues at the moment. And, if we did, we find a performance drop due to having to go back to manage this IPTT internally. So how to handle? We're going to continue to work towards exposing multiple queues to upper layer, but for the moment I think patch 6/7 is a good change. Please let me know. Thanks, John
On Fri, Oct 12, 2018 at 10:02:57AM +0100, John Garry wrote: > Hi Ming, > > > In theory, you still may generate and manage the IPTT in the LLDD by > > simply ignoring rq->tag, meantime enabling SCSI_MQ with 16 hw queues. > > > > Well at the moment we can't expose all 16 hw queues to upper layer anyway, > due to ordering restiction imposed by HW on LLDD. We have a plan to solve > it. > > Regardless, we still found performance better by using rq tag instead of > exposing all 16 queues. > > > However, not sure how much this way may improve performance, and it may > > degrade IO perf. If 16 hw queues are exposed to blk-mq, 16*.can_queue > > requests may be queued to the driver, and allocation & free on the single > > IPTT pool will become a bottleneck. > > Right, this IPTT management doesn't scale (especially for our host with 2 > sockets @ 48/64 cores each). So if upper layer is already generating a tag > which we can use, good to use it. > > > > > Per my experiment on host tagset, it might be a good tradeoff to allocate > > one hw queue for each node to avoid the remote access on dispatch > > data/requests structure for this case, but your IPTT pool is still > > shared all CPUs, maybe you can try the smart sbitmap. > > > > https://www.spinics.net/lists/linux-scsi/msg117920.html > > I don't understand this about allocating a hw queue per node. Surely having > and using all available queues in an intelligent way means less queue > contention, right? Yes. > > Looking at this change: > @@ -5761,6 +5762,11 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h) > static int hpsa_scsi_add_host(struct ctlr_info *h) > { > int rv; > + /* 256 tags should be high enough to saturate device */ > + int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256); > + > + /* per NUMA node hw queue */ > + h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues); > > I assume h->scsi_host->nr_hw_queues was zero-init previously, so we're now > using > 1 queue, but why limit? From my test on null_blk/scsi_debug, per-node hw queue improves iops much more obviously. Also you may manage IPTT in LLD, and contention on the single IPTT pool shouldn't be very serious, given there are less NUMA nodes usually. Thanks, Ming
John, > As I mentioned in the thread that spawned from this, we actually can't > expose multiple hw queues at the moment. And, if we did, we find a > performance drop due to having to go back to manage this IPTT > internally. > > So how to handle? We're going to continue to work towards exposing > multiple queues to upper layer, but for the moment I think patch 6/7 is > a good change. Thanks for the discussion. That's exactly what I was looking for. Applied to 4.20/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering