[0/7] hisi_sas: Misc bugfixes and an optimisation patch

Message ID	1537801594-207139-1-git-send-email-john.garry@huawei.com
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: John Garry <john.garry@huawei.com> To: <jejb@linux.vnet.ibm.com>, <martin.petersen@oracle.com> CC: <linuxarm@huawei.com>, <linux-scsi@vger.kernel.org>, <linux-kernel@vger.kernel.org>, John Garry <john.garry@huawei.com> Subject: [PATCH 0/7] hisi_sas: Misc bugfixes and an optimisation patch Date: Mon, 24 Sep 2018 23:06:27 +0800 Message-ID: <1537801594-207139-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	hisi_sas: Misc bugfixes and an optimisation patch \| expand [0/7] hisi_sas: Misc bugfixes and an optimisation patch [1/7] scsi: hisi_sas: Feed back linkrate(max/min) when re-attached [2/7] scsi: hisi_sas: Move evaluation of hisi_hba in hisi_sas_task_prep() [3/7] scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO [4/7] scsi: hisi_sas: Free slot later in slot_complete_vx_hw() [6/7] scsi: hisi_sas: Use block layer tag instead for IPTT [7/7] scsi: hisi_sas: Update v3 hw AIP_LIMIT and CFG_AGING_TIME register values

John Garry Sept. 24, 2018, 3:06 p.m. UTC

This patchset introduces mostly more minor/obscure bugfixes for the
driver.

Also included is an optimisation to use the block layer tag for the IPTT
indexing. This quite a nice optimisation as it means we don't have to
evaluate this in the driver - it was a bit of a bottle-neck.

However it does block us in future from enabling SCSI MQ in the driver.
This is because the IPTT index must be a unique value per HBA. However,
if we switched to SCSI MQ, the block layer tag becomes unique per queue,
and not per HBA.

Having said this, testing has shown that performance is better by using
this block layer tag instead of enabling SCSI MQ in the driver.

Luo Jiaxing (2):
  scsi: hisi_sas: Feed back linkrate(max/min) when re-attached
  scsi: hisi_sas: Move evaluation of hisi_hba in hisi_sas_task_prep()

Xiang Chen (5):
  scsi: hisi_sas: Fix the race between IO completion and timeout for
    SMP/internal IO
  scsi: hisi_sas: Free slot later in slot_complete_vx_hw()
  scsi: hisi_sas: unmask interrupts ent72 and ent74
  scsi: hisi_sas: Use block layer tag instead for IPTT
  scsi: hisi_sas: Update v3 hw AIP_LIMIT and CFG_AGING_TIME register
    values

 drivers/scsi/hisi_sas/hisi_sas.h       |   3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 154 ++++++++++++++++++++++++---------
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   1 -
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |  11 +--
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  15 ++--
 5 files changed, 130 insertions(+), 54 deletions(-)

-- 
1.9.1

Christoph Hellwig Oct. 11, 2018, 6:36 a.m. UTC | #1

On Wed, Oct 10, 2018 at 09:58:25PM -0400, Martin K. Petersen wrote:
> > This is because the IPTT index must be a unique value per HBA. However,

> > if we switched to SCSI MQ, the block layer tag becomes unique per queue,

> > and not per HBA.

> 

> That doesn't sound right.

blk-mq tags are always per-host (which has actually caused problems for
ATA, which is now using its own per-device tags).

John Garry Oct. 11, 2018, 1:12 p.m. UTC | #2

On 11/10/2018 11:15, Christoph Hellwig wrote:
> On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote:

>>

>>> blk-mq tags are always per-host (which has actually caused problems for

>>> ATA, which is now using its own per-device tags).

>>>

>>

>> So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues =

>> 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant

>> on queue count?

>

> Yes, if can_queue is 2048 you will gets tags from 0..2047.

>

I should be clear about some things before discussing this further. Our 
device has 16 hw queues. And each command we send to any queue in the 
device must have a unique tag across all hw queues for that device, and 
should be in the range [0, 2048) - it's called an IPTT. So 
Scsi_host.can_queue = 2048.

However today we only expose a single queue to upper layer (for 
unrelated LLDD error handling restriction). We hope to expose all 16 
queues in future, which is what I meant by "enabling SCSI MQ in the 
driver". However, with 6/7, this creates a problem, below.

> IFF you device needs different tags for different queues it can use

> the blk_mq_unique_tag heper to generate unique global tag.

So this helper can't help, as fundamentially the issue is "the tag field 
in struct request is unique per hardware queue but not all all hw 
queues". Indeed blk_mq_unique_tag() does give a unique global tag, but 
cannot be used for the IPTT.

OTOH, We could expose 16 queues to upper layer, and drop 6/7, but we 
found it performs worse.

>

> But unless you actuall have multiple hardware queues that latter part

> is rather irrelevant to start with.

>

> .

>

Thanks,
John

Ming Lei Oct. 11, 2018, 1:32 p.m. UTC | #3

On Thu, Oct 11, 2018 at 02:12:11PM +0100, John Garry wrote:
> On 11/10/2018 11:15, Christoph Hellwig wrote:

> > On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote:

> > > 

> > > > blk-mq tags are always per-host (which has actually caused problems for

> > > > ATA, which is now using its own per-device tags).

> > > > 

> > > 

> > > So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues =

> > > 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant

> > > on queue count?

> > 

> > Yes, if can_queue is 2048 you will gets tags from 0..2047.

> > 

> 

> I should be clear about some things before discussing this further. Our

> device has 16 hw queues. And each command we send to any queue in the device

> must have a unique tag across all hw queues for that device, and should be

> in the range [0, 2048) - it's called an IPTT. So Scsi_host.can_queue = 2048.


Could you describe a bit about IPTT?

Looks like the 16 hw queues are like reply queues in other drivers,
such as megara_sas, but given all the 16 reply queues share one tagset,
so the hw queue number has to be 1 from blk-mq's view.

> 

> However today we only expose a single queue to upper layer (for unrelated

> LLDD error handling restriction). We hope to expose all 16 queues in future,

> which is what I meant by "enabling SCSI MQ in the driver". However, with

> 6/7, this creates a problem, below.


If the tag of each request from all hw queues has to be unique, you
can't expose all 16 queues.

> 

> > IFF you device needs different tags for different queues it can use

> > the blk_mq_unique_tag heper to generate unique global tag.

> 

> So this helper can't help, as fundamentially the issue is "the tag field in

> struct request is unique per hardware queue but not all all hw queues".

> Indeed blk_mq_unique_tag() does give a unique global tag, but cannot be used

> for the IPTT.

> 

> OTOH, We could expose 16 queues to upper layer, and drop 6/7, but we found

> it performs worse.


We discussed this issue before, but not found a good solution yet for
exposing multiple hw queues to blk-mq.

However, we still get good performance in case of none scheduler by the
following patches:

8824f62246be blk-mq: fail the request in case issue failure
6ce3dd6eec11 blk-mq: issue directly if hw queue isn't busy in case of 'none'


Thanks,
Ming

John Garry Oct. 11, 2018, 2:07 p.m. UTC | #4

On 11/10/2018 14:32, Ming Lei wrote:
> On Thu, Oct 11, 2018 at 02:12:11PM +0100, John Garry wrote:

>> On 11/10/2018 11:15, Christoph Hellwig wrote:

>>> On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote:

>>>>

>>>>> blk-mq tags are always per-host (which has actually caused problems for

>>>>> ATA, which is now using its own per-device tags).

>>>>>

>>>>

>>>> So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues =

>>>> 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant

>>>> on queue count?

>>>

>>> Yes, if can_queue is 2048 you will gets tags from 0..2047.

>>>

>>

>> I should be clear about some things before discussing this further. Our

>> device has 16 hw queues. And each command we send to any queue in the device

>> must have a unique tag across all hw queues for that device, and should be

>> in the range [0, 2048) - it's called an IPTT. So Scsi_host.can_queue = 2048.

>

> Could you describe a bit about IPTT?

>


IPTT is an "Initiator Tag". It is a tag to map to the context of a hw 
queue command. It is related to SAS protocol Initiator Port tag. I think 
that most SAS HBAs have a similar concept.

IPTTs are limited, and must be recycled when an IO completes. Our hw 
supports upto 2048. So we have a limit of 2048 commands issued at any 
point in time.

Previously we had been managing IPTT in LLDD, but found rq tag can be 
used as IPTT (as in 6/7), to gave a good performance boost.

> Looks like the 16 hw queues are like reply queues in other drivers,

> such as megara_sas, but given all the 16 reply queues share one tagset,

> so the hw queue number has to be 1 from blk-mq's view.

>

>>

>> However today we only expose a single queue to upper layer (for unrelated

>> LLDD error handling restriction). We hope to expose all 16 queues in future,

>> which is what I meant by "enabling SCSI MQ in the driver". However, with

>> 6/7, this creates a problem, below.

>

> If the tag of each request from all hw queues has to be unique, you

> can't expose all 16 queues.


Well we can if we generate and manage the IPTT in the LLDD, as we had 
been doing. If we want to use the rq tag - which 6/7 is for - then we can't.

>

>>

>>> IFF you device needs different tags for different queues it can use

>>> the blk_mq_unique_tag heper to generate unique global tag.

>>

>> So this helper can't help, as fundamentially the issue is "the tag field in

>> struct request is unique per hardware queue but not all all hw queues".

>> Indeed blk_mq_unique_tag() does give a unique global tag, but cannot be used

>> for the IPTT.

>>

>> OTOH, We could expose 16 queues to upper layer, and drop 6/7, but we found

>> it performs worse.

>

> We discussed this issue before, but not found a good solution yet for

> exposing multiple hw queues to blk-mq.


I just think that it's unfortunate that enabling blk-mq means that the 
LLDD loses this unique tag across all queues in range [0, 
Scsi_host.can_queue), so much so that we found performance better by not 
exposing multiple queues and continuing to use single rq tag...

>

> However, we still get good performance in case of none scheduler by the

> following patches:

>

> 8824f62246be blk-mq: fail the request in case issue failure

> 6ce3dd6eec11 blk-mq: issue directly if hw queue isn't busy in case of 'none'

>


I think that these patches would have been included in our testing. I 
need to check.

>

> Thanks,

> Ming



Thanks,
John

>

> .

>

John Garry Oct. 12, 2018, 10:47 a.m. UTC | #5

On 11/10/2018 02:58, Martin K. Petersen wrote:
>

> John,

>

Hi Martin,

>> However it does block us in future from enabling SCSI MQ in the driver.

>

> We're going to remove the legacy I/O path so I'm not particularly keen

> on merging something that's going in the opposite direction.

What I really meant was that we cannot expose multiple queues to upper 
layer, and nothing related to legacy IO path.

>

>> This is because the IPTT index must be a unique value per HBA. However,

>> if we switched to SCSI MQ, the block layer tag becomes unique per queue,

>> and not per HBA.

>

> That doesn't sound right.

>

Again, my wording my probably was not accurate. I meant that rq tag is 
not unqiue across all hw queues.

As I mentioned in the thread that spawned from this, we actually can't 
expose multiple hw queues at the moment. And, if we did, we find a 
performance drop due to having to go back to manage this IPTT internally.

So how to handle? We're going to continue to work towards exposing 
multiple queues to upper layer, but for the moment I think patch 6/7 is 
a good change.

Please let me know.

Thanks,
John

Ming Lei Oct. 12, 2018, 1:30 p.m. UTC | #6

On Fri, Oct 12, 2018 at 10:02:57AM +0100, John Garry wrote:
> Hi Ming,

> 

> > In theory, you still may generate and manage the IPTT in the LLDD by

> > simply ignoring rq->tag, meantime enabling SCSI_MQ with 16 hw queues.

> > 

> 

> Well at the moment we can't expose all 16 hw queues to upper layer anyway,

> due to ordering restiction imposed by HW on LLDD. We have a plan to solve

> it.

> 

> Regardless, we still found performance better by using rq tag instead of

> exposing all 16 queues.

> 

> > However, not sure how much this way may improve performance, and it may

> > degrade IO perf. If 16 hw queues are exposed to blk-mq, 16*.can_queue

> > requests may be queued to the driver, and allocation & free on the single

> > IPTT pool will become a bottleneck.

> 

> Right, this IPTT management doesn't scale (especially for our host with 2

> sockets @ 48/64 cores each). So if upper layer is already generating a tag

> which we can use, good to use it.

> 

> > 

> > Per my experiment on host tagset, it might be a good tradeoff to allocate

> > one hw queue for each node to avoid the remote access on dispatch

> > data/requests structure for this case, but your IPTT pool is still

> > shared all CPUs, maybe you can try the smart sbitmap.

> > 

> > https://www.spinics.net/lists/linux-scsi/msg117920.html

> 

> I don't understand this about allocating a hw queue per node. Surely having

> and using all available queues in an intelligent way means less queue

> contention, right?


Yes.

> 

> Looking at this change:

> @@ -5761,6 +5762,11 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h)

>  static int hpsa_scsi_add_host(struct ctlr_info *h)

>  {

>  	int rv;

> +	/* 256 tags should be high enough to saturate device */

> +	int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256);

> +

> +	/* per NUMA node hw queue */

> +	h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);

> 

> I assume h->scsi_host->nr_hw_queues was zero-init previously, so we're now

> using > 1 queue, but why limit?


From my test on null_blk/scsi_debug, per-node hw queue improves iops
much more obviously.

Also you may manage IPTT in LLD, and contention on the single IPTT pool
shouldn't be very serious, given there are less NUMA nodes usually.



Thanks,
Ming

Martin K. Petersen Oct. 16, 2018, 4:28 a.m. UTC | #7

John,

> As I mentioned in the thread that spawned from this, we actually can't

> expose multiple hw queues at the moment. And, if we did, we find a

> performance drop due to having to go back to manage this IPTT

> internally.

>

> So how to handle? We're going to continue to work towards exposing

> multiple queues to upper layer, but for the moment I think patch 6/7 is

> a good change.


Thanks for the discussion. That's exactly what I was looking for.

Applied to 4.20/scsi-queue.

-- 
Martin K. Petersen	Oracle Linux Engineering

[0/7] hisi_sas: Misc bugfixes and an optimisation patch

Message

Comments