mbox series

[0/5] scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands

Message ID 20210421075543.1919826-1-ming.lei@redhat.com
Headers show
Series scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands | expand

Message

Ming Lei April 21, 2021, 7:55 a.m. UTC
Hello Guys,

fnic uses the following way to walk scsi commands in failure handling,
which is obvious wrong, because caller of scsi_host_find_tag has to
guarantee that the tag is active.

        for (tag = 0; tag < fnic->fnic_max_tag_id; tag++) {
				...
                sc = scsi_host_find_tag(fnic->lport->host, tag);
				...
		}

Fix the issue by using blk_mq_tagset_busy_iter() to walk
request/scsi_command.

thanks,
Ming


Ming Lei (5):
  scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
    fnic_terminate_rport_io
  scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
    fnic_clean_pending_aborts
  scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
    fnic_cleanup_io
  scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
    fnic_rport_exch_reset
  scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
    fnic_is_abts_pending

 drivers/scsi/fnic/fnic_scsi.c | 933 ++++++++++++++++++----------------
 1 file changed, 493 insertions(+), 440 deletions(-)

Cc: Satish Kharat <satishkh@cisco.com>
Cc: Karan Tilak Kumar <kartilak@cisco.com>
Cc: David Jeffery <djeffery@redhat.com>

Comments

Ming Lei April 22, 2021, 2:17 p.m. UTC | #1
On Thu, Apr 22, 2021 at 08:08:41AM +0800, Ming Lei wrote:
> On Wed, Apr 21, 2021 at 10:14:56PM +0200, Hannes Reinecke wrote:
> > On 4/21/21 9:55 AM, Ming Lei wrote:
> > > Hello Guys,
> > > 
> > > fnic uses the following way to walk scsi commands in failure handling,
> > > which is obvious wrong, because caller of scsi_host_find_tag has to
> > > guarantee that the tag is active.
> > > 
> > >          for (tag = 0; tag < fnic->fnic_max_tag_id; tag++) {
> > > 				...
> > >                  sc = scsi_host_find_tag(fnic->lport->host, tag);
> > > 				...
> > > 		}
> > > 
> > > Fix the issue by using blk_mq_tagset_busy_iter() to walk
> > > request/scsi_command.
> > > 
> > > thanks,
> > > Ming
> > > 
> > > 
> > > Ming Lei (5):
> > >    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
> > >      fnic_terminate_rport_io
> > >    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
> > >      fnic_clean_pending_aborts
> > >    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
> > >      fnic_cleanup_io
> > >    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
> > >      fnic_rport_exch_reset
> > >    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in
> > >      fnic_is_abts_pending
> > > 
> > >   drivers/scsi/fnic/fnic_scsi.c | 933 ++++++++++++++++++----------------
> > >   1 file changed, 493 insertions(+), 440 deletions(-)
> > > 
> > > Cc: Satish Kharat <satishkh@cisco.com>
> > > Cc: Karan Tilak Kumar <kartilak@cisco.com>
> > > Cc: David Jeffery <djeffery@redhat.com>
> > > 
> > Well, this is actually not that easy for fnic.
> > Problem is the reset hack hch put in some time ago (cf
> > fnic_host_start_tag()), which will cause any TMF to use a tag which is _not_
> > visible to the busy iter.
> 
> 'git grep -n fnic_host_start_tag ./' shows nothing.
> 
> > That will cause the iter to miss any TMF, with unpredictable results if a
> > TMF is running at the same time than, say, a link bounce.
> 
> Wrt. linus tree or next tree, I don't see any issue wrt. your concern.
> 
> > 
> > I have folded this as part of my patchset for reserved commands in SCSI;
> > that way fnic can use 'normal' tags for TMFs, which are then visible to the
> > busy iter and life's good.
> 
> No, this fix is one bug fix, which can't depend on your reserved
> command in SCSI, and they need to be backported to stable tree too.

Hi Hannes,

We have customers report on this issue, could you please let us know
if you will post out one bug-fix only version?

Thanks,
Ming
Hannes Reinecke April 26, 2021, 7:30 a.m. UTC | #2
On 4/22/21 4:17 PM, Ming Lei wrote:
> On Thu, Apr 22, 2021 at 08:08:41AM +0800, Ming Lei wrote:

>> On Wed, Apr 21, 2021 at 10:14:56PM +0200, Hannes Reinecke wrote:

>>> On 4/21/21 9:55 AM, Ming Lei wrote:

>>>> Hello Guys,

>>>>

>>>> fnic uses the following way to walk scsi commands in failure handling,

>>>> which is obvious wrong, because caller of scsi_host_find_tag has to

>>>> guarantee that the tag is active.

>>>>

>>>>          for (tag = 0; tag < fnic->fnic_max_tag_id; tag++) {

>>>> 				...

>>>>                  sc = scsi_host_find_tag(fnic->lport->host, tag);

>>>> 				...

>>>> 		}

>>>>

>>>> Fix the issue by using blk_mq_tagset_busy_iter() to walk

>>>> request/scsi_command.

>>>>

>>>> thanks,

>>>> Ming

>>>>

>>>>

>>>> Ming Lei (5):

>>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

>>>>      fnic_terminate_rport_io

>>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

>>>>      fnic_clean_pending_aborts

>>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

>>>>      fnic_cleanup_io

>>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

>>>>      fnic_rport_exch_reset

>>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

>>>>      fnic_is_abts_pending

>>>>

>>>>   drivers/scsi/fnic/fnic_scsi.c | 933 ++++++++++++++++++----------------

>>>>   1 file changed, 493 insertions(+), 440 deletions(-)

>>>>

>>>> Cc: Satish Kharat <satishkh@cisco.com>

>>>> Cc: Karan Tilak Kumar <kartilak@cisco.com>

>>>> Cc: David Jeffery <djeffery@redhat.com>

>>>>

>>> Well, this is actually not that easy for fnic.

>>> Problem is the reset hack hch put in some time ago (cf

>>> fnic_host_start_tag()), which will cause any TMF to use a tag which is _not_

>>> visible to the busy iter.

>>

>> 'git grep -n fnic_host_start_tag ./' shows nothing.

>>

>>> That will cause the iter to miss any TMF, with unpredictable results if a

>>> TMF is running at the same time than, say, a link bounce.

>>

>> Wrt. linus tree or next tree, I don't see any issue wrt. your concern.

>>

>>>

>>> I have folded this as part of my patchset for reserved commands in SCSI;

>>> that way fnic can use 'normal' tags for TMFs, which are then visible to the

>>> busy iter and life's good.

>>

>> No, this fix is one bug fix, which can't depend on your reserved

>> command in SCSI, and they need to be backported to stable tree too.

> 

> Hi Hannes,

> 

> We have customers report on this issue, could you please let us know

> if you will post out one bug-fix only version?

> 

And so have we, and indeed we have the same bug reports.
So I'll be splitting off those patches and send it as a stand-alone
patchset.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)
Ming Lei April 26, 2021, 8:04 a.m. UTC | #3
On Mon, Apr 26, 2021 at 09:30:45AM +0200, Hannes Reinecke wrote:
> On 4/22/21 4:17 PM, Ming Lei wrote:

> > On Thu, Apr 22, 2021 at 08:08:41AM +0800, Ming Lei wrote:

> >> On Wed, Apr 21, 2021 at 10:14:56PM +0200, Hannes Reinecke wrote:

> >>> On 4/21/21 9:55 AM, Ming Lei wrote:

> >>>> Hello Guys,

> >>>>

> >>>> fnic uses the following way to walk scsi commands in failure handling,

> >>>> which is obvious wrong, because caller of scsi_host_find_tag has to

> >>>> guarantee that the tag is active.

> >>>>

> >>>>          for (tag = 0; tag < fnic->fnic_max_tag_id; tag++) {

> >>>> 				...

> >>>>                  sc = scsi_host_find_tag(fnic->lport->host, tag);

> >>>> 				...

> >>>> 		}

> >>>>

> >>>> Fix the issue by using blk_mq_tagset_busy_iter() to walk

> >>>> request/scsi_command.

> >>>>

> >>>> thanks,

> >>>> Ming

> >>>>

> >>>>

> >>>> Ming Lei (5):

> >>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

> >>>>      fnic_terminate_rport_io

> >>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

> >>>>      fnic_clean_pending_aborts

> >>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

> >>>>      fnic_cleanup_io

> >>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

> >>>>      fnic_rport_exch_reset

> >>>>    scsi: fnic: use blk_mq_tagset_busy_iter() to walk scsi commands in

> >>>>      fnic_is_abts_pending

> >>>>

> >>>>   drivers/scsi/fnic/fnic_scsi.c | 933 ++++++++++++++++++----------------

> >>>>   1 file changed, 493 insertions(+), 440 deletions(-)

> >>>>

> >>>> Cc: Satish Kharat <satishkh@cisco.com>

> >>>> Cc: Karan Tilak Kumar <kartilak@cisco.com>

> >>>> Cc: David Jeffery <djeffery@redhat.com>

> >>>>

> >>> Well, this is actually not that easy for fnic.

> >>> Problem is the reset hack hch put in some time ago (cf

> >>> fnic_host_start_tag()), which will cause any TMF to use a tag which is _not_

> >>> visible to the busy iter.

> >>

> >> 'git grep -n fnic_host_start_tag ./' shows nothing.

> >>

> >>> That will cause the iter to miss any TMF, with unpredictable results if a

> >>> TMF is running at the same time than, say, a link bounce.

> >>

> >> Wrt. linus tree or next tree, I don't see any issue wrt. your concern.

> >>

> >>>

> >>> I have folded this as part of my patchset for reserved commands in SCSI;

> >>> that way fnic can use 'normal' tags for TMFs, which are then visible to the

> >>> busy iter and life's good.

> >>

> >> No, this fix is one bug fix, which can't depend on your reserved

> >> command in SCSI, and they need to be backported to stable tree too.

> > 

> > Hi Hannes,

> > 

> > We have customers report on this issue, could you please let us know

> > if you will post out one bug-fix only version?

> > 

> And so have we, and indeed we have the same bug reports.

> So I'll be splitting off those patches and send it as a stand-alone

> patchset.

> 


That is great!

Glad to give an review after it is posted out.


Thanks,
Ming