diff mbox series

scsi: dm-mpath: do not fail paths when the target returns ALUA state transition

Message ID CAHZQxyLY3vNeuNiEHC3SzWzBgUaN-ZPOYyZ3bA=Ah63WYwgdfw@mail.gmail.com
State Superseded
Headers show
Series scsi: dm-mpath: do not fail paths when the target returns ALUA state transition | expand

Commit Message

Brian Bunker July 15, 2021, 4:57 p.m. UTC
When paths return an ALUA state transition, do not fail those paths.
The expectation is that the transition is short lived until the new
ALUA state is entered. There might not be other paths in an online
state to serve the request which can lead to an unexpected I/O error
on the multipath device.

Signed-off-by: Brian Bunker <brian@purestorage.com>
Acked-by: Krishna Kant <krishna.kant@purestorage.com>
Acked-by: Seamus Connor <sconnor@purestorage.com>
--
                if (!atomic_read(&m->nr_valid_paths) &&
--

Thanks,
Brian

Brian Bunker
PURE Storage, Inc.
brian@purestorage.com

Comments

Brian Bunker Aug. 3, 2021, 10:09 p.m. UTC | #1
Martin and Hannes,

Any resolution to this issue?

Thanks,
Brian

On Fri, Jul 16, 2021 at 11:39 AM Brian Bunker <brian@purestorage.com> wrote:
>

> On Fri, Jul 16, 2021 at 1:25 AM Martin Wilck <mwilck@suse.com> wrote:

> >

> > Hannes,

> >

> > On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote:

> > > On 7/15/21 6:57 PM, Brian Bunker wrote:

> > > > When paths return an ALUA state transition, do not fail those paths.

> > > > The expectation is that the transition is short lived until the new

> > > > ALUA state is entered. There might not be other paths in an online

> > > > state to serve the request which can lead to an unexpected I/O error

> > > > on the multipath device.

> > > >

> > > > Signed-off-by: Brian Bunker <brian@purestorage.com>

> > > > Acked-by: Krishna Kant <krishna.kant@purestorage.com>

> > > > Acked-by: Seamus Connor <sconnor@purestorage.com>

> > > > --

> > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c

> > > > index bced42f082b0..28948cc481f9 100644

> > > > --- a/drivers/md/dm-mpath.c

> > > > +++ b/drivers/md/dm-mpath.c

> > > > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target

> > > > *ti, struct request *clone,

> > > >          if (error && blk_path_error(error)) {

> > > >                  struct multipath *m = ti->private;

> > > >

> > > > -               if (error == BLK_STS_RESOURCE)

> > > > +               if (error == BLK_STS_RESOURCE || error ==

> > > > BLK_STS_AGAIN)

> > > >                          r = DM_ENDIO_DELAY_REQUEUE;

> > > >                  else

> > > >                          r = DM_ENDIO_REQUEUE;

> > > >

> > > > -               if (pgpath)

> > > > +               if (pgpath && (error != BLK_STS_AGAIN))

> > > >                          fail_path(pgpath);

> > > >

> > > >                  if (!atomic_read(&m->nr_valid_paths) &&

> > > > --

> > >

> > > Sorry, but this will lead to regressions during failover for arrays

> > > taking longer time (some take up to 30 minutes for a complete

> > > failover).

> > > And for those it's absolutely crucial to _not_ retry I/O on the paths

> > > in

> > > transitioning.

> >

> > This won't happen.

> >

> > As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2,

> > your previous patches avoid the request being requeued on the SCSI

> > queue, even with Brian's patch on top. IMO that means that the deadlock

> > situation analyzed earlier can't occur. If you disagree, please explain

> > in detail.

> >

> > By not failing the path, the request can be requeued on the dm level,

> > and dm can decide to try the transitioning path again. But the request

> > wouldn't be added to the SCSI queue, because alua_prep_fn() would

> > prevent that.

> >

> > So, in the worst case, dm would retry queueing the request on the

> > transitioning path over and over again. By adding a delay, we avoid

> > busy-looping. This has basically the same effect as queueing on the dm

> > layer in the first place: the request stays queued on the dm level most

> > of the time. Except for the fact that the queueing would stop earlier:

> > as soon as the kernel notices that the path is not transitioning any

> > more. By not failing the dm paths, we don't depend on user space to

> > reinstate them, which is the right thing to do for a transitioning

> > state IMO.

> >

> > > And you already admitted that 'queue_if_no_path' would resolve this

> > > problem, so why not update the device configuration in multipath-

> > > tools

> > > to have the correct setting for your array?

>

> The reason I don't like queue_if_no_path for a solution is that there

> are times we do want to tail all of the paths as quickly as possible

> (e.g. a cluster sharing a resource). If we add some non zero value to

> no_path_retry we would be forcing that configuration to unnecessarily

> wait, and those customers will see this delay as a regression. This is

> where a distinction between an ALUA state of standby or unavailable

> vs. a transition ALUA state is attractive.

>

> >

> > I think Brian is right that setting transitioning paths to failed state

> > in dm is highly questionable.

> >

> > So far, in dm-multipath, we haven't set paths to failed state because

> > of ALUA state transitions. We've been mapping ALUA states to priorities

> > instead. We don't even fail paths in ALUA "unavailable" state, so why

> > do it for "transitioning"?

> >

> > Thanks

> > Martin

> >

> >

>

> Thanks,

> Brian

>

> --

> Brian Bunker

> PURE Storage, Inc.

> brian@purestorage.com




-- 
Brian Bunker
PURE Storage, Inc.
brian@purestorage.com
diff mbox series

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index bced42f082b0..28948cc481f9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1652,12 +1652,12 @@  static int multipath_end_io(struct dm_target
*ti, struct request *clone,
        if (error && blk_path_error(error)) {
                struct multipath *m = ti->private;

-               if (error == BLK_STS_RESOURCE)
+               if (error == BLK_STS_RESOURCE || error == BLK_STS_AGAIN)
                        r = DM_ENDIO_DELAY_REQUEUE;
                else
                        r = DM_ENDIO_REQUEUE;

-               if (pgpath)
+               if (pgpath && (error != BLK_STS_AGAIN))
                        fail_path(pgpath);