Message ID | 20220203064009.1795344-1-song@kernel.org |
---|---|
Headers | show |
Series | block: scsi: introduce and use BLK_STS_OFFLINE | expand |
CC linux-block (it was a typo in the original email) On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote: > > We have a use case where HDDs are regularly power on/off to perserve power. > When a drive is being removed, we often see errors like > > [ 172.803279] I/O error, dev sda, sector 3137184 > > These messages are confusing for automations that grep dmesg, as they look > very similar to real HDD error. > > Solve this issue with a new block state BLK_STS_OFFLINE. After the change, > the error message looks like > > [ 172.803279] device offline error, dev sda, sector 3137184 > > so that the automations won't confuse them with real I/O error. > > Song Liu (2): > block: introduce BLK_STS_OFFLINE > scsi: use BLK_STS_OFFLINE for not fully online devices > > block/blk-core.c | 1 + > drivers/scsi/scsi_lib.c | 2 +- > include/linux/blk_types.h | 7 +++++++ > 3 files changed, 9 insertions(+), 1 deletion(-) > > -- > 2.30.2
On 2/3/22 12:24 AM, Hannes Reinecke wrote: > On 2/3/22 07:52, Song Liu wrote: >> CC linux-block (it was a typo in the original email) >> >> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote: >>> >>> Currently, drivers reports BLK_STS_IOERR for devices that are not full >>> online or being removed. This behavior could cause confusion for users, >>> as they are not really I/O errors from the device. >>> >>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device >>> offline error" in dmesg instead of "I/O error". >>> >>> Signed-off-by: Song Liu <song@kernel.org> >>> --- >>> block/blk-core.c | 1 + >>> include/linux/blk_types.h | 7 +++++++ >>> 2 files changed, 8 insertions(+) >>> >>> diff --git a/block/blk-core.c b/block/blk-core.c >>> index 61f6a0dc4511..24035dd2eef1 100644 >>> --- a/block/blk-core.c >>> +++ b/block/blk-core.c >>> @@ -164,6 +164,7 @@ static const struct { >>> [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, >>> [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, >>> [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, >>> + [BLK_STS_OFFLINE] = { -EIO, "device offline" }, >>> >>> /* device mapper special case, should not leak out: */ >>> [BLK_STS_DM_REQUEUE] = { -EREMCHG, "dm internal retry" }, >>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >>> index fe065c394fff..5561e58d158a 100644 >>> --- a/include/linux/blk_types.h >>> +++ b/include/linux/blk_types.h >>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t; >>> */ >>> #define BLK_STS_ZONE_ACTIVE_RESOURCE ((__force blk_status_t)16) >>> >>> +/* >>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline >>> + * or is being taken offline. This could help differentiate the case where a >>> + * device is intentionally being shut down from a real I/O error. >>> + */ >>> +#define BLK_STS_OFFLINE ((__force blk_status_t)17) >>> + >>> /** >>> * blk_path_error - returns true if error may be path related >>> * @error: status the request was completed with >>> -- >>> 2.30.2 >>> > Please do not overload EIO here. > EIO already is a catch-all error if we don't know any better, but for > the 'device offline' case we do (or rather should). > Please map it onto 'ENODEV' or 'ENXIO'. It's deliberately EIO as not to force a change in behavior. I don't mind using something else, but that should be a separate change then.
Hi Hannes and Jens, > On Feb 3, 2022, at 5:47 AM, Jens Axboe <axboe@kernel.dk> wrote: > > On 2/3/22 12:24 AM, Hannes Reinecke wrote: >> On 2/3/22 07:52, Song Liu wrote: >>> CC linux-block (it was a typo in the original email) >>> >>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote: >>>> >>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full >>>> online or being removed. This behavior could cause confusion for users, >>>> as they are not really I/O errors from the device. >>>> >>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device >>>> offline error" in dmesg instead of "I/O error". >>>> >>>> Signed-off-by: Song Liu <song@kernel.org> >>>> --- >>>> block/blk-core.c | 1 + >>>> include/linux/blk_types.h | 7 +++++++ >>>> 2 files changed, 8 insertions(+) >>>> >>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>> index 61f6a0dc4511..24035dd2eef1 100644 >>>> --- a/block/blk-core.c >>>> +++ b/block/blk-core.c >>>> @@ -164,6 +164,7 @@ static const struct { >>>> [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, >>>> [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, >>>> [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, >>>> + [BLK_STS_OFFLINE] = { -EIO, "device offline" }, >>>> >>>> /* device mapper special case, should not leak out: */ >>>> [BLK_STS_DM_REQUEUE] = { -EREMCHG, "dm internal retry" }, >>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >>>> index fe065c394fff..5561e58d158a 100644 >>>> --- a/include/linux/blk_types.h >>>> +++ b/include/linux/blk_types.h >>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t; >>>> */ >>>> #define BLK_STS_ZONE_ACTIVE_RESOURCE ((__force blk_status_t)16) >>>> >>>> +/* >>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline >>>> + * or is being taken offline. This could help differentiate the case where a >>>> + * device is intentionally being shut down from a real I/O error. >>>> + */ >>>> +#define BLK_STS_OFFLINE ((__force blk_status_t)17) >>>> + >>>> /** >>>> * blk_path_error - returns true if error may be path related >>>> * @error: status the request was completed with >>>> -- >>>> 2.30.2 >>>> >> Please do not overload EIO here. >> EIO already is a catch-all error if we don't know any better, but for >> the 'device offline' case we do (or rather should). >> Please map it onto 'ENODEV' or 'ENXIO'. > > It's deliberately EIO as not to force a change in behavior. I don't mind > using something else, but that should be a separate change then. Thanks for these feedbacks. Shall I send v2 with an extra patch that changes EIO to ENODEV/ENXIO? Or shall we do that in a follow up patch? Also, any preference between ENODEV and ENXIO? Thanks, Song
On 2/3/22 10:23 AM, Song Liu wrote: > Hi Hannes and Jens, > >> On Feb 3, 2022, at 5:47 AM, Jens Axboe <axboe@kernel.dk> wrote: >> >> On 2/3/22 12:24 AM, Hannes Reinecke wrote: >>> On 2/3/22 07:52, Song Liu wrote: >>>> CC linux-block (it was a typo in the original email) >>>> >>>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote: >>>>> >>>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full >>>>> online or being removed. This behavior could cause confusion for users, >>>>> as they are not really I/O errors from the device. >>>>> >>>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device >>>>> offline error" in dmesg instead of "I/O error". >>>>> >>>>> Signed-off-by: Song Liu <song@kernel.org> >>>>> --- >>>>> block/blk-core.c | 1 + >>>>> include/linux/blk_types.h | 7 +++++++ >>>>> 2 files changed, 8 insertions(+) >>>>> >>>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>>> index 61f6a0dc4511..24035dd2eef1 100644 >>>>> --- a/block/blk-core.c >>>>> +++ b/block/blk-core.c >>>>> @@ -164,6 +164,7 @@ static const struct { >>>>> [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, >>>>> [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, >>>>> [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, >>>>> + [BLK_STS_OFFLINE] = { -EIO, "device offline" }, >>>>> >>>>> /* device mapper special case, should not leak out: */ >>>>> [BLK_STS_DM_REQUEUE] = { -EREMCHG, "dm internal retry" }, >>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >>>>> index fe065c394fff..5561e58d158a 100644 >>>>> --- a/include/linux/blk_types.h >>>>> +++ b/include/linux/blk_types.h >>>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t; >>>>> */ >>>>> #define BLK_STS_ZONE_ACTIVE_RESOURCE ((__force blk_status_t)16) >>>>> >>>>> +/* >>>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline >>>>> + * or is being taken offline. This could help differentiate the case where a >>>>> + * device is intentionally being shut down from a real I/O error. >>>>> + */ >>>>> +#define BLK_STS_OFFLINE ((__force blk_status_t)17) >>>>> + >>>>> /** >>>>> * blk_path_error - returns true if error may be path related >>>>> * @error: status the request was completed with >>>>> -- >>>>> 2.30.2 >>>>> >>> Please do not overload EIO here. >>> EIO already is a catch-all error if we don't know any better, but for >>> the 'device offline' case we do (or rather should). >>> Please map it onto 'ENODEV' or 'ENXIO'. >> >> It's deliberately EIO as not to force a change in behavior. I don't mind >> using something else, but that should be a separate change then. > > Thanks for these feedbacks. Shall I send v2 with an extra patch that > changes EIO to ENODEV/ENXIO? Or shall we do that in a follow up patch? > Also, any preference between ENODEV and ENXIO? Yeah I think so, and perhaps put a mention in this patch on why EIO is chosen to not change the user visible return value.
On 2/3/22 18:23, Song Liu wrote: > Hi Hannes and Jens, > >> On Feb 3, 2022, at 5:47 AM, Jens Axboe <axboe@kernel.dk> wrote: >> >> On 2/3/22 12:24 AM, Hannes Reinecke wrote: >>> On 2/3/22 07:52, Song Liu wrote: >>>> CC linux-block (it was a typo in the original email) >>>> >>>> On Wed, Feb 2, 2022 at 10:40 PM Song Liu <song@kernel.org> wrote: >>>>> >>>>> Currently, drivers reports BLK_STS_IOERR for devices that are not full >>>>> online or being removed. This behavior could cause confusion for users, >>>>> as they are not really I/O errors from the device. >>>>> >>>>> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device >>>>> offline error" in dmesg instead of "I/O error". >>>>> >>>>> Signed-off-by: Song Liu <song@kernel.org> >>>>> --- >>>>> block/blk-core.c | 1 + >>>>> include/linux/blk_types.h | 7 +++++++ >>>>> 2 files changed, 8 insertions(+) >>>>> >>>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>>> index 61f6a0dc4511..24035dd2eef1 100644 >>>>> --- a/block/blk-core.c >>>>> +++ b/block/blk-core.c >>>>> @@ -164,6 +164,7 @@ static const struct { >>>>> [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, >>>>> [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, >>>>> [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, >>>>> + [BLK_STS_OFFLINE] = { -EIO, "device offline" }, >>>>> >>>>> /* device mapper special case, should not leak out: */ >>>>> [BLK_STS_DM_REQUEUE] = { -EREMCHG, "dm internal retry" }, >>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >>>>> index fe065c394fff..5561e58d158a 100644 >>>>> --- a/include/linux/blk_types.h >>>>> +++ b/include/linux/blk_types.h >>>>> @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t; >>>>> */ >>>>> #define BLK_STS_ZONE_ACTIVE_RESOURCE ((__force blk_status_t)16) >>>>> >>>>> +/* >>>>> + * BLK_STS_OFFLINE is returned from the driver when the target device is offline >>>>> + * or is being taken offline. This could help differentiate the case where a >>>>> + * device is intentionally being shut down from a real I/O error. >>>>> + */ >>>>> +#define BLK_STS_OFFLINE ((__force blk_status_t)17) >>>>> + >>>>> /** >>>>> * blk_path_error - returns true if error may be path related >>>>> * @error: status the request was completed with >>>>> -- >>>>> 2.30.2 >>>>> >>> Please do not overload EIO here. >>> EIO already is a catch-all error if we don't know any better, but for >>> the 'device offline' case we do (or rather should). >>> Please map it onto 'ENODEV' or 'ENXIO'. >> >> It's deliberately EIO as not to force a change in behavior. I don't mind >> using something else, but that should be a separate change then. > > Thanks for these feedbacks. Shall I send v2 with an extra patch that > changes EIO to ENODEV/ENXIO? Or shall we do that in a follow up patch? > Also, any preference between ENODEV and ENXIO? > Please make it an addtional patch, and use ENODEV as a return value. For this patch you can add: Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes