diff mbox series

IO error on DIF/DIX supported array

Message ID DM4PR18MB5220E0AF6564DF1A1F126EF0D2DB9@DM4PR18MB5220.namprd18.prod.outlook.com
State New
Headers show
Series IO error on DIF/DIX supported array | expand

Commit Message

Saurav Kashyap Feb. 7, 2023, 11:20 a.m. UTC
Hi Martin,
We have observed IO failure on 3PAR array that supports DIF/DIX with upstream code. An error is only seen when IOs are done on DM devices, no error observed if IO is done on /dev/sdX.
I added some prints to understand the problem and figured out that SCSI_PROT_IP_CHECKSUM flag is not set in scmnd->prot_flags. Ideally it should be set as BIP_IP_CHECKSUM should be set.

--------------------<START: IO to /dev/sdc>----------------
[Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
-------------------<END: IO to /dev/sdc>-----------------

----------------<START: IO to dm-10>---------------------
[Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM bio=ffff976fa15fa490 bip_flags=0x0
[Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want ffff)
[Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff978f0752c180 bip_flags=0x11
[Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM bio=ffff976fc87fef10 bip_flags=0x0
[Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want ffff)
[Mon Feb 6 17:55:13 2023] Buffer I/O error on dev dm-10, logical block 0, async page read
-----------------<END: IO to dm-10>------------------------

Its noticed that bio pointer get changed when IO is done through dm device.  I added more debug prints in bio_clone and bio_integrity_clone and concluded that bip_flags are not getting copied in bio_integrity_clone routine.

--------------------
[Tue Feb  7 14:15:47 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff891ecc5fa840 bip_flags=0x11
[Tue Feb  7 14:15:47 2023] SK: __bio_clone: bio=ffff891ed97b5990 bio_src=ffff891ecc5fa840
[Tue Feb  7 14:15:47 2023] SK: bio_integrity_clone: bip=ffff891ecc5fd500 bip_src=ffff891ecc5fcb40 bip_flags=0x0 src_bip_flags=0x11
[Tue Feb  7 14:15:47 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM bio=ffff891ed97b5990 bip_flags=0x0
[Tue Feb  7 14:15:47 2023] dm-3: guard tag error at sector 0 (rcvd 0000, want ffff)
[Tue Feb  7 14:15:47 2023] Buffer I/O error on dev dm-3, logical block 0, async page read
----------------------------------

If I add the change to copy the flags, following  BUG_ON in slub.c is reported
------------------<code>-------------
----------------<code>---------------

------------------<BUG_ON>--------------
[  751.838432] kernel BUG at mm/slub.c:435!
[  751.838440] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  751.838443] CPU: 49 PID: 981 Comm: kworker/49:1H Kdump: loaded Not tainted 6.2.0-rc1+ #14
[  751.838447] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.5.6 10/06/2021
[  751.838448] Workqueue: kintegrityd bio_integrity_verify_fn
[  751.838458] RIP: 0010:__slab_free+0x1ae/0x300
[  751.838467] Code: 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 d8 fb ff ff 48 83 c4 60 4c 89 f7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 62 3b 00 00 <0f> 0b 80 4c 24 4b 80 e9 ea fe ff ff 4c 89 fa 4d 89 d7 4c 8b 54 24
[  751.838469] RSP: 0018:ffffbb674fcf7dd0 EFLAGS: 00010246
[  751.838472] RAX: ffff9c320d3546e0 RBX: ffff9c325302e480 RCX: 000000008040003f
[  751.838473] RDX: ffffffc10e1546c0 RSI: ffffdfb30434d500 RDI: ffff9c3200042500
[  751.838475] RBP: ffff9c3200042500 R08: 0000000000000001 R09: ffffffffb4fbf08a
[  751.838476] R10: ffffbb674fcf7ca0 R11: ffffffffb65e4ac8 R12: ffffdfb30434d500
[  751.838477] R13: ffff9c320d3546c0 R14: ffff9c320d3546c0 R15: ffff9c320d3546c0
[  751.838479] FS:  0000000000000000(0000) GS:ffff9c70ff840000(0000) knlGS:0000000000000000
[  751.838481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  751.838482] CR2: 00007fe84efedb00 CR3: 000000015472a000 CR4: 0000000000350ee0
[  751.838484] Call Trace:
[  751.838485]  <TASK>
[  751.838487]  ? bio_integrity_process+0x14f/0x1c0
[  751.838494]  ? __pfx_t10_pi_type1_verify_ip+0x10/0x10 [t10_pi]
[  751.838501]  bio_integrity_free+0xaa/0xb0
[  751.838504]  bio_integrity_verify_fn+0x40/0x50
[  751.838507]  process_one_work+0x1e5/0x3b0
[  751.838513]  ? __pfx_worker_thread+0x10/0x10
[  751.838515]  worker_thread+0x50/0x3a0
[  751.838518]  ? __pfx_worker_thread+0x10/0x10
[  751.838520]  kthread+0xd9/0x100
[  751.838525]  ? __pfx_kthread+0x10/0x10
[  751.838528]  ret_from_fork+0x2c/0x50
[  751.838535]  </TASK>
----------------------<BUG_ON>---------------

Queries
1) Is there a specific reason for not copying the bip_flags in bio_integrity_clone function?
2) If bip_flags needs to be copied then is there something else needs to be done that will take care of BUG_ON?
3) if not, then what should be right solution for fix an IO error because of SCSI_PROT_IP_CHECKSUM flag not set.


Thanks,
~Saurav

Comments

Saurav Kashyap Feb. 14, 2023, 6:58 a.m. UTC | #1
Hi Martin,
Any inputs on this one?

Thanks,
~Saurav

> -----Original Message-----
> From: Saurav Kashyap
> Sent: Tuesday, February 7, 2023 4:50 PM
> To: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: linux-scsi <linux-scsi@vger.kernel.org>; Girish Basrur
> <gbasrur@marvell.com>
> Subject: IO error on DIF/DIX supported array
> 
> Hi Martin,
> We have observed IO failure on 3PAR array that supports DIF/DIX with
> upstream code. An error is only seen when IOs are done on DM devices, no
> error observed if IO is done on /dev/sdX.
> I added some prints to understand the problem and figured out that
> SCSI_PROT_IP_CHECKSUM flag is not set in scmnd->prot_flags. Ideally it
> should be set as BIP_IP_CHECKSUM should be set.
> 
> --------------------<START: IO to /dev/sdc>----------------
> [Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM
> bio=ffff976f8d19c300 bip_flags=0x11
> [Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting
> IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
> [Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM
> bio=ffff976f8d19c300 bip_flags=0x11
> [Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting
> IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
> -------------------<END: IO to /dev/sdc>-----------------
> 
> ----------------<START: IO to dm-10>---------------------
> [Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM
> bio=ffff976f8d19c300 bip_flags=0x11
> [Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM
> bio=ffff976fa15fa490 bip_flags=0x0
> [Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want
> ffff)
> [Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM
> bio=ffff978f0752c180 bip_flags=0x11
> [Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM
> bio=ffff976fc87fef10 bip_flags=0x0
> [Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want
> ffff)
> [Mon Feb 6 17:55:13 2023] Buffer I/O error on dev dm-10, logical block 0,
> async page read
> -----------------<END: IO to dm-10>------------------------
> 
> Its noticed that bio pointer get changed when IO is done through dm device.
> I added more debug prints in bio_clone and bio_integrity_clone and
> concluded that bip_flags are not getting copied in bio_integrity_clone
> routine.
> 
> --------------------
> [Tue Feb  7 14:15:47 2023] SK: bio_integrity_prep setting IP_CHECKSUM
> bio=ffff891ecc5fa840 bip_flags=0x11
> [Tue Feb  7 14:15:47 2023] SK: __bio_clone: bio=ffff891ed97b5990
> bio_src=ffff891ecc5fa840
> [Tue Feb  7 14:15:47 2023] SK: bio_integrity_clone: bip=ffff891ecc5fd500
> bip_src=ffff891ecc5fcb40 bip_flags=0x0 src_bip_flags=0x11
> [Tue Feb  7 14:15:47 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM
> bio=ffff891ed97b5990 bip_flags=0x0
> [Tue Feb  7 14:15:47 2023] dm-3: guard tag error at sector 0 (rcvd 0000, want
> ffff)
> [Tue Feb  7 14:15:47 2023] Buffer I/O error on dev dm-3, logical block 0, async
> page read
> ----------------------------------
> 
> If I add the change to copy the flags, following  BUG_ON in slub.c is reported
> ------------------<code>-------------
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index 3f5685c00e36..07e7443c7be3 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -418,6 +418,7 @@ int bio_integrity_clone(struct bio *bio, struct bio
> *bio_src,
> 
>         bip->bip_vcnt = bip_src->bip_vcnt;
>         bip->bip_iter = bip_src->bip_iter;
> +       bip->bip_flags = bip_src->bip_flags;
> 
>         return 0;
>  }
> ----------------<code>---------------
> 
> ------------------<BUG_ON>--------------
> [  751.838432] kernel BUG at mm/slub.c:435!
> [  751.838440] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  751.838443] CPU: 49 PID: 981 Comm: kworker/49:1H Kdump: loaded Not
> tainted 6.2.0-rc1+ #14
> [  751.838447] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS
> 2.5.6 10/06/2021
> [  751.838448] Workqueue: kintegrityd bio_integrity_verify_fn
> [  751.838458] RIP: 0010:__slab_free+0x1ae/0x300
> [  751.838467] Code: 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 d8 fb ff ff
> 48 83 c4 60 4c 89 f7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 62 3b 00 00 <0f> 0b 80 4c 24
> 4b 80 e9 ea fe ff ff 4c 89 fa 4d 89 d7 4c 8b 54 24
> [  751.838469] RSP: 0018:ffffbb674fcf7dd0 EFLAGS: 00010246
> [  751.838472] RAX: ffff9c320d3546e0 RBX: ffff9c325302e480 RCX:
> 000000008040003f
> [  751.838473] RDX: ffffffc10e1546c0 RSI: ffffdfb30434d500 RDI:
> ffff9c3200042500
> [  751.838475] RBP: ffff9c3200042500 R08: 0000000000000001 R09:
> ffffffffb4fbf08a
> [  751.838476] R10: ffffbb674fcf7ca0 R11: ffffffffb65e4ac8 R12:
> ffffdfb30434d500
> [  751.838477] R13: ffff9c320d3546c0 R14: ffff9c320d3546c0 R15:
> ffff9c320d3546c0
> [  751.838479] FS:  0000000000000000(0000) GS:ffff9c70ff840000(0000)
> knlGS:0000000000000000
> [  751.838481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  751.838482] CR2: 00007fe84efedb00 CR3: 000000015472a000 CR4:
> 0000000000350ee0
> [  751.838484] Call Trace:
> [  751.838485]  <TASK>
> [  751.838487]  ? bio_integrity_process+0x14f/0x1c0
> [  751.838494]  ? __pfx_t10_pi_type1_verify_ip+0x10/0x10 [t10_pi]
> [  751.838501]  bio_integrity_free+0xaa/0xb0
> [  751.838504]  bio_integrity_verify_fn+0x40/0x50
> [  751.838507]  process_one_work+0x1e5/0x3b0
> [  751.838513]  ? __pfx_worker_thread+0x10/0x10
> [  751.838515]  worker_thread+0x50/0x3a0
> [  751.838518]  ? __pfx_worker_thread+0x10/0x10
> [  751.838520]  kthread+0xd9/0x100
> [  751.838525]  ? __pfx_kthread+0x10/0x10
> [  751.838528]  ret_from_fork+0x2c/0x50
> [  751.838535]  </TASK>
> ----------------------<BUG_ON>---------------
> 
> Queries
> 1) Is there a specific reason for not copying the bip_flags in
> bio_integrity_clone function?
> 2) If bip_flags needs to be copied then is there something else needs to be
> done that will take care of BUG_ON?
> 3) if not, then what should be right solution for fix an IO error because of
> SCSI_PROT_IP_CHECKSUM flag not set.
> 
> 
> Thanks,
> ~Saurav
Martin K. Petersen Feb. 15, 2023, 2:15 a.m. UTC | #2
Saurav,

> 1) Is there a specific reason for not copying the bip_flags in
> bio_integrity_clone function?

No, that's a bug. Not sure how that line got lost. Can you please try
the following patch?

Thanks!
Saurav Kashyap Feb. 15, 2023, 5:12 a.m. UTC | #3
Hi Martin,
Thanks a lot, below patch have worked, kindly post it.

Thanks,
~Saurav

> -----Original Message-----
> From: Martin K. Petersen <martin.petersen@oracle.com>
> Sent: Wednesday, February 15, 2023 7:46 AM
> To: Saurav Kashyap <skashyap@marvell.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>; linux-scsi <linux-
> scsi@vger.kernel.org>; Girish Basrur <gbasrur@marvell.com>
> Subject: [EXT] Re: IO error on DIF/DIX supported array
> 
> External Email
> 
> ----------------------------------------------------------------------
> 
> Saurav,
> 
> > 1) Is there a specific reason for not copying the bip_flags in
> > bio_integrity_clone function?
> 
> No, that's a bug. Not sure how that line got lost. Can you please try
> the following patch?
> 
> Thanks!
> 
> --
> Martin K. Petersen	Oracle Linux Engineering
> 
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index 3f5685c00e36..91ffee6fc8cb 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -418,6 +418,7 @@ int bio_integrity_clone(struct bio *bio, struct bio
> *bio_src,
> 
>  	bip->bip_vcnt = bip_src->bip_vcnt;
>  	bip->bip_iter = bip_src->bip_iter;
> +	bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY;
> 
>  	return 0;
>  }
diff mbox series

Patch

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3f5685c00e36..07e7443c7be3 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -418,6 +418,7 @@  int bio_integrity_clone(struct bio *bio, struct bio *bio_src,

        bip->bip_vcnt = bip_src->bip_vcnt;
        bip->bip_iter = bip_src->bip_iter;
+       bip->bip_flags = bip_src->bip_flags;

        return 0;
 }