diff mbox series

[v2] nvme-fc: initialize nvme fc ctrl ops

Message ID 20230221095708.29094-1-njavali@marvell.com
State New
Headers show
Series [v2] nvme-fc: initialize nvme fc ctrl ops | expand

Commit Message

Nilesh Javali Feb. 21, 2023, 9:57 a.m. UTC
The system crashed while performing qla2xxx nvme discovery with
below call trace,

qla2xxx [0000:21:00.0]-2102:12: qla_nvme_register_remote: traddr=nn-0x245e00a098f4684a:pn-0x245f00a098f4684a PortID:5a247a
qla2xxx [0000:21:00.0]-2102:12: qla_nvme_register_remote: traddr=nn-0x245e00a098f4684a:pn-0x246100a098f4684a PortID:5a2d6e
BUG: kernel NULL pointer dereference, address: 0000000000000010
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 61 PID: 6064 Comm: nvme Kdump: loaded Not tainted 6.2.0-rc1 #3
Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.5.6 10/06/2021
RIP: 0010:nvme_alloc_admin_tag_set+0x51/0x120 [nvme_core]
Code: 00 00 00 00 81 c1 b0 00 00 00 48 c7 86 a8 00 00 00 00 00 00 00
      c1 e9 03 f3 48 ab 4c 89 46 38 c7 46 44 1e 00 00 00 48 8b 45 30
      <f6> 40 10 01 74 07 c7 46 48 01 00 00 00 8b 45 5c c7 43 58 40 00 00
RSP: 0018:ffffafe6cd7cbd10 EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff898e0c39c050 RCX: 0000000000000000
RDX: 00000000000001d8 RSI: ffff898e0c39c050 RDI: ffff898e0c39c100
RBP: ffff898e0c39c398 R08: ffffffffc0afe1a0 R09: ffff896ed602a600
R10: 0000000000000010 R11: f000000000000000 R12: ffff898e06ea2600
R13: ffff898e070ffbc0 R14: ffff898e0c39c040 R15: ffff898e0c39c398
FS:  00007f9368279780(0000) GS:ffff89ad7fb40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000210c570000 CR4: 0000000000350ee0
Call Trace:
nvme_fc_init_ctrl+0x328/0x460 [nvme_fc]
nvme_fc_create_ctrl+0x1b0/0x260 [nvme_fc]
nvmf_create_ctrl+0x141/0x240 [nvme_fabrics]
nvmf_dev_write+0x81/0xe0 [nvme_fabrics]
vfs_write+0xc5/0x3b0
? syscall_exit_work+0x103/0x130
? syscall_exit_to_user_mode+0x12/0x30
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x90
? exc_page_fault+0x62/0x150
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f936813e967
Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f
      1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05
      <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
SP: 002b:00007fff4197a468 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000560862447d40 RCX: 00007f936813e967
RDX: 000000000000012b RSI: 0000560862447d40 RDI: 0000000000000003
RBP: 0000000000000003 R08: 000000000000012b R09: 0000560862447d40
R10: 0000000000000000 R11: 0000000000000246 R12: 00005608624473f0
R13: 000000000000012b R14: 00007f93682e6100 R15: 00007f93682e613d

Initialize the nvme_fc_ctrl_ops before allocating the nvme admin
tag set.

Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers")
Cc: stable@vger.kernel.org
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
v2:
- correct the cleanup path
- Add Cc tag

 drivers/nvme/host/fc.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Comments

Niklas Cassel Feb. 22, 2023, 4:11 p.m. UTC | #1
On Wed, Feb 22, 2023 at 05:59:54AM +0000, Nilesh Javali wrote:
> Christoph,
> 
> Thanks for pointing to the commit.
> I do not see this commit in Martin's tree 6.3/scsi-staging or 6.3/scsi-queue branches.
> 
> Martin,
> 
> The 6.3/scsi-staging or 6.3/scsi-queue branches are still at 6.2.0-rc1.
> That could be the reason we hit the NVMe discovery NULL pointer dereference issue.
> Any plans to pull the below commit to 6.3/scsi-staging or 6.3/scsi-queue branches.
> Or am I missing something here.

Hello Nilesh,

What you are missing is that NVMe is not SCSI :)

Consult the MAINTAINER file to see the correct (git) tree to climb:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS?h=v6.2#n14922

While SCSI sends pull requests straight to Linus,
NVMe sends pull requests first to Jens, who then sends it to Linus.


Kind regards,
Niklas

> 
> Thanks,
> Nilesh
> 
> > -----Original Message-----
> > From: Christoph Hellwig <hch@lst.de>
> > Sent: Tuesday, February 21, 2023 11:05 PM
> > To: Nilesh Javali <njavali@marvell.com>
> > Cc: linux-nvme@lists.infradead.org; martin.petersen@oracle.com; linux-
> > scsi@vger.kernel.org; GR-QLogic-Storage-Upstream <GR-QLogic-Storage-
> > Upstream@marvell.com>; Bikash Hazarika <bhazarika@marvell.com>; Anil
> > Gurumurthy <agurumurthy@marvell.com>; Shreyas Deodhar
> > <sdeodhar@marvell.com>; Christoph Hellwig <hch@lst.de>
> > Subject: [EXT] Re: [PATCH v2] nvme-fc: initialize nvme fc ctrl ops
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > On Tue, Feb 21, 2023 at 01:57:08AM -0800, Nilesh Javali wrote:
> > > CPU: 61 PID: 6064 Comm: nvme Kdump: loaded Not tainted 6.2.0-rc1 #3
> > 
> > Well, that's a reall old -rc.
> > 
> > This should be fixed by:
> > 
> > commit 98e3528012cd571c48bbae7c7c0f868823254b6c
> > Author: Ross Lagerwall <ross.lagerwall@citrix.com>
> > Date:   Fri Jan 20 17:43:54 2023 +0000
> > 
> >     nvme-fc: fix initialization order
>
Martin K. Petersen Feb. 22, 2023, 5:04 p.m. UTC | #2
Hi Nilesh!

> The 6.3/scsi-staging or 6.3/scsi-queue branches are still at
> 6.2.0-rc1.  That could be the reason we hit the NVMe discovery NULL
> pointer dereference issue.  Any plans to pull the below commit to
> 6.3/scsi-staging or 6.3/scsi-queue branches.  Or am I missing
> something here.

Except in very rare circumstances, the SCSI submission trees stay at
-rc1 forever. I generally don't bring in stuff from other trees to avoid
problems if those trees subsequently have to rebase.

It sounds like you should be testing either linux-next or maybe a local
ephemeral integration branch featuring the various topic areas that are
important to you (SCSI fixes + staging, block, NVMe).
Nilesh Javali Feb. 23, 2023, 3:43 p.m. UTC | #3
Niklas and Martin,

Thank you very much for the pointers.

Thanks,
Nilesh

> -----Original Message-----
> From: Martin K. Petersen <martin.petersen@oracle.com>
> Sent: Wednesday, February 22, 2023 10:34 PM
> To: Nilesh Javali <njavali@marvell.com>
> Cc: Christoph Hellwig <hch@lst.de>; martin.petersen@oracle.com; linux-
> nvme@lists.infradead.org; linux-scsi@vger.kernel.org; GR-QLogic-Storage-
> Upstream <GR-QLogic-Storage-Upstream@marvell.com>; Bikash Hazarika
> <bhazarika@marvell.com>; Anil Gurumurthy <agurumurthy@marvell.com>;
> Shreyas Deodhar <sdeodhar@marvell.com>; Shreya Jeurkar
> <sjeurkar@marvell.com>; Jeetendra Sonar <jsonar@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v2] nvme-fc: initialize nvme fc ctrl ops
> 
> 
> Hi Nilesh!
> 
> > The 6.3/scsi-staging or 6.3/scsi-queue branches are still at
> > 6.2.0-rc1.  That could be the reason we hit the NVMe discovery NULL
> > pointer dereference issue.  Any plans to pull the below commit to
> > 6.3/scsi-staging or 6.3/scsi-queue branches.  Or am I missing
> > something here.
> 
> Except in very rare circumstances, the SCSI submission trees stay at
> -rc1 forever. I generally don't bring in stuff from other trees to avoid
> problems if those trees subsequently have to rebase.
> 
> It sounds like you should be testing either linux-next or maybe a local
> ephemeral integration branch featuring the various topic areas that are
> important to you (SCSI fixes + staging, block, NVMe).
> 
> --
> Martin K. Petersen	Oracle Linux Engineering
diff mbox series

Patch

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 4564f16a0b20..53297cad49ea 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3521,13 +3521,6 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	nvme_fc_init_queue(ctrl, 0);
 
-	ret = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
-			&nvme_fc_admin_mq_ops,
-			struct_size((struct nvme_fcp_op_w_sgl *)NULL, priv,
-				    ctrl->lport->ops->fcprqst_priv_sz));
-	if (ret)
-		goto out_free_queues;
-
 	/*
 	 * Would have been nice to init io queues tag set as well.
 	 * However, we require interaction from the controller
@@ -3537,7 +3530,14 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
 	if (ret)
-		goto out_cleanup_tagset;
+		goto out_free_queues;
+
+	ret = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
+			&nvme_fc_admin_mq_ops,
+			struct_size((struct nvme_fcp_op_w_sgl *)NULL, priv,
+				    ctrl->lport->ops->fcprqst_priv_sz));
+	if (ret)
+		goto out_free_queues;
 
 	/* at this point, teardown path changes to ref counting on nvme ctrl */
 
@@ -3592,8 +3592,6 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	return ERR_PTR(-EIO);
 
-out_cleanup_tagset:
-	nvme_remove_admin_tag_set(&ctrl->ctrl);
 out_free_queues:
 	kfree(ctrl->queues);
 out_free_ida: