[1/2] target: core: fix race during ACL removal

Message ID	20220720112852.11440-2-d.bogdanov@yadro.com
State	Superseded
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Dmitry Bogdanov <d.bogdanov@yadro.com> To: Martin Petersen <martin.petersen@oracle.com>, <target-devel@vger.kernel.org> CC: <linux-scsi@vger.kernel.org>, <linux@yadro.com>, Dmitry Bogdanov <d.bogdanov@yadro.com> Subject: [PATCH 1/2] target: core: fix race during ACL removal Date: Wed, 20 Jul 2022 14:28:51 +0300 Message-ID: <20220720112852.11440-2-d.bogdanov@yadro.com> In-Reply-To: <20220720112852.11440-1-d.bogdanov@yadro.com> References: <20220720112852.11440-1-d.bogdanov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	target: core: fix race during ACL removal \| expand [0/2] target: core: fix race during ACL removal [1/2] target: core: fix race during ACL removal [2/2] target: core: de-RCU of se_lun and se_lun acl

Message ID

20220720112852.11440-2-d.bogdanov@yadro.com

State

Superseded

Headers

From: Dmitry Bogdanov <d.bogdanov@yadro.com>
To: Martin Petersen <martin.petersen@oracle.com>,
        <target-devel@vger.kernel.org>
CC: <linux-scsi@vger.kernel.org>, <linux@yadro.com>,
        Dmitry Bogdanov <d.bogdanov@yadro.com>
Subject: [PATCH 1/2] target: core: fix race during ACL removal
Date: Wed, 20 Jul 2022 14:28:51 +0300
Message-ID: <20220720112852.11440-2-d.bogdanov@yadro.com>
In-Reply-To: <20220720112852.11440-1-d.bogdanov@yadro.com>
References: <20220720112852.11440-1-d.bogdanov@yadro.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Precedence: bulk

Series

target: core: fix race during ACL removal | expand

Commit Message

Dmitry Bogdanov July 20, 2022, 11:28 a.m. UTC

Under huge load there is a possibility of race condition in updating
se_dev_entry object in ACL removal procedure.

 NIP [c0080000154093d0] transport_lookup_cmd_lun+0x1f8/0x3d0 [target_core_mod]
 LR [c00800001542ab34] target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
 Call Trace:
   target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
   target_submit_cmd+0x44/0x60 [target_core_mod]
   tcm_qla2xxx_handle_cmd+0x88/0xe0 [tcm_qla2xxx]
   qlt_do_work+0x2e4/0x3d0 [qla2xxx]
   process_one_work+0x298/0x5c

Despite usage of RCU primitives with deve->se_lun pointer, it has not
became dereference-safe. Because deve->se_lun is updated not
synchronized with a reader. That change might be in a release function
called by synchronize_rcu(). But, in fact, there is no sence to NULL that
pointer for deleting deve. So a better solution is to remove that NULLing.

Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
 drivers/target/target_core_device.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Mike Christie July 26, 2022, 9:40 p.m. UTC | #1

On 7/20/22 6:28 AM, Dmitry Bogdanov wrote:
> Under huge load there is a possibility of race condition in updating
> se_dev_entry object in ACL removal procedure.
> 
>  NIP [c0080000154093d0] transport_lookup_cmd_lun+0x1f8/0x3d0 [target_core_mod]
>  LR [c00800001542ab34] target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
>  Call Trace:
>    target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
>    target_submit_cmd+0x44/0x60 [target_core_mod]
>    tcm_qla2xxx_handle_cmd+0x88/0xe0 [tcm_qla2xxx]
>    qlt_do_work+0x2e4/0x3d0 [qla2xxx]
>    process_one_work+0x298/0x5c
> 
> Despite usage of RCU primitives with deve->se_lun pointer, it has not
> became dereference-safe. Because deve->se_lun is updated not
> synchronized with a reader. That change might be in a release function
> called by synchronize_rcu(). But, in fact, there is no sence to NULL that
> pointer for deleting deve. So a better solution is to remove that NULLing.
> 
Hey,

For the line above about there being no reason to NULL the pointer are you
saying:


We should have had a NULL check like:

transport_lookup_cmd_lun:

.....

se_lun = rcu_dereference(deve->se_lun);
if (!se_lun) {
	rcu_read_unlock();
	return TCM_NON_EXISTENT_LUN;
}

and it would have prevented the crash.

But we don't need that or the se_lun RCU use, because the hlist_del_rcu and
kfree_rcu calls in core_disable_device_list_for_node will make sure that
transport_lookup_cmd_lun either finds a se_dev_entry and while in use will
never be freed from under us, or transport_lookup_cmd_lun will never see a
se_dev_entry.


If that's what you meant then I agree.



> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> ---
>  drivers/target/target_core_device.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
> index 25f33eb25337..335f8cbfe633 100644
> --- a/drivers/target/target_core_device.c
> +++ b/drivers/target/target_core_device.c
> @@ -434,9 +434,6 @@ void core_disable_device_list_for_node(
>  	kref_put(&orig->pr_kref, target_pr_kref_release);
>  	wait_for_completion(&orig->pr_comp);
>  
> -	rcu_assign_pointer(orig->se_lun, NULL);
> -	rcu_assign_pointer(orig->se_lun_acl, NULL);
> -
>  	kfree_rcu(orig, rcu_head);
>  
>  	core_scsi3_free_pr_reg_from_nacl(dev, nacl);

Dmitry Bogdanov July 27, 2022, 6:58 a.m. UTC | #2

On Tue, Jul 26, 2022 at 04:40:16PM -0500, Mike Christie wrote:
> 
> On 7/20/22 6:28 AM, Dmitry Bogdanov wrote:
> > Under huge load there is a possibility of race condition in updating
> > se_dev_entry object in ACL removal procedure.
> >
> >  NIP [c0080000154093d0] transport_lookup_cmd_lun+0x1f8/0x3d0 [target_core_mod]
> >  LR [c00800001542ab34] target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
> >  Call Trace:
> >    target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
> >    target_submit_cmd+0x44/0x60 [target_core_mod]
> >    tcm_qla2xxx_handle_cmd+0x88/0xe0 [tcm_qla2xxx]
> >    qlt_do_work+0x2e4/0x3d0 [qla2xxx]
> >    process_one_work+0x298/0x5c
> >
> > Despite usage of RCU primitives with deve->se_lun pointer, it has not
> > became dereference-safe. Because deve->se_lun is updated not
> > synchronized with a reader. That change might be in a release function
> > called by synchronize_rcu(). But, in fact, there is no sence to NULL that
> > pointer for deleting deve. So a better solution is to remove that NULLing.
> >
> Hey,
> 
> For the line above about there being no reason to NULL the pointer are you
> saying:
> 
> 
> We should have had a NULL check like:
> 
> transport_lookup_cmd_lun:
> 
> .....
> 
> se_lun = rcu_dereference(deve->se_lun);
> if (!se_lun) {
>         rcu_read_unlock();
>         return TCM_NON_EXISTENT_LUN;
> }
> 
> and it would have prevented the crash.
> 
> But we don't need that or the se_lun RCU use, because the hlist_del_rcu and
> kfree_rcu calls in core_disable_device_list_for_node will make sure that
> transport_lookup_cmd_lun either finds a se_dev_entry and while in use will
> never be freed from under us, or transport_lookup_cmd_lun will never see a
> se_dev_entry.
> 
> 
> If that's what you meant then I agree.
Yes, that is exactly how I thought. All access to deve->se_lun is
already under rcu_read_lock. And either deve->se_lun is always valid or
deve is not valid itself and will not be found in the list_for_*.
Will add more details in the commit message.

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index 25f33eb25337..335f8cbfe633 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -434,9 +434,6 @@  void core_disable_device_list_for_node(
 	kref_put(&orig->pr_kref, target_pr_kref_release);
 	wait_for_completion(&orig->pr_comp);
 
-	rcu_assign_pointer(orig->se_lun, NULL);
-	rcu_assign_pointer(orig->se_lun_acl, NULL);
-
 	kfree_rcu(orig, rcu_head);
 
 	core_scsi3_free_pr_reg_from_nacl(dev, nacl);

[1/2] target: core: fix race during ACL removal

Commit Message

Comments

Patch