diff mbox series

target: Fix a double put in transport_free_session

Message ID 20210323025851.11782-1-lyl2019@mail.ustc.edu.cn
State New
Headers show
Series target: Fix a double put in transport_free_session | expand

Commit Message

Lv Yunlong March 23, 2021, 2:58 a.m. UTC
In transport_free_session, se_nacl is got from se_sess
with the initial reference. If se_nacl->acl_sess_list is
empty, se_nacl->dynamic_stop is set to true. Then the first
target_put_nacl(se_nacl) will drop the initial reference
and free se_nacl. Later there is a second target_put_nacl()
to put se_nacl. It may cause error in race.

My patch sets se_nacl->dynamic_stop to false to avoid the
double put.

Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
---
 drivers/target/target_core_transport.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Mike Christie March 23, 2021, 4:28 p.m. UTC | #1
On 3/22/21 9:58 PM, Lv Yunlong wrote:
> In transport_free_session, se_nacl is got from se_sess
> with the initial reference. If se_nacl->acl_sess_list is
> empty, se_nacl->dynamic_stop is set to true. Then the first
> target_put_nacl(se_nacl) will drop the initial reference
> and free se_nacl. Later there is a second target_put_nacl()
> to put se_nacl. It may cause error in race.
>> My patch sets se_nacl->dynamic_stop to false to avoid the
> double put.
> 
> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
> ---
>  drivers/target/target_core_transport.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 5ecb9f18a53d..c266defe694f 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess)
>  		}
>  		mutex_unlock(&se_tpg->acl_node_mutex);
>  
> -		if (se_nacl->dynamic_stop)
> +		if (se_nacl->dynamic_stop) {
>  			target_put_nacl(se_nacl);
> +			se_nacl->dynamic_stop = false;
> +		}
>  
>  		target_put_nacl(se_nacl);
Could you describe the race a little more?

Is the race:

1. thread1 called core_tpg_check_initiator_node_acl and found the acl.
sess->se_node_acl is set to the found acl.
2. thread2 is running transport_free_session. It now grabs the acl_node_mutex
and sees se_nacl->acl_sess_list is empty.
3. thread2 does the dynamic_stop=true operations in transport_free_session.
4. thread1 now calls transport_register_session now adds the sess to acl's
acl_sess_list.

Later when the session that thread 1 created is deleted dynamic_stop is still
set, so we do an extra target_put_nacl?

I'm not sure your patch will handle this race. When we delete the session thread1
created dynamic_node_acl is still set, so this:

                mutex_lock(&se_tpg->acl_node_mutex);
                if (se_nacl->dynamic_node_acl &&
                    !se_tfo->tpg_check_demo_mode_cache(se_tpg)) {
                        spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags);
                        if (list_empty(&se_nacl->acl_sess_list))
                                se_nacl->dynamic_stop = true;

can set dynamic_stop to true again and we can end up doing the extra put still.

On top of the extra put we also do

list_del(&se_nacl->acl_list);

twice so we have to handle that as well.

Is there also another bug in this code. If someone adds an acl while there is a
dynamic acl in place core_tpg_add_initiator_node_acl will clear dynamic_node_acl
but we leave the extra reference, so later when transport_free_session is called
we will not do the extra put.
Lv Yunlong March 25, 2021, 7:48 a.m. UTC | #2
> -----原始邮件-----

> 发件人: michael.christie@oracle.com

> 发送时间: 2021-03-24 00:28:35 (星期三)

> 收件人: "Lv Yunlong" <lyl2019@mail.ustc.edu.cn>, martin.petersen@oracle.com

> 抄送: linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-kernel@vger.kernel.org

> 主题: Re: [PATCH] target: Fix a double put in transport_free_session

> 

> On 3/22/21 9:58 PM, Lv Yunlong wrote:

> > In transport_free_session, se_nacl is got from se_sess

> > with the initial reference. If se_nacl->acl_sess_list is

> > empty, se_nacl->dynamic_stop is set to true. Then the first

> > target_put_nacl(se_nacl) will drop the initial reference

> > and free se_nacl. Later there is a second target_put_nacl()

> > to put se_nacl. It may cause error in race.

> >> My patch sets se_nacl->dynamic_stop to false to avoid the

> > double put.

> > 

> > Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>

> > ---

> >  drivers/target/target_core_transport.c | 4 +++-

> >  1 file changed, 3 insertions(+), 1 deletion(-)

> > 

> > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c

> > index 5ecb9f18a53d..c266defe694f 100644

> > --- a/drivers/target/target_core_transport.c

> > +++ b/drivers/target/target_core_transport.c

> > @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess)

> >  		}

> >  		mutex_unlock(&se_tpg->acl_node_mutex);

> >  

> > -		if (se_nacl->dynamic_stop)

> > +		if (se_nacl->dynamic_stop) {

> >  			target_put_nacl(se_nacl);

> > +			se_nacl->dynamic_stop = false;

> > +		}

> >  

> >  		target_put_nacl(se_nacl);

> Could you describe the race a little more?

> 

> Is the race:

> 

> 1. thread1 called core_tpg_check_initiator_node_acl and found the acl.

> sess->se_node_acl is set to the found acl.

> 2. thread2 is running transport_free_session. It now grabs the acl_node_mutex

> and sees se_nacl->acl_sess_list is empty.

> 3. thread2 does the dynamic_stop=true operations in transport_free_session.

> 4. thread1 now calls transport_register_session now adds the sess to acl's

> acl_sess_list.

> 

> Later when the session that thread 1 created is deleted dynamic_stop is still

> set, so we do an extra target_put_nacl?

> 

> I'm not sure your patch will handle this race. When we delete the session thread1

> created dynamic_node_acl is still set, so this:

> 

>                 mutex_lock(&se_tpg->acl_node_mutex);

>                 if (se_nacl->dynamic_node_acl &&

>                     !se_tfo->tpg_check_demo_mode_cache(se_tpg)) {

>                         spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags);

>                         if (list_empty(&se_nacl->acl_sess_list))

>                                 se_nacl->dynamic_stop = true;

> 

> can set dynamic_stop to true again and we can end up doing the extra put still.

> 

> On top of the extra put we also do

> 

> list_del(&se_nacl->acl_list);

> 

> twice so we have to handle that as well.

> 

> Is there also another bug in this code. If someone adds an acl while there is a

> dynamic acl in place core_tpg_add_initiator_node_acl will clear dynamic_node_acl

> but we leave the extra reference, so later when transport_free_session is called

> we will not do the extra put.

> 


Ok, thanks for your answer. According the description above, i think it is a false
positive now.

Thanks.
Mike Christie March 25, 2021, 5:24 p.m. UTC | #3
On 3/25/21 2:48 AM, lyl2019@mail.ustc.edu.cn wrote:
> 

> 

> 

>> -----原始邮件-----

>> 发件人: michael.christie@oracle.com

>> 发送时间: 2021-03-24 00:28:35 (星期三)

>> 收件人: "Lv Yunlong" <lyl2019@mail.ustc.edu.cn>, martin.petersen@oracle.com

>> 抄送: linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-kernel@vger.kernel.org

>> 主题: Re: [PATCH] target: Fix a double put in transport_free_session

>>

>> On 3/22/21 9:58 PM, Lv Yunlong wrote:

>>> In transport_free_session, se_nacl is got from se_sess

>>> with the initial reference. If se_nacl->acl_sess_list is

>>> empty, se_nacl->dynamic_stop is set to true. Then the first

>>> target_put_nacl(se_nacl) will drop the initial reference

>>> and free se_nacl. Later there is a second target_put_nacl()

>>> to put se_nacl. It may cause error in race.

>>>> My patch sets se_nacl->dynamic_stop to false to avoid the

>>> double put.

>>>

>>> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>

>>> ---

>>>  drivers/target/target_core_transport.c | 4 +++-

>>>  1 file changed, 3 insertions(+), 1 deletion(-)

>>>

>>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c

>>> index 5ecb9f18a53d..c266defe694f 100644

>>> --- a/drivers/target/target_core_transport.c

>>> +++ b/drivers/target/target_core_transport.c

>>> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess)

>>>  		}

>>>  		mutex_unlock(&se_tpg->acl_node_mutex);

>>>  

>>> -		if (se_nacl->dynamic_stop)

>>> +		if (se_nacl->dynamic_stop) {

>>>  			target_put_nacl(se_nacl);

>>> +			se_nacl->dynamic_stop = false;

>>> +		}

>>>  

>>>  		target_put_nacl(se_nacl);

>> Could you describe the race a little more?

>>

>> Is the race:

>>

>> 1. thread1 called core_tpg_check_initiator_node_acl and found the acl.

>> sess->se_node_acl is set to the found acl.

>> 2. thread2 is running transport_free_session. It now grabs the acl_node_mutex

>> and sees se_nacl->acl_sess_list is empty.

>> 3. thread2 does the dynamic_stop=true operations in transport_free_session.

>> 4. thread1 now calls transport_register_session now adds the sess to acl's

>> acl_sess_list.

>>

>> Later when the session that thread 1 created is deleted dynamic_stop is still

>> set, so we do an extra target_put_nacl?

>>

>> I'm not sure your patch will handle this race. When we delete the session thread1

>> created dynamic_node_acl is still set, so this:

>>

>>                 mutex_lock(&se_tpg->acl_node_mutex);

>>                 if (se_nacl->dynamic_node_acl &&

>>                     !se_tfo->tpg_check_demo_mode_cache(se_tpg)) {

>>                         spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags);

>>                         if (list_empty(&se_nacl->acl_sess_list))

>>                                 se_nacl->dynamic_stop = true;

>>

>> can set dynamic_stop to true again and we can end up doing the extra put still.

>>

>> On top of the extra put we also do

>>

>> list_del(&se_nacl->acl_list);

>>

>> twice so we have to handle that as well.

>>

>> Is there also another bug in this code. If someone adds an acl while there is a

>> dynamic acl in place core_tpg_add_initiator_node_acl will clear dynamic_node_acl

>> but we leave the extra reference, so later when transport_free_session is called

>> we will not do the extra put.

>>

> 

> Ok, thanks for your answer. According the description above, i think it is a false

> positive now.

> 


Did you hit this bug, are you using an inspection tool, or did you find this by code
review?

I think there was a misunderstanding. I was saying it looks like a race could happen.
There is no protection in lio core.

I think it's hard to hit because most drivers do not allow the combo:

tpg_check_demo_mode == true
tpg_check_demo_mode_cache = false

It looks like those settings are allowed with tcm_qla2xxx and usb, but:

usb - has a mutex around creation and removal so we can't race.
tcm qla - I don't know this driver will enough, but I cc'd the maintainer.
Lv Yunlong March 26, 2021, 11:11 a.m. UTC | #4
> -----原始邮件-----

> 发件人: "Mike Christie" <michael.christie@oracle.com>

> 发送时间: 2021-03-26 01:24:58 (星期五)

> 收件人: lyl2019@mail.ustc.edu.cn

> 抄送: martin.petersen@oracle.com, linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-kernel@vger.kernel.org, "Nilesh Javali" <njavali@marvell.com>

> 主题: Re: [PATCH] target: Fix a double put in transport_free_session

> 

> On 3/25/21 2:48 AM, lyl2019@mail.ustc.edu.cn wrote:

> > 

> > 

> > 

> >> -----原始邮件-----

> >> 发件人: michael.christie@oracle.com

> >> 发送时间: 2021-03-24 00:28:35 (星期三)

> >> 收件人: "Lv Yunlong" <lyl2019@mail.ustc.edu.cn>, martin.petersen@oracle.com

> >> 抄送: linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-kernel@vger.kernel.org

> >> 主题: Re: [PATCH] target: Fix a double put in transport_free_session

> >>

> >> On 3/22/21 9:58 PM, Lv Yunlong wrote:

> >>> In transport_free_session, se_nacl is got from se_sess

> >>> with the initial reference. If se_nacl->acl_sess_list is

> >>> empty, se_nacl->dynamic_stop is set to true. Then the first

> >>> target_put_nacl(se_nacl) will drop the initial reference

> >>> and free se_nacl. Later there is a second target_put_nacl()

> >>> to put se_nacl. It may cause error in race.

> >>>> My patch sets se_nacl->dynamic_stop to false to avoid the

> >>> double put.

> >>>

> >>> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>

> >>> ---

> >>>  drivers/target/target_core_transport.c | 4 +++-

> >>>  1 file changed, 3 insertions(+), 1 deletion(-)

> >>>

> >>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c

> >>> index 5ecb9f18a53d..c266defe694f 100644

> >>> --- a/drivers/target/target_core_transport.c

> >>> +++ b/drivers/target/target_core_transport.c

> >>> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess)

> >>>  		}

> >>>  		mutex_unlock(&se_tpg->acl_node_mutex);

> >>>  

> >>> -		if (se_nacl->dynamic_stop)

> >>> +		if (se_nacl->dynamic_stop) {

> >>>  			target_put_nacl(se_nacl);

> >>> +			se_nacl->dynamic_stop = false;

> >>> +		}

> >>>  

> >>>  		target_put_nacl(se_nacl);

> >> Could you describe the race a little more?

> >>

> >> Is the race:

> >>

> >> 1. thread1 called core_tpg_check_initiator_node_acl and found the acl.

> >> sess->se_node_acl is set to the found acl.

> >> 2. thread2 is running transport_free_session. It now grabs the acl_node_mutex

> >> and sees se_nacl->acl_sess_list is empty.

> >> 3. thread2 does the dynamic_stop=true operations in transport_free_session.

> >> 4. thread1 now calls transport_register_session now adds the sess to acl's

> >> acl_sess_list.

> >>

> >> Later when the session that thread 1 created is deleted dynamic_stop is still

> >> set, so we do an extra target_put_nacl?

> >>

> >> I'm not sure your patch will handle this race. When we delete the session thread1

> >> created dynamic_node_acl is still set, so this:

> >>

> >>                 mutex_lock(&se_tpg->acl_node_mutex);

> >>                 if (se_nacl->dynamic_node_acl &&

> >>                     !se_tfo->tpg_check_demo_mode_cache(se_tpg)) {

> >>                         spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags);

> >>                         if (list_empty(&se_nacl->acl_sess_list))

> >>                                 se_nacl->dynamic_stop = true;

> >>

> >> can set dynamic_stop to true again and we can end up doing the extra put still.

> >>

> >> On top of the extra put we also do

> >>

> >> list_del(&se_nacl->acl_list);

> >>

> >> twice so we have to handle that as well.

> >>

> >> Is there also another bug in this code. If someone adds an acl while there is a

> >> dynamic acl in place core_tpg_add_initiator_node_acl will clear dynamic_node_acl

> >> but we leave the extra reference, so later when transport_free_session is called

> >> we will not do the extra put.

> >>

> > 

> > Ok, thanks for your answer. According the description above, i think it is a false

> > positive now.

> > 

> 

> Did you hit this bug, are you using an inspection tool, or did you find this by code

> review?

> 

> I think there was a misunderstanding. I was saying it looks like a race could happen.

> There is no protection in lio core.

> 

> I think it's hard to hit because most drivers do not allow the combo:

> 

> tpg_check_demo_mode == true

> tpg_check_demo_mode_cache = false

> 

> It looks like those settings are allowed with tcm_qla2xxx and usb, but:

> 

> usb - has a mutex around creation and removal so we can't race.

> tcm qla - I don't know this driver will enough, but I cc'd the maintainer.


This bug is detected by a static analyzer tool.

Thanks.
Maurizio Lombardi March 26, 2021, 12:31 p.m. UTC | #5
Dne 23. 03. 21 v 3:58 Lv Yunlong napsal(a):
> In transport_free_session, se_nacl is got from se_sess

> with the initial reference. If se_nacl->acl_sess_list is

> empty, se_nacl->dynamic_stop is set to true. Then the first

> target_put_nacl(se_nacl) will drop the initial reference

> and free se_nacl. Later there is a second target_put_nacl()

> to put se_nacl. It may cause error in race.

> 

> My patch sets se_nacl->dynamic_stop to false to avoid the

> double put.

> 

> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>

> ---

>  drivers/target/target_core_transport.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c

> index 5ecb9f18a53d..c266defe694f 100644

> --- a/drivers/target/target_core_transport.c

> +++ b/drivers/target/target_core_transport.c

> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess)

>  		}

>  		mutex_unlock(&se_tpg->acl_node_mutex);

>  

> -		if (se_nacl->dynamic_stop)

> +		if (se_nacl->dynamic_stop) {

>  			target_put_nacl(se_nacl);

> +			se_nacl->dynamic_stop = false;

> +		}

>  

>  		target_put_nacl(se_nacl);

>  	}

> 


FYI,

I have received a bug report against the 5.8 kernel about task hangs that seems to involve the nacl "dynamic_stop" code

Mar  4 16:49:44 gzboot kernel: [186322.177819] INFO: task targetcli:2359053 blocked for more than 120 seconds.
Mar  4 16:49:44 gzboot kernel: [186322.178862]       Tainted: P           O      5.8.0-44-generic #50-Ubuntu
Mar  4 16:49:44 gzboot kernel: [186322.179746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  4 16:49:44 gzboot kernel: [186322.180583] targetcli       D    0 2359053 2359052 0x00000000
Mar  4 16:49:44 gzboot kernel: [186322.180586] Call Trace:
Mar  4 16:49:44 gzboot kernel: [186322.180592]  __schedule+0x212/0x5d0
Mar  4 16:49:44 gzboot kernel: [186322.180595]  ? usleep_range+0x90/0x90
Mar  4 16:49:44 gzboot kernel: [186322.180596]  schedule+0x55/0xc0
Mar  4 16:49:44 gzboot kernel: [186322.180597]  schedule_timeout+0x10f/0x160
Mar  4 16:49:44 gzboot kernel: [186322.180601]  ? evict+0x14c/0x1b0
Mar  4 16:49:44 gzboot kernel: [186322.180602]  __wait_for_common+0xa8/0x150
Mar  4 16:49:44 gzboot kernel: [186322.180603]  wait_for_completion+0x24/0x30
Mar  4 16:49:44 gzboot kernel: [186322.180637]  core_tpg_del_initiator_node_acl+0x8e/0x120 [target_core_mod]
Mar  4 16:49:44 gzboot kernel: [186322.180643]  target_fabric_nacl_base_release+0x26/0x30 [target_core_mod]
Mar  4 16:49:44 gzboot kernel: [186322.180647]  config_item_cleanup+0x5d/0xf0
Mar  4 16:49:44 gzboot kernel: [186322.180649]  config_item_put+0x2d/0x40
Mar  4 16:49:44 gzboot kernel: [186322.180651]  configfs_rmdir+0x1d8/0x350
Mar  4 16:49:44 gzboot kernel: [186322.180653]  vfs_rmdir.part.0+0x66/0x190
Mar  4 16:49:44 gzboot kernel: [186322.180654]  do_rmdir+0x1b4/0x200
Mar  4 16:49:44 gzboot kernel: [186322.180655]  __x64_sys_rmdir+0x17/0x20
Mar  4 16:49:44 gzboot kernel: [186322.180657]  do_syscall_64+0x49/0xc0
Mar  4 16:49:44 gzboot kernel: [186322.180659]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar  4 16:49:44 gzboot kernel: [186322.180660] RIP: 0033:0x7f30cf1ca9eb
Mar  4 16:49:44 gzboot kernel: [186322.180661] RSP: 002b:00007ffd72030bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
Mar  4 16:49:44 gzboot kernel: [186322.180662] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: 00007f30cf1ca9eb
Mar  4 16:49:44 gzboot kernel: [186322.180663] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f30cc1e2a50
Mar  4 16:49:44 gzboot kernel: [186322.180664] RBP: 0000000000a4b7a0 R08: 0000000000000000 R09: 00007ffd72030b7f
Mar  4 16:49:44 gzboot kernel: [186322.180664] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd72030c00
Mar  4 16:49:44 gzboot kernel: [186322.180665] R13: 00007f30cdd706a8 R14: 00007f30ced00cc0 R15: 00007f30cdd70698


Maurizio
Mike Christie March 26, 2021, 4:24 p.m. UTC | #6
On 3/26/21 7:31 AM, Maurizio Lombardi wrote:
> 

> 

> Dne 23. 03. 21 v 3:58 Lv Yunlong napsal(a):

>> In transport_free_session, se_nacl is got from se_sess

>> with the initial reference. If se_nacl->acl_sess_list is

>> empty, se_nacl->dynamic_stop is set to true. Then the first

>> target_put_nacl(se_nacl) will drop the initial reference

>> and free se_nacl. Later there is a second target_put_nacl()

>> to put se_nacl. It may cause error in race.

>>

>> My patch sets se_nacl->dynamic_stop to false to avoid the

>> double put.

>>

>> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>

>> ---

>>  drivers/target/target_core_transport.c | 4 +++-

>>  1 file changed, 3 insertions(+), 1 deletion(-)

>>

>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c

>> index 5ecb9f18a53d..c266defe694f 100644

>> --- a/drivers/target/target_core_transport.c

>> +++ b/drivers/target/target_core_transport.c

>> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess)

>>  		}

>>  		mutex_unlock(&se_tpg->acl_node_mutex);

>>  

>> -		if (se_nacl->dynamic_stop)

>> +		if (se_nacl->dynamic_stop) {

>>  			target_put_nacl(se_nacl);

>> +			se_nacl->dynamic_stop = false;

>> +		}

>>  

>>  		target_put_nacl(se_nacl);

>>  	}

>>

> 

> FYI,

> 

> I have received a bug report against the 5.8 kernel about task hangs that seems to involve the nacl "dynamic_stop" code

> 

> Mar  4 16:49:44 gzboot kernel: [186322.177819] INFO: task targetcli:2359053 blocked for more than 120 seconds.

> Mar  4 16:49:44 gzboot kernel: [186322.178862]       Tainted: P           O      5.8.0-44-generic #50-Ubuntu

> Mar  4 16:49:44 gzboot kernel: [186322.179746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

> Mar  4 16:49:44 gzboot kernel: [186322.180583] targetcli       D    0 2359053 2359052 0x00000000

> Mar  4 16:49:44 gzboot kernel: [186322.180586] Call Trace:

> Mar  4 16:49:44 gzboot kernel: [186322.180592]  __schedule+0x212/0x5d0

> Mar  4 16:49:44 gzboot kernel: [186322.180595]  ? usleep_range+0x90/0x90

> Mar  4 16:49:44 gzboot kernel: [186322.180596]  schedule+0x55/0xc0

> Mar  4 16:49:44 gzboot kernel: [186322.180597]  schedule_timeout+0x10f/0x160

> Mar  4 16:49:44 gzboot kernel: [186322.180601]  ? evict+0x14c/0x1b0

> Mar  4 16:49:44 gzboot kernel: [186322.180602]  __wait_for_common+0xa8/0x150

> Mar  4 16:49:44 gzboot kernel: [186322.180603]  wait_for_completion+0x24/0x30

> Mar  4 16:49:44 gzboot kernel: [186322.180637]  core_tpg_del_initiator_node_acl+0x8e/0x120 [target_core_mod]


We would need more details. We can hit that normally when the target driver
waits for stuck IO to complete before calling transport_deregister_session.
I think if you hit the put bug, then we would have seen the refcount warning
in refcount.h fire before the hung task because we do an extra put.

Did the user switch to ACLs? Did you see my comment about it looks like there
is a bug if the user were to add an acl while dynamic was used for the same
initiatorname. In that case, we do not do the put to match the one taken
core_tpg_check_initiator_node_acl. In that case we would hit your hang since
no one ever does the last put.
diff mbox series

Patch

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5ecb9f18a53d..c266defe694f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -584,8 +584,10 @@  void transport_free_session(struct se_session *se_sess)
 		}
 		mutex_unlock(&se_tpg->acl_node_mutex);
 
-		if (se_nacl->dynamic_stop)
+		if (se_nacl->dynamic_stop) {
 			target_put_nacl(se_nacl);
+			se_nacl->dynamic_stop = false;
+		}
 
 		target_put_nacl(se_nacl);
 	}