diff mbox series

ACPICA: avoid double free when object already has a zero reference count

Message ID 20201028200523.111028-1-mark.asselstine@windriver.com
State New
Headers show
Series ACPICA: avoid double free when object already has a zero reference count | expand

Commit Message

Mark Asselstine Oct. 28, 2020, 8:05 p.m. UTC
The first trip into acpi_ut_update_ref_count() for an object where
'object->common.reference_count' is 1 and we are performing a
REF_DECREMENT will result in 'new_count' being 0 and thus the object
is deleted via acpi_ut_delete_internal_obj().

If for some reason we make a subsequent trip into
acpi_ut_update_ref_count() with the same object,
object->common.reference_count' will be 0 and performing a
REF_DECREMENT will produce a warning msg "Reference Count is already
zero, cannot decrement", 'new_count' will again be 0 and the already
deleted object will be attempted to be deleted again via
acpi_ut_delete_internal_obj().

Since the object deletion doesn't NULL the object the calls to
acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),
acpi_os_release_object(), kmem_cache_free() will operate on the object
as if it hasn't been deleted. In many cases this can result in no
issues, but if you are using the slab and a new object has been
created with the same address this can be the cause slab corruption.

Adding a check if we are decrementing to 0 for the first time and only
calling acpi_ut_delete_internal_obj() in this case will prevent
another attempt at deleting the object.

Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
---
 drivers/acpi/acpica/utdelete.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Oct. 29, 2020, 2:05 p.m. UTC | #1
+Erik and Bob

On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine
<mark.asselstine@windriver.com> wrote:
>
> The first trip into acpi_ut_update_ref_count() for an object where
> 'object->common.reference_count' is 1 and we are performing a
> REF_DECREMENT will result in 'new_count' being 0 and thus the object
> is deleted via acpi_ut_delete_internal_obj().
>
> If for some reason we make a subsequent trip into
> acpi_ut_update_ref_count() with the same object,
> object->common.reference_count' will be 0 and performing a
> REF_DECREMENT will produce a warning msg "Reference Count is already
> zero, cannot decrement", 'new_count' will again be 0 and the already
> deleted object will be attempted to be deleted again via
> acpi_ut_delete_internal_obj().
>
> Since the object deletion doesn't NULL the object the calls to
> acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),
> acpi_os_release_object(), kmem_cache_free() will operate on the object
> as if it hasn't been deleted. In many cases this can result in no
> issues, but if you are using the slab and a new object has been
> created with the same address this can be the cause slab corruption.
>
> Adding a check if we are decrementing to 0 for the first time and only
> calling acpi_ut_delete_internal_obj() in this case will prevent
> another attempt at deleting the object.
>
> Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
> ---
>  drivers/acpi/acpica/utdelete.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/acpica/utdelete.c b/drivers/acpi/acpica/utdelete.c
> index 4c0d4e434196..c6b860fd9eb5 100644
> --- a/drivers/acpi/acpica/utdelete.c
> +++ b/drivers/acpi/acpica/utdelete.c
> @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union acpi_operand_object *object, u32 action)
>                                       ACPI_GET_FUNCTION_NAME, object,
>                                       object->common.type, new_count));
>
> -               /* Actually delete the object on a reference count of zero */
> +               /* If we haven't already, actually delete the object on a reference count of zero */
>
> -               if (new_count == 0) {
> +               if (new_count == 0 && original_count != 0) {
>                         acpi_ut_delete_internal_obj(object);
>                 }
>                 message = "Decrement";
> --
> 2.17.1
>
Erik Kaneda Nov. 2, 2020, 6:11 p.m. UTC | #2
> -----Original Message-----

> From: Rafael J. Wysocki <rafael@kernel.org>

> Sent: Thursday, October 29, 2020 7:06 AM

> To: Mark Asselstine <mark.asselstine@windriver.com>

> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Kaneda, Erik

> <erik.kaneda@intel.com>; Moore, Robert <robert.moore@intel.com>

> Subject: Re: [PATCH] ACPICA: avoid double free when object already has a

> zero reference count

> 

> +Erik and Bob

> 

> On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine

> <mark.asselstine@windriver.com> wrote:

> >

> > The first trip into acpi_ut_update_ref_count() for an object where

> > 'object->common.reference_count' is 1 and we are performing a

> > REF_DECREMENT will result in 'new_count' being 0 and thus the object

> > is deleted via acpi_ut_delete_internal_obj().

> >

> > If for some reason we make a subsequent trip into

> > acpi_ut_update_ref_count() with the same object,

> > object->common.reference_count' will be 0 and performing a

> > REF_DECREMENT will produce a warning msg "Reference Count is already

> > zero, cannot decrement", 'new_count' will again be 0 and the already

> > deleted object will be attempted to be deleted again via

> > acpi_ut_delete_internal_obj().


Mark, Do you have an example of AML/ASL that you used to determine this double free?

Thanks,
Erik

> >

> > Since the object deletion doesn't NULL the object the calls to

> > acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),

> > acpi_os_release_object(), kmem_cache_free() will operate on the object

> > as if it hasn't been deleted. In many cases this can result in no

> > issues, but if you are using the slab and a new object has been

> > created with the same address this can be the cause slab corruption.

> >

> > Adding a check if we are decrementing to 0 for the first time and only

> > calling acpi_ut_delete_internal_obj() in this case will prevent

> > another attempt at deleting the object.

> >

> > Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>

> > ---

> >  drivers/acpi/acpica/utdelete.c | 4 ++--

> >  1 file changed, 2 insertions(+), 2 deletions(-)

> >

> > diff --git a/drivers/acpi/acpica/utdelete.c b/drivers/acpi/acpica/utdelete.c

> > index 4c0d4e434196..c6b860fd9eb5 100644

> > --- a/drivers/acpi/acpica/utdelete.c

> > +++ b/drivers/acpi/acpica/utdelete.c

> > @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union

> acpi_operand_object *object, u32 action)

> >                                       ACPI_GET_FUNCTION_NAME, object,

> >                                       object->common.type, new_count));

> >

> > -               /* Actually delete the object on a reference count of zero */

> > +               /* If we haven't already, actually delete the object on a reference

> count of zero */

> >

> > -               if (new_count == 0) {

> > +               if (new_count == 0 && original_count != 0) {

> >                         acpi_ut_delete_internal_obj(object);

> >                 }

> >                 message = "Decrement";

> > --

> > 2.17.1

> >
Mark Asselstine Nov. 2, 2020, 6:51 p.m. UTC | #3
On Monday, November 2, 2020 1:11:22 P.M. EST Kaneda, Erik wrote:
> 

> 

> > -----Original Message-----

> > From: Rafael J. Wysocki <rafael@kernel.org>

> > Sent: Thursday, October 29, 2020 7:06 AM

> > To: Mark Asselstine <mark.asselstine@windriver.com>

> > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Kaneda, Erik

> > <erik.kaneda@intel.com>; Moore, Robert <robert.moore@intel.com>

> > Subject: Re: [PATCH] ACPICA: avoid double free when object already has a

> > zero reference count

> >

> >

> >

> > +Erik and Bob

> >

> >

> >

> > On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine

> > <mark.asselstine@windriver.com> wrote:

> > 

> > >

> > >

> > > The first trip into acpi_ut_update_ref_count() for an object where

> > > 'object->common.reference_count' is 1 and we are performing a

> > > REF_DECREMENT will result in 'new_count' being 0 and thus the object

> > > is deleted via acpi_ut_delete_internal_obj().

> > >

> > >

> > >

> > > If for some reason we make a subsequent trip into

> > > acpi_ut_update_ref_count() with the same object,

> > > object->common.reference_count' will be 0 and performing a

> > > REF_DECREMENT will produce a warning msg "Reference Count is already

> > > zero, cannot decrement", 'new_count' will again be 0 and the already

> > > deleted object will be attempted to be deleted again via

> > > acpi_ut_delete_internal_obj().

> 

> 

> Mark, Do you have an example of AML/ASL that you used to determine this

> double free?


Unfortunately no. It is a rare occurance and a consequence of several actions 
taking place at the same time during boot, including a PCI rescan. 
Unfortunately due to circumstances I am sure you would rather not have to be 
concerned about, we have so far had to focus our efforts on an older kernel 
which also has the preempt-rt patchset applied. It is for this reason I also 
didn't include the dmesg and eventual kernel BUG_ON in my submission.

It is unclear at this time if the additional locking or other changes that 
have been merged since the kernel version we are on would prevent this from 
occuring on an up to date kernel.

I have reviewed the code in the latest linux kernel and as far as I can tell 
the deficiency is still present. If you do go into acpi_ut_update_ref_count() 
with an object with a reference count of 0 and an action of REF_DECREMENT the 
following may be called:

acpi_ut_update_ref_count
  -> acpi_ut_delete_internal_obj
    -> acpi_ut_delete_object_desc
      -> acpi_os_release_object
        -> kmem_cache_free

I completely understand if you have concerns about the change since I can't 
hand you a reproducer. I was hoping the merrits of the change would stand on 
their own as there is no reason to call acpi_ut_delete_internal_obj() if we 
have already done so, even if the rest of the call chain was well behaved. In 
our case the eventual slab corruption ended up affecting the 'Acpi-Operand' 
dedicated cache. We see the corruption happen shortly after we see the msg 
warning that the reference count is already at zero.

Please let me know if there is anything else I can provide to help out and 
thanks for your time reviewing this change.

Thanks,
Mark

 
> Thanks,

> Erik

> 

> 

> > >

> > >

> > > Since the object deletion doesn't NULL the object the calls to

> > > acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),

> > > acpi_os_release_object(), kmem_cache_free() will operate on the object

> > > as if it hasn't been deleted. In many cases this can result in no

> > > issues, but if you are using the slab and a new object has been

> > > created with the same address this can be the cause slab corruption.

> > >

> > >

> > >

> > > Adding a check if we are decrementing to 0 for the first time and only

> > > calling acpi_ut_delete_internal_obj() in this case will prevent

> > > another attempt at deleting the object.

> > >

> > >

> > >

> > > Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>

> > > ---

> > > 

> > >  drivers/acpi/acpica/utdelete.c | 4 ++--

> > >  1 file changed, 2 insertions(+), 2 deletions(-)

> > >

> > >

> > >

> > > diff --git a/drivers/acpi/acpica/utdelete.c

> > > b/drivers/acpi/acpica/utdelete.c

 index 4c0d4e434196..c6b860fd9eb5
> > > 100644

> > > --- a/drivers/acpi/acpica/utdelete.c

> > > +++ b/drivers/acpi/acpica/utdelete.c

> > > @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union

> > 

> > acpi_operand_object *object, u32 action)

> > 

> > >                                       ACPI_GET_FUNCTION_NAME, object,

> > >                                       object->common.type, new_count));

> > >

> > >

> > >

> > > -               /* Actually delete the object on a reference count of

> > > zero */

 +               /* If we haven't already, actually delete the
> > > object on a reference> 

> > count of zero */

> > 

> > >

> > >

> > > -               if (new_count == 0) {

> > > +               if (new_count == 0 && original_count != 0) {

> > > 

> > >                         acpi_ut_delete_internal_obj(object);

> > >                 

> > >                 }

> > >                 message = "Decrement";

> > > 

> > > --

> > > 2.17.1

> > >

> > >
Erik Kaneda Nov. 3, 2020, 6:28 p.m. UTC | #4
> -----Original Message-----
> From: Mark Asselstine <mark.asselstine@windriver.com>
> Sent: Monday, November 2, 2020 10:51 AM
> To: Rafael J. Wysocki <rafael@kernel.org>; Kaneda, Erik
> <erik.kaneda@intel.com>
> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Moore, Robert
> <robert.moore@intel.com>
> Subject: Re: [PATCH] ACPICA: avoid double free when object already has a
> zero reference count
> 
> On Monday, November 2, 2020 1:11:22 P.M. EST Kaneda, Erik wrote:
> >
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Thursday, October 29, 2020 7:06 AM
> > > To: Mark Asselstine <mark.asselstine@windriver.com>
> > > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Kaneda, Erik
> > > <erik.kaneda@intel.com>; Moore, Robert <robert.moore@intel.com>
> > > Subject: Re: [PATCH] ACPICA: avoid double free when object already has
> a
> > > zero reference count
> > >
> > >
> > >
> > > +Erik and Bob
> > >
> > >
> > >
> > > On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine
> > > <mark.asselstine@windriver.com> wrote:
> > >
> > > >
> > > >
> > > > The first trip into acpi_ut_update_ref_count() for an object where
> > > > 'object->common.reference_count' is 1 and we are performing a
> > > > REF_DECREMENT will result in 'new_count' being 0 and thus the object
> > > > is deleted via acpi_ut_delete_internal_obj().
> > > >
> > > >
> > > >
> > > > If for some reason we make a subsequent trip into
> > > > acpi_ut_update_ref_count() with the same object,
> > > > object->common.reference_count' will be 0 and performing a
> > > > REF_DECREMENT will produce a warning msg "Reference Count is
> already
> > > > zero, cannot decrement", 'new_count' will again be 0 and the already
> > > > deleted object will be attempted to be deleted again via
> > > > acpi_ut_delete_internal_obj().
> >
> >
> > Mark, Do you have an example of AML/ASL that you used to determine
> this
> > double free?
> 
> Unfortunately no. It is a rare occurance and a consequence of several actions
> taking place at the same time during boot, including a PCI rescan.
> Unfortunately due to circumstances I am sure you would rather not have to
> be
> concerned about, we have so far had to focus our efforts on an older kernel
> which also has the preempt-rt patchset applied. It is for this reason I also
> didn't include the dmesg and eventual kernel BUG_ON in my submission.
> 
> It is unclear at this time if the additional locking or other changes that
> have been merged since the kernel version we are on would prevent this
> from
> occuring on an up to date kernel.
> 
> I have reviewed the code in the latest linux kernel and as far as I can tell
> the deficiency is still present. If you do go into acpi_ut_update_ref_count()
> with an object with a reference count of 0 and an action of REF_DECREMENT
> the
> following may be called:
> 
> acpi_ut_update_ref_count
>   -> acpi_ut_delete_internal_obj
>     -> acpi_ut_delete_object_desc
>       -> acpi_os_release_object
>         -> kmem_cache_free
> 
> I completely understand if you have concerns about the change since I can't
> hand you a reproducer. I was hoping the merrits of the change would stand
> on
> their own as there is no reason to call acpi_ut_delete_internal_obj() if we
> have already done so, even if the rest of the call chain was well behaved. In
> our case the eventual slab corruption ended up affecting the 'Acpi-Operand'
> dedicated cache. We see the corruption happen shortly after we see the msg
> warning that the reference count is already at zero.
> 
> Please let me know if there is anything else I can provide to help out and
> thanks for your time reviewing this change.

Since this is such a sensitive part of the codebase, I would like to see if it's possible to come up with ASL that reproduces this on our userspace interpreter (acpiexec).

I can try reproduce this from your explanations but it would be helpful to also get an acpidump and dmesg so that we can understand which named objects are causing this issue. This gives us a clue on how to reproduce this in user space.

Could you provide an acpidump of the machine and 2 dmesg logs? I need one normal dmesg that shows this error and a custom kernel built with CONFIG_ACPI_DEBUG=y and booting with the following commandline parameters:

acpi.debug_level=0x80000 acpi.debug_layer=0xffffffff

Thanks,
Erik
> 
> Thanks,
> Mark
> 
> 
> > Thanks,
> > Erik
> >
> >
> > > >
> > > >
> > > > Since the object deletion doesn't NULL the object the calls to
> > > > acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),
> > > > acpi_os_release_object(), kmem_cache_free() will operate on the
> object
> > > > as if it hasn't been deleted. In many cases this can result in no
> > > > issues, but if you are using the slab and a new object has been
> > > > created with the same address this can be the cause slab corruption.
> > > >
> > > >
> > > >
> > > > Adding a check if we are decrementing to 0 for the first time and only
> > > > calling acpi_ut_delete_internal_obj() in this case will prevent
> > > > another attempt at deleting the object.
> > > >
> > > >
> > > >
> > > > Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
> > > > ---
> > > >
> > > >  drivers/acpi/acpica/utdelete.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > >
> > > >
> > > > diff --git a/drivers/acpi/acpica/utdelete.c
> > > > b/drivers/acpi/acpica/utdelete.c
>  index 4c0d4e434196..c6b860fd9eb5
> > > > 100644
> > > > --- a/drivers/acpi/acpica/utdelete.c
> > > > +++ b/drivers/acpi/acpica/utdelete.c
> > > > @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union
> > >
> > > acpi_operand_object *object, u32 action)
> > >
> > > >                                       ACPI_GET_FUNCTION_NAME, object,
> > > >                                       object->common.type, new_count));
> > > >
> > > >
> > > >
> > > > -               /* Actually delete the object on a reference count of
> > > > zero */
>  +               /* If we haven't already, actually delete the
> > > > object on a reference>
> > > count of zero */
> > >
> > > >
> > > >
> > > > -               if (new_count == 0) {
> > > > +               if (new_count == 0 && original_count != 0) {
> > > >
> > > >                         acpi_ut_delete_internal_obj(object);
> > > >
> > > >                 }
> > > >                 message = "Decrement";
> > > >
> > > > --
> > > > 2.17.1
> > > >
> > > >
> 
> 
>
Mark Asselstine Nov. 3, 2020, 6:52 p.m. UTC | #5
On Tuesday, November 3, 2020 1:28:14 P.M. EST Kaneda, Erik wrote:
> 
> > -----Original Message-----
> > From: Mark Asselstine <mark.asselstine@windriver.com>
> > Sent: Monday, November 2, 2020 10:51 AM
> > To: Rafael J. Wysocki <rafael@kernel.org>; Kaneda, Erik
> > <erik.kaneda@intel.com>
> > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Moore, Robert
> > <robert.moore@intel.com>
> > Subject: Re: [PATCH] ACPICA: avoid double free when object already has a
> > zero reference count
> > 
> > On Monday, November 2, 2020 1:11:22 P.M. EST Kaneda, Erik wrote:
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > Sent: Thursday, October 29, 2020 7:06 AM
> > > > To: Mark Asselstine <mark.asselstine@windriver.com>
> > > > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Kaneda, Erik
> > > > <erik.kaneda@intel.com>; Moore, Robert <robert.moore@intel.com>
> > > > Subject: Re: [PATCH] ACPICA: avoid double free when object already has
> > 
> > a
> > 
> > > > zero reference count
> > > > 
> > > > 
> > > > 
> > > > +Erik and Bob
> > > > 
> > > > 
> > > > 
> > > > On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine
> > > > 
> > > > <mark.asselstine@windriver.com> wrote:
> > > > > The first trip into acpi_ut_update_ref_count() for an object where
> > > > > 'object->common.reference_count' is 1 and we are performing a
> > > > > REF_DECREMENT will result in 'new_count' being 0 and thus the object
> > > > > is deleted via acpi_ut_delete_internal_obj().
> > > > > 
> > > > > 
> > > > > 
> > > > > If for some reason we make a subsequent trip into
> > > > > acpi_ut_update_ref_count() with the same object,
> > > > > object->common.reference_count' will be 0 and performing a
> > > > > REF_DECREMENT will produce a warning msg "Reference Count is
> > 
> > already
> > 
> > > > > zero, cannot decrement", 'new_count' will again be 0 and the already
> > > > > deleted object will be attempted to be deleted again via
> > > > > acpi_ut_delete_internal_obj().
> > > 
> > > Mark, Do you have an example of AML/ASL that you used to determine
> > 
> > this
> > 
> > > double free?
> > 
> > Unfortunately no. It is a rare occurance and a consequence of several
> > actions taking place at the same time during boot, including a PCI
> > rescan. Unfortunately due to circumstances I am sure you would rather not
> > have to be
> > concerned about, we have so far had to focus our efforts on an older
> > kernel
> > which also has the preempt-rt patchset applied. It is for this reason I
> > also didn't include the dmesg and eventual kernel BUG_ON in my
> > submission.
> > 
> > It is unclear at this time if the additional locking or other changes that
> > have been merged since the kernel version we are on would prevent this
> > from
> > occuring on an up to date kernel.
> > 
> > I have reviewed the code in the latest linux kernel and as far as I can
> > tell the deficiency is still present. If you do go into
> > acpi_ut_update_ref_count() with an object with a reference count of 0 and
> > an action of REF_DECREMENT the
> > following may be called:
> > 
> > acpi_ut_update_ref_count
> > 
> >   -> acpi_ut_delete_internal_obj
> >   
> >     -> acpi_ut_delete_object_desc
> >     
> >       -> acpi_os_release_object
> >       
> >         -> kmem_cache_free
> > 
> > I completely understand if you have concerns about the change since I
> > can't
> > hand you a reproducer. I was hoping the merrits of the change would stand
> > on
> > their own as there is no reason to call acpi_ut_delete_internal_obj() if
> > we
> > have already done so, even if the rest of the call chain was well behaved.
> > In our case the eventual slab corruption ended up affecting the
> > 'Acpi-Operand' dedicated cache. We see the corruption happen shortly
> > after we see the msg warning that the reference count is already at zero.
> > 
> > Please let me know if there is anything else I can provide to help out and
> > thanks for your time reviewing this change.
> 
> Since this is such a sensitive part of the codebase, I would like to see if
> it's possible to come up with ASL that reproduces this on our userspace
> interpreter (acpiexec).
> 
> I can try reproduce this from your explanations but it would be helpful to
> also get an acpidump and dmesg so that we can understand which named
> objects are causing this issue. This gives us a clue on how to reproduce
> this in user space.
> 
> Could you provide an acpidump of the machine and 2 dmesg logs? I need one
> normal dmesg that shows this error and a custom kernel built with
> CONFIG_ACPI_DEBUG=y and booting with the following commandline parameters:
> 
> acpi.debug_level=0x80000 acpi.debug_layer=0xffffffff

This should be possible. Give me a day or two to put something together. Again 
my intention, as when dealing with any maintainer, is to be sensitive of your 
time and given the kernel version skew and such I am not sure how well it will 
map to the latest kernel code. But if you can pull out some clues to help with 
this mapping I am not going to impede your offer of assistance.

Thanks,
MarkA

> 
> Thanks,
> Erik
> 
> > Thanks,
> > Mark
> > 
> > > Thanks,
> > > Erik
> > > 
> > > > > Since the object deletion doesn't NULL the object the calls to
> > > > > acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),
> > > > > acpi_os_release_object(), kmem_cache_free() will operate on the
> > 
> > object
> > 
> > > > > as if it hasn't been deleted. In many cases this can result in no
> > > > > issues, but if you are using the slab and a new object has been
> > > > > created with the same address this can be the cause slab corruption.
> > > > > 
> > > > > 
> > > > > 
> > > > > Adding a check if we are decrementing to 0 for the first time and
> > > > > only
> > > > > calling acpi_ut_delete_internal_obj() in this case will prevent
> > > > > another attempt at deleting the object.
> > > > > 
> > > > > 
> > > > > 
> > > > > Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
> > > > > ---
> > > > > 
> > > > >  drivers/acpi/acpica/utdelete.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/acpi/acpica/utdelete.c
> > > > > b/drivers/acpi/acpica/utdelete.c
> >  
> >  index 4c0d4e434196..c6b860fd9eb5
> >  
> > > > > 100644
> > > > > --- a/drivers/acpi/acpica/utdelete.c
> > > > > +++ b/drivers/acpi/acpica/utdelete.c
> > > > > @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union
> > > > 
> > > > acpi_operand_object *object, u32 action)
> > > > 
> > > > >                                       ACPI_GET_FUNCTION_NAME,
> > > > >                                       object,
> > > > >                                       object->common.type,
> > > > >                                       new_count));
> > > > > 
> > > > > -               /* Actually delete the object on a reference count
> > > > > of
> > > > > zero */
> >  
> >  +               /* If we haven't already, actually delete the
> >  
> > > > > object on a reference>
> > > > 
> > > > count of zero */
> > > > 
> > > > > -               if (new_count == 0) {
> > > > > +               if (new_count == 0 && original_count != 0) {
> > > > > 
> > > > >                         acpi_ut_delete_internal_obj(object);
> > > > >                 
> > > > >                 }
> > > > >                 message = "Decrement";
> > > > > 
> > > > > --
> > > > > 2.17.1
Mark Asselstine Nov. 19, 2020, 10:30 p.m. UTC | #6
On Tuesday, November 3, 2020 1:52:28 P.M. EST Mark Asselstine wrote:
> On Tuesday, November 3, 2020 1:28:14 P.M. EST Kaneda, Erik wrote:

> > > -----Original Message-----

> > > From: Mark Asselstine <mark.asselstine@windriver.com>

> > > Sent: Monday, November 2, 2020 10:51 AM

> > > To: Rafael J. Wysocki <rafael@kernel.org>; Kaneda, Erik

> > > <erik.kaneda@intel.com>

> > > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Moore, Robert

> > > <robert.moore@intel.com>

> > > Subject: Re: [PATCH] ACPICA: avoid double free when object already has a

> > > zero reference count

> > > 

> > > On Monday, November 2, 2020 1:11:22 P.M. EST Kaneda, Erik wrote:

> > > > > -----Original Message-----

> > > > > From: Rafael J. Wysocki <rafael@kernel.org>

> > > > > Sent: Thursday, October 29, 2020 7:06 AM

> > > > > To: Mark Asselstine <mark.asselstine@windriver.com>

> > > > > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Kaneda,

> > > > > Erik

> > > > > <erik.kaneda@intel.com>; Moore, Robert <robert.moore@intel.com>

> > > > > Subject: Re: [PATCH] ACPICA: avoid double free when object already

> > > > > has

> > > 

> > > a

> > > 

> > > > > zero reference count

> > > > > 

> > > > > 

> > > > > 

> > > > > +Erik and Bob

> > > > > 

> > > > > 

> > > > > 

> > > > > On Thu, Oct 29, 2020 at 3:05 AM Mark Asselstine

> > > > > 

> > > > > <mark.asselstine@windriver.com> wrote:

> > > > > > The first trip into acpi_ut_update_ref_count() for an object where

> > > > > > 'object->common.reference_count' is 1 and we are performing a

> > > > > > REF_DECREMENT will result in 'new_count' being 0 and thus the

> > > > > > object

> > > > > > is deleted via acpi_ut_delete_internal_obj().

> > > > > > 

> > > > > > 

> > > > > > 

> > > > > > If for some reason we make a subsequent trip into

> > > > > > acpi_ut_update_ref_count() with the same object,

> > > > > > object->common.reference_count' will be 0 and performing a

> > > > > > REF_DECREMENT will produce a warning msg "Reference Count is

> > > 

> > > already

> > > 

> > > > > > zero, cannot decrement", 'new_count' will again be 0 and the

> > > > > > already

> > > > > > deleted object will be attempted to be deleted again via

> > > > > > acpi_ut_delete_internal_obj().

> > > > 

> > > > Mark, Do you have an example of AML/ASL that you used to determine

> > > 

> > > this

> > > 

> > > > double free?

> > > 

> > > Unfortunately no. It is a rare occurance and a consequence of several

> > > actions taking place at the same time during boot, including a PCI

> > > rescan. Unfortunately due to circumstances I am sure you would rather

> > > not

> > > have to be

> > > concerned about, we have so far had to focus our efforts on an older

> > > kernel

> > > which also has the preempt-rt patchset applied. It is for this reason I

> > > also didn't include the dmesg and eventual kernel BUG_ON in my

> > > submission.

> > > 

> > > It is unclear at this time if the additional locking or other changes

> > > that

> > > have been merged since the kernel version we are on would prevent this

> > > from

> > > occuring on an up to date kernel.

> > > 

> > > I have reviewed the code in the latest linux kernel and as far as I can

> > > tell the deficiency is still present. If you do go into

> > > acpi_ut_update_ref_count() with an object with a reference count of 0

> > > and

> > > an action of REF_DECREMENT the

> > > following may be called:

> > > 

> > > acpi_ut_update_ref_count

> > > 

> > >   -> acpi_ut_delete_internal_obj

> > >   

> > >     -> acpi_ut_delete_object_desc

> > >     

> > >       -> acpi_os_release_object

> > >       

> > >         -> kmem_cache_free

> > > 

> > > I completely understand if you have concerns about the change since I

> > > can't

> > > hand you a reproducer. I was hoping the merrits of the change would

> > > stand

> > > on

> > > their own as there is no reason to call acpi_ut_delete_internal_obj() if

> > > we

> > > have already done so, even if the rest of the call chain was well

> > > behaved.

> > > In our case the eventual slab corruption ended up affecting the

> > > 'Acpi-Operand' dedicated cache. We see the corruption happen shortly

> > > after we see the msg warning that the reference count is already at

> > > zero.

> > > 

> > > Please let me know if there is anything else I can provide to help out

> > > and

> > > thanks for your time reviewing this change.

> > 

> > Since this is such a sensitive part of the codebase, I would like to see

> > if

> > it's possible to come up with ASL that reproduces this on our userspace

> > interpreter (acpiexec).

> > 

> > I can try reproduce this from your explanations but it would be helpful to

> > also get an acpidump and dmesg so that we can understand which named

> > objects are causing this issue. This gives us a clue on how to reproduce

> > this in user space.

> > 

> > Could you provide an acpidump of the machine and 2 dmesg logs? I need one

> > normal dmesg that shows this error and a custom kernel built with

> > CONFIG_ACPI_DEBUG=y and booting with the following commandline parameters:

> > 

> > acpi.debug_level=0x80000 acpi.debug_layer=0xffffffff

> 

> This should be possible. Give me a day or two to put something together.

> Again my intention, as when dealing with any maintainer, is to be sensitive

> of your time and given the kernel version skew and such I am not sure how

> well it will map to the latest kernel code. But if you can pull out some

> clues to help with this mapping I am not going to impede your offer of

> assistance.


I have supplied what I could directly to the Intel folks. Unfortunately, as I 
had stated previously our reproduction case uses an older kernel so can't 
easily map to the current state of code in mainline.

I go back to my original assertion that the same condition which is being 
caught by the warning should be used to prevent a second attempt at deleting 
an already deleted object. Attempting a second deletion has been observed to 
be troublesome as the underlying memory could now be in used by a valid object 
and slab corruption is highly likely.

I have also supplied a github pull request to the acpica repo as this issue 
could also manifest itself negatively in other OSes.

https://github.com/acpica/acpica/pull/652

Mark

> 

> Thanks,

> MarkA

> 

> > Thanks,

> > Erik

> > 

> > > Thanks,

> > > Mark

> > > 

> > > > Thanks,

> > > > Erik

> > > > 

> > > > > > Since the object deletion doesn't NULL the object the calls to

> > > > > > acpi_ut_delete_internal_obj(), acpi_ut_delete_object_desc(),

> > > > > > acpi_os_release_object(), kmem_cache_free() will operate on the

> > > 

> > > object

> > > 

> > > > > > as if it hasn't been deleted. In many cases this can result in no

> > > > > > issues, but if you are using the slab and a new object has been

> > > > > > created with the same address this can be the cause slab

> > > > > > corruption.

> > > > > > 

> > > > > > 

> > > > > > 

> > > > > > Adding a check if we are decrementing to 0 for the first time and

> > > > > > only

> > > > > > calling acpi_ut_delete_internal_obj() in this case will prevent

> > > > > > another attempt at deleting the object.

> > > > > > 

> > > > > > 

> > > > > > 

> > > > > > Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>

> > > > > > ---

> > > > > > 

> > > > > >  drivers/acpi/acpica/utdelete.c | 4 ++--

> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)

> > > > > > 

> > > > > > diff --git a/drivers/acpi/acpica/utdelete.c

> > > > > > b/drivers/acpi/acpica/utdelete.c

> > >  

> > >  index 4c0d4e434196..c6b860fd9eb5

> > >  

> > > > > > 100644

> > > > > > --- a/drivers/acpi/acpica/utdelete.c

> > > > > > +++ b/drivers/acpi/acpica/utdelete.c

> > > > > > @@ -421,9 +421,9 @@ acpi_ut_update_ref_count(union

> > > > > 

> > > > > acpi_operand_object *object, u32 action)

> > > > > 

> > > > > >                                       ACPI_GET_FUNCTION_NAME,

> > > > > >                                       object,

> > > > > >                                       object->common.type,

> > > > > >                                       new_count));

> > > > > > 

> > > > > > -               /* Actually delete the object on a reference count

> > > > > > of

> > > > > > zero */

> > >  

> > >  +               /* If we haven't already, actually delete the

> > >  

> > > > > > object on a reference>

> > > > > 

> > > > > count of zero */

> > > > > 

> > > > > > -               if (new_count == 0) {

> > > > > > +               if (new_count == 0 && original_count != 0) {

> > > > > > 

> > > > > >                         acpi_ut_delete_internal_obj(object);

> > > > > >                 

> > > > > >                 }

> > > > > >                 message = "Decrement";

> > > > > > 

> > > > > > --

> > > > > > 2.17.1
diff mbox series

Patch

diff --git a/drivers/acpi/acpica/utdelete.c b/drivers/acpi/acpica/utdelete.c
index 4c0d4e434196..c6b860fd9eb5 100644
--- a/drivers/acpi/acpica/utdelete.c
+++ b/drivers/acpi/acpica/utdelete.c
@@ -421,9 +421,9 @@  acpi_ut_update_ref_count(union acpi_operand_object *object, u32 action)
 				      ACPI_GET_FUNCTION_NAME, object,
 				      object->common.type, new_count));
 
-		/* Actually delete the object on a reference count of zero */
+		/* If we haven't already, actually delete the object on a reference count of zero */
 
-		if (new_count == 0) {
+		if (new_count == 0 && original_count != 0) {
 			acpi_ut_delete_internal_obj(object);
 		}
 		message = "Decrement";