diff mbox series

[v2] s390/vfio-ap: fix memory leak in mdev remove callback

Message ID 20210510214837.359717-1-akrowiak@linux.ibm.com
State Superseded
Headers show
Series [v2] s390/vfio-ap: fix memory leak in mdev remove callback | expand

Commit Message

Anthony Krowiak May 10, 2021, 9:48 p.m. UTC
The mdev remove callback for the vfio_ap device driver bails out with
-EBUSY if the mdev is in use by a KVM guest. The intended purpose was
to prevent the mdev from being removed while in use; however, returning a
non-zero rc does not prevent removal. This could result in a memory leak
of the resources allocated when the mdev was created. In addition, the
KVM guest will still have access to the AP devices assigned to the mdev
even though the mdev no longer exists.

To prevent this scenario, cleanup will be done - including unplugging the
AP adapters, domains and control domains - regardless of whether the mdev
is in use by a KVM guest or not.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

Comments

Anthony Krowiak May 10, 2021, 9:56 p.m. UTC | #1
On 5/10/21 5:48 PM, Tony Krowiak wrote:
> The mdev remove callback for the vfio_ap device driver bails out with
> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was
> to prevent the mdev from being removed while in use; however, returning a
> non-zero rc does not prevent removal. This could result in a memory leak
> of the resources allocated when the mdev was created. In addition, the
> KVM guest will still have access to the AP devices assigned to the mdev
> even though the mdev no longer exists.
>
> To prevent this scenario, cleanup will be done - including unplugging the
> AP adapters, domains and control domains - regardless of whether the mdev
> is in use by a KVM guest or not.
>
> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
> Cc: stable@vger.kernel.org
> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

The Signed-off-by was erroneously put in by the git sendemail
command. Please take this out of your reply-all responses. thanks.

> ---
>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------
>   1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index b2c7e10dfdcd..f90c9103dac2 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,6 +26,7 @@
>   
>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>   static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);
>   
>   static int match_apqn(struct device *dev, const void *data)
>   {
> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>   
>   	mutex_lock(&matrix_dev->lock);
> -
> -	/*
> -	 * If the KVM pointer is in flux or the guest is running, disallow
> -	 * un-assignment of control domain.
> -	 */
> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
> -		mutex_unlock(&matrix_dev->lock);
> -		return -EBUSY;
> -	}
> -
> -	vfio_ap_mdev_reset_queues(mdev);
> +	vfio_ap_mdev_unset_kvm(matrix_mdev);
>   	list_del(&matrix_mdev->node);
>   	kfree(matrix_mdev);
>   	mdev_set_drvdata(mdev, NULL);
Cornelia Huck May 12, 2021, 10:35 a.m. UTC | #2
On Mon, 10 May 2021 17:48:37 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The mdev remove callback for the vfio_ap device driver bails out with

> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> to prevent the mdev from being removed while in use; however, returning a

> non-zero rc does not prevent removal. This could result in a memory leak

> of the resources allocated when the mdev was created. In addition, the

> KVM guest will still have access to the AP devices assigned to the mdev

> even though the mdev no longer exists.

> 

> To prevent this scenario, cleanup will be done - including unplugging the

> AP adapters, domains and control domains - regardless of whether the mdev

> is in use by a KVM guest or not.

> 

> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> Cc: stable@vger.kernel.org

> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> ---

>  drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>  1 file changed, 2 insertions(+), 11 deletions(-)


With the S-o-b fixed,

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Jason Gunthorpe May 12, 2021, 12:41 p.m. UTC | #3
On Mon, May 10, 2021 at 05:48:37PM -0400, Tony Krowiak wrote:
> The mdev remove callback for the vfio_ap device driver bails out with

> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> to prevent the mdev from being removed while in use; however, returning a

> non-zero rc does not prevent removal. This could result in a memory leak

> of the resources allocated when the mdev was created. In addition, the

> KVM guest will still have access to the AP devices assigned to the mdev

> even though the mdev no longer exists.

> 

> To prevent this scenario, cleanup will be done - including unplugging the

> AP adapters, domains and control domains - regardless of whether the mdev

> is in use by a KVM guest or not.

> 

> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> Cc: stable@vger.kernel.org

> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> ---

>  drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>  1 file changed, 2 insertions(+), 11 deletions(-)


Can you please ensure this goes to a -rc branch or through Alex's
tree?

Thanks,
Jason
Christian Borntraeger May 12, 2021, 3:32 p.m. UTC | #4
On 12.05.21 14:41, Jason Gunthorpe wrote:
> On Mon, May 10, 2021 at 05:48:37PM -0400, Tony Krowiak wrote:

>> The mdev remove callback for the vfio_ap device driver bails out with

>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

>> to prevent the mdev from being removed while in use; however, returning a

>> non-zero rc does not prevent removal. This could result in a memory leak

>> of the resources allocated when the mdev was created. In addition, the

>> KVM guest will still have access to the AP devices assigned to the mdev

>> even though the mdev no longer exists.

>>

>> To prevent this scenario, cleanup will be done - including unplugging the

>> AP adapters, domains and control domains - regardless of whether the mdev

>> is in use by a KVM guest or not.

>>

>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

>> Cc: stable@vger.kernel.org

>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

>> ---

>>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>>   1 file changed, 2 insertions(+), 11 deletions(-)

> 

> Can you please ensure this goes to a -rc branch or through Alex's

> tree?


So you want this is 5.13-rc?
I can apply this to the s390 tree if that is ok.
Christian Borntraeger May 12, 2021, 4:49 p.m. UTC | #5
On 10.05.21 23:48, Tony Krowiak wrote:
> The mdev remove callback for the vfio_ap device driver bails out with

> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> to prevent the mdev from being removed while in use; however, returning a

> non-zero rc does not prevent removal. This could result in a memory leak

> of the resources allocated when the mdev was created. In addition, the

> KVM guest will still have access to the AP devices assigned to the mdev

> even though the mdev no longer exists.

> 

> To prevent this scenario, cleanup will be done - including unplugging the

> AP adapters, domains and control domains - regardless of whether the mdev

> is in use by a KVM guest or not.

> 

> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> Cc: stable@vger.kernel.org

> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>


applied to the internal s390 tree. Lets give it some days of testing coverage of
CI and others.

Vasily, can you add this for the s390/fixes branch early next week if nothing
goes wrong?

> ---

>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>   1 file changed, 2 insertions(+), 11 deletions(-)

> 

> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

> index b2c7e10dfdcd..f90c9103dac2 100644

> --- a/drivers/s390/crypto/vfio_ap_ops.c

> +++ b/drivers/s390/crypto/vfio_ap_ops.c

> @@ -26,6 +26,7 @@

>   

>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

>   static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

>   

>   static int match_apqn(struct device *dev, const void *data)

>   {

> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

>   

>   	mutex_lock(&matrix_dev->lock);

> -

> -	/*

> -	 * If the KVM pointer is in flux or the guest is running, disallow

> -	 * un-assignment of control domain.

> -	 */

> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

> -		mutex_unlock(&matrix_dev->lock);

> -		return -EBUSY;

> -	}

> -

> -	vfio_ap_mdev_reset_queues(mdev);

> +	vfio_ap_mdev_unset_kvm(matrix_mdev);

>   	list_del(&matrix_mdev->node);

>   	kfree(matrix_mdev);

>   	mdev_set_drvdata(mdev, NULL);

>
Jason Gunthorpe May 12, 2021, 4:50 p.m. UTC | #6
On Wed, May 12, 2021 at 05:32:52PM +0200, Christian Borntraeger wrote:
> 

> 

> On 12.05.21 14:41, Jason Gunthorpe wrote:

> > On Mon, May 10, 2021 at 05:48:37PM -0400, Tony Krowiak wrote:

> > > The mdev remove callback for the vfio_ap device driver bails out with

> > > -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> > > to prevent the mdev from being removed while in use; however, returning a

> > > non-zero rc does not prevent removal. This could result in a memory leak

> > > of the resources allocated when the mdev was created. In addition, the

> > > KVM guest will still have access to the AP devices assigned to the mdev

> > > even though the mdev no longer exists.

> > > 

> > > To prevent this scenario, cleanup will be done - including unplugging the

> > > AP adapters, domains and control domains - regardless of whether the mdev

> > > is in use by a KVM guest or not.

> > > 

> > > Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> > > Cc: stable@vger.kernel.org

> > > Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> > >   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

> > >   1 file changed, 2 insertions(+), 11 deletions(-)

> > 

> > Can you please ensure this goes to a -rc branch or through Alex's

> > tree?

> 

> So you want this is 5.13-rc?

> I can apply this to the s390 tree if that is ok.


Yes please

Jason
Halil Pasic May 12, 2021, 6:35 p.m. UTC | #7
On Mon, 10 May 2021 17:48:37 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The mdev remove callback for the vfio_ap device driver bails out with

> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> to prevent the mdev from being removed while in use; however, returning a

> non-zero rc does not prevent removal. This could result in a memory leak

> of the resources allocated when the mdev was created. In addition, the

> KVM guest will still have access to the AP devices assigned to the mdev

> even though the mdev no longer exists.

> 

> To prevent this scenario, cleanup will be done - including unplugging the

> AP adapters, domains and control domains - regardless of whether the mdev

> is in use by a KVM guest or not.

> 

> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> Cc: stable@vger.kernel.org

> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> ---

>  drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>  1 file changed, 2 insertions(+), 11 deletions(-)

> 

> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

> index b2c7e10dfdcd..f90c9103dac2 100644

> --- a/drivers/s390/crypto/vfio_ap_ops.c

> +++ b/drivers/s390/crypto/vfio_ap_ops.c

> @@ -26,6 +26,7 @@

> 

>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

>  static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

> 

>  static int match_apqn(struct device *dev, const void *data)

>  {

> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

> 

>  	mutex_lock(&matrix_dev->lock);

> -

> -	/*

> -	 * If the KVM pointer is in flux or the guest is running, disallow

> -	 * un-assignment of control domain.

> -	 */

> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

> -		mutex_unlock(&matrix_dev->lock);

> -		return -EBUSY;

> -	}

> -

> -	vfio_ap_mdev_reset_queues(mdev);

> +	vfio_ap_mdev_unset_kvm(matrix_mdev);


>  	list_del(&matrix_mdev->node);

>  	kfree(matrix_mdev);


Are we at risk of handle_pqap() in arch/s390/kvm/priv.c using an
already freed pqap_hook (which is a member of the matrix_mdev pointee
that is freed just above my comment).

I'm aware of the fact that vfio_ap_mdev_unset_kvm() does a
matrix_mdev->kvm->arch.crypto.pqap_hook = NULL but that is
AFRICT not done under any lock relevant for handle_pqap(). I guess
the idea is, I guess, the check cited below 

static int handle_pqap(struct kvm_vcpu *vcpu)
[..]
        /*                                                                      
         * Verify that the hook callback is registered, lock the owner          
         * and call the hook.                                                   
         */                                                                     
        if (vcpu->kvm->arch.crypto.pqap_hook) {                                 
                if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))   
                        return -EOPNOTSUPP;                                     
                ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);             
                module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);            
                if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)             
                        kvm_s390_set_psw_cc(vcpu, 3);                           
                return ret;                                                     
        }

is going to catch it, but I'm not sure it is guaranteed to catch it.
Opinions?

Regards,
Halil


>  	mdev_set_drvdata(mdev, NULL);
Anthony Krowiak May 13, 2021, 2:18 p.m. UTC | #8
On 5/12/21 8:41 AM, Jason Gunthorpe wrote:
> On Mon, May 10, 2021 at 05:48:37PM -0400, Tony Krowiak wrote:

>> The mdev remove callback for the vfio_ap device driver bails out with

>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

>> to prevent the mdev from being removed while in use; however, returning a

>> non-zero rc does not prevent removal. This could result in a memory leak

>> of the resources allocated when the mdev was created. In addition, the

>> KVM guest will still have access to the AP devices assigned to the mdev

>> even though the mdev no longer exists.

>>

>> To prevent this scenario, cleanup will be done - including unplugging the

>> AP adapters, domains and control domains - regardless of whether the mdev

>> is in use by a KVM guest or not.

>>

>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

>> Cc: stable@vger.kernel.org

>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

>> ---

>>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>>   1 file changed, 2 insertions(+), 11 deletions(-)

> Can you please ensure this goes to a -rc branch or through Alex's

> tree?


I'm sorry, I don't know what a -rc branch is nor how to push this
to Alex's tree, but I'd be happy to do so if you tell me now:)

>

> Thanks,

> Jason
Anthony Krowiak May 13, 2021, 2:19 p.m. UTC | #9
On 5/12/21 11:32 AM, Christian Borntraeger wrote:
>

>

> On 12.05.21 14:41, Jason Gunthorpe wrote:

>> On Mon, May 10, 2021 at 05:48:37PM -0400, Tony Krowiak wrote:

>>> The mdev remove callback for the vfio_ap device driver bails out with

>>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

>>> to prevent the mdev from being removed while in use; however, 

>>> returning a

>>> non-zero rc does not prevent removal. This could result in a memory 

>>> leak

>>> of the resources allocated when the mdev was created. In addition, the

>>> KVM guest will still have access to the AP devices assigned to the mdev

>>> even though the mdev no longer exists.

>>>

>>> To prevent this scenario, cleanup will be done - including 

>>> unplugging the

>>> AP adapters, domains and control domains - regardless of whether the 

>>> mdev

>>> is in use by a KVM guest or not.

>>>

>>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open 

>>> callback")

>>> Cc: stable@vger.kernel.org

>>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

>>> ---

>>>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>>>   1 file changed, 2 insertions(+), 11 deletions(-)

>>

>> Can you please ensure this goes to a -rc branch or through Alex's

>> tree?

>

> So you want this is 5.13-rc?

> I can apply this to the s390 tree if that is ok.


If it is in time for 5.13.-rc, then yes, go ahead and
apply it.
Anthony Krowiak May 13, 2021, 2:35 p.m. UTC | #10
On 5/12/21 2:35 PM, Halil Pasic wrote:
> On Mon, 10 May 2021 17:48:37 -0400

> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>

>> The mdev remove callback for the vfio_ap device driver bails out with

>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

>> to prevent the mdev from being removed while in use; however, returning a

>> non-zero rc does not prevent removal. This could result in a memory leak

>> of the resources allocated when the mdev was created. In addition, the

>> KVM guest will still have access to the AP devices assigned to the mdev

>> even though the mdev no longer exists.

>>

>> To prevent this scenario, cleanup will be done - including unplugging the

>> AP adapters, domains and control domains - regardless of whether the mdev

>> is in use by a KVM guest or not.

>>

>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

>> Cc: stable@vger.kernel.org

>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

>> ---

>>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>>   1 file changed, 2 insertions(+), 11 deletions(-)

>>

>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

>> index b2c7e10dfdcd..f90c9103dac2 100644

>> --- a/drivers/s390/crypto/vfio_ap_ops.c

>> +++ b/drivers/s390/crypto/vfio_ap_ops.c

>> @@ -26,6 +26,7 @@

>>

>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

>>   static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

>> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

>>

>>   static int match_apqn(struct device *dev, const void *data)

>>   {

>> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

>>

>>   	mutex_lock(&matrix_dev->lock);

>> -

>> -	/*

>> -	 * If the KVM pointer is in flux or the guest is running, disallow

>> -	 * un-assignment of control domain.

>> -	 */

>> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

>> -		mutex_unlock(&matrix_dev->lock);

>> -		return -EBUSY;

>> -	}

>> -

>> -	vfio_ap_mdev_reset_queues(mdev);

>> +	vfio_ap_mdev_unset_kvm(matrix_mdev);

>>   	list_del(&matrix_mdev->node);

>>   	kfree(matrix_mdev);

> Are we at risk of handle_pqap() in arch/s390/kvm/priv.c using an

> already freed pqap_hook (which is a member of the matrix_mdev pointee

> that is freed just above my comment).

>

> I'm aware of the fact that vfio_ap_mdev_unset_kvm() does a

> matrix_mdev->kvm->arch.crypto.pqap_hook = NULL but that is

> AFRICT not done under any lock relevant for handle_pqap(). I guess

> the idea is, I guess, the check cited below

>

> static int handle_pqap(struct kvm_vcpu *vcpu)

> [..]

>          /*

>           * Verify that the hook callback is registered, lock the owner

>           * and call the hook.

>           */

>          if (vcpu->kvm->arch.crypto.pqap_hook) {

>                  if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

>                          return -EOPNOTSUPP;

>                  ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

>                  module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

>                  if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)

>                          kvm_s390_set_psw_cc(vcpu, 3);

>                  return ret;

>          }

>

> is going to catch it, but I'm not sure it is guaranteed to catch it.

> Opinions?


The hook itself - handle_pqap() function in vfio_ap_ops.c - also checks
to see if the reference to the hook is set and terminates with an error 
if it
is not. If the hook is invoked subsequent to the remove callback above,
all should be fine since the check is also done under the matrix_dev->lock.

>

> Regards,

> Halil

>

>

>>   	mdev_set_drvdata(mdev, NULL);
Jason Gunthorpe May 13, 2021, 5:25 p.m. UTC | #11
On Thu, May 13, 2021 at 10:18:44AM -0400, Tony Krowiak wrote:
> 

> 

> On 5/12/21 8:41 AM, Jason Gunthorpe wrote:

> > On Mon, May 10, 2021 at 05:48:37PM -0400, Tony Krowiak wrote:

> > > The mdev remove callback for the vfio_ap device driver bails out with

> > > -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> > > to prevent the mdev from being removed while in use; however, returning a

> > > non-zero rc does not prevent removal. This could result in a memory leak

> > > of the resources allocated when the mdev was created. In addition, the

> > > KVM guest will still have access to the AP devices assigned to the mdev

> > > even though the mdev no longer exists.

> > > 

> > > To prevent this scenario, cleanup will be done - including unplugging the

> > > AP adapters, domains and control domains - regardless of whether the mdev

> > > is in use by a KVM guest or not.

> > > 

> > > Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> > > Cc: stable@vger.kernel.org

> > > Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> > >   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

> > >   1 file changed, 2 insertions(+), 11 deletions(-)

> > Can you please ensure this goes to a -rc branch or through Alex's

> > tree?

> 

> I'm sorry, I don't know what a -rc branch is nor how to push this

> to Alex's tree, but I'd be happy to do so if you tell me now:)


If Christian takes it for 5.13-rc then that is OK

Otherwise please ask AlexW to take it

Thanks,
Jason
Halil Pasic May 13, 2021, 5:32 p.m. UTC | #12
On Thu, 13 May 2021 14:25:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> > I'm sorry, I don't know what a -rc branch is nor how to push this

> > to Alex's tree, but I'd be happy to do so if you tell me now:)  

> 

> If Christian takes it for 5.13-rc then that is OK


I'm pretty confident that Christian is handling this. IMHO there is
no need to bother Alex with it.

Regards,
Halil
Jason Gunthorpe May 13, 2021, 5:34 p.m. UTC | #13
On Thu, May 13, 2021 at 07:32:03PM +0200, Halil Pasic wrote:
> On Thu, 13 May 2021 14:25:09 -0300

> Jason Gunthorpe <jgg@nvidia.com> wrote:

> 

> > > I'm sorry, I don't know what a -rc branch is nor how to push this

> > > to Alex's tree, but I'd be happy to do so if you tell me now:)  

> > 

> > If Christian takes it for 5.13-rc then that is OK

> 

> I'm pretty confident that Christian is handling this. IMHO there is

> no need to bother Alex with it.


We need to ensure there are not a bunch of conflicts with the VFIO
tree that will have a nother batch of work on this driver for the next
merge window.
 
Jason
Halil Pasic May 13, 2021, 5:45 p.m. UTC | #14
On Thu, 13 May 2021 10:35:05 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 5/12/21 2:35 PM, Halil Pasic wrote:

> > On Mon, 10 May 2021 17:48:37 -0400

> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >  

> >> The mdev remove callback for the vfio_ap device driver bails out with

> >> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> >> to prevent the mdev from being removed while in use; however, returning a

> >> non-zero rc does not prevent removal. This could result in a memory leak

> >> of the resources allocated when the mdev was created. In addition, the

> >> KVM guest will still have access to the AP devices assigned to the mdev

> >> even though the mdev no longer exists.

> >>

> >> To prevent this scenario, cleanup will be done - including unplugging the

> >> AP adapters, domains and control domains - regardless of whether the mdev

> >> is in use by a KVM guest or not.

> >>

> >> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> >> Cc: stable@vger.kernel.org

> >> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> >> ---

> >>   drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

> >>   1 file changed, 2 insertions(+), 11 deletions(-)

> >>

> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

> >> index b2c7e10dfdcd..f90c9103dac2 100644

> >> --- a/drivers/s390/crypto/vfio_ap_ops.c

> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c

> >> @@ -26,6 +26,7 @@

> >>

> >>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

> >>   static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

> >> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

> >>

> >>   static int match_apqn(struct device *dev, const void *data)

> >>   {

> >> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

> >>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

> >>

> >>   	mutex_lock(&matrix_dev->lock);

> >> -

> >> -	/*

> >> -	 * If the KVM pointer is in flux or the guest is running, disallow

> >> -	 * un-assignment of control domain.

> >> -	 */

> >> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

> >> -		mutex_unlock(&matrix_dev->lock);

> >> -		return -EBUSY;

> >> -	}

> >> -

> >> -	vfio_ap_mdev_reset_queues(mdev);

> >> +	vfio_ap_mdev_unset_kvm(matrix_mdev);

> >>   	list_del(&matrix_mdev->node);

> >>   	kfree(matrix_mdev);  

> > Are we at risk of handle_pqap() in arch/s390/kvm/priv.c using an

> > already freed pqap_hook (which is a member of the matrix_mdev pointee

> > that is freed just above my comment).

> >

> > I'm aware of the fact that vfio_ap_mdev_unset_kvm() does a

> > matrix_mdev->kvm->arch.crypto.pqap_hook = NULL but that is

> > AFRICT not done under any lock relevant for handle_pqap(). I guess

> > the idea is, I guess, the check cited below

> >

> > static int handle_pqap(struct kvm_vcpu *vcpu)

> > [..]

> >          /*

> >           * Verify that the hook callback is registered, lock the owner

> >           * and call the hook.

> >           */

> >          if (vcpu->kvm->arch.crypto.pqap_hook) {

> >                  if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

> >                          return -EOPNOTSUPP;

> >                  ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

> >                  module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

> >                  if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)

> >                          kvm_s390_set_psw_cc(vcpu, 3);

> >                  return ret;

> >          }

> >

> > is going to catch it, but I'm not sure it is guaranteed to catch it.

> > Opinions?  

> 

> The hook itself - handle_pqap() function in vfio_ap_ops.c - also checks

> to see if the reference to the hook is set and terminates with an error 

> if it

> is not. If the hook is invoked subsequent to the remove callback above,

> all should be fine since the check is also done under the matrix_dev->lock.

> 


I don't quite understand your logic. Let us assume matrix_mdev was freed,
but vcpu->kvm->arch.crypto.pqap_hook still points to what used to be
(*matrix_mdev).pqap_hook. In that case the function pointer
vcpu->kvm->arch.crypto.pqap_hook->hook is used after it was freed, and
may not point to the handle_pqap() function in vfio_ap_ops.c, thus the
check you are referring to ain't necessarily relevant. Than is
if you mean the check in the  handle_pqap() function in vfio_ap_ops.c; if
you mean the check in handle_pqap() in arch/s390/kvm/priv.c, that one is
not done under the matrix_dev->lock. Or do I have a hole somewhere in my
reasoning?

Regards,
Halil
Anthony Krowiak May 13, 2021, 7:23 p.m. UTC | #15
On 5/13/21 1:45 PM, Halil Pasic wrote:
> On Thu, 13 May 2021 10:35:05 -0400

> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>

>> On 5/12/21 2:35 PM, Halil Pasic wrote:

>>> On Mon, 10 May 2021 17:48:37 -0400

>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>   

>>>> The mdev remove callback for the vfio_ap device driver bails out with

>>>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

>>>> to prevent the mdev from being removed while in use; however, returning a

>>>> non-zero rc does not prevent removal. This could result in a memory leak

>>>> of the resources allocated when the mdev was created. In addition, the

>>>> KVM guest will still have access to the AP devices assigned to the mdev

>>>> even though the mdev no longer exists.

>>>>

>>>> To prevent this scenario, cleanup will be done - including unplugging the

>>>> AP adapters, domains and control domains - regardless of whether the mdev

>>>> is in use by a KVM guest or not.

>>>>

>>>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

>>>> Cc: stable@vger.kernel.org

>>>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

>>>> ---

>>>>    drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>>>>    1 file changed, 2 insertions(+), 11 deletions(-)

>>>>

>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

>>>> index b2c7e10dfdcd..f90c9103dac2 100644

>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c

>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c

>>>> @@ -26,6 +26,7 @@

>>>>

>>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

>>>>    static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

>>>> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

>>>>

>>>>    static int match_apqn(struct device *dev, const void *data)

>>>>    {

>>>> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

>>>>    	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

>>>>

>>>>    	mutex_lock(&matrix_dev->lock);

>>>> -

>>>> -	/*

>>>> -	 * If the KVM pointer is in flux or the guest is running, disallow

>>>> -	 * un-assignment of control domain.

>>>> -	 */

>>>> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

>>>> -		mutex_unlock(&matrix_dev->lock);

>>>> -		return -EBUSY;

>>>> -	}

>>>> -

>>>> -	vfio_ap_mdev_reset_queues(mdev);

>>>> +	vfio_ap_mdev_unset_kvm(matrix_mdev);

>>>>    	list_del(&matrix_mdev->node);

>>>>    	kfree(matrix_mdev);

>>> Are we at risk of handle_pqap() in arch/s390/kvm/priv.c using an

>>> already freed pqap_hook (which is a member of the matrix_mdev pointee

>>> that is freed just above my comment).

>>>

>>> I'm aware of the fact that vfio_ap_mdev_unset_kvm() does a

>>> matrix_mdev->kvm->arch.crypto.pqap_hook = NULL but that is

>>> AFRICT not done under any lock relevant for handle_pqap(). I guess

>>> the idea is, I guess, the check cited below

>>>

>>> static int handle_pqap(struct kvm_vcpu *vcpu)

>>> [..]

>>>           /*

>>>            * Verify that the hook callback is registered, lock the owner

>>>            * and call the hook.

>>>            */

>>>           if (vcpu->kvm->arch.crypto.pqap_hook) {

>>>                   if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

>>>                           return -EOPNOTSUPP;

>>>                   ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

>>>                   module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

>>>                   if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)

>>>                           kvm_s390_set_psw_cc(vcpu, 3);

>>>                   return ret;

>>>           }

>>>

>>> is going to catch it, but I'm not sure it is guaranteed to catch it.

>>> Opinions?

>> The hook itself - handle_pqap() function in vfio_ap_ops.c - also checks

>> to see if the reference to the hook is set and terminates with an error

>> if it

>> is not. If the hook is invoked subsequent to the remove callback above,

>> all should be fine since the check is also done under the matrix_dev->lock.

>>

> I don't quite understand your logic. Let us assume matrix_mdev was freed,

> but vcpu->kvm->arch.crypto.pqap_hook still points to what used to be

> (*matrix_mdev).pqap_hook. In that case the function pointer

> vcpu->kvm->arch.crypto.pqap_hook->hook is used after it was freed, and

> may not point to the handle_pqap() function in vfio_ap_ops.c, thus the

> check you are referring to ain't necessarily relevant. Than is

> if you mean the check in the  handle_pqap() function in vfio_ap_ops.c; if

> you mean the check in handle_pqap() in arch/s390/kvm/priv.c, that one is

> not done under the matrix_dev->lock. Or do I have a hole somewhere in my

> reasoning?


What I am saying is the vcpu->kvm->arch.crypto.pqap_hook
will either be NULL or point to the handle_pqap() function in the
vfio_ap driver. In the latter case, the handler in the driver will get
called and try to acquire the matrix_dev->lock. The function that
sets the vcpu->kvm->arch.crypto.pqap_hook to NULL also takes that
lock. If the pointer is still active, then the handler will do its thing.
If not, then the handler will return without enabling or disabling
IRQs. That should not be a problem since the unset_kvm function
resets the queues which will disable the IRQs.

I don't see how
the vcpu->kvm->arch.crypto.pqap_hook can point to anything
other than the handler or be NULL unless KVM is gone. Based on
my observations of the behavior, unless there is some
other way for the remove callback to be invoked other than in
response to a request from userspace via the sysfs remove
attribute, it will not get called until the file descriptor is
closed in which case the release callback will also unset_kvm.
I think you are worrying about something that will likely never
happen.

>

> Regards,

> Halil

>
Halil Pasic May 14, 2021, 12:15 a.m. UTC | #16
On Thu, 13 May 2021 15:23:27 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 5/13/21 1:45 PM, Halil Pasic wrote:

> > On Thu, 13 May 2021 10:35:05 -0400

> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >  

> >> On 5/12/21 2:35 PM, Halil Pasic wrote:  

> >>> On Mon, 10 May 2021 17:48:37 -0400

> >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >>>     

> >>>> The mdev remove callback for the vfio_ap device driver bails out with

> >>>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

> >>>> to prevent the mdev from being removed while in use; however, returning a

> >>>> non-zero rc does not prevent removal. This could result in a memory leak

> >>>> of the resources allocated when the mdev was created. In addition, the

> >>>> KVM guest will still have access to the AP devices assigned to the mdev

> >>>> even though the mdev no longer exists.

> >>>>

> >>>> To prevent this scenario, cleanup will be done - including unplugging the

> >>>> AP adapters, domains and control domains - regardless of whether the mdev

> >>>> is in use by a KVM guest or not.

> >>>>

> >>>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

> >>>> Cc: stable@vger.kernel.org

> >>>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

> >>>> ---

> >>>>    drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

> >>>>    1 file changed, 2 insertions(+), 11 deletions(-)

> >>>>

> >>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

> >>>> index b2c7e10dfdcd..f90c9103dac2 100644

> >>>> --- a/drivers/s390/crypto/vfio_ap_ops.c

> >>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c

> >>>> @@ -26,6 +26,7 @@

> >>>>

> >>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

> >>>>    static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

> >>>> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

> >>>>

> >>>>    static int match_apqn(struct device *dev, const void *data)

> >>>>    {

> >>>> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

> >>>>    	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

> >>>>

> >>>>    	mutex_lock(&matrix_dev->lock);

> >>>> -

> >>>> -	/*

> >>>> -	 * If the KVM pointer is in flux or the guest is running, disallow

> >>>> -	 * un-assignment of control domain.

> >>>> -	 */

> >>>> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

> >>>> -		mutex_unlock(&matrix_dev->lock);

> >>>> -		return -EBUSY;

> >>>> -	}

> >>>> -

> >>>> -	vfio_ap_mdev_reset_queues(mdev);

> >>>> +	vfio_ap_mdev_unset_kvm(matrix_mdev);

> >>>>    	list_del(&matrix_mdev->node);

> >>>>    	kfree(matrix_mdev);  

> >>> Are we at risk of handle_pqap() in arch/s390/kvm/priv.c using an

> >>> already freed pqap_hook (which is a member of the matrix_mdev pointee

> >>> that is freed just above my comment).

> >>>

> >>> I'm aware of the fact that vfio_ap_mdev_unset_kvm() does a

> >>> matrix_mdev->kvm->arch.crypto.pqap_hook = NULL but that is

> >>> AFRICT not done under any lock relevant for handle_pqap(). I guess

> >>> the idea is, I guess, the check cited below

> >>>

> >>> static int handle_pqap(struct kvm_vcpu *vcpu)

> >>> [..]

> >>>           /*

> >>>            * Verify that the hook callback is registered, lock the owner

> >>>            * and call the hook.

> >>>            */

> >>>           if (vcpu->kvm->arch.crypto.pqap_hook) {

> >>>                   if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

> >>>                           return -EOPNOTSUPP;

> >>>                   ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

> >>>                   module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

> >>>                   if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)

> >>>                           kvm_s390_set_psw_cc(vcpu, 3);

> >>>                   return ret;

> >>>           }

> >>>

> >>> is going to catch it, but I'm not sure it is guaranteed to catch it.

> >>> Opinions?  

> >> The hook itself - handle_pqap() function in vfio_ap_ops.c - also checks

> >> to see if the reference to the hook is set and terminates with an error

> >> if it

> >> is not. If the hook is invoked subsequent to the remove callback above,

> >> all should be fine since the check is also done under the matrix_dev->lock.

> >>  

> > I don't quite understand your logic. Let us assume matrix_mdev was freed,

> > but vcpu->kvm->arch.crypto.pqap_hook still points to what used to be

> > (*matrix_mdev).pqap_hook. In that case the function pointer

> > vcpu->kvm->arch.crypto.pqap_hook->hook is used after it was freed, and

> > may not point to the handle_pqap() function in vfio_ap_ops.c, thus the

> > check you are referring to ain't necessarily relevant. Than is

> > if you mean the check in the  handle_pqap() function in vfio_ap_ops.c; if

> > you mean the check in handle_pqap() in arch/s390/kvm/priv.c, that one is

> > not done under the matrix_dev->lock. Or do I have a hole somewhere in my

> > reasoning?  

> 

> What I am saying is the vcpu->kvm->arch.crypto.pqap_hook

> will either be NULL or point to the handle_pqap() function in the

> vfio_ap driver.


Please read the code again. In my reading of the code
vcpu->kvm->arch.crypto.pqap_hook is never supposed to point to >(or does
point to) the handle_pqap() function defined in vfio_ap_ops.c. It points
to the pqap_hook member of struct ap_matrix_mdev (the type of the member
is struct kvm_s390_module_hook, which in turn has a function pointer
member called hook, which is supposed to hold the address of
handle_pqap() function defined in vfio_ap_ops.c, and thus point to
it).

Because of this, I don't think the rest of your argument is valid.
Furthermore I believe we first need to get to common ground on this
one before proceeding any further. If you happen to preserve your
opinion after checking again, I think we should try to discuss this
offline, as one of us is likely looking at the wrong code.

Regards,
Halil

> In the latter case, the handler in the driver will get

> called and try to acquire the matrix_dev->lock. The function that

> sets the vcpu->kvm->arch.crypto.pqap_hook to NULL also takes that

> lock. If the pointer is still active, then the handler will do its thing.

> If not, then the handler will return without enabling or disabling

> IRQs. That should not be a problem since the unset_kvm function

> resets the queues which will disable the IRQs.

> 

> I don't see how

> the vcpu->kvm->arch.crypto.pqap_hook can point to anything

> other than the handler or be NULL unless KVM is gone. Based on

> my observations of the behavior, unless there is some

> other way for the remove callback to be invoked other than in

> response to a request from userspace via the sysfs remove

> attribute, it will not get called until the file descriptor is

> closed in which case the release callback will also unset_kvm.

> I think you are worrying about something that will likely never

> happen.

> 

> >

> > Regards,

> > Halil

> >  

>
Anthony Krowiak May 17, 2021, 1:37 p.m. UTC | #17
On 5/13/21 8:15 PM, Halil Pasic wrote:
> On Thu, 13 May 2021 15:23:27 -0400

> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>

>> On 5/13/21 1:45 PM, Halil Pasic wrote:

>>> On Thu, 13 May 2021 10:35:05 -0400

>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>   

>>>> On 5/12/21 2:35 PM, Halil Pasic wrote:

>>>>> On Mon, 10 May 2021 17:48:37 -0400

>>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>>>      

>>>>>> The mdev remove callback for the vfio_ap device driver bails out with

>>>>>> -EBUSY if the mdev is in use by a KVM guest. The intended purpose was

>>>>>> to prevent the mdev from being removed while in use; however, returning a

>>>>>> non-zero rc does not prevent removal. This could result in a memory leak

>>>>>> of the resources allocated when the mdev was created. In addition, the

>>>>>> KVM guest will still have access to the AP devices assigned to the mdev

>>>>>> even though the mdev no longer exists.

>>>>>>

>>>>>> To prevent this scenario, cleanup will be done - including unplugging the

>>>>>> AP adapters, domains and control domains - regardless of whether the mdev

>>>>>> is in use by a KVM guest or not.

>>>>>>

>>>>>> Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")

>>>>>> Cc: stable@vger.kernel.org

>>>>>> Signed-off-by: Tony Krowiak <akrowiak@stny.rr.com>

>>>>>> ---

>>>>>>     drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------

>>>>>>     1 file changed, 2 insertions(+), 11 deletions(-)

>>>>>>

>>>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

>>>>>> index b2c7e10dfdcd..f90c9103dac2 100644

>>>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c

>>>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c

>>>>>> @@ -26,6 +26,7 @@

>>>>>>

>>>>>>     static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

>>>>>>     static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);

>>>>>> +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);

>>>>>>

>>>>>>     static int match_apqn(struct device *dev, const void *data)

>>>>>>     {

>>>>>> @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)

>>>>>>     	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

>>>>>>

>>>>>>     	mutex_lock(&matrix_dev->lock);

>>>>>> -

>>>>>> -	/*

>>>>>> -	 * If the KVM pointer is in flux or the guest is running, disallow

>>>>>> -	 * un-assignment of control domain.

>>>>>> -	 */

>>>>>> -	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {

>>>>>> -		mutex_unlock(&matrix_dev->lock);

>>>>>> -		return -EBUSY;

>>>>>> -	}

>>>>>> -

>>>>>> -	vfio_ap_mdev_reset_queues(mdev);

>>>>>> +	vfio_ap_mdev_unset_kvm(matrix_mdev);

>>>>>>     	list_del(&matrix_mdev->node);

>>>>>>     	kfree(matrix_mdev);

>>>>> Are we at risk of handle_pqap() in arch/s390/kvm/priv.c using an

>>>>> already freed pqap_hook (which is a member of the matrix_mdev pointee

>>>>> that is freed just above my comment).

>>>>>

>>>>> I'm aware of the fact that vfio_ap_mdev_unset_kvm() does a

>>>>> matrix_mdev->kvm->arch.crypto.pqap_hook = NULL but that is

>>>>> AFRICT not done under any lock relevant for handle_pqap(). I guess

>>>>> the idea is, I guess, the check cited below

>>>>>

>>>>> static int handle_pqap(struct kvm_vcpu *vcpu)

>>>>> [..]

>>>>>            /*

>>>>>             * Verify that the hook callback is registered, lock the owner

>>>>>             * and call the hook.

>>>>>             */

>>>>>            if (vcpu->kvm->arch.crypto.pqap_hook) {

>>>>>                    if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

>>>>>                            return -EOPNOTSUPP;

>>>>>                    ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

>>>>>                    module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

>>>>>                    if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)

>>>>>                            kvm_s390_set_psw_cc(vcpu, 3);

>>>>>                    return ret;

>>>>>            }

>>>>>

>>>>> is going to catch it, but I'm not sure it is guaranteed to catch it.

>>>>> Opinions?

>>>> The hook itself - handle_pqap() function in vfio_ap_ops.c - also checks

>>>> to see if the reference to the hook is set and terminates with an error

>>>> if it

>>>> is not. If the hook is invoked subsequent to the remove callback above,

>>>> all should be fine since the check is also done under the matrix_dev->lock.

>>>>   

>>> I don't quite understand your logic. Let us assume matrix_mdev was freed,

>>> but vcpu->kvm->arch.crypto.pqap_hook still points to what used to be

>>> (*matrix_mdev).pqap_hook. In that case the function pointer

>>> vcpu->kvm->arch.crypto.pqap_hook->hook is used after it was freed, and

>>> may not point to the handle_pqap() function in vfio_ap_ops.c, thus the

>>> check you are referring to ain't necessarily relevant. Than is

>>> if you mean the check in the  handle_pqap() function in vfio_ap_ops.c; if

>>> you mean the check in handle_pqap() in arch/s390/kvm/priv.c, that one is

>>> not done under the matrix_dev->lock. Or do I have a hole somewhere in my

>>> reasoning?

>> What I am saying is the vcpu->kvm->arch.crypto.pqap_hook

>> will either be NULL or point to the handle_pqap() function in the

>> vfio_ap driver.

> Please read the code again. In my reading of the code

> vcpu->kvm->arch.crypto.pqap_hook is never supposed to point to >(or does

> point to) the handle_pqap() function defined in vfio_ap_ops.c. It points

> to the pqap_hook member of struct ap_matrix_mdev (the type of the member

> is struct kvm_s390_module_hook, which in turn has a function pointer

> member called hook, which is supposed to hold the address of

> handle_pqap() function defined in vfio_ap_ops.c, and thus point to

> it).


You are correct, we are looking at the same code.

>

> Because of this, I don't think the rest of your argument is valid.


Okay, so your concern is that between the point in time the
vcpu->kvm->arch.crypto.pqap_hook pointer is checked in
priv.c and the point in time the handle_pqap() function
in vfio_ap_ops.c is called, the memory allocated for the
matrix_mdev containing the struct kvm_s390_module_hook
may get freed, thus rendering the function pointer invalid.
While not impossible, that seems extremely unlikely to
happen. Can you articulate a scenario where that could
even occur?

> Furthermore I believe we first need to get to common ground on this

> one before proceeding any further. If you happen to preserve your

> opinion after checking again, I think we should try to discuss this

> offline, as one of us is likely looking at the wrong code.

>

> Regards,

> Halil

>

>> In the latter case, the handler in the driver will get

>> called and try to acquire the matrix_dev->lock. The function that

>> sets the vcpu->kvm->arch.crypto.pqap_hook to NULL also takes that

>> lock. If the pointer is still active, then the handler will do its thing.

>> If not, then the handler will return without enabling or disabling

>> IRQs. That should not be a problem since the unset_kvm function

>> resets the queues which will disable the IRQs.

>>

>> I don't see how

>> the vcpu->kvm->arch.crypto.pqap_hook can point to anything

>> other than the handler or be NULL unless KVM is gone. Based on

>> my observations of the behavior, unless there is some

>> other way for the remove callback to be invoked other than in

>> response to a request from userspace via the sysfs remove

>> attribute, it will not get called until the file descriptor is

>> closed in which case the release callback will also unset_kvm.

>> I think you are worrying about something that will likely never

>> happen.

>>

>>> Regards,

>>> Halil

>>>
Halil Pasic May 17, 2021, 7:10 p.m. UTC | #18
On Mon, 17 May 2021 09:37:42 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >

> > Because of this, I don't think the rest of your argument is valid.  

> 

> Okay, so your concern is that between the point in time the

> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

> priv.c and the point in time the handle_pqap() function

> in vfio_ap_ops.c is called, the memory allocated for the

> matrix_mdev containing the struct kvm_s390_module_hook

> may get freed, thus rendering the function pointer invalid.

> While not impossible, that seems extremely unlikely to

> happen. Can you articulate a scenario where that could

> even occur?


Malicious userspace. We tend to do the pqap aqic just once
in the guest right after the queue is detected. I do agree
it ain't very likely to happen during normal operation. But why are
you asking?

I'm not sure I understood correctly what kind of a scenario are
you asking for. PQAP AQIC and mdev remove are independent
events originated in userspace, so AFAIK we may not assume
that the execution of two won't overlap, nor are we allowed
to make assumptions on how does the execution of these two
overlap (except for the things we explicitly ensure -- e.g.
some parts are made mutually exclusive using the matrix_dev->lock
lock).

Regards,
Halil
Christian Borntraeger May 18, 2021, 9:30 a.m. UTC | #19
On 17.05.21 21:10, Halil Pasic wrote:
> On Mon, 17 May 2021 09:37:42 -0400

> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 

>>>

>>> Because of this, I don't think the rest of your argument is valid.

>>

>> Okay, so your concern is that between the point in time the

>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>> priv.c and the point in time the handle_pqap() function

>> in vfio_ap_ops.c is called, the memory allocated for the

>> matrix_mdev containing the struct kvm_s390_module_hook

>> may get freed, thus rendering the function pointer invalid.

>> While not impossible, that seems extremely unlikely to

>> happen. Can you articulate a scenario where that could

>> even occur?

> 

> Malicious userspace. We tend to do the pqap aqic just once

> in the guest right after the queue is detected. I do agree

> it ain't very likely to happen during normal operation. But why are

> you asking?


Would it help, if the code in priv.c would read the hook once
and then only work on the copy? We could protect that with rcu
and do a synchronize rcu in vfio_ap_mdev_unset_kvm after
unsetting the pointer?
> 

> I'm not sure I understood correctly what kind of a scenario are

> you asking for. PQAP AQIC and mdev remove are independent

> events originated in userspace, so AFAIK we may not assume

> that the execution of two won't overlap, nor are we allowed

> to make assumptions on how does the execution of these two

> overlap (except for the things we explicitly ensure -- e.g.

> some parts are made mutually exclusive using the matrix_dev->lock

> lock).

> 

> Regards,

> Halil

>
Anthony Krowiak May 18, 2021, 1:41 p.m. UTC | #20
On 5/17/21 3:10 PM, Halil Pasic wrote:
> On Mon, 17 May 2021 09:37:42 -0400

> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>

>>> Because of this, I don't think the rest of your argument is valid.

>> Okay, so your concern is that between the point in time the

>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>> priv.c and the point in time the handle_pqap() function

>> in vfio_ap_ops.c is called, the memory allocated for the

>> matrix_mdev containing the struct kvm_s390_module_hook

>> may get freed, thus rendering the function pointer invalid.

>> While not impossible, that seems extremely unlikely to

>> happen. Can you articulate a scenario where that could

>> even occur?

> Malicious userspace. We tend to do the pqap aqic just once

> in the guest right after the queue is detected. I do agree

> it ain't very likely to happen during normal operation. But why are

> you asking?


I'm just trying to wrap my head around how this can
happen given the incredibly small window between
access to the pointer to the structure containing the
function pointer and access to the function pointer
itself.

>

> I'm not sure I understood correctly what kind of a scenario are

> you asking for. PQAP AQIC and mdev remove are independent

> events originated in userspace, so AFAIK we may not assume

> that the execution of two won't overlap, nor are we allowed

> to make assumptions on how does the execution of these two

> overlap (except for the things we explicitly ensure -- e.g.

> some parts are made mutually exclusive using the matrix_dev->lock

> lock).


It looks like we need a way to control access to the
struct kvm_s390_module_hook. I'm looking into
Christian's suggestion for using RCU as well as other
solutions.

>

> Regards,

> Halil

>
Anthony Krowiak May 18, 2021, 1:42 p.m. UTC | #21
On 5/18/21 5:30 AM, Christian Borntraeger wrote:
>

>

> On 17.05.21 21:10, Halil Pasic wrote:

>> On Mon, 17 May 2021 09:37:42 -0400

>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>

>>>>

>>>> Because of this, I don't think the rest of your argument is valid.

>>>

>>> Okay, so your concern is that between the point in time the

>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>>> priv.c and the point in time the handle_pqap() function

>>> in vfio_ap_ops.c is called, the memory allocated for the

>>> matrix_mdev containing the struct kvm_s390_module_hook

>>> may get freed, thus rendering the function pointer invalid.

>>> While not impossible, that seems extremely unlikely to

>>> happen. Can you articulate a scenario where that could

>>> even occur?

>>

>> Malicious userspace. We tend to do the pqap aqic just once

>> in the guest right after the queue is detected. I do agree

>> it ain't very likely to happen during normal operation. But why are

>> you asking?

>

> Would it help, if the code in priv.c would read the hook once

> and then only work on the copy? We could protect that with rcu

> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

> unsetting the pointer?


I'll look into this.

>>

>> I'm not sure I understood correctly what kind of a scenario are

>> you asking for. PQAP AQIC and mdev remove are independent

>> events originated in userspace, so AFAIK we may not assume

>> that the execution of two won't overlap, nor are we allowed

>> to make assumptions on how does the execution of these two

>> overlap (except for the things we explicitly ensure -- e.g.

>> some parts are made mutually exclusive using the matrix_dev->lock

>> lock).

>>

>> Regards,

>> Halil

>>
Christian Borntraeger May 18, 2021, 1:59 p.m. UTC | #22
On 18.05.21 15:42, Tony Krowiak wrote:
> 

> 

> On 5/18/21 5:30 AM, Christian Borntraeger wrote:

>>

>>

>> On 17.05.21 21:10, Halil Pasic wrote:

>>> On Mon, 17 May 2021 09:37:42 -0400

>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>

>>>>>

>>>>> Because of this, I don't think the rest of your argument is valid.

>>>>

>>>> Okay, so your concern is that between the point in time the

>>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>>>> priv.c and the point in time the handle_pqap() function

>>>> in vfio_ap_ops.c is called, the memory allocated for the

>>>> matrix_mdev containing the struct kvm_s390_module_hook

>>>> may get freed, thus rendering the function pointer invalid.

>>>> While not impossible, that seems extremely unlikely to

>>>> happen. Can you articulate a scenario where that could

>>>> even occur?

>>>

>>> Malicious userspace. We tend to do the pqap aqic just once

>>> in the guest right after the queue is detected. I do agree

>>> it ain't very likely to happen during normal operation. But why are

>>> you asking?

>>

>> Would it help, if the code in priv.c would read the hook once

>> and then only work on the copy? We could protect that with rcu

>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>> unsetting the pointer?

> 

> I'll look into this.


I think it could work. in priv.c use rcu_readlock, save the
pointer, do the check and call, call rcu_read_unlock.
In vfio_ap use rcu_assign_pointer to set the pointer and
after setting it to zero call sychronize_rcu.

Halil, I think we can do this as an addon patch as it makes
sense to have this callback pointer protected independent of
this patch. Agree?
Halil Pasic May 18, 2021, 3:33 p.m. UTC | #23
On Tue, 18 May 2021 15:59:36 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 18.05.21 15:42, Tony Krowiak wrote:

> > 

> > 

> > On 5/18/21 5:30 AM, Christian Borntraeger wrote:  

> >>

> >>

> >> On 17.05.21 21:10, Halil Pasic wrote:  

> >>> On Mon, 17 May 2021 09:37:42 -0400

> >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >>>  

> >>>>>

> >>>>> Because of this, I don't think the rest of your argument is valid.  

> >>>>

> >>>> Okay, so your concern is that between the point in time the

> >>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

> >>>> priv.c and the point in time the handle_pqap() function

> >>>> in vfio_ap_ops.c is called, the memory allocated for the

> >>>> matrix_mdev containing the struct kvm_s390_module_hook

> >>>> may get freed, thus rendering the function pointer invalid.

> >>>> While not impossible, that seems extremely unlikely to

> >>>> happen. Can you articulate a scenario where that could

> >>>> even occur?  

> >>>

> >>> Malicious userspace. We tend to do the pqap aqic just once

> >>> in the guest right after the queue is detected. I do agree

> >>> it ain't very likely to happen during normal operation. But why are

> >>> you asking?  

> >>

> >> Would it help, if the code in priv.c would read the hook once

> >> and then only work on the copy? We could protect that with rcu

> >> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

> >> unsetting the pointer?


Unfortunately just "the hook" is ambiguous in this context. We
have kvm->arch.crypto.pqap_hook that is supposed to point to
a struct kvm_s390_module_hook member of struct ap_matrix_mdev 
which is also called pqap_hook. And struct kvm_s390_module_hook
has function pointer member named "hook".

> > 

> > I'll look into this.  

> 

> I think it could work. in priv.c use rcu_readlock, save the

> pointer, do the check and call, call rcu_read_unlock.

> In vfio_ap use rcu_assign_pointer to set the pointer and

> after setting it to zero call sychronize_rcu.


In my opinion, we should make the accesses to the
kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm
not sure if that is what you are proposing. How do we usually
do synchronisation on the stuff that lives in kvm->arch?

BTW, something as simple as a cmpxchg which boils down to the
CSG instruction for us would suffice in this case (or forcing
any interlocked update type construct). 

> 

> Halil, I think we can do this as an addon patch as it makes

> sense to have this callback pointer protected independent of

> this patch. Agree?


Unfortunately I didn't quite get at the bottom of what exactly gets
leaked. My intuition is, that trading a leak for an use after free
is in general not a good idea. In this particular case, assuming
userspace is well behaved, the use after free is very unlikely,
but then I don't consider the leak to be awfully likely either. A
well behaved userspace should not attempt to remove the mdev while
it is associated with a guest. We documented that in:
Documentation/s390/vfio-ap.rst
"""
  remove:
    deallocates the mediated matrix device's ap_matrix_mdev structure. This will
    be allowed only if a running guest is not using the mdev.
"""
BTW this patch should probably change that piece of documentation as
well.

In this case, because the leak is much likelier than the use after
free (assuming a non-malicious-userspace) the trade may be worth it. Yet my
independent opinion is that I would prefer this fixed in one go and
properly. But I do trust your judgement better than mine (especially in
matters like this). So feel free to go ahead (i.e. I'm not going to NACK
this). 

Regards,
Halil
Christian Borntraeger May 18, 2021, 5:01 p.m. UTC | #24
On 18.05.21 17:33, Halil Pasic wrote:
> On Tue, 18 May 2021 15:59:36 +0200

> Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> 

>> On 18.05.21 15:42, Tony Krowiak wrote:

>>>

>>>

>>> On 5/18/21 5:30 AM, Christian Borntraeger wrote:

>>>>

>>>>

>>>> On 17.05.21 21:10, Halil Pasic wrote:

>>>>> On Mon, 17 May 2021 09:37:42 -0400

>>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>>>   

>>>>>>>

>>>>>>> Because of this, I don't think the rest of your argument is valid.

>>>>>>

>>>>>> Okay, so your concern is that between the point in time the

>>>>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>>>>>> priv.c and the point in time the handle_pqap() function

>>>>>> in vfio_ap_ops.c is called, the memory allocated for the

>>>>>> matrix_mdev containing the struct kvm_s390_module_hook

>>>>>> may get freed, thus rendering the function pointer invalid.

>>>>>> While not impossible, that seems extremely unlikely to

>>>>>> happen. Can you articulate a scenario where that could

>>>>>> even occur?

>>>>>

>>>>> Malicious userspace. We tend to do the pqap aqic just once

>>>>> in the guest right after the queue is detected. I do agree

>>>>> it ain't very likely to happen during normal operation. But why are

>>>>> you asking?

>>>>

>>>> Would it help, if the code in priv.c would read the hook once

>>>> and then only work on the copy? We could protect that with rcu

>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>>>> unsetting the pointer?

> 

> Unfortunately just "the hook" is ambiguous in this context. We

> have kvm->arch.crypto.pqap_hook that is supposed to point to

> a struct kvm_s390_module_hook member of struct ap_matrix_mdev

> which is also called pqap_hook. And struct kvm_s390_module_hook

> has function pointer member named "hook".


I was referring to the full struct.
> 

>>>

>>> I'll look into this.

>>

>> I think it could work. in priv.c use rcu_readlock, save the

>> pointer, do the check and call, call rcu_read_unlock.

>> In vfio_ap use rcu_assign_pointer to set the pointer and

>> after setting it to zero call sychronize_rcu.

> 

> In my opinion, we should make the accesses to the

> kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

> not sure if that is what you are proposing. How do we usually

> do synchronisation on the stuff that lives in kvm->arch?

> 


RCU is a method of synchronization. We  make sure that structure
pqap_hook is still valid as long as we are inside the rcu read
lock. So the idea is: clear pointer, wait until all old readers
have finished and the proceed with getting rid of the structure.
Anthony Krowiak May 18, 2021, 6:14 p.m. UTC | #25
On 5/18/21 9:59 AM, Christian Borntraeger wrote:
>

>

> On 18.05.21 15:42, Tony Krowiak wrote:

>>

>>

>> On 5/18/21 5:30 AM, Christian Borntraeger wrote:

>>>

>>>

>>> On 17.05.21 21:10, Halil Pasic wrote:

>>>> On Mon, 17 May 2021 09:37:42 -0400

>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>>

>>>>>>

>>>>>> Because of this, I don't think the rest of your argument is valid.

>>>>>

>>>>> Okay, so your concern is that between the point in time the

>>>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>>>>> priv.c and the point in time the handle_pqap() function

>>>>> in vfio_ap_ops.c is called, the memory allocated for the

>>>>> matrix_mdev containing the struct kvm_s390_module_hook

>>>>> may get freed, thus rendering the function pointer invalid.

>>>>> While not impossible, that seems extremely unlikely to

>>>>> happen. Can you articulate a scenario where that could

>>>>> even occur?

>>>>

>>>> Malicious userspace. We tend to do the pqap aqic just once

>>>> in the guest right after the queue is detected. I do agree

>>>> it ain't very likely to happen during normal operation. But why are

>>>> you asking?

>>>

>>> Would it help, if the code in priv.c would read the hook once

>>> and then only work on the copy? We could protect that with rcu

>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>>> unsetting the pointer?

>>

>> I'll look into this.

>

> I think it could work. in priv.c use rcu_readlock, save the

> pointer, do the check and call, call rcu_read_unlock.

> In vfio_ap use rcu_assign_pointer to set the pointer and

> after setting it to zero call sychronize_rcu.

>

> Halil, I think we can do this as an addon patch as it makes

> sense to have this callback pointer protected independent of

> this patch. Agree?


I agree that this is a viable option; however, this does not
guarantee that the matrix_mdev is not freed thus rendering
the function pointer to the interception handler invalid unless
that is also included within the rcu_readlock/rcu_read_unlock.
That is not possible given the matrix_mdev is freed within
the remove callback and the pointer to the structure that
contains the interception handler function pointer is cleared
in the vfio_ap_mdev_unset_kvm() function. I am working on
a patch and should be able to post it before EOD or first thing
tomorrow.
Christian Borntraeger May 18, 2021, 6:22 p.m. UTC | #26
On 18.05.21 20:14, Tony Krowiak wrote:
> 

> 

> On 5/18/21 9:59 AM, Christian Borntraeger wrote:

>>

>>

>> On 18.05.21 15:42, Tony Krowiak wrote:

>>>

>>>

>>> On 5/18/21 5:30 AM, Christian Borntraeger wrote:

>>>>

>>>>

>>>> On 17.05.21 21:10, Halil Pasic wrote:

>>>>> On Mon, 17 May 2021 09:37:42 -0400

>>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>>>

>>>>>>>

>>>>>>> Because of this, I don't think the rest of your argument is valid.

>>>>>>

>>>>>> Okay, so your concern is that between the point in time the

>>>>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>>>>>> priv.c and the point in time the handle_pqap() function

>>>>>> in vfio_ap_ops.c is called, the memory allocated for the

>>>>>> matrix_mdev containing the struct kvm_s390_module_hook

>>>>>> may get freed, thus rendering the function pointer invalid.

>>>>>> While not impossible, that seems extremely unlikely to

>>>>>> happen. Can you articulate a scenario where that could

>>>>>> even occur?

>>>>>

>>>>> Malicious userspace. We tend to do the pqap aqic just once

>>>>> in the guest right after the queue is detected. I do agree

>>>>> it ain't very likely to happen during normal operation. But why are

>>>>> you asking?

>>>>

>>>> Would it help, if the code in priv.c would read the hook once

>>>> and then only work on the copy? We could protect that with rcu

>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>>>> unsetting the pointer?

>>>

>>> I'll look into this.

>>

>> I think it could work. in priv.c use rcu_readlock, save the

>> pointer, do the check and call, call rcu_read_unlock.

>> In vfio_ap use rcu_assign_pointer to set the pointer and

>> after setting it to zero call sychronize_rcu.

>>

>> Halil, I think we can do this as an addon patch as it makes

>> sense to have this callback pointer protected independent of

>> this patch. Agree?

> 

> I agree that this is a viable option; however, this does not

> guarantee that the matrix_mdev is not freed thus rendering

> the function pointer to the interception handler invalid unless

> that is also included within the rcu_readlock/rcu_read_unlock.


The trick should be the sychronize_rcu. This will put the deleting
code (vfio_ap_mdev_unset_kvm) to sleep until the rcu read section
has finished. So if you first set the pointer to zero, then call
synchronize_rcu the code will only progress until all users of
the old poiner have finished.

> That is not possible given the matrix_mdev is freed within

> the remove callback and the pointer to the structure that

> contains the interception handler function pointer is cleared

> in the vfio_ap_mdev_unset_kvm() function. I am working on

> a patch and should be able to post it before EOD or first thing

> tomorrow.

>
Anthony Krowiak May 18, 2021, 6:40 p.m. UTC | #27
On 5/18/21 2:22 PM, Christian Borntraeger wrote:
>

>

> On 18.05.21 20:14, Tony Krowiak wrote:

>>

>>

>> On 5/18/21 9:59 AM, Christian Borntraeger wrote:

>>>

>>>

>>> On 18.05.21 15:42, Tony Krowiak wrote:

>>>>

>>>>

>>>> On 5/18/21 5:30 AM, Christian Borntraeger wrote:

>>>>>

>>>>>

>>>>> On 17.05.21 21:10, Halil Pasic wrote:

>>>>>> On Mon, 17 May 2021 09:37:42 -0400

>>>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:

>>>>>>

>>>>>>>>

>>>>>>>> Because of this, I don't think the rest of your argument is valid.

>>>>>>>

>>>>>>> Okay, so your concern is that between the point in time the

>>>>>>> vcpu->kvm->arch.crypto.pqap_hook pointer is checked in

>>>>>>> priv.c and the point in time the handle_pqap() function

>>>>>>> in vfio_ap_ops.c is called, the memory allocated for the

>>>>>>> matrix_mdev containing the struct kvm_s390_module_hook

>>>>>>> may get freed, thus rendering the function pointer invalid.

>>>>>>> While not impossible, that seems extremely unlikely to

>>>>>>> happen. Can you articulate a scenario where that could

>>>>>>> even occur?

>>>>>>

>>>>>> Malicious userspace. We tend to do the pqap aqic just once

>>>>>> in the guest right after the queue is detected. I do agree

>>>>>> it ain't very likely to happen during normal operation. But why are

>>>>>> you asking?

>>>>>

>>>>> Would it help, if the code in priv.c would read the hook once

>>>>> and then only work on the copy? We could protect that with rcu

>>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>>>>> unsetting the pointer?

>>>>

>>>> I'll look into this.

>>>

>>> I think it could work. in priv.c use rcu_readlock, save the

>>> pointer, do the check and call, call rcu_read_unlock.

>>> In vfio_ap use rcu_assign_pointer to set the pointer and

>>> after setting it to zero call sychronize_rcu.

>>>

>>> Halil, I think we can do this as an addon patch as it makes

>>> sense to have this callback pointer protected independent of

>>> this patch. Agree?

>>

>> I agree that this is a viable option; however, this does not

>> guarantee that the matrix_mdev is not freed thus rendering

>> the function pointer to the interception handler invalid unless

>> that is also included within the rcu_readlock/rcu_read_unlock.

>

> The trick should be the sychronize_rcu. This will put the deleting

> code (vfio_ap_mdev_unset_kvm) to sleep until the rcu read section

> has finished. So if you first set the pointer to zero, then call

> synchronize_rcu the code will only progress until all users of

> the old poiner have finished.


Yes, that is my understanding too.

>

>> That is not possible given the matrix_mdev is freed within

>> the remove callback and the pointer to the structure that

>> contains the interception handler function pointer is cleared

>> in the vfio_ap_mdev_unset_kvm() function. I am working on

>> a patch and should be able to post it before EOD or first thing

>> tomorrow.

>>
Halil Pasic May 18, 2021, 11:27 p.m. UTC | #28
On Tue, 18 May 2021 19:01:42 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 18.05.21 17:33, Halil Pasic wrote:

> > On Tue, 18 May 2021 15:59:36 +0200

> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:

[..]
> >>>>

> >>>> Would it help, if the code in priv.c would read the hook once

> >>>> and then only work on the copy? We could protect that with rcu

> >>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

> >>>> unsetting the pointer?  

> > 

> > Unfortunately just "the hook" is ambiguous in this context. We

> > have kvm->arch.crypto.pqap_hook that is supposed to point to

> > a struct kvm_s390_module_hook member of struct ap_matrix_mdev

> > which is also called pqap_hook. And struct kvm_s390_module_hook

> > has function pointer member named "hook".  

> 

> I was referring to the full struct.

> >   

> >>>

> >>> I'll look into this.  

> >>

> >> I think it could work. in priv.c use rcu_readlock, save the

> >> pointer, do the check and call, call rcu_read_unlock.

> >> In vfio_ap use rcu_assign_pointer to set the pointer and

> >> after setting it to zero call sychronize_rcu.  

> > 

> > In my opinion, we should make the accesses to the

> > kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

> > not sure if that is what you are proposing. How do we usually

> > do synchronisation on the stuff that lives in kvm->arch?

> >   

> 

> RCU is a method of synchronization. We  make sure that structure

> pqap_hook is still valid as long as we are inside the rcu read

> lock. So the idea is: clear pointer, wait until all old readers

> have finished and the proceed with getting rid of the structure.


Yes I know that RCU is a method of synchronization, but I'm not
very familiar with it. I'm a little confused by "read the hook
once and then work on a copy". I guess, I would have to read up
on the RCU again to get clarity. I intend to brush up my RCU knowledge
once the patch comes along. I would be glad to have your help when
reviewing an RCU based solution for this.   

Regards,
Halil
Christian Borntraeger May 19, 2021, 8:17 a.m. UTC | #29
On 19.05.21 01:27, Halil Pasic wrote:
> On Tue, 18 May 2021 19:01:42 +0200

> Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> 

>> On 18.05.21 17:33, Halil Pasic wrote:

>>> On Tue, 18 May 2021 15:59:36 +0200

>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> [..]

>>>>>>

>>>>>> Would it help, if the code in priv.c would read the hook once

>>>>>> and then only work on the copy? We could protect that with rcu

>>>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>>>>>> unsetting the pointer?

>>>

>>> Unfortunately just "the hook" is ambiguous in this context. We

>>> have kvm->arch.crypto.pqap_hook that is supposed to point to

>>> a struct kvm_s390_module_hook member of struct ap_matrix_mdev

>>> which is also called pqap_hook. And struct kvm_s390_module_hook

>>> has function pointer member named "hook".

>>

>> I was referring to the full struct.

>>>    

>>>>>

>>>>> I'll look into this.

>>>>

>>>> I think it could work. in priv.c use rcu_readlock, save the

>>>> pointer, do the check and call, call rcu_read_unlock.

>>>> In vfio_ap use rcu_assign_pointer to set the pointer and

>>>> after setting it to zero call sychronize_rcu.

>>>

>>> In my opinion, we should make the accesses to the

>>> kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

>>> not sure if that is what you are proposing. How do we usually

>>> do synchronisation on the stuff that lives in kvm->arch?

>>>    

>>

>> RCU is a method of synchronization. We  make sure that structure

>> pqap_hook is still valid as long as we are inside the rcu read

>> lock. So the idea is: clear pointer, wait until all old readers

>> have finished and the proceed with getting rid of the structure.

> 

> Yes I know that RCU is a method of synchronization, but I'm not

> very familiar with it. I'm a little confused by "read the hook

> once and then work on a copy". I guess, I would have to read up

> on the RCU again to get clarity. I intend to brush up my RCU knowledge

> once the patch comes along. I would be glad to have your help when

> reviewing an RCU based solution for this.


Just had a quick look. Its not trivial, as the hook function itself
takes a mutex and an rcu section must not sleep. Will have a deeper
look.
Christian Borntraeger May 19, 2021, 11:22 a.m. UTC | #30
On 19.05.21 10:17, Christian Borntraeger wrote:
> 

> 

> On 19.05.21 01:27, Halil Pasic wrote:

>> On Tue, 18 May 2021 19:01:42 +0200

>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:

>>

>>> On 18.05.21 17:33, Halil Pasic wrote:

>>>> On Tue, 18 May 2021 15:59:36 +0200

>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:

>> [..]

>>>>>>>

>>>>>>> Would it help, if the code in priv.c would read the hook once

>>>>>>> and then only work on the copy? We could protect that with rcu

>>>>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

>>>>>>> unsetting the pointer?

>>>>

>>>> Unfortunately just "the hook" is ambiguous in this context. We

>>>> have kvm->arch.crypto.pqap_hook that is supposed to point to

>>>> a struct kvm_s390_module_hook member of struct ap_matrix_mdev

>>>> which is also called pqap_hook. And struct kvm_s390_module_hook

>>>> has function pointer member named "hook".

>>>

>>> I was referring to the full struct.

>>>>>>

>>>>>> I'll look into this.

>>>>>

>>>>> I think it could work. in priv.c use rcu_readlock, save the

>>>>> pointer, do the check and call, call rcu_read_unlock.

>>>>> In vfio_ap use rcu_assign_pointer to set the pointer and

>>>>> after setting it to zero call sychronize_rcu.

>>>>

>>>> In my opinion, we should make the accesses to the

>>>> kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

>>>> not sure if that is what you are proposing. How do we usually

>>>> do synchronisation on the stuff that lives in kvm->arch?

>>>

>>> RCU is a method of synchronization. We  make sure that structure

>>> pqap_hook is still valid as long as we are inside the rcu read

>>> lock. So the idea is: clear pointer, wait until all old readers

>>> have finished and the proceed with getting rid of the structure.

>>

>> Yes I know that RCU is a method of synchronization, but I'm not

>> very familiar with it. I'm a little confused by "read the hook

>> once and then work on a copy". I guess, I would have to read up

>> on the RCU again to get clarity. I intend to brush up my RCU knowledge

>> once the patch comes along. I would be glad to have your help when

>> reviewing an RCU based solution for this.

> 

> Just had a quick look. Its not trivial, as the hook function itself

> takes a mutex and an rcu section must not sleep. Will have a deeper

> look.



As a quick hack something like this could work. The whole locking is pretty
complicated and this makes it even more complex so we might want to do
a cleanup/locking rework later on.


index 9928f785c677..fde6e02aab54 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -609,6 +609,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
   */
  static int handle_pqap(struct kvm_vcpu *vcpu)
  {
+       struct kvm_s390_module_hook *pqap_hook;
         struct ap_queue_status status = {};
         unsigned long reg0;
         int ret;
@@ -657,14 +658,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
          * Verify that the hook callback is registered, lock the owner
          * and call the hook.
          */
-       if (vcpu->kvm->arch.crypto.pqap_hook) {
-               if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
+       rcu_read_lock();
+       pqap_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
+       if (pqap_hook) {
+               if (!try_module_get(pqap_hook->owner)) {
+                       rcu_read_unlock();
                         return -EOPNOTSUPP;
-               ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
-               module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
+               }
+               rcu_read_unlock();
+               ret = pqap_hook->hook(vcpu);
+               module_put(pqap_hook->owner);
                 if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
                         kvm_s390_set_psw_cc(vcpu, 3);
                 return ret;
+       } else {
+               rcu_read_unlock();
         }
         /*
          * A vfio_driver must register a hook.
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index f90c9103dac2..a7124abd6aed 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1194,6 +1194,7 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
                 mutex_lock(&matrix_dev->lock);
                 vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
                 matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+               synchronize_rcu();
                 kvm_put_kvm(matrix_mdev->kvm);
                 matrix_mdev->kvm = NULL;
                 matrix_mdev->kvm_busy = false;
Halil Pasic May 19, 2021, 11:25 a.m. UTC | #31
On Wed, 19 May 2021 10:17:49 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 19.05.21 01:27, Halil Pasic wrote:

> > On Tue, 18 May 2021 19:01:42 +0200

> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> >   

> >> On 18.05.21 17:33, Halil Pasic wrote:  

> >>> On Tue, 18 May 2021 15:59:36 +0200

> >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:  

> > [..]  

> >>>>>>

> >>>>>> Would it help, if the code in priv.c would read the hook once

> >>>>>> and then only work on the copy? We could protect that with rcu

> >>>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

> >>>>>> unsetting the pointer?  

> >>>

> >>> Unfortunately just "the hook" is ambiguous in this context. We

> >>> have kvm->arch.crypto.pqap_hook that is supposed to point to

> >>> a struct kvm_s390_module_hook member of struct ap_matrix_mdev

> >>> which is also called pqap_hook. And struct kvm_s390_module_hook

> >>> has function pointer member named "hook".  

> >>

> >> I was referring to the full struct.  

> >>>      

> >>>>>

> >>>>> I'll look into this.  

> >>>>

> >>>> I think it could work. in priv.c use rcu_readlock, save the

> >>>> pointer, do the check and call, call rcu_read_unlock.

> >>>> In vfio_ap use rcu_assign_pointer to set the pointer and

> >>>> after setting it to zero call sychronize_rcu.  

> >>>

> >>> In my opinion, we should make the accesses to the

> >>> kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

> >>> not sure if that is what you are proposing. How do we usually

> >>> do synchronisation on the stuff that lives in kvm->arch?

> >>>      

> >>

> >> RCU is a method of synchronization. We  make sure that structure

> >> pqap_hook is still valid as long as we are inside the rcu read

> >> lock. So the idea is: clear pointer, wait until all old readers

> >> have finished and the proceed with getting rid of the structure.  

> > 

> > Yes I know that RCU is a method of synchronization, but I'm not

> > very familiar with it. I'm a little confused by "read the hook

> > once and then work on a copy". I guess, I would have to read up

> > on the RCU again to get clarity. I intend to brush up my RCU knowledge

> > once the patch comes along. I would be glad to have your help when

> > reviewing an RCU based solution for this.  

> 

> Just had a quick look. Its not trivial, as the hook function itself

> takes a mutex and an rcu section must not sleep. Will have a deeper

> look.


I refreshed my RCU knowledge and RCU seems to be a reasonable choice
here. I don't think we have to make the rcu read section span the 
call to the callback. That is something like

--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -613,6 +613,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
        unsigned long reg0;
        int ret;
        uint8_t fc;
+       int (*pqap_hook)(struct kvm_vcpu *vcpu);
 
        /* Verify that the AP instruction are available */
        if (!ap_instructions_available())
@@ -657,14 +658,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
         * Verify that the hook callback is registered, lock the owner
         * and call the hook.
         */
+       rcu_read_lock();
        if (vcpu->kvm->arch.crypto.pqap_hook) {
-               if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
+               if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) {
+                       rcu_read_unlock();
                        return -EOPNOTSUPP;
-               ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
+               }
+               pqap_hook = READ_ONCE(vcpu->kvm->arch.crypto.pqap_hook->hook);
+               rcu_read_unlock();
+               ret = pqap_hook();
                module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
                if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
                        kvm_s390_set_psw_cc(vcpu, 3);
                return ret;
+       } else {
+               rcu_read_unlock();
        }
        /*
         * A vfio_driver must register a hook.

Should be sufficient. The module get ensures that the pointee is still
around for the duration of the call. The handle_pqap() from
vfio_ap_ops.c checks the vcpu->kvm->arch.crypto.pqap_hook the same
lock that is used to set it to NULL, and bails out if it is NULL. It
is a bit convoluted, but it should work.

Regards,
Halil
Halil Pasic May 19, 2021, 12:59 p.m. UTC | #32
On Wed, 19 May 2021 13:22:56 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 19.05.21 10:17, Christian Borntraeger wrote:

> > 

> > 

> > On 19.05.21 01:27, Halil Pasic wrote:  

> >> On Tue, 18 May 2021 19:01:42 +0200

> >> Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> >>  

> >>> On 18.05.21 17:33, Halil Pasic wrote:  

> >>>> On Tue, 18 May 2021 15:59:36 +0200

> >>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:  

> >> [..]  

> >>>>>>>

> >>>>>>> Would it help, if the code in priv.c would read the hook once

> >>>>>>> and then only work on the copy? We could protect that with rcu

> >>>>>>> and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

> >>>>>>> unsetting the pointer?  

> >>>>

> >>>> Unfortunately just "the hook" is ambiguous in this context. We

> >>>> have kvm->arch.crypto.pqap_hook that is supposed to point to

> >>>> a struct kvm_s390_module_hook member of struct ap_matrix_mdev

> >>>> which is also called pqap_hook. And struct kvm_s390_module_hook

> >>>> has function pointer member named "hook".  

> >>>

> >>> I was referring to the full struct.  

> >>>>>>

> >>>>>> I'll look into this.  

> >>>>>

> >>>>> I think it could work. in priv.c use rcu_readlock, save the

> >>>>> pointer, do the check and call, call rcu_read_unlock.

> >>>>> In vfio_ap use rcu_assign_pointer to set the pointer and

> >>>>> after setting it to zero call sychronize_rcu.  

> >>>>

> >>>> In my opinion, we should make the accesses to the

> >>>> kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

> >>>> not sure if that is what you are proposing. How do we usually

> >>>> do synchronisation on the stuff that lives in kvm->arch?  

> >>>

> >>> RCU is a method of synchronization. We  make sure that structure

> >>> pqap_hook is still valid as long as we are inside the rcu read

> >>> lock. So the idea is: clear pointer, wait until all old readers

> >>> have finished and the proceed with getting rid of the structure.  

> >>

> >> Yes I know that RCU is a method of synchronization, but I'm not

> >> very familiar with it. I'm a little confused by "read the hook

> >> once and then work on a copy". I guess, I would have to read up

> >> on the RCU again to get clarity. I intend to brush up my RCU knowledge

> >> once the patch comes along. I would be glad to have your help when

> >> reviewing an RCU based solution for this.  

> > 

> > Just had a quick look. Its not trivial, as the hook function itself

> > takes a mutex and an rcu section must not sleep. Will have a deeper

> > look.  

> 

> 

> As a quick hack something like this could work. The whole locking is pretty

> complicated and this makes it even more complex so we might want to do

> a cleanup/locking rework later on.

> 


Hm, seems our emails crossed mid air...

> 

> index 9928f785c677..fde6e02aab54 100644

> --- a/arch/s390/kvm/priv.c

> +++ b/arch/s390/kvm/priv.c

> @@ -609,6 +609,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)

>    */

>   static int handle_pqap(struct kvm_vcpu *vcpu)

>   {

> +       struct kvm_s390_module_hook *pqap_hook;

>          struct ap_queue_status status = {};

>          unsigned long reg0;

>          int ret;

> @@ -657,14 +658,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)

>           * Verify that the hook callback is registered, lock the owner

>           * and call the hook.

>           */

> -       if (vcpu->kvm->arch.crypto.pqap_hook) {

> -               if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

> +       rcu_read_lock();

> +       pqap_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);

> +       if (pqap_hook) {

> +               if (!try_module_get(pqap_hook->owner)) {

> +                       rcu_read_unlock();

>                          return -EOPNOTSUPP;

> -               ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

> -               module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

> +               }


Up to this point the local pqap_hook is guaranteed to point to a valid
object if not NULL, ...
> +               rcu_read_unlock();


... and after this point IMHO it is not.

> +               ret = pqap_hook->hook(vcpu);


So IMHO the pointer deference here is still problematic, but that can
be fixed easily as I described in that email I've sent 3 minutes after
yours. IMHO we need a local copy of cpu->kvm->arch.crypto.pqap_hook->hook
taken within the rcu read critical section. Do you agree?

Regards,
Halil

> +               module_put(pqap_hook->owner);

>                  if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)

>                          kvm_s390_set_psw_cc(vcpu, 3);

>                  return ret;

> +       } else {

> +               rcu_read_unlock();

>          }

>          /*

>           * A vfio_driver must register a hook.

> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c

> index f90c9103dac2..a7124abd6aed 100644

> --- a/drivers/s390/crypto/vfio_ap_ops.c

> +++ b/drivers/s390/crypto/vfio_ap_ops.c

> @@ -1194,6 +1194,7 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)

>                  mutex_lock(&matrix_dev->lock);

>                  vfio_ap_mdev_reset_queues(matrix_mdev->mdev);

>                  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;

> +               synchronize_rcu();

>                  kvm_put_kvm(matrix_mdev->kvm);

>                  matrix_mdev->kvm = NULL;

>                  matrix_mdev->kvm_busy = false;
Jason Gunthorpe May 19, 2021, 1:02 p.m. UTC | #33
On Wed, May 19, 2021 at 01:22:56PM +0200, Christian Borntraeger wrote:
> 

> 

> On 19.05.21 10:17, Christian Borntraeger wrote:

> > 

> > 

> > On 19.05.21 01:27, Halil Pasic wrote:

> > > On Tue, 18 May 2021 19:01:42 +0200

> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> > > 

> > > > On 18.05.21 17:33, Halil Pasic wrote:

> > > > > On Tue, 18 May 2021 15:59:36 +0200

> > > > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> > > [..]

> > > > > > > > 

> > > > > > > > Would it help, if the code in priv.c would read the hook once

> > > > > > > > and then only work on the copy? We could protect that with rcu

> > > > > > > > and do a synchronize rcu in vfio_ap_mdev_unset_kvm after

> > > > > > > > unsetting the pointer?

> > > > > 

> > > > > Unfortunately just "the hook" is ambiguous in this context. We

> > > > > have kvm->arch.crypto.pqap_hook that is supposed to point to

> > > > > a struct kvm_s390_module_hook member of struct ap_matrix_mdev

> > > > > which is also called pqap_hook. And struct kvm_s390_module_hook

> > > > > has function pointer member named "hook".

> > > > 

> > > > I was referring to the full struct.

> > > > > > > 

> > > > > > > I'll look into this.

> > > > > > 

> > > > > > I think it could work. in priv.c use rcu_readlock, save the

> > > > > > pointer, do the check and call, call rcu_read_unlock.

> > > > > > In vfio_ap use rcu_assign_pointer to set the pointer and

> > > > > > after setting it to zero call sychronize_rcu.

> > > > > 

> > > > > In my opinion, we should make the accesses to the

> > > > > kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm

> > > > > not sure if that is what you are proposing. How do we usually

> > > > > do synchronisation on the stuff that lives in kvm->arch?

> > > > 

> > > > RCU is a method of synchronization. We  make sure that structure

> > > > pqap_hook is still valid as long as we are inside the rcu read

> > > > lock. So the idea is: clear pointer, wait until all old readers

> > > > have finished and the proceed with getting rid of the structure.

> > > 

> > > Yes I know that RCU is a method of synchronization, but I'm not

> > > very familiar with it. I'm a little confused by "read the hook

> > > once and then work on a copy". I guess, I would have to read up

> > > on the RCU again to get clarity. I intend to brush up my RCU knowledge

> > > once the patch comes along. I would be glad to have your help when

> > > reviewing an RCU based solution for this.

> > 

> > Just had a quick look. Its not trivial, as the hook function itself

> > takes a mutex and an rcu section must not sleep. Will have a deeper

> > look.

> 

> 

> As a quick hack something like this could work. The whole locking is pretty

> complicated and this makes it even more complex so we might want to do

> a cleanup/locking rework later on.

> 

> 

> index 9928f785c677..fde6e02aab54 100644

> +++ b/arch/s390/kvm/priv.c

> @@ -609,6 +609,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)

>   */

>  static int handle_pqap(struct kvm_vcpu *vcpu)

>  {

> +       struct kvm_s390_module_hook *pqap_hook;

>         struct ap_queue_status status = {};

>         unsigned long reg0;

>         int ret;

> @@ -657,14 +658,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)

>          * Verify that the hook callback is registered, lock the owner

>          * and call the hook.

>          */

> -       if (vcpu->kvm->arch.crypto.pqap_hook) {

> -               if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))

> +       rcu_read_lock();

> +       pqap_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);

> +       if (pqap_hook) {

> +               if (!try_module_get(pqap_hook->owner)) {


module locking doesn't prevent driver unbinding

> +                       rcu_read_unlock();

>                         return -EOPNOTSUPP;

> -               ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

> -               module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);

> +               }

> +               rcu_read_unlock();

> +               ret = pqap_hook->hook(vcpu);


So taking the pointer out of the rcu still isn't protected.

Unless this is super performance critical just use a rw sem

Jason
diff mbox series

Patch

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index b2c7e10dfdcd..f90c9103dac2 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,6 +26,7 @@ 
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);
 
 static int match_apqn(struct device *dev, const void *data)
 {
@@ -366,17 +367,7 @@  static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 	mutex_lock(&matrix_dev->lock);
-
-	/*
-	 * If the KVM pointer is in flux or the guest is running, disallow
-	 * un-assignment of control domain.
-	 */
-	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
-		mutex_unlock(&matrix_dev->lock);
-		return -EBUSY;
-	}
-
-	vfio_ap_mdev_reset_queues(mdev);
+	vfio_ap_mdev_unset_kvm(matrix_mdev);
 	list_del(&matrix_mdev->node);
 	kfree(matrix_mdev);
 	mdev_set_drvdata(mdev, NULL);