diff mbox series

[RFC,3/8] qaic: Create char dev

Message ID 1589465266-20056-4-git-send-email-jhugo@codeaurora.org
State New
Headers show
Series Qualcomm Cloud AI 100 driver | expand

Commit Message

Jeffrey Hugo May 14, 2020, 2:07 p.m. UTC
Now that we can fully boot the device, we should start making it usable.
The primary interface to the device will be via a char dev.  Add the
necessary framework to detect when the device is fully booted, and create
the char dev at that point.  The device is only usable when it is fully
booted.  The char dev does nothing useful yet, but we can easily build on
this to provide functionality.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/qaic.h     |  19 +++
 drivers/misc/qaic/qaic_drv.c | 303 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 321 insertions(+), 1 deletion(-)

Comments

Greg KH May 14, 2020, 2:12 p.m. UTC | #1
On Thu, May 14, 2020 at 08:07:41AM -0600, Jeffrey Hugo wrote:
>  /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
>  
> +#include <linux/cdev.h>
> +#include <linux/idr.h>
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/mhi.h>
>  #include <linux/module.h>
>  #include <linux/msi.h>
> +#include <linux/mutex.h>
>  #include <linux/pci.h>
>  #include <linux/pci_ids.h>
>  
> @@ -13,9 +19,242 @@
>  #define PCI_DEV_AIC100			0xa100
>  
>  #define QAIC_NAME			"Qualcomm Cloud AI 100"
> +#define QAIC_MAX_MINORS			256

Why have a max?

Why not just use a misc device so you make the logic a lot simple, no
class or chardev logic to mess with at all.

thanks,

greg k-h
Jeffrey Hugo May 14, 2020, 4:24 p.m. UTC | #2
On 5/14/2020 9:56 AM, Greg KH wrote:
> On Thu, May 14, 2020 at 09:05:30AM -0600, Jeffrey Hugo wrote:
>> Wow, thank you for the near immediate response.  I'm am quite impressed.
>>
>> On 5/14/2020 8:12 AM, Greg KH wrote:
>>> On Thu, May 14, 2020 at 08:07:41AM -0600, Jeffrey Hugo wrote:
>>>>    /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
>>>> +#include <linux/cdev.h>
>>>> +#include <linux/idr.h>
>>>> +#include <linux/list.h>
>>>> +#include <linux/kref.h>
>>>> +#include <linux/mhi.h>
>>>>    #include <linux/module.h>
>>>>    #include <linux/msi.h>
>>>> +#include <linux/mutex.h>
>>>>    #include <linux/pci.h>
>>>>    #include <linux/pci_ids.h>
>>>> @@ -13,9 +19,242 @@
>>>>    #define PCI_DEV_AIC100			0xa100
>>>>    #define QAIC_NAME			"Qualcomm Cloud AI 100"
>>>> +#define QAIC_MAX_MINORS			256
>>>
>>> Why have a max?
>>>
>>> Why not just use a misc device so you make the logic a lot simple, no
>>> class or chardev logic to mess with at all.
>>
>> It was our understanding that the preference is not to add new misc devices.
> 
> Huh, who said that?  Not the char/misc maintainer (i.e. me) :)
> 
>> As I go and try to find a supporting reference for that, I cannot find one,
>> so I'm not entirely sure where that idea came from.
>>
>> In addition, we see that the Habana Labs driver also uses chardev, and has
>> chosen the same max.  We assumed that since their driver is already
>> accepted, using the same mechanisms where applicable would be the preferred
>> approach.
> 
> They had good reasons why not to use a chardev and convinced me of it.
> If you can't come up with them, then stick with a misc for now please.

Interesting.  I didn't see any discussion on this.

>> Specific to the max, 256 was chosen as being a factor larger than the
>> largest system we have, therefore we figured it wouldn't be hit for a long
>> while even as we try to have a look at what might happen down the road.
>> Looking at the Habana code, it looks like they have the same value for much
>> of the same reasons, although their usecases may vary from ours somewhat.
> 
> Max numbers for no good reason are not a good thing to have.
> 
>> At this time, I don't think we have a strong requirement for a chardev, so
>> we could investigate a switch over to a misc dev if you would prefer that
>> over following the Habana Labs precedent.  All I ask is a confirmation that
>> is the approach you would like to see going forward after reviewing the
>> above.
> 
> Please use misc.

Ok, will investigate.
Greg KH May 16, 2020, 7:01 a.m. UTC | #3
On Fri, May 15, 2020 at 03:08:59PM -0600, Jeffrey Hugo wrote:
> 2. There are a limited number of dynamic minor numbers for misc devs (64),
> so if you are expecting more devices than that, a misc dev is not
> appropiate.  Also, these minors are shared with other misc dev users, so
> depending on the system configuration, you might have significantly less
> than 64 minors available for use.

I'm pretty sure we can have more than 64 misc devices, that limitation
should have been removed a while ago.  Try it and see :)

greg k-h
Jeffrey Hugo May 16, 2020, 9:29 p.m. UTC | #4
On 5/16/2020 1:01 AM, Greg KH wrote:
> On Fri, May 15, 2020 at 03:08:59PM -0600, Jeffrey Hugo wrote:
>> 2. There are a limited number of dynamic minor numbers for misc devs (64),
>> so if you are expecting more devices than that, a misc dev is not
>> appropiate.  Also, these minors are shared with other misc dev users, so
>> depending on the system configuration, you might have significantly less
>> than 64 minors available for use.
> 
> I'm pretty sure we can have more than 64 misc devices, that limitation
> should have been removed a while ago.  Try it and see :)

In total, there can be more tha 64 misc devices.  However my previous 
comment was specific to dynamic minors (ie devices which do not have an 
assigned minor).  The limit on dynamic minors still apears to be 64. 
Looking at the code -

DYNAMIC_MINORS is still 64
https://elixir.bootlin.com/linux/v5.7-rc5/source/drivers/char/misc.c#L63

I see the same in -next

DYNAMIC_MINORS is used to size a bitmap - one bit for each dynamic minor 
misc device that exists at one particular point in time.  After all 64 
bits are consumed by misc_register() by clients requesting a dynamic 
minor, no more dynamic minor misc devices can be registered until some 
are unregistered.

What am I missing?
diff mbox series

Patch

diff --git a/drivers/misc/qaic/qaic.h b/drivers/misc/qaic/qaic.h
index 379aa82..58ca167 100644
--- a/drivers/misc/qaic/qaic.h
+++ b/drivers/misc/qaic/qaic.h
@@ -6,13 +6,32 @@ 
 #ifndef QAICINTERNAL_H_
 #define QAICINTERNAL_H_
 
+#include <linux/cdev.h>
+#include <linux/kref.h>
 #include <linux/mhi.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/srcu.h>
+
+struct qaic_user {
+	pid_t			handle;
+	struct qaic_device	*qdev;
+	struct list_head	node;
+	struct srcu_struct	qdev_lock;
+	struct kref		ref_count;
+};
 
 struct qaic_device {
 	struct pci_dev		*pdev;
 	int			bars;
 	void __iomem		*bar_0;
 	struct mhi_controller	*mhi_cntl;
+	struct mhi_device	*cntl_ch;
+	struct cdev		*cdev;
+	struct device		*dev;
+	bool			in_reset;
+	struct srcu_struct	dev_lock;
+	struct list_head	users;
+	struct mutex		users_mutex;
 };
 #endif /* QAICINTERNAL_H_ */
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index b624daa..6e4b936 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -2,8 +2,14 @@ 
 
 /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
 
+#include <linux/cdev.h>
+#include <linux/idr.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/mhi.h>
 #include <linux/module.h>
 #include <linux/msi.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/pci_ids.h>
 
@@ -13,9 +19,242 @@ 
 #define PCI_DEV_AIC100			0xa100
 
 #define QAIC_NAME			"Qualcomm Cloud AI 100"
+#define QAIC_MAX_MINORS			256
 
+static int qaic_major;
+static struct class *qaic_class;
+static DEFINE_IDR(qaic_devs);
+static DEFINE_MUTEX(qaic_devs_lock);
 static bool link_up;
 
+static int qaic_device_open(struct inode *inode, struct file *filp);
+static int qaic_device_release(struct inode *inode, struct file *filp);
+
+static const struct file_operations qaic_ops = {
+	.owner = THIS_MODULE,
+	.open = qaic_device_open,
+	.release = qaic_device_release,
+};
+
+static void free_usr(struct kref *kref)
+{
+	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
+
+	list_del(&usr->node);
+	cleanup_srcu_struct(&usr->qdev_lock);
+	kfree(usr);
+}
+
+static int qaic_device_open(struct inode *inode, struct file *filp)
+{
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	int rcu_id;
+	int ret;
+
+	ret = mutex_lock_interruptible(&qaic_devs_lock);
+	if (ret)
+		return ret;
+	qdev = idr_find(&qaic_devs, iminor(inode));
+	mutex_unlock(&qaic_devs_lock);
+
+	pci_dbg(qdev->pdev, "%s pid:%d\n", __func__, current->pid);
+
+	rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return -ENODEV;
+	}
+
+	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
+	if (!usr)
+		return -ENOMEM;
+
+	usr->handle = current->pid;
+	usr->qdev = qdev;
+	init_srcu_struct(&usr->qdev_lock);
+	kref_init(&usr->ref_count);
+
+	ret = mutex_lock_interruptible(&qdev->users_mutex);
+	if (ret) {
+		cleanup_srcu_struct(&usr->qdev_lock);
+		kfree(usr);
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return ret;
+	}
+
+	list_add(&usr->node, &qdev->users);
+	mutex_unlock(&qdev->users_mutex);
+
+	filp->private_data = usr;
+	nonseekable_open(inode, filp);
+
+	srcu_read_unlock(&qdev->dev_lock, rcu_id);
+	return 0;
+}
+
+static int qaic_device_release(struct inode *inode, struct file *filp)
+{
+	struct qaic_user *usr = filp->private_data;
+	struct qaic_device *qdev = usr->qdev;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+
+	usr_rcu_id = srcu_read_lock(&usr->qdev_lock);
+	if (qdev) {
+		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+		if (!qdev->in_reset) {
+			pci_dbg(qdev->pdev, "%s pid:%d\n", __func__,
+								current->pid);
+		}
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		mutex_lock(&qdev->users_mutex);
+		kref_put(&usr->ref_count, free_usr);
+		mutex_unlock(&qdev->users_mutex);
+	} else {
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		/* safe to do without the mutex because reset already has ref */
+		kref_put(&usr->ref_count, free_usr);
+	}
+
+	filp->private_data = NULL;
+	return 0;
+}
+
+static int qaic_mhi_probe(struct mhi_device *mhi_dev,
+			  const struct mhi_device_id *id)
+{
+	struct qaic_device *qdev;
+	dev_t devno;
+	int ret;
+
+	/*
+	 * Invoking this function indicates that the control channel to the
+	 * device is available.  We use that as a signal to indicate that
+	 * the device side firmware has booted.  The device side firmware
+	 * manages the device resources, so we need to communicate with it
+	 * via the control channel in order to utilize the device.  Therefore
+	 * we wait until this signal to create the char dev that userspace will
+	 * use to control the device, because without the device side firmware,
+	 * userspace can't do anything useful.
+	 */
+
+	qdev = (struct qaic_device *)pci_get_drvdata(
+				to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+
+	pci_dbg(qdev->pdev, "%s\n", __func__);
+	qdev->in_reset = false;
+
+	dev_set_drvdata(&mhi_dev->dev, qdev);
+	qdev->cntl_ch = mhi_dev;
+
+	mutex_lock(&qaic_devs_lock);
+	ret = idr_alloc(&qaic_devs, qdev, 0, QAIC_MAX_MINORS, GFP_KERNEL);
+	mutex_unlock(&qaic_devs_lock);
+
+	if (ret < 0) {
+		pci_dbg(qdev->pdev, "%s: idr_alloc failed %d\n", __func__, ret);
+		goto err;
+	}
+
+	devno = MKDEV(qaic_major, ret);
+
+	qdev->cdev = cdev_alloc();
+	if (!qdev->cdev) {
+		pci_dbg(qdev->pdev, "%s: cdev_alloc failed\n", __func__);
+		ret = -ENOMEM;
+		goto free_idr;
+	}
+
+	qdev->cdev->owner = THIS_MODULE;
+	qdev->cdev->ops = &qaic_ops;
+	ret = cdev_add(qdev->cdev, devno, 1);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: cdev_add failed %d\n", __func__, ret);
+		goto free_cdev;
+	}
+
+	qdev->dev = device_create(qaic_class, NULL, devno, NULL,
+				  "qaic_aic100_%04x:%02x:%02x.%d",
+				  pci_domain_nr(qdev->pdev->bus),
+				  qdev->pdev->bus->number,
+				  PCI_SLOT(qdev->pdev->devfn),
+				  PCI_FUNC(qdev->pdev->devfn));
+	if (IS_ERR(qdev->dev)) {
+		ret = PTR_ERR(qdev->dev);
+		pci_dbg(qdev->pdev, "%s: device_create failed %d\n", __func__, ret);
+		goto free_cdev;
+	}
+
+	dev_set_drvdata(qdev->dev, qdev);
+
+	return 0;
+
+free_cdev:
+	cdev_del(qdev->cdev);
+free_idr:
+	mutex_lock(&qaic_devs_lock);
+	idr_remove(&qaic_devs, MINOR(devno));
+	mutex_unlock(&qaic_devs_lock);
+err:
+	return ret;
+}
+
+static void qaic_mhi_remove(struct mhi_device *mhi_dev)
+{
+}
+
+static void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+				struct mhi_result *mhi_result)
+{
+}
+
+static void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+				struct mhi_result *mhi_result)
+{
+}
+
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
+{
+	struct qaic_user *usr;
+	struct qaic_user *u;
+	dev_t devno;
+
+	qdev->in_reset = true;
+	synchronize_srcu(&qdev->dev_lock);
+
+	/*
+	 * while the usr still has access to the qdev, use the mutex to add
+	 * a reference for later.  This makes sure the usr can't disappear on
+	 * us at the wrong time.  The mutex use in close() system call handling
+	 * makes sure the usr will be valid or complete not exist here.
+	 */
+	mutex_lock(&qdev->users_mutex);
+	list_for_each_entry_safe(usr, u, &qdev->users, node)
+		kref_get(&usr->ref_count);
+	mutex_unlock(&qdev->users_mutex);
+
+	/* remove chardev to prevent new users from coming in */
+	if (qdev->dev) {
+		devno = qdev->dev->devt;
+		qdev->dev = NULL;
+		device_destroy(qaic_class, devno);
+		cdev_del(qdev->cdev);
+		mutex_lock(&qaic_devs_lock);
+		idr_remove(&qaic_devs, MINOR(devno));
+		mutex_unlock(&qaic_devs_lock);
+	}
+
+	/* make existing users get unresolvable errors until they close FDs */
+	list_for_each_entry_safe(usr, u, &qdev->users, node) {
+		usr->qdev = NULL;
+		synchronize_srcu(&usr->qdev_lock);
+		kref_put(&usr->ref_count, free_usr);
+	}
+}
+
 static int qaic_pci_probe(struct pci_dev *pdev,
 			  const struct pci_device_id *id)
 {
@@ -33,6 +272,9 @@  static int qaic_pci_probe(struct pci_dev *pdev,
 
 	pci_set_drvdata(pdev, qdev);
 	qdev->pdev = pdev;
+	init_srcu_struct(&qdev->dev_lock);
+	INIT_LIST_HEAD(&qdev->users);
+	mutex_init(&qdev->users_mutex);
 
 	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
 
@@ -107,6 +349,7 @@  static int qaic_pci_probe(struct pci_dev *pdev,
 enable_fail:
 	pci_set_drvdata(pdev, NULL);
 bar_fail:
+	cleanup_srcu_struct(&qdev->dev_lock);
 	kfree(qdev);
 qdev_fail:
 	return ret;
@@ -120,6 +363,7 @@  static void qaic_pci_remove(struct pci_dev *pdev)
 	if (!qdev)
 		return;
 
+	qaic_dev_reset_clean_local_state(qdev);
 	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
 	pci_free_irq_vectors(pdev);
 	iounmap(qdev->bar_0);
@@ -127,9 +371,27 @@  static void qaic_pci_remove(struct pci_dev *pdev)
 	pci_release_selected_regions(pdev, qdev->bars);
 	pci_disable_device(pdev);
 	pci_set_drvdata(pdev, NULL);
+	cleanup_srcu_struct(&qdev->dev_lock);
 	kfree(qdev);
 }
 
+static const struct mhi_device_id qaic_mhi_match_table[] = {
+	{ .chan = "QAIC_CONTROL", },
+	{},
+};
+
+static struct mhi_driver qaic_mhi_driver = {
+	.id_table = qaic_mhi_match_table,
+	.remove = qaic_mhi_remove,
+	.probe = qaic_mhi_probe,
+	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
+	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
+	.driver = {
+		.name = "qaic_mhi",
+		.owner = THIS_MODULE,
+	},
+};
+
 static const struct pci_device_id ids[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
 	{ 0, }
@@ -146,13 +408,48 @@  static struct pci_driver qaic_pci_driver = {
 static int __init qaic_init(void)
 {
 	int ret;
+	dev_t dev;
 
 	pr_debug("qaic: init\n");
 
+	ret = alloc_chrdev_region(&dev, 0, QAIC_MAX_MINORS, QAIC_NAME);
+	if (ret < 0) {
+		pr_debug("qaic: alloc_chrdev_region failed %d\n", ret);
+		goto out;
+	}
+
+	qaic_major = MAJOR(dev);
+
+	qaic_class = class_create(THIS_MODULE, QAIC_NAME);
+	if (IS_ERR(qaic_class)) {
+		ret = PTR_ERR(qaic_class);
+		pr_debug("qaic: class_create failed %d\n", ret);
+		goto free_major;
+	}
+
+	ret = mhi_driver_register(&qaic_mhi_driver);
+	if (ret) {
+		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
+		goto free_class;
+	}
+
 	ret = pci_register_driver(&qaic_pci_driver);
 
-	pr_debug("qaic: init success\n");
+	if (ret) {
+		pr_debug("qaic: pci_register_driver failed %d\n", ret);
+		goto free_mhi;
+	}
 
+	pr_debug("qaic: init success\n");
+	goto out;
+
+free_mhi:
+	mhi_driver_unregister(&qaic_mhi_driver);
+free_class:
+	class_destroy(qaic_class);
+free_major:
+	unregister_chrdev_region(MKDEV(qaic_major, 0), QAIC_MAX_MINORS);
+out:
 	return ret;
 }
 
@@ -161,6 +458,10 @@  static void __exit qaic_exit(void)
 	pr_debug("qaic: exit\n");
 	link_up = true;
 	pci_unregister_driver(&qaic_pci_driver);
+	mhi_driver_unregister(&qaic_mhi_driver);
+	class_destroy(qaic_class);
+	unregister_chrdev_region(MKDEV(qaic_major, 0), QAIC_MAX_MINORS);
+	idr_destroy(&qaic_devs);
 }
 
 module_init(qaic_init);