Message ID | 20220419113435.246203-13-krzysztof.kozlowski@linaro.org |
---|---|
State | Accepted |
Commit | 42cd402b8fd4672b692400fe5f9eecd55d2794ac |
Headers | show |
Series | Fix broken usage of driver_override (and kfree of static memory) | expand |
Hi Krzysztof, On 19.04.2022 13:34, Krzysztof Kozlowski wrote: > The driver_override field from platform driver should not be initialized > from static memory (string literal) because the core later kfree() it, > for example when driver_override is set via sysfs. > > Use dedicated helper to set driver_override properly. > > Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") > Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> This patch landed recently in linux-next as commit 42cd402b8fd4 ("rpmsg: Fix kfree() of static memory on setting driver_override"). In my tests I found that it triggers the following issue during boot of the DragonBoard410c SBC (arch/arm64/boot/dts/qcom/apq8016-sbc.dtb): ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 1 PID: 8 at kernel/locking/mutex.c:582 __mutex_lock+0x1ec/0x430 Modules linked in: CPU: 1 PID: 8 Comm: kworker/u8:0 Not tainted 5.18.0-rc4-next-20220429 #11815 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) Workqueue: events_unbound deferred_probe_work_func pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __mutex_lock+0x1ec/0x430 lr : __mutex_lock+0x1ec/0x430 .. Call trace: __mutex_lock+0x1ec/0x430 mutex_lock_nested+0x38/0x64 driver_set_override+0x124/0x150 qcom_smd_register_edge+0x2a8/0x4ec qcom_smd_probe+0x54/0x80 platform_probe+0x68/0xe0 really_probe.part.0+0x9c/0x29c __driver_probe_device+0x98/0x144 driver_probe_device+0xac/0x14c __device_attach_driver+0xb8/0x120 bus_for_each_drv+0x78/0xd0 __device_attach+0xd8/0x180 device_initial_probe+0x14/0x20 bus_probe_device+0x9c/0xa4 deferred_probe_work_func+0x88/0xc4 process_one_work+0x288/0x6bc worker_thread+0x248/0x450 kthread+0x118/0x11c ret_from_fork+0x10/0x20 irq event stamp: 3599 hardirqs last enabled at (3599): [<ffff80000919053c>] _raw_spin_unlock_irqrestore+0x98/0x9c hardirqs last disabled at (3598): [<ffff800009190ba4>] _raw_spin_lock_irqsave+0xc0/0xcc softirqs last enabled at (3554): [<ffff800008010470>] _stext+0x470/0x5e8 softirqs last disabled at (3549): [<ffff8000080a4514>] __irq_exit_rcu+0x180/0x1ac ---[ end trace 0000000000000000 ]--- I don't see any direct relation between the $subject and the above log, but reverting the $subject on top of linux next-20220429 hides/fixes it. Maybe there is a kind of memory trashing somewhere there and your change only revealed it? > --- > drivers/rpmsg/rpmsg_internal.h | 13 +++++++++++-- > drivers/rpmsg/rpmsg_ns.c | 14 ++++++++++++-- > include/linux/rpmsg.h | 6 ++++-- > 3 files changed, 27 insertions(+), 6 deletions(-) > > diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h > index d4b23fd019a8..3e81642238d2 100644 > --- a/drivers/rpmsg/rpmsg_internal.h > +++ b/drivers/rpmsg/rpmsg_internal.h > @@ -94,10 +94,19 @@ int rpmsg_release_channel(struct rpmsg_device *rpdev, > */ > static inline int rpmsg_ctrldev_register_device(struct rpmsg_device *rpdev) > { > + int ret; > + > strcpy(rpdev->id.name, "rpmsg_ctrl"); > - rpdev->driver_override = "rpmsg_ctrl"; > + ret = driver_set_override(&rpdev->dev, &rpdev->driver_override, > + rpdev->id.name, strlen(rpdev->id.name)); > + if (ret) > + return ret; > + > + ret = rpmsg_register_device(rpdev); > + if (ret) > + kfree(rpdev->driver_override); > > - return rpmsg_register_device(rpdev); > + return ret; > } > > #endif > diff --git a/drivers/rpmsg/rpmsg_ns.c b/drivers/rpmsg/rpmsg_ns.c > index 762ff1ae279f..8eb8f328237e 100644 > --- a/drivers/rpmsg/rpmsg_ns.c > +++ b/drivers/rpmsg/rpmsg_ns.c > @@ -20,12 +20,22 @@ > */ > int rpmsg_ns_register_device(struct rpmsg_device *rpdev) > { > + int ret; > + > strcpy(rpdev->id.name, "rpmsg_ns"); > - rpdev->driver_override = "rpmsg_ns"; > + ret = driver_set_override(&rpdev->dev, &rpdev->driver_override, > + rpdev->id.name, strlen(rpdev->id.name)); > + if (ret) > + return ret; > + > rpdev->src = RPMSG_NS_ADDR; > rpdev->dst = RPMSG_NS_ADDR; > > - return rpmsg_register_device(rpdev); > + ret = rpmsg_register_device(rpdev); > + if (ret) > + kfree(rpdev->driver_override); > + > + return ret; > } > EXPORT_SYMBOL(rpmsg_ns_register_device); > > diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h > index 02fa9116cd60..20c8cd1cde21 100644 > --- a/include/linux/rpmsg.h > +++ b/include/linux/rpmsg.h > @@ -41,7 +41,9 @@ struct rpmsg_channel_info { > * rpmsg_device - device that belong to the rpmsg bus > * @dev: the device struct > * @id: device id (used to match between rpmsg drivers and devices) > - * @driver_override: driver name to force a match > + * @driver_override: driver name to force a match; do not set directly, > + * because core frees it; use driver_set_override() to > + * set or clear it. > * @src: local address > * @dst: destination address > * @ept: the rpmsg endpoint of this channel > @@ -51,7 +53,7 @@ struct rpmsg_channel_info { > struct rpmsg_device { > struct device dev; > struct rpmsg_device_id id; > - char *driver_override; > + const char *driver_override; > u32 src; > u32 dst; > struct rpmsg_endpoint *ept; Best regards
On 29/04/2022 14:29, Marek Szyprowski wrote: > Hi Krzysztof, > > On 19.04.2022 13:34, Krzysztof Kozlowski wrote: >> The driver_override field from platform driver should not be initialized >> from static memory (string literal) because the core later kfree() it, >> for example when driver_override is set via sysfs. >> >> Use dedicated helper to set driver_override properly. >> >> Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") >> Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") >> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> >> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > This patch landed recently in linux-next as commit 42cd402b8fd4 ("rpmsg: > Fix kfree() of static memory on setting driver_override"). In my tests I > found that it triggers the following issue during boot of the > DragonBoard410c SBC (arch/arm64/boot/dts/qcom/apq8016-sbc.dtb): > > ------------[ cut here ]------------ > DEBUG_LOCKS_WARN_ON(lock->magic != lock) > WARNING: CPU: 1 PID: 8 at kernel/locking/mutex.c:582 > __mutex_lock+0x1ec/0x430 > Modules linked in: > CPU: 1 PID: 8 Comm: kworker/u8:0 Not tainted 5.18.0-rc4-next-20220429 #11815 > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > Workqueue: events_unbound deferred_probe_work_func > pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : __mutex_lock+0x1ec/0x430 > lr : __mutex_lock+0x1ec/0x430 > .. > Call trace: > __mutex_lock+0x1ec/0x430 > mutex_lock_nested+0x38/0x64 > driver_set_override+0x124/0x150 > qcom_smd_register_edge+0x2a8/0x4ec > qcom_smd_probe+0x54/0x80 > platform_probe+0x68/0xe0 > really_probe.part.0+0x9c/0x29c > __driver_probe_device+0x98/0x144 > driver_probe_device+0xac/0x14c > __device_attach_driver+0xb8/0x120 > bus_for_each_drv+0x78/0xd0 > __device_attach+0xd8/0x180 > device_initial_probe+0x14/0x20 > bus_probe_device+0x9c/0xa4 > deferred_probe_work_func+0x88/0xc4 > process_one_work+0x288/0x6bc > worker_thread+0x248/0x450 > kthread+0x118/0x11c > ret_from_fork+0x10/0x20 > irq event stamp: 3599 > hardirqs last enabled at (3599): [<ffff80000919053c>] > _raw_spin_unlock_irqrestore+0x98/0x9c > hardirqs last disabled at (3598): [<ffff800009190ba4>] > _raw_spin_lock_irqsave+0xc0/0xcc > softirqs last enabled at (3554): [<ffff800008010470>] _stext+0x470/0x5e8 > softirqs last disabled at (3549): [<ffff8000080a4514>] > __irq_exit_rcu+0x180/0x1ac > ---[ end trace 0000000000000000 ]--- > > I don't see any direct relation between the $subject and the above log, > but reverting the $subject on top of linux next-20220429 hides/fixes it. > Maybe there is a kind of memory trashing somewhere there and your change > only revealed it? Thanks for the report. I think the error path of my patch is wrong - I should not kfree(rpdev->driver_override) from the rpmsg code. That's the only thing I see now... Could you test following patch and tell if it helps? https://pastebin.ubuntu.com/p/rp3q9Z5fXj/ ----- diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h index 3e81642238d2..1e2ad944e2ec 100644 --- a/drivers/rpmsg/rpmsg_internal.h +++ b/drivers/rpmsg/rpmsg_internal.h @@ -102,11 +102,7 @@ static inline int rpmsg_ctrldev_register_device(struct rpmsg_device *rpdev) if (ret) return ret; - ret = rpmsg_register_device(rpdev); - if (ret) - kfree(rpdev->driver_override); - - return ret; + return rpmsg_register_device(rpdev); } #endif diff --git a/drivers/rpmsg/rpmsg_ns.c b/drivers/rpmsg/rpmsg_ns.c index 8eb8f328237e..f26078467899 100644 --- a/drivers/rpmsg/rpmsg_ns.c +++ b/drivers/rpmsg/rpmsg_ns.c @@ -31,11 +31,7 @@ int rpmsg_ns_register_device(struct rpmsg_device *rpdev) rpdev->src = RPMSG_NS_ADDR; rpdev->dst = RPMSG_NS_ADDR; - ret = rpmsg_register_device(rpdev); - if (ret) - kfree(rpdev->driver_override); - - return ret; + return rpmsg_register_device(rpdev); } EXPORT_SYMBOL(rpmsg_ns_register_device);
On 29.04.2022 16:16, Krzysztof Kozlowski wrote: > On 29/04/2022 14:29, Marek Szyprowski wrote: >> On 19.04.2022 13:34, Krzysztof Kozlowski wrote: >>> The driver_override field from platform driver should not be initialized >>> from static memory (string literal) because the core later kfree() it, >>> for example when driver_override is set via sysfs. >>> >>> Use dedicated helper to set driver_override properly. >>> >>> Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") >>> Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") >>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> >>> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> >> This patch landed recently in linux-next as commit 42cd402b8fd4 ("rpmsg: >> Fix kfree() of static memory on setting driver_override"). In my tests I >> found that it triggers the following issue during boot of the >> DragonBoard410c SBC (arch/arm64/boot/dts/qcom/apq8016-sbc.dtb): >> >> ------------[ cut here ]------------ >> DEBUG_LOCKS_WARN_ON(lock->magic != lock) >> WARNING: CPU: 1 PID: 8 at kernel/locking/mutex.c:582 >> __mutex_lock+0x1ec/0x430 >> Modules linked in: >> CPU: 1 PID: 8 Comm: kworker/u8:0 Not tainted 5.18.0-rc4-next-20220429 #11815 >> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >> Workqueue: events_unbound deferred_probe_work_func >> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : __mutex_lock+0x1ec/0x430 >> lr : __mutex_lock+0x1ec/0x430 >> .. >> Call trace: >> __mutex_lock+0x1ec/0x430 >> mutex_lock_nested+0x38/0x64 >> driver_set_override+0x124/0x150 >> qcom_smd_register_edge+0x2a8/0x4ec >> qcom_smd_probe+0x54/0x80 >> platform_probe+0x68/0xe0 >> really_probe.part.0+0x9c/0x29c >> __driver_probe_device+0x98/0x144 >> driver_probe_device+0xac/0x14c >> __device_attach_driver+0xb8/0x120 >> bus_for_each_drv+0x78/0xd0 >> __device_attach+0xd8/0x180 >> device_initial_probe+0x14/0x20 >> bus_probe_device+0x9c/0xa4 >> deferred_probe_work_func+0x88/0xc4 >> process_one_work+0x288/0x6bc >> worker_thread+0x248/0x450 >> kthread+0x118/0x11c >> ret_from_fork+0x10/0x20 >> irq event stamp: 3599 >> hardirqs last enabled at (3599): [<ffff80000919053c>] >> _raw_spin_unlock_irqrestore+0x98/0x9c >> hardirqs last disabled at (3598): [<ffff800009190ba4>] >> _raw_spin_lock_irqsave+0xc0/0xcc >> softirqs last enabled at (3554): [<ffff800008010470>] _stext+0x470/0x5e8 >> softirqs last disabled at (3549): [<ffff8000080a4514>] >> __irq_exit_rcu+0x180/0x1ac >> ---[ end trace 0000000000000000 ]--- >> >> I don't see any direct relation between the $subject and the above log, >> but reverting the $subject on top of linux next-20220429 hides/fixes it. >> Maybe there is a kind of memory trashing somewhere there and your change >> only revealed it? > Thanks for the report. I think the error path of my patch is wrong - I > should not kfree(rpdev->driver_override) from the rpmsg code. That's the > only thing I see now... > > Could you test following patch and tell if it helps? > https://pastebin.ubuntu.com/p/rp3q9Z5fXj/ This doesn't help, the issue is still reported. Best regards
On 29/04/2022 16:51, Marek Szyprowski wrote: > On 29.04.2022 16:16, Krzysztof Kozlowski wrote: >> On 29/04/2022 14:29, Marek Szyprowski wrote: >>> On 19.04.2022 13:34, Krzysztof Kozlowski wrote: >>>> The driver_override field from platform driver should not be initialized >>>> from static memory (string literal) because the core later kfree() it, >>>> for example when driver_override is set via sysfs. >>>> >>>> Use dedicated helper to set driver_override properly. >>>> >>>> Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") >>>> Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") >>>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> >>>> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> >>> This patch landed recently in linux-next as commit 42cd402b8fd4 ("rpmsg: >>> Fix kfree() of static memory on setting driver_override"). In my tests I >>> found that it triggers the following issue during boot of the >>> DragonBoard410c SBC (arch/arm64/boot/dts/qcom/apq8016-sbc.dtb): >>> >>> ------------[ cut here ]------------ >>> DEBUG_LOCKS_WARN_ON(lock->magic != lock) >>> WARNING: CPU: 1 PID: 8 at kernel/locking/mutex.c:582 >>> __mutex_lock+0x1ec/0x430 >>> Modules linked in: >>> CPU: 1 PID: 8 Comm: kworker/u8:0 Not tainted 5.18.0-rc4-next-20220429 #11815 >>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >>> Workqueue: events_unbound deferred_probe_work_func >>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> pc : __mutex_lock+0x1ec/0x430 >>> lr : __mutex_lock+0x1ec/0x430 >>> .. >>> Call trace: >>> __mutex_lock+0x1ec/0x430 >>> mutex_lock_nested+0x38/0x64 >>> driver_set_override+0x124/0x150 >>> qcom_smd_register_edge+0x2a8/0x4ec >>> qcom_smd_probe+0x54/0x80 >>> platform_probe+0x68/0xe0 >>> really_probe.part.0+0x9c/0x29c >>> __driver_probe_device+0x98/0x144 >>> driver_probe_device+0xac/0x14c >>> __device_attach_driver+0xb8/0x120 >>> bus_for_each_drv+0x78/0xd0 >>> __device_attach+0xd8/0x180 >>> device_initial_probe+0x14/0x20 >>> bus_probe_device+0x9c/0xa4 >>> deferred_probe_work_func+0x88/0xc4 >>> process_one_work+0x288/0x6bc >>> worker_thread+0x248/0x450 >>> kthread+0x118/0x11c >>> ret_from_fork+0x10/0x20 >>> irq event stamp: 3599 >>> hardirqs last enabled at (3599): [<ffff80000919053c>] >>> _raw_spin_unlock_irqrestore+0x98/0x9c >>> hardirqs last disabled at (3598): [<ffff800009190ba4>] >>> _raw_spin_lock_irqsave+0xc0/0xcc >>> softirqs last enabled at (3554): [<ffff800008010470>] _stext+0x470/0x5e8 >>> softirqs last disabled at (3549): [<ffff8000080a4514>] >>> __irq_exit_rcu+0x180/0x1ac >>> ---[ end trace 0000000000000000 ]--- >>> >>> I don't see any direct relation between the $subject and the above log, >>> but reverting the $subject on top of linux next-20220429 hides/fixes it. >>> Maybe there is a kind of memory trashing somewhere there and your change >>> only revealed it? >> Thanks for the report. I think the error path of my patch is wrong - I >> should not kfree(rpdev->driver_override) from the rpmsg code. That's the >> only thing I see now... >> >> Could you test following patch and tell if it helps? >> https://pastebin.ubuntu.com/p/rp3q9Z5fXj/ > > This doesn't help, the issue is still reported. I think I screwed this part of code. The new helper uses device_lock() (the mutexes you see in backtrace) but in rpmsg it is called before device_register() which initializes the device. I don't have a device using qcom-smd rpmsg, so it's a bit tricky to reproduce. Best regards, Krzysztof
diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h index d4b23fd019a8..3e81642238d2 100644 --- a/drivers/rpmsg/rpmsg_internal.h +++ b/drivers/rpmsg/rpmsg_internal.h @@ -94,10 +94,19 @@ int rpmsg_release_channel(struct rpmsg_device *rpdev, */ static inline int rpmsg_ctrldev_register_device(struct rpmsg_device *rpdev) { + int ret; + strcpy(rpdev->id.name, "rpmsg_ctrl"); - rpdev->driver_override = "rpmsg_ctrl"; + ret = driver_set_override(&rpdev->dev, &rpdev->driver_override, + rpdev->id.name, strlen(rpdev->id.name)); + if (ret) + return ret; + + ret = rpmsg_register_device(rpdev); + if (ret) + kfree(rpdev->driver_override); - return rpmsg_register_device(rpdev); + return ret; } #endif diff --git a/drivers/rpmsg/rpmsg_ns.c b/drivers/rpmsg/rpmsg_ns.c index 762ff1ae279f..8eb8f328237e 100644 --- a/drivers/rpmsg/rpmsg_ns.c +++ b/drivers/rpmsg/rpmsg_ns.c @@ -20,12 +20,22 @@ */ int rpmsg_ns_register_device(struct rpmsg_device *rpdev) { + int ret; + strcpy(rpdev->id.name, "rpmsg_ns"); - rpdev->driver_override = "rpmsg_ns"; + ret = driver_set_override(&rpdev->dev, &rpdev->driver_override, + rpdev->id.name, strlen(rpdev->id.name)); + if (ret) + return ret; + rpdev->src = RPMSG_NS_ADDR; rpdev->dst = RPMSG_NS_ADDR; - return rpmsg_register_device(rpdev); + ret = rpmsg_register_device(rpdev); + if (ret) + kfree(rpdev->driver_override); + + return ret; } EXPORT_SYMBOL(rpmsg_ns_register_device); diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h index 02fa9116cd60..20c8cd1cde21 100644 --- a/include/linux/rpmsg.h +++ b/include/linux/rpmsg.h @@ -41,7 +41,9 @@ struct rpmsg_channel_info { * rpmsg_device - device that belong to the rpmsg bus * @dev: the device struct * @id: device id (used to match between rpmsg drivers and devices) - * @driver_override: driver name to force a match + * @driver_override: driver name to force a match; do not set directly, + * because core frees it; use driver_set_override() to + * set or clear it. * @src: local address * @dst: destination address * @ept: the rpmsg endpoint of this channel @@ -51,7 +53,7 @@ struct rpmsg_channel_info { struct rpmsg_device { struct device dev; struct rpmsg_device_id id; - char *driver_override; + const char *driver_override; u32 src; u32 dst; struct rpmsg_endpoint *ept;