diff mbox series

media: qcom: camss: fix error path on configuration of power domains

Message ID 20240806221204.1560258-1-vladimir.zapolskiy@linaro.org
State Superseded
Headers show
Series media: qcom: camss: fix error path on configuration of power domains | expand

Commit Message

Vladimir Zapolskiy Aug. 6, 2024, 10:12 p.m. UTC
There is a chance to meet runtime issues during configuration of CAMSS
power domains, because on the error path dev_pm_domain_detach() is
unexpectedly called with NULL or error pointer.

Fixes: 23aa4f0cd327 ("media: qcom: camss: Move VFE power-domain specifics into vfe.c")
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
---
 drivers/media/platform/qcom/camss/camss.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

Comments

Vladimir Zapolskiy Aug. 6, 2024, 11:27 p.m. UTC | #1
Hi Bryan.

On 8/7/24 02:15, Bryan O'Donoghue wrote:
> On 06/08/2024 23:12, Vladimir Zapolskiy wrote:
>> There is a chance to meet runtime issues during configuration of CAMSS
>> power domains, because on the error path dev_pm_domain_detach() is
>> unexpectedly called with NULL or error pointer.
>>
>> Fixes: 23aa4f0cd327 ("media: qcom: camss: Move VFE power-domain specifics into vfe.c")
>> Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
> 
> Have you tested this with and without named power domains in your dts ?
> The logic here is complex to support both the legacy non-named case and
> the updated named required case.

The problem and the fix are pretty straightforward, if you notice any issues
with it, please let me know.

As it's said in the commit description the problem is unrelated to named/not named
power domains, I tested the fix only on a platform without "power-domain-names"
property in camss device tree node.

> Could you also provide a backtrace of a failing camss_configure_pd() for
> the commit log.

Sure, I believe anyone can get a backtrace simply by disabling camcc at build time,
so that camss power domain supplies disappear:

[   13.541205] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001a2
[   13.550224] Mem abort info:
[   13.553110]   ESR = 0x0000000096000004
[   13.556975]   EC = 0x25: DABT (current EL), IL = 32 bits
[   13.562438]   SET = 0, FnV = 0
[   13.565580]   EA = 0, S1PTW = 0
[   13.568813]   FSC = 0x04: level 0 translation fault
[   13.573824] Data abort info:
[   13.576787]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[   13.582424]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   13.587614]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   13.593074] user pgtable: 4k pages, 48-bit VAs, pgdp=000000088a55a000
[   13.599693] [00000000000001a2] pgd=0000000000000000, p4d=0000000000000000
[   13.606666] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[   13.613104] Modules linked in:

<snip>

[   13.632753] Workqueue: events_unbound deferred_probe_work_func
[   13.638776] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[   13.645926] pc : dev_pm_domain_detach+0x8/0x48
[   13.650521] lr : camss_probe+0x374/0x9c0
[   13.654577] sp : ffff800086ec3ab0
[   13.657985] x29: ffff800086ec3ab0 x28: ffff8000855079c0 x27: ffff800085507000
[   13.665329] x26: ffff000800c4b040 x25: 0000000000000000 x24: ffff00080aa72c20
[   13.672659] x23: ffff800083060588 x22: ffff000801397010 x21: ffff800083060588
[   13.679989] x20: 00000000ffffff92 x19: ffff00080aa72880 x18: ffffffffffffffff
[   13.687318] x17: 6e6570656420676e x16: 69726f6e6769202c x15: 0000000000000000
[   13.694648] x14: 000000000000003e x13: 0000000000000000 x12: 0000000000000000
[   13.701988] x11: ffff00080b350460 x10: ffff00080b350248 x9 : ffff800081ddcde4
[   13.709318] x8 : ffff00080b350270 x7 : 0000000000000001 x6 : 8000003ff0000000
[   13.716658] x5 : ffff00080149a300 x4 : ffff000a72592b70 x3 : 0000000000076404
[   13.723998] x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffffffffffffff92
[   13.731338] Call trace:
[   13.733865]  dev_pm_domain_detach+0x8/0x48
[   13.738081]  platform_probe+0x70/0xf0
[   13.741864]  really_probe+0xc4/0x2a8
[   13.745556]  __driver_probe_device+0x80/0x140
[   13.750045]  driver_probe_device+0x48/0x170
[   13.754355]  __device_attach_driver+0xc0/0x148
[   13.758937]  bus_for_each_drv+0x88/0xf0
[   13.762894]  __device_attach+0xb0/0x1c0
[   13.766852]  device_initial_probe+0x1c/0x30
[   13.771165]  bus_probe_device+0xb4/0xc0
[   13.775124]  deferred_probe_work_func+0x90/0xd0
[   13.779787]  process_one_work+0x164/0x3e0
[   13.783920]  worker_thread+0x310/0x420
[   13.787777]  kthread+0x120/0x130
[   13.791123]  ret_from_fork+0x10/0x20
[   13.794821] Code: 828a2cb8 ffff8000 aa1e03e9 d503201f (f9410802)
[   13.801088] ---[ end trace 0000000000000000 ]---


--
Best wishes,
Vladimir
Vladimir Zapolskiy Aug. 6, 2024, 11:37 p.m. UTC | #2
On 8/7/24 02:30, Bryan O'Donoghue wrote:
> On 07/08/2024 00:27, Vladimir Zapolskiy wrote:
>> Hi Bryan.
>>
>> On 8/7/24 02:15, Bryan O'Donoghue wrote:
>>> On 06/08/2024 23:12, Vladimir Zapolskiy wrote:
>>>> There is a chance to meet runtime issues during configuration of CAMSS
>>>> power domains, because on the error path dev_pm_domain_detach() is
>>>> unexpectedly called with NULL or error pointer.
>>>>
>>>> Fixes: 23aa4f0cd327 ("media: qcom: camss: Move VFE power-domain
>>>> specifics into vfe.c")
>>>> Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
>>>
>>> Have you tested this with and without named power domains in your dts ?
>>> The logic here is complex to support both the legacy non-named case and
>>> the updated named required case.
>>
>> The problem and the fix are pretty straightforward, if you notice any
>> issues
>> with it, please let me know.
>>
>> As it's said in the commit description the problem is unrelated to
>> named/not named
>> power domains, I tested the fix only on a platform without
>> "power-domain-names"
>> property in camss device tree node.
>>
>>> Could you also provide a backtrace of a failing camss_configure_pd() for
>>> the commit log.
>>
>> Sure, I believe anyone can get a backtrace simply by disabling camcc at
>> build time,
>> so that camss power domain supplies disappear:
> 
> Ah OK, that's how, proof positive if its not tested, its not working,
> I've extensively tested both named and non-named pds but, yep never with
> camcc switched off.
> 
>>
>> [   13.541205] Unable to handle kernel NULL pointer dereference at
>> virtual address 00000000000001a2
>> [   13.550224] Mem abort info:
>> [   13.553110]   ESR = 0x0000000096000004
>> [   13.556975]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [   13.562438]   SET = 0, FnV = 0
>> [   13.565580]   EA = 0, S1PTW = 0
>> [   13.568813]   FSC = 0x04: level 0 translation fault
>> [   13.573824] Data abort info:
>> [   13.576787]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> [   13.582424]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [   13.587614]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [   13.593074] user pgtable: 4k pages, 48-bit VAs, pgdp=000000088a55a000
>> [   13.599693] [00000000000001a2] pgd=0000000000000000,
>> p4d=0000000000000000
>> [   13.606666] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> [   13.613104] Modules linked in:
>>
>> <snip>
>>
>> [   13.632753] Workqueue: events_unbound deferred_probe_work_func
>> [   13.638776] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS
>> BTYPE=--)
>> [   13.645926] pc : dev_pm_domain_detach+0x8/0x48
>> [   13.650521] lr : camss_probe+0x374/0x9c0
>> [   13.654577] sp : ffff800086ec3ab0
>> [   13.657985] x29: ffff800086ec3ab0 x28: ffff8000855079c0 x27:
>> ffff800085507000
>> [   13.665329] x26: ffff000800c4b040 x25: 0000000000000000 x24:
>> ffff00080aa72c20
>> [   13.672659] x23: ffff800083060588 x22: ffff000801397010 x21:
>> ffff800083060588
>> [   13.679989] x20: 00000000ffffff92 x19: ffff00080aa72880 x18:
>> ffffffffffffffff
>> [   13.687318] x17: 6e6570656420676e x16: 69726f6e6769202c x15:
>> 0000000000000000
>> [   13.694648] x14: 000000000000003e x13: 0000000000000000 x12:
>> 0000000000000000
>> [   13.701988] x11: ffff00080b350460 x10: ffff00080b350248 x9 :
>> ffff800081ddcde4
>> [   13.709318] x8 : ffff00080b350270 x7 : 0000000000000001 x6 :
>> 8000003ff0000000
>> [   13.716658] x5 : ffff00080149a300 x4 : ffff000a72592b70 x3 :
>> 0000000000076404
>> [   13.723998] x2 : 0000000000000000 x1 : 0000000000000001 x0 :
>> ffffffffffffff92
>> [   13.731338] Call trace:
>> [   13.733865]  dev_pm_domain_detach+0x8/0x48
>> [   13.738081]  platform_probe+0x70/0xf0
>> [   13.741864]  really_probe+0xc4/0x2a8
>> [   13.745556]  __driver_probe_device+0x80/0x140
>> [   13.750045]  driver_probe_device+0x48/0x170
>> [   13.754355]  __device_attach_driver+0xc0/0x148
>> [   13.758937]  bus_for_each_drv+0x88/0xf0
>> [   13.762894]  __device_attach+0xb0/0x1c0
>> [   13.766852]  device_initial_probe+0x1c/0x30
>> [   13.771165]  bus_probe_device+0xb4/0xc0
>> [   13.775124]  deferred_probe_work_func+0x90/0xd0
>> [   13.779787]  process_one_work+0x164/0x3e0
>> [   13.783920]  worker_thread+0x310/0x420
>> [   13.787777]  kthread+0x120/0x130
>> [   13.791123]  ret_from_fork+0x10/0x20
>> [   13.794821] Code: 828a2cb8 ffff8000 aa1e03e9 d503201f (f9410802)
>> [   13.801088] ---[ end trace 0000000000000000 ]---
> 
> I'd be obliged if you could add to your commit log and verify everything
> works for you with both named and unnamed power-domains.
> 

No objections to resend the change with an updated commit message, since
it raised a question, I can add information about a method how to reproduce
the bug.

However I would like to know your opinion about the change itself, are there
any noticeable issues? Thank you in advance!

--
Best wishes,
Vladimir
Bryan O'Donoghue Aug. 7, 2024, 9:55 a.m. UTC | #3
On 07/08/2024 09:52, Vladimir Zapolskiy wrote:
> On 8/7/24 11:39, Bryan O'Donoghue wrote:
>> On 07/08/2024 00:37, Vladimir Zapolskiy wrote:
>>> On 8/7/24 02:30, Bryan O'Donoghue wrote:
>>>> On 07/08/2024 00:27, Vladimir Zapolskiy wrote:
>>>>> Hi Bryan.
>>>>>
>>>>> On 8/7/24 02:15, Bryan O'Donoghue wrote:
>>>>>> On 06/08/2024 23:12, Vladimir Zapolskiy wrote:
>>>>>>> There is a chance to meet runtime issues during configuration of 
>>>>>>> CAMSS
>>>>>>> power domains, because on the error path dev_pm_domain_detach() is
>>>>>>> unexpectedly called with NULL or error pointer.
>>>>>>>
>>>>>>> Fixes: 23aa4f0cd327 ("media: qcom: camss: Move VFE power-domain
>>>>>>> specifics into vfe.c")
>>>>>>> Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
>>>>>>
>>>>>> Have you tested this with and without named power domains in your 
>>>>>> dts ?
>>>>>> The logic here is complex to support both the legacy non-named 
>>>>>> case and
>>>>>> the updated named required case.
>>>>>
>>>>> The problem and the fix are pretty straightforward, if you notice any
>>>>> issues
>>>>> with it, please let me know.
>>>>>
>>>>> As it's said in the commit description the problem is unrelated to
>>>>> named/not named
>>>>> power domains, I tested the fix only on a platform without
>>>>> "power-domain-names"
>>>>> property in camss device tree node.
>>>>>
>>>>>> Could you also provide a backtrace of a failing camss_configure_pd()
>>>>>> for
>>>>>> the commit log.
>>>>>
>>>>> Sure, I believe anyone can get a backtrace simply by disabling 
>>>>> camcc at
>>>>> build time,
>>>>> so that camss power domain supplies disappear:
>>>>
>>>> Ah OK, that's how, proof positive if its not tested, its not working,
>>>> I've extensively tested both named and non-named pds but, yep never 
>>>> with
>>>> camcc switched off.
>>>>
>>>>>
>>>>> [   13.541205] Unable to handle kernel NULL pointer dereference at
>>>>> virtual address 00000000000001a2
>>>>> [   13.550224] Mem abort info:
>>>>> [   13.553110]   ESR = 0x0000000096000004
>>>>> [   13.556975]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>> [   13.562438]   SET = 0, FnV = 0
>>>>> [   13.565580]   EA = 0, S1PTW = 0
>>>>> [   13.568813]   FSC = 0x04: level 0 translation fault
>>>>> [   13.573824] Data abort info:
>>>>> [   13.576787]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>> [   13.582424]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>> [   13.587614]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>> [   13.593074] user pgtable: 4k pages, 48-bit VAs, 
>>>>> pgdp=000000088a55a000
>>>>> [   13.599693] [00000000000001a2] pgd=0000000000000000,
>>>>> p4d=0000000000000000
>>>>> [   13.606666] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>> [   13.613104] Modules linked in:
>>>>>
>>>>> <snip>
>>>>>
>>>>> [   13.632753] Workqueue: events_unbound deferred_probe_work_func
>>>>> [   13.638776] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS
>>>>> BTYPE=--)
>>>>> [   13.645926] pc : dev_pm_domain_detach+0x8/0x48
>>>>> [   13.650521] lr : camss_probe+0x374/0x9c0
>>>>> [   13.654577] sp : ffff800086ec3ab0
>>>>> [   13.657985] x29: ffff800086ec3ab0 x28: ffff8000855079c0 x27:
>>>>> ffff800085507000
>>>>> [   13.665329] x26: ffff000800c4b040 x25: 0000000000000000 x24:
>>>>> ffff00080aa72c20
>>>>> [   13.672659] x23: ffff800083060588 x22: ffff000801397010 x21:
>>>>> ffff800083060588
>>>>> [   13.679989] x20: 00000000ffffff92 x19: ffff00080aa72880 x18:
>>>>> ffffffffffffffff
>>>>> [   13.687318] x17: 6e6570656420676e x16: 69726f6e6769202c x15:
>>>>> 0000000000000000
>>>>> [   13.694648] x14: 000000000000003e x13: 0000000000000000 x12:
>>>>> 0000000000000000
>>>>> [   13.701988] x11: ffff00080b350460 x10: ffff00080b350248 x9 :
>>>>> ffff800081ddcde4
>>>>> [   13.709318] x8 : ffff00080b350270 x7 : 0000000000000001 x6 :
>>>>> 8000003ff0000000
>>>>> [   13.716658] x5 : ffff00080149a300 x4 : ffff000a72592b70 x3 :
>>>>> 0000000000076404
>>>>> [   13.723998] x2 : 0000000000000000 x1 : 0000000000000001 x0 :
>>>>> ffffffffffffff92
>>>>> [   13.731338] Call trace:
>>>>> [   13.733865]  dev_pm_domain_detach+0x8/0x48
>>>>> [   13.738081]  platform_probe+0x70/0xf0
>>>>> [   13.741864]  really_probe+0xc4/0x2a8
>>>>> [   13.745556]  __driver_probe_device+0x80/0x140
>>>>> [   13.750045]  driver_probe_device+0x48/0x170
>>>>> [   13.754355]  __device_attach_driver+0xc0/0x148
>>>>> [   13.758937]  bus_for_each_drv+0x88/0xf0
>>>>> [   13.762894]  __device_attach+0xb0/0x1c0
>>>>> [   13.766852]  device_initial_probe+0x1c/0x30
>>>>> [   13.771165]  bus_probe_device+0xb4/0xc0
>>>>> [   13.775124]  deferred_probe_work_func+0x90/0xd0
>>>>> [   13.779787]  process_one_work+0x164/0x3e0
>>>>> [   13.783920]  worker_thread+0x310/0x420
>>>>> [   13.787777]  kthread+0x120/0x130
>>>>> [   13.791123]  ret_from_fork+0x10/0x20
>>>>> [   13.794821] Code: 828a2cb8 ffff8000 aa1e03e9 d503201f (f9410802)
>>>>> [   13.801088] ---[ end trace 0000000000000000 ]---
>>>>
>>>> I'd be obliged if you could add to your commit log and verify 
>>>> everything
>>>> works for you with both named and unnamed power-domains.
>>>>
>>>
>>> No objections to resend the change with an updated commit message, since
>>> it raised a question, I can add information about a method how to 
>>> reproduce
>>> the bug.
>>>
>>> However I would like to know your opinion about the change itself, are
>>> there
>>> any noticeable issues? Thank you in advance!
>>>
>>> -- 
>>> Best wishes,
>>> Vladimir
>>
>> Why not just
>>
>> diff --git a/drivers/media/platform/qcom/camss/camss.c
>> b/drivers/media/platform/qcom/camss/camss.c
>> index 51b1d3550421a..9990af675190c 100644
>> --- a/drivers/media/platform/qcom/camss/camss.c
>> +++ b/drivers/media/platform/qcom/camss/camss.c
>> @@ -2162,7 +2162,8 @@ static int camss_configure_pd(struct camss *camss)
>>           return 0;
>>
>>    fail_pm:
>> -       dev_pm_domain_detach(camss->genpd, true);
>> +       if (camss->genpd)
>> +               dev_pm_domain_detach(camss->genpd, true);
>>
>>
>> ?
>>
> 
> Because your change is invalid strictly speaking, again you've missed
> an error pointer case, but that's secondary, since it could be improved.
> 
> However your change brings more of unnecessary complexity, because it
> increases both cyclomatic complexity and increases LoC, when my change
> reduces the values in both these metrics.
> 
> My change makes the code way simpler, hopefully I managed to explain it.

Cyclomatic complexity you know, I checked

pmccabe drivers/media/platform/qcom/camss/camss.c

base:
12	12	37	2088	81	drivers/media/platform/qcom/camss/camss.c(2088): 
camss_configure_pd

vlad:
12	12	35	2088	78	drivers/media/platform/qcom/camss/camss.c(2088): 
camss_configure_pd

bod:
13	13	38	2088	82	drivers/media/platform/qcom/camss/camss.c(2088): 
camss_configure_pd

That's convincing enough for me, please add the kernel backtrace on your V2.

---
bod
diff mbox series

Patch

diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c
index 51b1d3550421..aa894be1461d 100644
--- a/drivers/media/platform/qcom/camss/camss.c
+++ b/drivers/media/platform/qcom/camss/camss.c
@@ -2130,10 +2130,8 @@  static int camss_configure_pd(struct camss *camss)
 	if (camss->res->pd_name) {
 		camss->genpd = dev_pm_domain_attach_by_name(camss->dev,
 							    camss->res->pd_name);
-		if (IS_ERR(camss->genpd)) {
-			ret = PTR_ERR(camss->genpd);
-			goto fail_pm;
-		}
+		if (IS_ERR(camss->genpd))
+			return PTR_ERR(camss->genpd);
 	}
 
 	if (!camss->genpd) {
@@ -2143,14 +2141,13 @@  static int camss_configure_pd(struct camss *camss)
 		 */
 		camss->genpd = dev_pm_domain_attach_by_id(camss->dev,
 							  camss->genpd_num - 1);
+		if (IS_ERR(camss->genpd))
+			return PTR_ERR(camss->genpd);
 	}
-	if (IS_ERR_OR_NULL(camss->genpd)) {
-		if (!camss->genpd)
-			ret = -ENODEV;
-		else
-			ret = PTR_ERR(camss->genpd);
-		goto fail_pm;
-	}
+
+	if (!camss->genpd)
+		return -ENODEV;
+
 	camss->genpd_link = device_link_add(camss->dev, camss->genpd,
 					    DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME |
 					    DL_FLAG_RPM_ACTIVE);