Message ID | 20241003160454.3017229-1-Basavaraj.Natikar@amd.com |
---|---|
State | New |
Headers | show |
Series | HID: amd_sfh: Return immediately if no sensor is found | expand |
[CCing the three reporters and the regressions list] On 03.10.24 18:04, Basavaraj Natikar wrote: > There is no need for additional cleanup, as all resources are managed. > Additionally, if no sensor is found, there will be no initialization of > HID devices. Therefore, return immediately if no sensor is detected. I'm not a reviewer, so feel free to ignore the follow comment: I think the patch description should mentioned that this bug caused Memory Errors / Page Faults / btrfs going read-only / btrfs disk corruption, as that is a crucial detail for later and downstreams that need to consider when deciding about backporting. > Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 Some reported-by tags IMHO would be appropriate to give credit; all three reporters already agreed to use their email address in public. There meanwhile is also one comment in the bugzilla ticket that could be read as a tested-by tag. Maybe a Link: to https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/ might be appropriate as well. Ohh, and participation in stable is optional, but given the severeness on the problem: would you maybe be willing to add a stable tag to the commit to ensure this is backported to affected stable series quickly? Ciao, Thorsten > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> > --- > drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- > drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > index 4b59687ff5d8..3fcb971d5fda 100644 > --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) > (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { > dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", > cl_data->is_any_sensor_enabled); > - rc = -EOPNOTSUPP; > - goto cleanup; > + return -EOPNOTSUPP; > } > > for (i = 0; i < cl_data->num_hid_devices; i++) { > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > index 0c28ca349bcd..1300f122b524 100644 > --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) > > rc = amd_sfh_hid_client_init(mp2); > if (rc) { > - amd_sfh_clear_intr(mp2); > + if (rc != -EOPNOTSUPP) > + amd_sfh_clear_intr(mp2); > + > dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc); > return; > }
On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: > [CCing the three reporters and the regressions list] > > On 03.10.24 18:04, Basavaraj Natikar wrote: > > There is no need for additional cleanup, as all resources are managed. > > Additionally, if no sensor is found, there will be no initialization of > > HID devices. Therefore, return immediately if no sensor is detected. > > I'm not a reviewer, so feel free to ignore the follow comment: > > I think the patch description should mentioned that this bug caused > Memory Errors / Page Faults / btrfs going read-only / btrfs disk > corruption, as that is a crucial detail for later and downstreams that > need to consider when deciding about backporting. > > > Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 > > Some reported-by tags IMHO would be appropriate to give credit; all > three reporters already agreed to use their email address in public. > > There meanwhile is also one comment in the bugzilla ticket that could be > read as a tested-by tag. > > Maybe a Link: to > https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/ > might be appropriate as well. > > Ohh, and participation in stable is optional, but given the severeness > on the problem: would you maybe be willing to add a stable tag to the > commit to ensure this is backported to affected stable series quickly? Fully agree here. It would definitely help if the submitter of the patch keeps track of all of these instead of relying on the maintainers or Thorsten to do the tedious work. I was about to apply the patch, but I still have one remark on the fix itself. > > Ciao, Thorsten > > > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> > > --- > > drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- > > drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > > index 4b59687ff5d8..3fcb971d5fda 100644 > > --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c > > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > > @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) > > (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { > > dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", > > cl_data->is_any_sensor_enabled); > > - rc = -EOPNOTSUPP; > > - goto cleanup; > > + return -EOPNOTSUPP; so cleanup is doing: cleanup: amd_sfh_hid_client_deinit(privdata); for (i = 0; i < cl_data->num_hid_devices; i++) { devm_kfree(dev, cl_data->feature_report[i]); devm_kfree(dev, in_data->input_report[i]); devm_kfree(dev, cl_data->report_descr[i]); } return rc; Would that means that the memory corruption appears during amd_sfh_hid_client_deinit(), or... > > } > > > > for (i = 0; i < cl_data->num_hid_devices; i++) { > > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > > index 0c28ca349bcd..1300f122b524 100644 > > --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > > @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) > > > > rc = amd_sfh_hid_client_init(mp2); > > if (rc) { > > - amd_sfh_clear_intr(mp2); > > + if (rc != -EOPNOTSUPP) > > + amd_sfh_clear_intr(mp2); ... or during amd_sfh_clear_intr()? This very much looks like a band-aid (I know it is because you can not reproduce, not blaming anyone), but I'd like to know a little bit more if the bug is not appearing anywhere else in the normal processing of the driver itself. Also a comment explaining why this is the only case where amd_sfh_clear_intr() should not be called would be appreciated. So all in all, I have a feeling one of these 2 functions is not making a proper check and I would rather fix the root cause, not the symptoms. Cheers, Benjamin PS: sorry, I know this is a long standing issue, but I'd rather not paper over a bigger issue :/ > > + > > dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc); > > return; > > } >
On 10/4/2024 2:42 PM, Benjamin Tissoires wrote: > On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: >> [CCing the three reporters and the regressions list] >> >> On 03.10.24 18:04, Basavaraj Natikar wrote: >>> There is no need for additional cleanup, as all resources are managed. >>> Additionally, if no sensor is found, there will be no initialization of >>> HID devices. Therefore, return immediately if no sensor is detected. >> I'm not a reviewer, so feel free to ignore the follow comment: >> >> I think the patch description should mentioned that this bug caused >> Memory Errors / Page Faults / btrfs going read-only / btrfs disk >> corruption, as that is a crucial detail for later and downstreams that >> need to consider when deciding about backporting. >> >>> Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 >> Some reported-by tags IMHO would be appropriate to give credit; all >> three reporters already agreed to use their email address in public. >> >> There meanwhile is also one comment in the bugzilla ticket that could be >> read as a tested-by tag. >> >> Maybe a Link: to >> https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/ >> might be appropriate as well. >> >> Ohh, and participation in stable is optional, but given the severeness >> on the problem: would you maybe be willing to add a stable tag to the >> commit to ensure this is backported to affected stable series quickly? > Fully agree here. It would definitely help if the submitter of the patch > keeps track of all of these instead of relying on the maintainers or > Thorsten to do the tedious work. > > I was about to apply the patch, but I still have one remark on the fix > itself. > > >> Ciao, Thorsten >> >>> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> >>> --- >>> drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- >>> drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- >>> 2 files changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c >>> index 4b59687ff5d8..3fcb971d5fda 100644 >>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c >>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c >>> @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) >>> (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { >>> dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", >>> cl_data->is_any_sensor_enabled); >>> - rc = -EOPNOTSUPP; >>> - goto cleanup; >>> + return -EOPNOTSUPP; > so cleanup is doing: > cleanup: > amd_sfh_hid_client_deinit(privdata); > for (i = 0; i < cl_data->num_hid_devices; i++) { > devm_kfree(dev, cl_data->feature_report[i]); > devm_kfree(dev, in_data->input_report[i]); > devm_kfree(dev, cl_data->report_descr[i]); > } > return rc; > > Would that means that the memory corruption appears during > amd_sfh_hid_client_deinit(), or... > >>> } >>> >>> for (i = 0; i < cl_data->num_hid_devices; i++) { >>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> index 0c28ca349bcd..1300f122b524 100644 >>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) >>> >>> rc = amd_sfh_hid_client_init(mp2); >>> if (rc) { >>> - amd_sfh_clear_intr(mp2); >>> + if (rc != -EOPNOTSUPP) >>> + amd_sfh_clear_intr(mp2); > ... or during amd_sfh_clear_intr()? > > This very much looks like a band-aid (I know it is because you can not > reproduce, not blaming anyone), but I'd like to know a little bit more > if the bug is not appearing anywhere else in the normal processing of > the driver itself. > > Also a comment explaining why this is the only case where > amd_sfh_clear_intr() should not be called would be appreciated. > > So all in all, I have a feeling one of these 2 functions is not making a > proper check and I would rather fix the root cause, not the symptoms. Sure Benjamin, I have added the latest cleanup patch in the bug ID to see if that helps resolve the issue and to find the root cause analysis. Thanks -- Basavaraj
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c index 4b59687ff5d8..3fcb971d5fda 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", cl_data->is_any_sensor_enabled); - rc = -EOPNOTSUPP; - goto cleanup; + return -EOPNOTSUPP; } for (i = 0; i < cl_data->num_hid_devices; i++) { diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c index 0c28ca349bcd..1300f122b524 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) rc = amd_sfh_hid_client_init(mp2); if (rc) { - amd_sfh_clear_intr(mp2); + if (rc != -EOPNOTSUPP) + amd_sfh_clear_intr(mp2); + dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc); return; }
There is no need for additional cleanup, as all resources are managed. Additionally, if no sensor is found, there will be no initialization of HID devices. Therefore, return immediately if no sensor is detected. Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> --- drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- 2 files changed, 4 insertions(+), 3 deletions(-)