mbox series

[v5,00/17] Add Qualcomm Minidump kernel driver related support

Message ID 1694290578-17733-1-git-send-email-quic_mojha@quicinc.com
Headers show
Series Add Qualcomm Minidump kernel driver related support | expand

Message

Mukesh Ojha Sept. 9, 2023, 8:16 p.m. UTC
Hi All,

This is to continuation from the conversation happened at v4

https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/

https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/

We have put abstract on LPC on this topic as well as initiated a mail thread
with other SoC vendors but did not get much traction on it.

https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/

We explored most of possiblity present in kernel to address this issue[1] but
solution like kdump/fadump does not seems safe/secure/performant from our
perspective.

Hence, with this series we tried to make the minidump kernel driver, simple
and tied with pstore frontends, so that it collects the present available
frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working
towards enhancing generic pstore to capture more debug data which will be
helpful for first hand of debugging that can benefit both other pstore users
as well as us as minidump users.

One of the proposal made here,
https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/

Looking forward for your comments.

Thanks,
Mukesh

[1]
Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs. Hence, the
ability to collect accurate data is only a best-effort. The data collected
could be invalid or corrupted, data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

The core of SMEM based minidump feature is part of Qualcomm's boot
firmware code. It initializes shared memory (SMEM), which is a part of
DDR and allocates a small section of SMEM to minidump table i.e also
called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
has their own table of segments to be included in the minidump and all
get their reference from G-ToC. Each segment/region has some details
like name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.

Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
based minidump feature for remoteproc instances like ADSP, MODEM, ...
where predefined selective segments of subsystem region can be dumped
as part of coredump collection which generates smaller size artifacts
compared to complete coredump of subsystem on crash.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142

In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.

Changes in v5:
 - On suggestion from Pavan.k, to have single function call for minidump collection
   from remoteproc driver, separated the logic to have separate minidump file called
   qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to 
   qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
   during region unregister in this series, will pursue it in next series.

 - To simplify the minidump driver, removed the complication for frontend and different
   backend from Greg suggestion, will pursue this once main driver gets mainlined.

 - Move the dynamic ramoops region allocation from Device tree approach to command line
   approch with the introduction command line parsing and memblock reservation during
   early boot up; Not added documentation about it yet, will add if it gets positive
   response.

 - Exporting linux banner from kernel to make minidump build also as module, however,
   minidump is a debug module and should be kernel built to get most debug information
   from kernel.

 - Tried to address comments given on dload patch series. 

Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
 - Redesigned the driver and divided the driver into front end and backend (smem) so
   that any new backend can be attached easily to avoid code duplication.
 - Patch reordering as per the driver and subsystem to easier review of the code.
 - Removed minidump specific code from remoteproc to minidump smem based driver.
 - Enabled the all the driver as modules.
 - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
 - Address comments made qcom_pstore_minidump driver and given its Device tree
   same set of properties as ramoops. [Luca/Kees]
 - Added patch for MAINTAINER file.
 - Include defconfig change as one patch as per [Krzysztof] suggestion.
 - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
 - Addressed comments made on dload mode patch v6 version
   https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/

Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
 - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
    - Added platform device support
    - Unregister region support.
 - Added update region for clients.
 - Added pending region support.
 - Modified the documentation guide accordingly.
 - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
   device and also registers ramoops region with minidump.
 - Added download mode patch series with this minidump series.
    https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/

Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
 - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
 - Addressed comments made by [srinivas.kandagatla]
 - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
   region in minidump.
 - Fixed issue reported by kernel test robot.

Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/

Testing of the patches has been done on sm8450 target after enabling config like
CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.

 echo mini > /sys/module/qcom_scm/parameters/download_mode

Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
and put the device in download mode) on command prompt.

Default storage type is set to via USB, so minidump would be downloaded with the
help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
backed minidump boot firmware support.

This will make the device go to download mode and collect the minidump on to the
attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
package manager kit).

After that we will see a bunch of predefined registered region as binary blobs files
starts with md_* downloaded on the x86 machine on given location in PCAT tool from
the target device, more about this can be found in qualcomm minidump guide patch.

Mukesh Ojha (17):
  docs: qcom: Add qualcomm minidump guide
  soc: qcom: Add qcom_rproc_minidump module
  remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump()
  remoteproc: qcom: Remove minidump related data from qcom_common.c
  init: export linux_banner data variable
  soc: qcom: Add Qualcomm APSS minidump kernel driver
  soc: qcom: minidump: Add pending region registration
  arm64: mm: Add dynamic ramoops region support through command line
  pstore/ram: Use dynamic ramoops reserve resource
  pstore: Add pstore_region_defined() helper and export it
  qcom_minidump: Register ramoops region with minidump
  MAINTAINERS: Add entry for minidump related files
  firmware: qcom_scm: provide a read-modify-write function
  pinctrl: qcom: Use qcom_scm_io_update_field()
  firmware: scm: Modify only the download bits in TCSR register
  firmware: qcom_scm: Refactor code to support multiple download mode
  firmware: qcom_scm: Add multiple download mode support

 Documentation/admin-guide/index.rst         |   1 +
 Documentation/admin-guide/qcom_minidump.rst | 272 +++++++++++
 MAINTAINERS                                 |  10 +
 arch/arm64/mm/init.c                        |  94 ++++
 drivers/firmware/Kconfig                    |  11 -
 drivers/firmware/qcom_scm.c                 |  90 +++-
 drivers/pinctrl/qcom/pinctrl-msm.c          |  10 +-
 drivers/remoteproc/Kconfig                  |   1 +
 drivers/remoteproc/qcom_common.c            | 150 ------
 drivers/remoteproc/qcom_q6v5_pas.c          |   3 +-
 drivers/soc/qcom/Kconfig                    |  24 +
 drivers/soc/qcom/Makefile                   |   3 +
 drivers/soc/qcom/qcom_minidump.c            | 727 ++++++++++++++++++++++++++++
 drivers/soc/qcom/qcom_minidump_internal.h   |  74 +++
 drivers/soc/qcom/qcom_ramoops_minidump.c    |  88 ++++
 drivers/soc/qcom/qcom_ramoops_minidump.h    |  10 +
 drivers/soc/qcom/qcom_rproc_minidump.c      | 111 +++++
 drivers/soc/qcom/smem.c                     |  18 +
 fs/pstore/platform.c                        |  15 +
 fs/pstore/ram.c                             |  52 +-
 include/linux/firmware/qcom/qcom_scm.h      |   2 +
 include/linux/init.h                        |   3 +
 include/linux/pstore.h                      |   6 +
 include/linux/pstore_ram.h                  |   2 +
 include/linux/soc/qcom/smem.h               |   2 +
 include/soc/qcom/qcom_minidump.h            |  56 +++
 init/version-timestamp.c                    |   3 +
 27 files changed, 1659 insertions(+), 179 deletions(-)
 create mode 100644 Documentation/admin-guide/qcom_minidump.rst
 create mode 100644 drivers/soc/qcom/qcom_minidump.c
 create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
 create mode 100644 drivers/soc/qcom/qcom_ramoops_minidump.c
 create mode 100644 drivers/soc/qcom/qcom_ramoops_minidump.h
 create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c
 create mode 100644 include/soc/qcom/qcom_minidump.h

Comments

Kathiravan Thirumoorthy Sept. 11, 2023, 7:59 a.m. UTC | #1
On 9/10/2023 1:46 AM, Mukesh Ojha wrote:
> Hi All,
>
> This is to continuation from the conversation happened at v4
>
> https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/
>
> https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/
>
> We have put abstract on LPC on this topic as well as initiated a mail thread
> with other SoC vendors but did not get much traction on it.
>
> https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/
>
> We explored most of possiblity present in kernel to address this issue[1] but
> solution like kdump/fadump does not seems safe/secure/performant from our
> perspective.
>
> Hence, with this series we tried to make the minidump kernel driver, simple
> and tied with pstore frontends, so that it collects the present available
> frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working
> towards enhancing generic pstore to capture more debug data which will be
> helpful for first hand of debugging that can benefit both other pstore users
> as well as us as minidump users.
>
> One of the proposal made here,
> https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/
>
> Looking forward for your comments.
>
> Thanks,
> Mukesh
>
> [1]
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.
>
> Qualcomm devices in engineering mode provides a mechanism for generating
> full system ramdumps for post mortem debugging. But in some cases it's
> however not feasible to capture the entire content of RAM. The minidump
> mechanism provides the means for selecting which snippets should be
> included in the ramdump.
>
> The core of SMEM based minidump feature is part of Qualcomm's boot
> firmware code. It initializes shared memory (SMEM), which is a part of
> DDR and allocates a small section of SMEM to minidump table i.e also
> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
> has their own table of segments to be included in the minidump and all
> get their reference from G-ToC. Each segment/region has some details
> like name, physical address and it's size etc. and it could be anywhere
> scattered in the DDR.
>
> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
> based minidump feature for remoteproc instances like ADSP, MODEM, ...
> where predefined selective segments of subsystem region can be dumped
> as part of coredump collection which generates smaller size artifacts
> compared to complete coredump of subsystem on crash.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
>
> In addition to managing and querying the APSS minidump description,
> the Linux driver maintains a ELF header in a segment. This segment
> gets updated with section/program header whenever a new entry gets
> registered.
>
> Changes in v5:
>   - On suggestion from Pavan.k, to have single function call for minidump collection
>     from remoteproc driver, separated the logic to have separate minidump file called
>     qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to
>     qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
>     during region unregister in this series, will pursue it in next series.
>
>   - To simplify the minidump driver, removed the complication for frontend and different
>     backend from Greg suggestion, will pursue this once main driver gets mainlined.
>
>   - Move the dynamic ramoops region allocation from Device tree approach to command line
>     approch with the introduction command line parsing and memblock reservation during
>     early boot up; Not added documentation about it yet, will add if it gets positive
>     response.
>
>   - Exporting linux banner from kernel to make minidump build also as module, however,
>     minidump is a debug module and should be kernel built to get most debug information
>     from kernel.
>
>   - Tried to address comments given on dload patch series.
>
> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
>   - Redesigned the driver and divided the driver into front end and backend (smem) so
>     that any new backend can be attached easily to avoid code duplication.
>   - Patch reordering as per the driver and subsystem to easier review of the code.
>   - Removed minidump specific code from remoteproc to minidump smem based driver.
>   - Enabled the all the driver as modules.
>   - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
>   - Address comments made qcom_pstore_minidump driver and given its Device tree
>     same set of properties as ramoops. [Luca/Kees]
>   - Added patch for MAINTAINER file.
>   - Include defconfig change as one patch as per [Krzysztof] suggestion.
>   - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
>   - Addressed comments made on dload mode patch v6 version
>     https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
>
> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
>   - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
>      - Added platform device support
>      - Unregister region support.
>   - Added update region for clients.
>   - Added pending region support.
>   - Modified the documentation guide accordingly.
>   - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
>     device and also registers ramoops region with minidump.
>   - Added download mode patch series with this minidump series.
>      https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
>
> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
>   - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
>   - Addressed comments made by [srinivas.kandagatla]
>   - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
>     region in minidump.
>   - Fixed issue reported by kernel test robot.
>
> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
>
> Testing of the patches has been done on sm8450 target after enabling config like
> CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.
>
>   echo mini > /sys/module/qcom_scm/parameters/download_mode
>
> Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
> and put the device in download mode) on command prompt.
>
> Default storage type is set to via USB, so minidump would be downloaded with the
> help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
> backed minidump boot firmware support.
>
> This will make the device go to download mode and collect the minidump on to the
> attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
> package manager kit).
>
> After that we will see a bunch of predefined registered region as binary blobs files
> starts with md_* downloaded on the x86 machine on given location in PCAT tool from
> the target device, more about this can be found in qualcomm minidump guide patch.
>
> Mukesh Ojha (17):
>    docs: qcom: Add qualcomm minidump guide
>    soc: qcom: Add qcom_rproc_minidump module
>    remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump()
>    remoteproc: qcom: Remove minidump related data from qcom_common.c
>    init: export linux_banner data variable
>    soc: qcom: Add Qualcomm APSS minidump kernel driver
>    soc: qcom: minidump: Add pending region registration
>    arm64: mm: Add dynamic ramoops region support through command line
>    pstore/ram: Use dynamic ramoops reserve resource
>    pstore: Add pstore_region_defined() helper and export it
>    qcom_minidump: Register ramoops region with minidump
>    MAINTAINERS: Add entry for minidump related files


Based on last discussion[1], I posted the V5 of the below three 
patches[2] separately. I don't mind clubbing these again back to 
Minidump series as long as three patches can be picked up separately.


>    firmware: qcom_scm: provide a read-modify-write function
>    pinctrl: qcom: Use qcom_scm_io_update_field()
>    firmware: scm: Modify only the download bits in TCSR register

[1] 
https://lore.kernel.org/linux-arm-msm/d77f5601-2b08-a7c7-1400-7ab68b8add3a@quicinc.com/

[2] 
https://lore.kernel.org/linux-arm-msm/20230720070408.1093698-1-quic_kathirav@quicinc.com/


>    firmware: qcom_scm: Refactor code to support multiple download mode
>    firmware: qcom_scm: Add multiple download mode support
>
>   Documentation/admin-guide/index.rst         |   1 +
>   Documentation/admin-guide/qcom_minidump.rst | 272 +++++++++++
>   MAINTAINERS                                 |  10 +
>   arch/arm64/mm/init.c                        |  94 ++++
>   drivers/firmware/Kconfig                    |  11 -
>   drivers/firmware/qcom_scm.c                 |  90 +++-
>   drivers/pinctrl/qcom/pinctrl-msm.c          |  10 +-
>   drivers/remoteproc/Kconfig                  |   1 +
>   drivers/remoteproc/qcom_common.c            | 150 ------
>   drivers/remoteproc/qcom_q6v5_pas.c          |   3 +-
>   drivers/soc/qcom/Kconfig                    |  24 +
>   drivers/soc/qcom/Makefile                   |   3 +
>   drivers/soc/qcom/qcom_minidump.c            | 727 ++++++++++++++++++++++++++++
>   drivers/soc/qcom/qcom_minidump_internal.h   |  74 +++
>   drivers/soc/qcom/qcom_ramoops_minidump.c    |  88 ++++
>   drivers/soc/qcom/qcom_ramoops_minidump.h    |  10 +
>   drivers/soc/qcom/qcom_rproc_minidump.c      | 111 +++++
>   drivers/soc/qcom/smem.c                     |  18 +
>   fs/pstore/platform.c                        |  15 +
>   fs/pstore/ram.c                             |  52 +-
>   include/linux/firmware/qcom/qcom_scm.h      |   2 +
>   include/linux/init.h                        |   3 +
>   include/linux/pstore.h                      |   6 +
>   include/linux/pstore_ram.h                  |   2 +
>   include/linux/soc/qcom/smem.h               |   2 +
>   include/soc/qcom/qcom_minidump.h            |  56 +++
>   init/version-timestamp.c                    |   3 +
>   27 files changed, 1659 insertions(+), 179 deletions(-)
>   create mode 100644 Documentation/admin-guide/qcom_minidump.rst
>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>   create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
>   create mode 100644 drivers/soc/qcom/qcom_ramoops_minidump.c
>   create mode 100644 drivers/soc/qcom/qcom_ramoops_minidump.h
>   create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c
>   create mode 100644 include/soc/qcom/qcom_minidump.h
>
Kathiravan Thirumoorthy Sept. 11, 2023, 8:11 a.m. UTC | #2
On 9/10/2023 1:46 AM, Mukesh Ojha wrote:
> Crashdump collection is based on the DLOAD bit of TCSR register.
> To retain other bits, we read the register and modify only the
> DLOAD bit as the other bits have their own significance.
>
> Co-developed-by: Poovendhan Selvaraj <quic_poovendh@quicinc.com>
> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---


This change doesn't cleanly apply on next-20230911. Please rebase it.

Validated this change on IPQ9574 and IPQ5332 and system is entering into 
the download mode.

Tested-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com> # IPQ9574 
and IPQ5332


>   drivers/firmware/qcom_scm.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
> index 5ea8fc4fd4e8..eda92f713019 100644
> --- a/drivers/firmware/qcom_scm.c
> +++ b/drivers/firmware/qcom_scm.c
> @@ -5,6 +5,8 @@
>   #include <linux/platform_device.h>
>   #include <linux/init.h>
>   #include <linux/interrupt.h>
> +#include <linux/bitfield.h>
> +#include <linux/bits.h>
>   #include <linux/completion.h>
>   #include <linux/cpumask.h>
>   #include <linux/export.h>
> @@ -30,6 +32,10 @@ module_param(download_mode, bool, 0);
>   #define SCM_HAS_IFACE_CLK	BIT(1)
>   #define SCM_HAS_BUS_CLK		BIT(2)
>   
> +#define QCOM_DLOAD_MASK		GENMASK(5, 4)
> +#define QCOM_DLOAD_FULLDUMP	0x1
> +#define QCOM_DLOAD_NODUMP	0x0
> +
>   struct qcom_scm {
>   	struct device *dev;
>   	struct clk *core_clk;
> @@ -444,6 +450,7 @@ static int __qcom_scm_set_dload_mode(struct device *dev, bool enable)
>   
>   static void qcom_scm_set_download_mode(bool enable)
>   {
> +	u32 val = enable ? QCOM_DLOAD_FULLDUMP : QCOM_DLOAD_NODUMP;
>   	bool avail;
>   	int ret = 0;
>   
> @@ -453,8 +460,9 @@ static void qcom_scm_set_download_mode(bool enable)
>   	if (avail) {
>   		ret = __qcom_scm_set_dload_mode(__scm->dev, enable);
>   	} else if (__scm->dload_mode_addr) {
> -		ret = qcom_scm_io_writel(__scm->dload_mode_addr,
> -				enable ? QCOM_SCM_BOOT_SET_DLOAD_MODE : 0);
> +		ret = qcom_scm_io_update_field(__scm->dload_mode_addr,
> +					       QCOM_DLOAD_MASK,
> +					       FIELD_PREP(QCOM_DLOAD_MASK, val));
>   	} else {
>   		dev_err(__scm->dev,
>   			"No available mechanism for setting download mode\n");
Kathiravan Thirumoorthy Sept. 11, 2023, 8:16 a.m. UTC | #3
On 9/10/2023 1:46 AM, Mukesh Ojha wrote:
> Crashdump collection is based on the DLOAD bit of TCSR register.
> To retain other bits, we read the register and modify only the
> DLOAD bit as the other bits have their own significance.
>
> Co-developed-by: Poovendhan Selvaraj <quic_poovendh@quicinc.com>


Poovendhan's signed-off is also required. Please refer [1]

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/process/submitting-patches.rst#n489


> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---
>   drivers/firmware/qcom_scm.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
> index 5ea8fc4fd4e8..eda92f713019 100644
> --- a/drivers/firmware/qcom_scm.c
> +++ b/drivers/firmware/qcom_scm.c
> @@ -5,6 +5,8 @@
>   #include <linux/platform_device.h>
>   #include <linux/init.h>
>   #include <linux/interrupt.h>
> +#include <linux/bitfield.h>
> +#include <linux/bits.h>
>   #include <linux/completion.h>
>   #include <linux/cpumask.h>
>   #include <linux/export.h>
> @@ -30,6 +32,10 @@ module_param(download_mode, bool, 0);
>   #define SCM_HAS_IFACE_CLK	BIT(1)
>   #define SCM_HAS_BUS_CLK		BIT(2)
>   
> +#define QCOM_DLOAD_MASK		GENMASK(5, 4)
> +#define QCOM_DLOAD_FULLDUMP	0x1
> +#define QCOM_DLOAD_NODUMP	0x0
> +
>   struct qcom_scm {
>   	struct device *dev;
>   	struct clk *core_clk;
> @@ -444,6 +450,7 @@ static int __qcom_scm_set_dload_mode(struct device *dev, bool enable)
>   
>   static void qcom_scm_set_download_mode(bool enable)
>   {
> +	u32 val = enable ? QCOM_DLOAD_FULLDUMP : QCOM_DLOAD_NODUMP;
>   	bool avail;
>   	int ret = 0;
>   
> @@ -453,8 +460,9 @@ static void qcom_scm_set_download_mode(bool enable)
>   	if (avail) {
>   		ret = __qcom_scm_set_dload_mode(__scm->dev, enable);
>   	} else if (__scm->dload_mode_addr) {
> -		ret = qcom_scm_io_writel(__scm->dload_mode_addr,
> -				enable ? QCOM_SCM_BOOT_SET_DLOAD_MODE : 0);
> +		ret = qcom_scm_io_update_field(__scm->dload_mode_addr,
> +					       QCOM_DLOAD_MASK,
> +					       FIELD_PREP(QCOM_DLOAD_MASK, val));
>   	} else {
>   		dev_err(__scm->dev,
>   			"No available mechanism for setting download mode\n");
Bagas Sanjaya Sept. 11, 2023, 8:52 a.m. UTC | #4
On Sun, Sep 10, 2023 at 01:46:01AM +0530, Mukesh Ojha wrote:
> Hi All,
> 
> This is to continuation from the conversation happened at v4
> 
> https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/
> 
> https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/
> 
> We have put abstract on LPC on this topic as well as initiated a mail thread
> with other SoC vendors but did not get much traction on it.
> 
> https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/
> 
> We explored most of possiblity present in kernel to address this issue[1] but
> solution like kdump/fadump does not seems safe/secure/performant from our
> perspective.
> 
> Hence, with this series we tried to make the minidump kernel driver, simple
> and tied with pstore frontends, so that it collects the present available
> frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working
> towards enhancing generic pstore to capture more debug data which will be
> helpful for first hand of debugging that can benefit both other pstore users
> as well as us as minidump users.
> 
> One of the proposal made here,
> https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/
> 
> Looking forward for your comments.
> 
> Thanks,
> Mukesh
> 
> [1]
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.
> 
> Qualcomm devices in engineering mode provides a mechanism for generating
> full system ramdumps for post mortem debugging. But in some cases it's
> however not feasible to capture the entire content of RAM. The minidump
> mechanism provides the means for selecting which snippets should be
> included in the ramdump.
> 
> The core of SMEM based minidump feature is part of Qualcomm's boot
> firmware code. It initializes shared memory (SMEM), which is a part of
> DDR and allocates a small section of SMEM to minidump table i.e also
> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
> has their own table of segments to be included in the minidump and all
> get their reference from G-ToC. Each segment/region has some details
> like name, physical address and it's size etc. and it could be anywhere
> scattered in the DDR.
> 
> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
> based minidump feature for remoteproc instances like ADSP, MODEM, ...
> where predefined selective segments of subsystem region can be dumped
> as part of coredump collection which generates smaller size artifacts
> compared to complete coredump of subsystem on crash.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
> 
> In addition to managing and querying the APSS minidump description,
> the Linux driver maintains a ELF header in a segment. This segment
> gets updated with section/program header whenever a new entry gets
> registered.
> 
> Changes in v5:
>  - On suggestion from Pavan.k, to have single function call for minidump collection
>    from remoteproc driver, separated the logic to have separate minidump file called
>    qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to 
>    qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
>    during region unregister in this series, will pursue it in next series.
> 
>  - To simplify the minidump driver, removed the complication for frontend and different
>    backend from Greg suggestion, will pursue this once main driver gets mainlined.
> 
>  - Move the dynamic ramoops region allocation from Device tree approach to command line
>    approch with the introduction command line parsing and memblock reservation during
>    early boot up; Not added documentation about it yet, will add if it gets positive
>    response.
> 
>  - Exporting linux banner from kernel to make minidump build also as module, however,
>    minidump is a debug module and should be kernel built to get most debug information
>    from kernel.
> 
>  - Tried to address comments given on dload patch series. 
> 
> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
>  - Redesigned the driver and divided the driver into front end and backend (smem) so
>    that any new backend can be attached easily to avoid code duplication.
>  - Patch reordering as per the driver and subsystem to easier review of the code.
>  - Removed minidump specific code from remoteproc to minidump smem based driver.
>  - Enabled the all the driver as modules.
>  - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
>  - Address comments made qcom_pstore_minidump driver and given its Device tree
>    same set of properties as ramoops. [Luca/Kees]
>  - Added patch for MAINTAINER file.
>  - Include defconfig change as one patch as per [Krzysztof] suggestion.
>  - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
>  - Addressed comments made on dload mode patch v6 version
>    https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
> 
> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
>  - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
>     - Added platform device support
>     - Unregister region support.
>  - Added update region for clients.
>  - Added pending region support.
>  - Modified the documentation guide accordingly.
>  - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
>    device and also registers ramoops region with minidump.
>  - Added download mode patch series with this minidump series.
>     https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
> 
> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
>  - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
>  - Addressed comments made by [srinivas.kandagatla]
>  - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
>    region in minidump.
>  - Fixed issue reported by kernel test robot.
> 
> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
> 
> Testing of the patches has been done on sm8450 target after enabling config like
> CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.
> 
>  echo mini > /sys/module/qcom_scm/parameters/download_mode
> 
> Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
> and put the device in download mode) on command prompt.
> 
> Default storage type is set to via USB, so minidump would be downloaded with the
> help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
> backed minidump boot firmware support.
> 
> This will make the device go to download mode and collect the minidump on to the
> attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
> package manager kit).
> 
> After that we will see a bunch of predefined registered region as binary blobs files
> starts with md_* downloaded on the x86 machine on given location in PCAT tool from
> the target device, more about this can be found in qualcomm minidump guide patch.
> 

I tried to apply this series on top of 535a265d7f0dd50 (as suggested by
`b4 am -l -g`), but it conflicts on patch [04/17]. Please specify the
exact base commit or another series for which this series is based on.

Thanks.
Mukesh Ojha Sept. 11, 2023, 10:39 a.m. UTC | #5
On 9/11/2023 2:22 PM, Bagas Sanjaya wrote:
> On Sun, Sep 10, 2023 at 01:46:01AM +0530, Mukesh Ojha wrote:
>> Hi All,
>>
>> This is to continuation from the conversation happened at v4
>>
>> https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/
>>
>> https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/
>>
>> We have put abstract on LPC on this topic as well as initiated a mail thread
>> with other SoC vendors but did not get much traction on it.
>>
>> https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/
>>
>> We explored most of possiblity present in kernel to address this issue[1] but
>> solution like kdump/fadump does not seems safe/secure/performant from our
>> perspective.
>>
>> Hence, with this series we tried to make the minidump kernel driver, simple
>> and tied with pstore frontends, so that it collects the present available
>> frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working
>> towards enhancing generic pstore to capture more debug data which will be
>> helpful for first hand of debugging that can benefit both other pstore users
>> as well as us as minidump users.
>>
>> One of the proposal made here,
>> https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/
>>
>> Looking forward for your comments.
>>
>> Thanks,
>> Mukesh
>>
>> [1]
>> Minidump is a best effort mechanism to collect useful and predefined data
>> for first level of debugging on end user devices running on Qualcomm SoCs.
>> It is built on the premise that System on Chip (SoC) or subsystem part of
>> SoC crashes, due to a range of hardware and software bugs. Hence, the
>> ability to collect accurate data is only a best-effort. The data collected
>> could be invalid or corrupted, data collection itself could fail, and so on.
>>
>> Qualcomm devices in engineering mode provides a mechanism for generating
>> full system ramdumps for post mortem debugging. But in some cases it's
>> however not feasible to capture the entire content of RAM. The minidump
>> mechanism provides the means for selecting which snippets should be
>> included in the ramdump.
>>
>> The core of SMEM based minidump feature is part of Qualcomm's boot
>> firmware code. It initializes shared memory (SMEM), which is a part of
>> DDR and allocates a small section of SMEM to minidump table i.e also
>> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
>> has their own table of segments to be included in the minidump and all
>> get their reference from G-ToC. Each segment/region has some details
>> like name, physical address and it's size etc. and it could be anywhere
>> scattered in the DDR.
>>
>> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
>> based minidump feature for remoteproc instances like ADSP, MODEM, ...
>> where predefined selective segments of subsystem region can be dumped
>> as part of coredump collection which generates smaller size artifacts
>> compared to complete coredump of subsystem on crash.
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
>>
>> In addition to managing and querying the APSS minidump description,
>> the Linux driver maintains a ELF header in a segment. This segment
>> gets updated with section/program header whenever a new entry gets
>> registered.
>>
>> Changes in v5:
>>   - On suggestion from Pavan.k, to have single function call for minidump collection
>>     from remoteproc driver, separated the logic to have separate minidump file called
>>     qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to
>>     qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
>>     during region unregister in this series, will pursue it in next series.
>>
>>   - To simplify the minidump driver, removed the complication for frontend and different
>>     backend from Greg suggestion, will pursue this once main driver gets mainlined.
>>
>>   - Move the dynamic ramoops region allocation from Device tree approach to command line
>>     approch with the introduction command line parsing and memblock reservation during
>>     early boot up; Not added documentation about it yet, will add if it gets positive
>>     response.
>>
>>   - Exporting linux banner from kernel to make minidump build also as module, however,
>>     minidump is a debug module and should be kernel built to get most debug information
>>     from kernel.
>>
>>   - Tried to address comments given on dload patch series.
>>
>> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
>>   - Redesigned the driver and divided the driver into front end and backend (smem) so
>>     that any new backend can be attached easily to avoid code duplication.
>>   - Patch reordering as per the driver and subsystem to easier review of the code.
>>   - Removed minidump specific code from remoteproc to minidump smem based driver.
>>   - Enabled the all the driver as modules.
>>   - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
>>   - Address comments made qcom_pstore_minidump driver and given its Device tree
>>     same set of properties as ramoops. [Luca/Kees]
>>   - Added patch for MAINTAINER file.
>>   - Include defconfig change as one patch as per [Krzysztof] suggestion.
>>   - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
>>   - Addressed comments made on dload mode patch v6 version
>>     https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
>>
>> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
>>   - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
>>      - Added platform device support
>>      - Unregister region support.
>>   - Added update region for clients.
>>   - Added pending region support.
>>   - Modified the documentation guide accordingly.
>>   - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
>>     device and also registers ramoops region with minidump.
>>   - Added download mode patch series with this minidump series.
>>      https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
>>
>> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
>>   - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
>>   - Addressed comments made by [srinivas.kandagatla]
>>   - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
>>     region in minidump.
>>   - Fixed issue reported by kernel test robot.
>>
>> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
>>
>> Testing of the patches has been done on sm8450 target after enabling config like
>> CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.
>>
>>   echo mini > /sys/module/qcom_scm/parameters/download_mode
>>
>> Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
>> and put the device in download mode) on command prompt.
>>
>> Default storage type is set to via USB, so minidump would be downloaded with the
>> help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
>> backed minidump boot firmware support.
>>
>> This will make the device go to download mode and collect the minidump on to the
>> attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
>> package manager kit).
>>
>> After that we will see a bunch of predefined registered region as binary blobs files
>> starts with md_* downloaded on the x86 machine on given location in PCAT tool from
>> the target device, more about this can be found in qualcomm minidump guide patch.
>>
> 
> I tried to apply this series on top of 535a265d7f0dd50 (as suggested by
> `b4 am -l -g`), but it conflicts on patch [04/17]. Please specify the
> exact base commit or another series for which this series is based on.

Apologies !
I just realized, it was 6.5-rc7, but let me rebase version of the series;

Sorry, for all the reviewed done so far, i will definitely take care of 
them or reply.

-Mukesh
> 
> Thanks.
>
Krzysztof Kozlowski Sept. 11, 2023, 11:01 a.m. UTC | #6
On 09/09/2023 22:16, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined
> data for first level of debugging on end user devices running on
> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
> or subsystem part of SoC crashes, due to a range of hardware and
> software bugs. Hence, the ability to collect accurate data is only
> a best-effort. The data collected could be invalid or corrupted,
> data collection itself could fail, and so on.

...

> +static int qcom_apss_md_table_init(struct minidump *md,
> +				   struct minidump_subsystem *mdss_toc)
> +{
> +	struct minidump_ss_data *mdss_data;
> +
> +	mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL);
> +	if (!mdss_data)
> +		return -ENOMEM;
> +
> +	mdss_data->md_ss_toc = mdss_toc;
> +	mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
> +					     sizeof(struct minidump_region),
> +					     GFP_KERNEL);
> +	if (!mdss_data->md_regions)
> +		return -ENOMEM;
> +
> +	mdss_toc = mdss_data->md_ss_toc;
> +	mdss_toc->regions_baseptr = cpu_to_le64(virt_to_phys(mdss_data->md_regions));
> +	mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
> +	mdss_toc->status = cpu_to_le32(1);
> +	mdss_toc->region_count = cpu_to_le32(0);
> +
> +	/* Tell bootloader not to encrypt the regions of this subsystem */
> +	mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
> +	mdss_toc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
> +
> +	md->apss_data = mdss_data;
> +
> +	return 0;
> +}
> +
> +static int qcom_apss_minidump_probe(struct platform_device *pdev)
> +{
> +	struct minidump_global_toc *mdgtoc;
> +	struct minidump *md;
> +	size_t size;
> +	int ret;
> +
> +	md = devm_kzalloc(&pdev->dev, sizeof(struct minidump), GFP_KERNEL);

sizeof(*)

Didn't you get such comments already?


> +	if (!md)
> +		return -ENOMEM;
> +
> +	md->dev = &pdev->dev;
> +	mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
> +	if (IS_ERR(mdgtoc)) {
> +		ret = PTR_ERR(mdgtoc);
> +		dev_err(md->dev, "Couldn't find minidump smem item: %d\n", ret);
> +		return ret;

The syntax is:
return dev_err_probe

> +	}
> +
> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
> +		dev_err(md->dev, "minidump table is not initialized: %d\n", ret);

ret is uninitialized here. Please use automated tools for checking your
code:
coccinelle, smatch and sparse

> +		return -EINVAL;
> +	}
> +
> +	mutex_init(&md->md_lock);
> +	ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]);
> +	if (ret) {
> +		dev_err(md->dev, "apss minidump initialization failed: %d\n", ret);
> +		return ret;
> +	}
> +
> +	/* First entry would be ELF header */
> +	ret = qcom_md_add_elfheader(md);
> +	if (ret) {
> +		dev_err(md->dev, "Failed to add elf header: %d\n", ret);
> +		memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem));

Why do you need it?

> +		return ret;
> +	}
> +
> +	platform_set_drvdata(pdev, md);
> +
> +	return ret;
> +}
> +
> +static int qcom_apss_minidump_remove(struct platform_device *pdev)
> +{
> +	struct minidump *md = platform_get_drvdata(pdev);
> +	struct minidump_ss_data *mdss_data;
> +
> +	mdss_data = md->apss_data;
> +	memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(struct minidump_subsystem));

Why do you need it?

> +	md = NULL;

That's useless assignment.

> +
> +	return 0;
> +}
> +
> +static struct platform_driver qcom_minidump_driver = {
> +	.probe = qcom_apss_minidump_probe,
> +	.remove = qcom_apss_minidump_remove,
> +	.driver  = {
> +		.name = "qcom-minidump-smem",
> +	},
> +};
> +
> +module_platform_driver(qcom_minidump_driver);
> +
> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS("platform:qcom-minidump-smem");

Add a proper ID table instead of re-inventing it with module aliases.

Best regards,
Krzysztof
Krzysztof Kozlowski Sept. 11, 2023, 11:07 a.m. UTC | #7
On 09/09/2023 22:16, Mukesh Ojha wrote:
> +/**
> + * qcom_minidump_region_register() - Register region in APSS Minidump table.
> + * @region: minidump region.
> + *
> + * Return: On success, it returns 0 and negative error value on failure.
> + */
> +int qcom_minidump_region_register(const struct qcom_minidump_region *region)
> +{
> +	struct minidump *md;
> +	int ret;
> +
> +	md = qcom_smem_minidump_ready();
> +	if (!md)
> +		return -EPROBE_DEFER;
> +
> +	if (!qcom_minidump_valid_region(region))
> +		return -EINVAL;
> +
> +	mutex_lock(&md->md_lock);
> +	ret = qcom_md_region_register(md, region);
> +	if (ret)
> +		goto unlock;
> +
> +	qcom_md_update_elfheader(md, region);
> +unlock:
> +	mutex_unlock(&md->md_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register);

NAK, there is no user for this.

Drop all exports from minidump drivers. Your upstream driver *must not*
expose stuff to your vendor drivers.

Best regards,
Krzysztof
Krzysztof Kozlowski Sept. 11, 2023, 11:08 a.m. UTC | #8
On 09/09/2023 22:16, Mukesh Ojha wrote:
>  static int qcom_apss_minidump_probe(struct platform_device *pdev)
>  {
>  	struct minidump_global_toc *mdgtoc;
> @@ -571,7 +688,10 @@ static int qcom_apss_minidump_probe(struct platform_device *pdev)
>  		return ret;
>  	}
>  
> +	mutex_lock(&md_plist.plock);
>  	platform_set_drvdata(pdev, md);

Why this is locked?

> +	qcom_apss_register_pending_regions(md);

Why this one is locked? It seems ordering of your operations is not
correct if you need to lock the providers probe().

> +	mutex_unlock(&md_plist.plock);


Best regards,
Krzysztof
Bagas Sanjaya Sept. 11, 2023, 1:14 p.m. UTC | #9
On 11/09/2023 17:39, Mukesh Ojha wrote:
> 
> 
> On 9/11/2023 2:22 PM, Bagas Sanjaya wrote:
>> On Sun, Sep 10, 2023 at 01:46:01AM +0530, Mukesh Ojha wrote:
>>> Hi All,
>>>
>>> This is to continuation from the conversation happened at v4
>>>
>>> https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/
>>>
>>> https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/
>>>
>>> We have put abstract on LPC on this topic as well as initiated a mail thread
>>> with other SoC vendors but did not get much traction on it.
>>>
>>> https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/
>>>
>>> We explored most of possiblity present in kernel to address this issue[1] but
>>> solution like kdump/fadump does not seems safe/secure/performant from our
>>> perspective.
>>>
>>> Hence, with this series we tried to make the minidump kernel driver, simple
>>> and tied with pstore frontends, so that it collects the present available
>>> frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working
>>> towards enhancing generic pstore to capture more debug data which will be
>>> helpful for first hand of debugging that can benefit both other pstore users
>>> as well as us as minidump users.
>>>
>>> One of the proposal made here,
>>> https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/
>>>
>>> Looking forward for your comments.
>>>
>>> Thanks,
>>> Mukesh
>>>
>>> [1]
>>> Minidump is a best effort mechanism to collect useful and predefined data
>>> for first level of debugging on end user devices running on Qualcomm SoCs.
>>> It is built on the premise that System on Chip (SoC) or subsystem part of
>>> SoC crashes, due to a range of hardware and software bugs. Hence, the
>>> ability to collect accurate data is only a best-effort. The data collected
>>> could be invalid or corrupted, data collection itself could fail, and so on.
>>>
>>> Qualcomm devices in engineering mode provides a mechanism for generating
>>> full system ramdumps for post mortem debugging. But in some cases it's
>>> however not feasible to capture the entire content of RAM. The minidump
>>> mechanism provides the means for selecting which snippets should be
>>> included in the ramdump.
>>>
>>> The core of SMEM based minidump feature is part of Qualcomm's boot
>>> firmware code. It initializes shared memory (SMEM), which is a part of
>>> DDR and allocates a small section of SMEM to minidump table i.e also
>>> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
>>> has their own table of segments to be included in the minidump and all
>>> get their reference from G-ToC. Each segment/region has some details
>>> like name, physical address and it's size etc. and it could be anywhere
>>> scattered in the DDR.
>>>
>>> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
>>> based minidump feature for remoteproc instances like ADSP, MODEM, ...
>>> where predefined selective segments of subsystem region can be dumped
>>> as part of coredump collection which generates smaller size artifacts
>>> compared to complete coredump of subsystem on crash.
>>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
>>>
>>> In addition to managing and querying the APSS minidump description,
>>> the Linux driver maintains a ELF header in a segment. This segment
>>> gets updated with section/program header whenever a new entry gets
>>> registered.
>>>
>>> Changes in v5:
>>>   - On suggestion from Pavan.k, to have single function call for minidump collection
>>>     from remoteproc driver, separated the logic to have separate minidump file called
>>>     qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to
>>>     qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
>>>     during region unregister in this series, will pursue it in next series.
>>>
>>>   - To simplify the minidump driver, removed the complication for frontend and different
>>>     backend from Greg suggestion, will pursue this once main driver gets mainlined.
>>>
>>>   - Move the dynamic ramoops region allocation from Device tree approach to command line
>>>     approch with the introduction command line parsing and memblock reservation during
>>>     early boot up; Not added documentation about it yet, will add if it gets positive
>>>     response.
>>>
>>>   - Exporting linux banner from kernel to make minidump build also as module, however,
>>>     minidump is a debug module and should be kernel built to get most debug information
>>>     from kernel.
>>>
>>>   - Tried to address comments given on dload patch series.
>>>
>>> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
>>>   - Redesigned the driver and divided the driver into front end and backend (smem) so
>>>     that any new backend can be attached easily to avoid code duplication.
>>>   - Patch reordering as per the driver and subsystem to easier review of the code.
>>>   - Removed minidump specific code from remoteproc to minidump smem based driver.
>>>   - Enabled the all the driver as modules.
>>>   - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
>>>   - Address comments made qcom_pstore_minidump driver and given its Device tree
>>>     same set of properties as ramoops. [Luca/Kees]
>>>   - Added patch for MAINTAINER file.
>>>   - Include defconfig change as one patch as per [Krzysztof] suggestion.
>>>   - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
>>>   - Addressed comments made on dload mode patch v6 version
>>>     https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
>>>
>>> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
>>>   - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
>>>      - Added platform device support
>>>      - Unregister region support.
>>>   - Added update region for clients.
>>>   - Added pending region support.
>>>   - Modified the documentation guide accordingly.
>>>   - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
>>>     device and also registers ramoops region with minidump.
>>>   - Added download mode patch series with this minidump series.
>>>      https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
>>>
>>> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
>>>   - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
>>>   - Addressed comments made by [srinivas.kandagatla]
>>>   - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
>>>     region in minidump.
>>>   - Fixed issue reported by kernel test robot.
>>>
>>> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
>>>
>>> Testing of the patches has been done on sm8450 target after enabling config like
>>> CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.
>>>
>>>   echo mini > /sys/module/qcom_scm/parameters/download_mode
>>>
>>> Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
>>> and put the device in download mode) on command prompt.
>>>
>>> Default storage type is set to via USB, so minidump would be downloaded with the
>>> help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
>>> backed minidump boot firmware support.
>>>
>>> This will make the device go to download mode and collect the minidump on to the
>>> attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
>>> package manager kit).
>>>
>>> After that we will see a bunch of predefined registered region as binary blobs files
>>> starts with md_* downloaded on the x86 machine on given location in PCAT tool from
>>> the target device, more about this can be found in qualcomm minidump guide patch.
>>>
>>
>> I tried to apply this series on top of 535a265d7f0dd50 (as suggested by
>> `b4 am -l -g`), but it conflicts on patch [04/17]. Please specify the
>> exact base commit or another series for which this series is based on.
> 
> Apologies !
> I just realized, it was 6.5-rc7, but let me rebase version of the series;
> 
> Sorry, for all the reviewed done so far, i will definitely take care of them or reply.
> 

OK, see you in v6!
Jeff Johnson Sept. 11, 2023, 6:59 p.m. UTC | #10
On 9/9/2023 1:16 PM, Mukesh Ojha wrote:
> +/**
> + * struct minidump_pregion - Minidump pending region
> + * @list       : Pending region list pointer
> + * @region     : APSS minidump client region

does kernel-doc parse this correctly? should not be whitespace between 
@ID and ":"

refer to 
<https://static.lwn.net/kerneldoc/doc-guide/kernel-doc.html#structure-union-and-enumeration-documentation>
Jeff Johnson Sept. 11, 2023, 7:09 p.m. UTC | #11
On 9/11/2023 4:07 AM, Krzysztof Kozlowski wrote:
> On 09/09/2023 22:16, Mukesh Ojha wrote:
>> +/**
>> + * qcom_minidump_region_register() - Register region in APSS Minidump table.
>> + * @region: minidump region.
>> + *
>> + * Return: On success, it returns 0 and negative error value on failure.
>> + */
>> +int qcom_minidump_region_register(const struct qcom_minidump_region *region)
>> +{
>> +	struct minidump *md;
>> +	int ret;
>> +
>> +	md = qcom_smem_minidump_ready();
>> +	if (!md)
>> +		return -EPROBE_DEFER;
>> +
>> +	if (!qcom_minidump_valid_region(region))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&md->md_lock);
>> +	ret = qcom_md_region_register(md, region);
>> +	if (ret)
>> +		goto unlock;
>> +
>> +	qcom_md_update_elfheader(md, region);
>> +unlock:
>> +	mutex_unlock(&md->md_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register);
> 
> NAK, there is no user for this.
> 
> Drop all exports from minidump drivers. Your upstream driver *must not*
> expose stuff to your vendor drivers.

Do we not expect that upstream drivers would want to register?
Mind you, in the downstream code the following was a bad limitation:
#define MAX_NUM_OF_SS           10
Konrad Dybcio Sept. 11, 2023, 7:33 p.m. UTC | #12
On 11.09.2023 21:09, Jeff Johnson wrote:
> On 9/11/2023 4:07 AM, Krzysztof Kozlowski wrote:
>> On 09/09/2023 22:16, Mukesh Ojha wrote:
>>> +/**
>>> + * qcom_minidump_region_register() - Register region in APSS Minidump table.
>>> + * @region: minidump region.
>>> + *
>>> + * Return: On success, it returns 0 and negative error value on failure.
>>> + */
>>> +int qcom_minidump_region_register(const struct qcom_minidump_region *region)
>>> +{
>>> +    struct minidump *md;
>>> +    int ret;
>>> +
>>> +    md = qcom_smem_minidump_ready();
>>> +    if (!md)
>>> +        return -EPROBE_DEFER;
>>> +
>>> +    if (!qcom_minidump_valid_region(region))
>>> +        return -EINVAL;
>>> +
>>> +    mutex_lock(&md->md_lock);
>>> +    ret = qcom_md_region_register(md, region);
>>> +    if (ret)
>>> +        goto unlock;
>>> +
>>> +    qcom_md_update_elfheader(md, region);
>>> +unlock:
>>> +    mutex_unlock(&md->md_lock);
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register);
>>
>> NAK, there is no user for this.
>>
>> Drop all exports from minidump drivers. Your upstream driver *must not*
>> expose stuff to your vendor drivers.
> 
> Do we not expect that upstream drivers would want to register?
> Mind you, in the downstream code the following was a bad limitation:
> #define MAX_NUM_OF_SS           10
No, Krzysztof meant that you are not allowed to export symbols
without immediately providing a user for them - meaning if the
functions are not going to be used upstream, this change will
not be accepted.

Konrad
Mukesh Ojha Sept. 12, 2023, 9:26 a.m. UTC | #13
Thanks for your time in reviewing this.

On 9/11/2023 4:31 PM, Krzysztof Kozlowski wrote:
> On 09/09/2023 22:16, Mukesh Ojha wrote:
>> Minidump is a best effort mechanism to collect useful and predefined
>> data for first level of debugging on end user devices running on
>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
>> or subsystem part of SoC crashes, due to a range of hardware and
>> software bugs. Hence, the ability to collect accurate data is only
>> a best-effort. The data collected could be invalid or corrupted,
>> data collection itself could fail, and so on.
> 
> ...
> 
>> +static int qcom_apss_md_table_init(struct minidump *md,
>> +				   struct minidump_subsystem *mdss_toc)
>> +{
>> +	struct minidump_ss_data *mdss_data;
>> +
>> +	mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL);
>> +	if (!mdss_data)
>> +		return -ENOMEM;
>> +
>> +	mdss_data->md_ss_toc = mdss_toc;
>> +	mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
>> +					     sizeof(struct minidump_region),
>> +					     GFP_KERNEL);
>> +	if (!mdss_data->md_regions)
>> +		return -ENOMEM;
>> +
>> +	mdss_toc = mdss_data->md_ss_toc;
>> +	mdss_toc->regions_baseptr = cpu_to_le64(virt_to_phys(mdss_data->md_regions));
>> +	mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
>> +	mdss_toc->status = cpu_to_le32(1);
>> +	mdss_toc->region_count = cpu_to_le32(0);
>> +
>> +	/* Tell bootloader not to encrypt the regions of this subsystem */
>> +	mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
>> +	mdss_toc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
>> +
>> +	md->apss_data = mdss_data;
>> +
>> +	return 0;
>> +}
>> +
>> +static int qcom_apss_minidump_probe(struct platform_device *pdev)
>> +{
>> +	struct minidump_global_toc *mdgtoc;
>> +	struct minidump *md;
>> +	size_t size;
>> +	int ret;
>> +
>> +	md = devm_kzalloc(&pdev->dev, sizeof(struct minidump), GFP_KERNEL);
> 
> sizeof(*)
> 
> Didn't you get such comments already?

Ok, will fix this, no i have not got such comments as of yet.
Any reason of using this way?

> 
> 
>> +	if (!md)
>> +		return -ENOMEM;
>> +
>> +	md->dev = &pdev->dev;
>> +	mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size);
>> +	if (IS_ERR(mdgtoc)) {
>> +		ret = PTR_ERR(mdgtoc);
>> +		dev_err(md->dev, "Couldn't find minidump smem item: %d\n", ret);
>> +		return ret;
> 
> The syntax is:
> return dev_err_probe

ACK.

> 
>> +	}
>> +
>> +	if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
>> +		dev_err(md->dev, "minidump table is not initialized: %d\n", ret);
> 
> ret is uninitialized here. Please use automated tools for checking your
> code:
> coccinelle, smatch and sparse

Thanks.

> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	mutex_init(&md->md_lock);
>> +	ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]);
>> +	if (ret) {
>> +		dev_err(md->dev, "apss minidump initialization failed: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	/* First entry would be ELF header */
>> +	ret = qcom_md_add_elfheader(md);
>> +	if (ret) {
>> +		dev_err(md->dev, "Failed to add elf header: %d\n", ret);
>> +		memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem));
> 
> Why do you need it?

Earlier, i got comment about clearing the SS TOC(subsystem table of 
content) which is shared with other SS and it will have stale values.

> 
>> +		return ret;
>> +	}
>> +
>> +	platform_set_drvdata(pdev, md);
>> +
>> +	return ret;
>> +}
>> +
>> +static int qcom_apss_minidump_remove(struct platform_device *pdev)
>> +{
>> +	struct minidump *md = platform_get_drvdata(pdev);
>> +	struct minidump_ss_data *mdss_data;
>> +
>> +	mdss_data = md->apss_data;
>> +	memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(struct minidump_subsystem));
> 
> Why do you need it?

Same as above.

> 
>> +	md = NULL;
> 
> That's useless assignment.

Ok.

> 
>> +
>> +	return 0;
>> +}
>> +
>> +static struct platform_driver qcom_minidump_driver = {
>> +	.probe = qcom_apss_minidump_probe,
>> +	.remove = qcom_apss_minidump_remove,
>> +	.driver  = {
>> +		.name = "qcom-minidump-smem",
>> +	},
>> +};
>> +
>> +module_platform_driver(qcom_minidump_driver);
>> +
>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
>> +MODULE_LICENSE("GPL");
>> +MODULE_ALIAS("platform:qcom-minidump-smem");
> 
> Add a proper ID table instead of re-inventing it with module aliases.

Ok.

-Mukesh

> 
> Best regards,
> Krzysztof
>
Krzysztof Kozlowski Sept. 12, 2023, 9:54 a.m. UTC | #14
On 12/09/2023 11:26, Mukesh Ojha wrote:
>>
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	mutex_init(&md->md_lock);
>>> +	ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]);
>>> +	if (ret) {
>>> +		dev_err(md->dev, "apss minidump initialization failed: %d\n", ret);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* First entry would be ELF header */
>>> +	ret = qcom_md_add_elfheader(md);
>>> +	if (ret) {
>>> +		dev_err(md->dev, "Failed to add elf header: %d\n", ret);
>>> +		memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem));
>>
>> Why do you need it?
> 
> Earlier, i got comment about clearing the SS TOC(subsystem table of 
> content) which is shared with other SS and it will have stale values.

OK, but then the entire code is poorly readable. First, any cleanup of
qcom_apss_md_table_init() should be named similarly, e.g.
qcom_apss_md_table_clean() or qcom_apss_md_table_exit() or whatever
seems feasible.

Second, shouldn't writing to shared memory be the last step? Step which
cannot fail and there is no cleanup afterwards (like
platform_set_drvdata)? I don't enjoy looking at this interface...



Best regards,
Krzysztof
Mukesh Ojha Sept. 12, 2023, 9:58 a.m. UTC | #15
On 9/12/2023 2:56 PM, Mukesh Ojha wrote:
> Thanks for your time in reviewing this.
> 
> On 9/11/2023 4:31 PM, Krzysztof Kozlowski wrote:
>> On 09/09/2023 22:16, Mukesh Ojha wrote:
>>> Minidump is a best effort mechanism to collect useful and predefined
>>> data for first level of debugging on end user devices running on
>>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
>>> or subsystem part of SoC crashes, due to a range of hardware and
>>> software bugs. Hence, the ability to collect accurate data is only
>>> a best-effort. The data collected could be invalid or corrupted,
>>> data collection itself could fail, and so on.
>>
>> ...
>>
>>> +static int qcom_apss_md_table_init(struct minidump *md,
>>> +                   struct minidump_subsystem *mdss_toc)
>>> +{
>>> +    struct minidump_ss_data *mdss_data;
>>> +
>>> +    mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL);
>>> +    if (!mdss_data)
>>> +        return -ENOMEM;
>>> +
>>> +    mdss_data->md_ss_toc = mdss_toc;
>>> +    mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES,
>>> +                         sizeof(struct minidump_region),
>>> +                         GFP_KERNEL);
>>> +    if (!mdss_data->md_regions)
>>> +        return -ENOMEM;
>>> +
>>> +    mdss_toc = mdss_data->md_ss_toc;
>>> +    mdss_toc->regions_baseptr = 
>>> cpu_to_le64(virt_to_phys(mdss_data->md_regions));
>>> +    mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED);
>>> +    mdss_toc->status = cpu_to_le32(1);
>>> +    mdss_toc->region_count = cpu_to_le32(0);
>>> +
>>> +    /* Tell bootloader not to encrypt the regions of this subsystem */
>>> +    mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE);
>>> +    mdss_toc->encryption_required = 
>>> cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ);
>>> +
>>> +    md->apss_data = mdss_data;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int qcom_apss_minidump_probe(struct platform_device *pdev)
>>> +{
>>> +    struct minidump_global_toc *mdgtoc;
>>> +    struct minidump *md;
>>> +    size_t size;
>>> +    int ret;
>>> +
>>> +    md = devm_kzalloc(&pdev->dev, sizeof(struct minidump), GFP_KERNEL);
>>
>> sizeof(*)
>>
>> Didn't you get such comments already?
> 
> Ok, will fix this, no i have not got such comments as of yet.
> Any reason of using this way?

Got the reason, thanks.

-Mukesh

> 
>>
>>
>>> +    if (!md)
>>> +        return -ENOMEM;
>>> +
>>> +    md->dev = &pdev->dev;
>>> +    mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, 
>>> &size);
>>> +    if (IS_ERR(mdgtoc)) {
>>> +        ret = PTR_ERR(mdgtoc);
>>> +        dev_err(md->dev, "Couldn't find minidump smem item: %d\n", 
>>> ret);
>>> +        return ret;
>>
>> The syntax is:
>> return dev_err_probe
> 
> ACK.
> 
>>
>>> +    }
>>> +
>>> +    if (size < sizeof(*mdgtoc) || !mdgtoc->status) {
>>> +        dev_err(md->dev, "minidump table is not initialized: %d\n", 
>>> ret);
>>
>> ret is uninitialized here. Please use automated tools for checking your
>> code:
>> coccinelle, smatch and sparse
> 
> Thanks.
> 
>>
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    mutex_init(&md->md_lock);
>>> +    ret = qcom_apss_md_table_init(md, 
>>> &mdgtoc->subsystems[MINIDUMP_APSS_DESC]);
>>> +    if (ret) {
>>> +        dev_err(md->dev, "apss minidump initialization failed: 
>>> %d\n", ret);
>>> +        return ret;
>>> +    }
>>> +
>>> +    /* First entry would be ELF header */
>>> +    ret = qcom_md_add_elfheader(md);
>>> +    if (ret) {
>>> +        dev_err(md->dev, "Failed to add elf header: %d\n", ret);
>>> +        memset(md->apss_data->md_ss_toc, 0, sizeof(struct 
>>> minidump_subsystem));
>>
>> Why do you need it?
> 
> Earlier, i got comment about clearing the SS TOC(subsystem table of 
> content) which is shared with other SS and it will have stale values.
> 
>>
>>> +        return ret;
>>> +    }
>>> +
>>> +    platform_set_drvdata(pdev, md);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static int qcom_apss_minidump_remove(struct platform_device *pdev)
>>> +{
>>> +    struct minidump *md = platform_get_drvdata(pdev);
>>> +    struct minidump_ss_data *mdss_data;
>>> +
>>> +    mdss_data = md->apss_data;
>>> +    memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(struct 
>>> minidump_subsystem));
>>
>> Why do you need it?
> 
> Same as above.
> 
>>
>>> +    md = NULL;
>>
>> That's useless assignment.
> 
> Ok.
> 
>>
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static struct platform_driver qcom_minidump_driver = {
>>> +    .probe = qcom_apss_minidump_probe,
>>> +    .remove = qcom_apss_minidump_remove,
>>> +    .driver  = {
>>> +        .name = "qcom-minidump-smem",
>>> +    },
>>> +};
>>> +
>>> +module_platform_driver(qcom_minidump_driver);
>>> +
>>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver");
>>> +MODULE_LICENSE("GPL");
>>> +MODULE_ALIAS("platform:qcom-minidump-smem");
>>
>> Add a proper ID table instead of re-inventing it with module aliases.
> 
> Ok.
> 
> -Mukesh
> 
>>
>> Best regards,
>> Krzysztof
>>
Mukesh Ojha Sept. 13, 2023, 7:09 a.m. UTC | #16
On 9/12/2023 3:24 PM, Krzysztof Kozlowski wrote:
> On 12/09/2023 11:26, Mukesh Ojha wrote:
>>>
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	mutex_init(&md->md_lock);
>>>> +	ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]);
>>>> +	if (ret) {
>>>> +		dev_err(md->dev, "apss minidump initialization failed: %d\n", ret);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	/* First entry would be ELF header */
>>>> +	ret = qcom_md_add_elfheader(md);
>>>> +	if (ret) {
>>>> +		dev_err(md->dev, "Failed to add elf header: %d\n", ret);
>>>> +		memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem));
>>>
>>> Why do you need it?
>>
>> Earlier, i got comment about clearing the SS TOC(subsystem table of
>> content) which is shared with other SS and it will have stale values.
> 
> OK, but then the entire code is poorly readable. First, any cleanup of
> qcom_apss_md_table_init() should be named similarly, e.g.
> qcom_apss_md_table_clean() or qcom_apss_md_table_exit() or whatever
> seems feasible.

ACK on this.

> 
> Second, shouldn't writing to shared memory be the last step? Step which
> cannot fail and there is no cleanup afterwards (like
> platform_set_drvdata)? I don't enjoy looking at this interface...

It can be done, if i shift adding elf header as first thing to first
caller of qcom_minidump_region_register() but then i would have to 
remove qcom_ramoops_minidump() from this probe in 11/17 patch.

-Mukesh
> 
> 
> 
> Best regards,
> Krzysztof
>
Mukesh Ojha Sept. 13, 2023, 3:18 p.m. UTC | #17
On 9/12/2023 12:39 AM, Jeff Johnson wrote:
> On 9/11/2023 4:07 AM, Krzysztof Kozlowski wrote:
>> On 09/09/2023 22:16, Mukesh Ojha wrote:
>>> +/**
>>> + * qcom_minidump_region_register() - Register region in APSS 
>>> Minidump table.
>>> + * @region: minidump region.
>>> + *
>>> + * Return: On success, it returns 0 and negative error value on 
>>> failure.
>>> + */
>>> +int qcom_minidump_region_register(const struct qcom_minidump_region 
>>> *region)
>>> +{
>>> +    struct minidump *md;
>>> +    int ret;
>>> +
>>> +    md = qcom_smem_minidump_ready();
>>> +    if (!md)
>>> +        return -EPROBE_DEFER;
>>> +
>>> +    if (!qcom_minidump_valid_region(region))
>>> +        return -EINVAL;
>>> +
>>> +    mutex_lock(&md->md_lock);
>>> +    ret = qcom_md_region_register(md, region);
>>> +    if (ret)
>>> +        goto unlock;
>>> +
>>> +    qcom_md_update_elfheader(md, region);
>>> +unlock:
>>> +    mutex_unlock(&md->md_lock);
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register);
>>
>> NAK, there is no user for this.
>>
>> Drop all exports from minidump drivers. Your upstream driver *must not*
>> expose stuff to your vendor drivers.
> 
> Do we not expect that upstream drivers would want to register?

As of current version of patch, there is no user. Let's avoid till
any upstream QCOM driver uses it .


> Mind you, in the downstream code the following was a bad limitation:
> #define MAX_NUM_OF_SS           10

I don't think, there is any problem with above macro, instead there is
restriction on total number of APSS region can be registered.

#define MAX_NUM_ENTRIES	  201

-Mukesh
> 
>