Message ID | 1694290578-17733-1-git-send-email-quic_mojha@quicinc.com |
---|---|
Headers | show |
Series | Add Qualcomm Minidump kernel driver related support | expand |
On 9/10/2023 1:46 AM, Mukesh Ojha wrote: > Hi All, > > This is to continuation from the conversation happened at v4 > > https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/ > > https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/ > > We have put abstract on LPC on this topic as well as initiated a mail thread > with other SoC vendors but did not get much traction on it. > > https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/ > > We explored most of possiblity present in kernel to address this issue[1] but > solution like kdump/fadump does not seems safe/secure/performant from our > perspective. > > Hence, with this series we tried to make the minidump kernel driver, simple > and tied with pstore frontends, so that it collects the present available > frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working > towards enhancing generic pstore to capture more debug data which will be > helpful for first hand of debugging that can benefit both other pstore users > as well as us as minidump users. > > One of the proposal made here, > https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/ > > Looking forward for your comments. > > Thanks, > Mukesh > > [1] > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. > > Qualcomm devices in engineering mode provides a mechanism for generating > full system ramdumps for post mortem debugging. But in some cases it's > however not feasible to capture the entire content of RAM. The minidump > mechanism provides the means for selecting which snippets should be > included in the ramdump. > > The core of SMEM based minidump feature is part of Qualcomm's boot > firmware code. It initializes shared memory (SMEM), which is a part of > DDR and allocates a small section of SMEM to minidump table i.e also > called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) > has their own table of segments to be included in the minidump and all > get their reference from G-ToC. Each segment/region has some details > like name, physical address and it's size etc. and it could be anywhere > scattered in the DDR. > > Existing upstream Qualcomm remoteproc driver[1] already supports SMEM > based minidump feature for remoteproc instances like ADSP, MODEM, ... > where predefined selective segments of subsystem region can be dumped > as part of coredump collection which generates smaller size artifacts > compared to complete coredump of subsystem on crash. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142 > > In addition to managing and querying the APSS minidump description, > the Linux driver maintains a ELF header in a segment. This segment > gets updated with section/program header whenever a new entry gets > registered. > > Changes in v5: > - On suggestion from Pavan.k, to have single function call for minidump collection > from remoteproc driver, separated the logic to have separate minidump file called > qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to > qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion > during region unregister in this series, will pursue it in next series. > > - To simplify the minidump driver, removed the complication for frontend and different > backend from Greg suggestion, will pursue this once main driver gets mainlined. > > - Move the dynamic ramoops region allocation from Device tree approach to command line > approch with the introduction command line parsing and memblock reservation during > early boot up; Not added documentation about it yet, will add if it gets positive > response. > > - Exporting linux banner from kernel to make minidump build also as module, however, > minidump is a debug module and should be kernel built to get most debug information > from kernel. > > - Tried to address comments given on dload patch series. > > Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/ > - Redesigned the driver and divided the driver into front end and backend (smem) so > that any new backend can be attached easily to avoid code duplication. > - Patch reordering as per the driver and subsystem to easier review of the code. > - Removed minidump specific code from remoteproc to minidump smem based driver. > - Enabled the all the driver as modules. > - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad] > - Address comments made qcom_pstore_minidump driver and given its Device tree > same set of properties as ramoops. [Luca/Kees] > - Added patch for MAINTAINER file. > - Include defconfig change as one patch as per [Krzysztof] suggestion. > - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion. > - Addressed comments made on dload mode patch v6 version > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/ > - Addressed most of the comments by Srini on v2 and refactored the minidump driver. > - Added platform device support > - Unregister region support. > - Added update region for clients. > - Added pending region support. > - Modified the documentation guide accordingly. > - Added qcom_pstore_ramdump client driver which happen to add ramoops platform > device and also registers ramoops region with minidump. > - Added download mode patch series with this minidump series. > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/ > - Addressed review comment made by [quic_tsoni/bmasney] to add documentation. > - Addressed comments made by [srinivas.kandagatla] > - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore > region in minidump. > - Fixed issue reported by kernel test robot. > > Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/ > > Testing of the patches has been done on sm8450 target after enabling config like > CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up. > > echo mini > /sys/module/qcom_scm/parameters/download_mode > > Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and > and put the device in download mode) on command prompt. > > Default storage type is set to via USB, so minidump would be downloaded with the > help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has > backed minidump boot firmware support. > > This will make the device go to download mode and collect the minidump on to the > attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm > package manager kit). > > After that we will see a bunch of predefined registered region as binary blobs files > starts with md_* downloaded on the x86 machine on given location in PCAT tool from > the target device, more about this can be found in qualcomm minidump guide patch. > > Mukesh Ojha (17): > docs: qcom: Add qualcomm minidump guide > soc: qcom: Add qcom_rproc_minidump module > remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump() > remoteproc: qcom: Remove minidump related data from qcom_common.c > init: export linux_banner data variable > soc: qcom: Add Qualcomm APSS minidump kernel driver > soc: qcom: minidump: Add pending region registration > arm64: mm: Add dynamic ramoops region support through command line > pstore/ram: Use dynamic ramoops reserve resource > pstore: Add pstore_region_defined() helper and export it > qcom_minidump: Register ramoops region with minidump > MAINTAINERS: Add entry for minidump related files Based on last discussion[1], I posted the V5 of the below three patches[2] separately. I don't mind clubbing these again back to Minidump series as long as three patches can be picked up separately. > firmware: qcom_scm: provide a read-modify-write function > pinctrl: qcom: Use qcom_scm_io_update_field() > firmware: scm: Modify only the download bits in TCSR register [1] https://lore.kernel.org/linux-arm-msm/d77f5601-2b08-a7c7-1400-7ab68b8add3a@quicinc.com/ [2] https://lore.kernel.org/linux-arm-msm/20230720070408.1093698-1-quic_kathirav@quicinc.com/ > firmware: qcom_scm: Refactor code to support multiple download mode > firmware: qcom_scm: Add multiple download mode support > > Documentation/admin-guide/index.rst | 1 + > Documentation/admin-guide/qcom_minidump.rst | 272 +++++++++++ > MAINTAINERS | 10 + > arch/arm64/mm/init.c | 94 ++++ > drivers/firmware/Kconfig | 11 - > drivers/firmware/qcom_scm.c | 90 +++- > drivers/pinctrl/qcom/pinctrl-msm.c | 10 +- > drivers/remoteproc/Kconfig | 1 + > drivers/remoteproc/qcom_common.c | 150 ------ > drivers/remoteproc/qcom_q6v5_pas.c | 3 +- > drivers/soc/qcom/Kconfig | 24 + > drivers/soc/qcom/Makefile | 3 + > drivers/soc/qcom/qcom_minidump.c | 727 ++++++++++++++++++++++++++++ > drivers/soc/qcom/qcom_minidump_internal.h | 74 +++ > drivers/soc/qcom/qcom_ramoops_minidump.c | 88 ++++ > drivers/soc/qcom/qcom_ramoops_minidump.h | 10 + > drivers/soc/qcom/qcom_rproc_minidump.c | 111 +++++ > drivers/soc/qcom/smem.c | 18 + > fs/pstore/platform.c | 15 + > fs/pstore/ram.c | 52 +- > include/linux/firmware/qcom/qcom_scm.h | 2 + > include/linux/init.h | 3 + > include/linux/pstore.h | 6 + > include/linux/pstore_ram.h | 2 + > include/linux/soc/qcom/smem.h | 2 + > include/soc/qcom/qcom_minidump.h | 56 +++ > init/version-timestamp.c | 3 + > 27 files changed, 1659 insertions(+), 179 deletions(-) > create mode 100644 Documentation/admin-guide/qcom_minidump.rst > create mode 100644 drivers/soc/qcom/qcom_minidump.c > create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h > create mode 100644 drivers/soc/qcom/qcom_ramoops_minidump.c > create mode 100644 drivers/soc/qcom/qcom_ramoops_minidump.h > create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c > create mode 100644 include/soc/qcom/qcom_minidump.h >
On 9/10/2023 1:46 AM, Mukesh Ojha wrote: > Crashdump collection is based on the DLOAD bit of TCSR register. > To retain other bits, we read the register and modify only the > DLOAD bit as the other bits have their own significance. > > Co-developed-by: Poovendhan Selvaraj <quic_poovendh@quicinc.com> > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > --- This change doesn't cleanly apply on next-20230911. Please rebase it. Validated this change on IPQ9574 and IPQ5332 and system is entering into the download mode. Tested-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com> # IPQ9574 and IPQ5332 > drivers/firmware/qcom_scm.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c > index 5ea8fc4fd4e8..eda92f713019 100644 > --- a/drivers/firmware/qcom_scm.c > +++ b/drivers/firmware/qcom_scm.c > @@ -5,6 +5,8 @@ > #include <linux/platform_device.h> > #include <linux/init.h> > #include <linux/interrupt.h> > +#include <linux/bitfield.h> > +#include <linux/bits.h> > #include <linux/completion.h> > #include <linux/cpumask.h> > #include <linux/export.h> > @@ -30,6 +32,10 @@ module_param(download_mode, bool, 0); > #define SCM_HAS_IFACE_CLK BIT(1) > #define SCM_HAS_BUS_CLK BIT(2) > > +#define QCOM_DLOAD_MASK GENMASK(5, 4) > +#define QCOM_DLOAD_FULLDUMP 0x1 > +#define QCOM_DLOAD_NODUMP 0x0 > + > struct qcom_scm { > struct device *dev; > struct clk *core_clk; > @@ -444,6 +450,7 @@ static int __qcom_scm_set_dload_mode(struct device *dev, bool enable) > > static void qcom_scm_set_download_mode(bool enable) > { > + u32 val = enable ? QCOM_DLOAD_FULLDUMP : QCOM_DLOAD_NODUMP; > bool avail; > int ret = 0; > > @@ -453,8 +460,9 @@ static void qcom_scm_set_download_mode(bool enable) > if (avail) { > ret = __qcom_scm_set_dload_mode(__scm->dev, enable); > } else if (__scm->dload_mode_addr) { > - ret = qcom_scm_io_writel(__scm->dload_mode_addr, > - enable ? QCOM_SCM_BOOT_SET_DLOAD_MODE : 0); > + ret = qcom_scm_io_update_field(__scm->dload_mode_addr, > + QCOM_DLOAD_MASK, > + FIELD_PREP(QCOM_DLOAD_MASK, val)); > } else { > dev_err(__scm->dev, > "No available mechanism for setting download mode\n");
On 9/10/2023 1:46 AM, Mukesh Ojha wrote: > Crashdump collection is based on the DLOAD bit of TCSR register. > To retain other bits, we read the register and modify only the > DLOAD bit as the other bits have their own significance. > > Co-developed-by: Poovendhan Selvaraj <quic_poovendh@quicinc.com> Poovendhan's signed-off is also required. Please refer [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/process/submitting-patches.rst#n489 > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > --- > drivers/firmware/qcom_scm.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c > index 5ea8fc4fd4e8..eda92f713019 100644 > --- a/drivers/firmware/qcom_scm.c > +++ b/drivers/firmware/qcom_scm.c > @@ -5,6 +5,8 @@ > #include <linux/platform_device.h> > #include <linux/init.h> > #include <linux/interrupt.h> > +#include <linux/bitfield.h> > +#include <linux/bits.h> > #include <linux/completion.h> > #include <linux/cpumask.h> > #include <linux/export.h> > @@ -30,6 +32,10 @@ module_param(download_mode, bool, 0); > #define SCM_HAS_IFACE_CLK BIT(1) > #define SCM_HAS_BUS_CLK BIT(2) > > +#define QCOM_DLOAD_MASK GENMASK(5, 4) > +#define QCOM_DLOAD_FULLDUMP 0x1 > +#define QCOM_DLOAD_NODUMP 0x0 > + > struct qcom_scm { > struct device *dev; > struct clk *core_clk; > @@ -444,6 +450,7 @@ static int __qcom_scm_set_dload_mode(struct device *dev, bool enable) > > static void qcom_scm_set_download_mode(bool enable) > { > + u32 val = enable ? QCOM_DLOAD_FULLDUMP : QCOM_DLOAD_NODUMP; > bool avail; > int ret = 0; > > @@ -453,8 +460,9 @@ static void qcom_scm_set_download_mode(bool enable) > if (avail) { > ret = __qcom_scm_set_dload_mode(__scm->dev, enable); > } else if (__scm->dload_mode_addr) { > - ret = qcom_scm_io_writel(__scm->dload_mode_addr, > - enable ? QCOM_SCM_BOOT_SET_DLOAD_MODE : 0); > + ret = qcom_scm_io_update_field(__scm->dload_mode_addr, > + QCOM_DLOAD_MASK, > + FIELD_PREP(QCOM_DLOAD_MASK, val)); > } else { > dev_err(__scm->dev, > "No available mechanism for setting download mode\n");
On Sun, Sep 10, 2023 at 01:46:01AM +0530, Mukesh Ojha wrote: > Hi All, > > This is to continuation from the conversation happened at v4 > > https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/ > > https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/ > > We have put abstract on LPC on this topic as well as initiated a mail thread > with other SoC vendors but did not get much traction on it. > > https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/ > > We explored most of possiblity present in kernel to address this issue[1] but > solution like kdump/fadump does not seems safe/secure/performant from our > perspective. > > Hence, with this series we tried to make the minidump kernel driver, simple > and tied with pstore frontends, so that it collects the present available > frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working > towards enhancing generic pstore to capture more debug data which will be > helpful for first hand of debugging that can benefit both other pstore users > as well as us as minidump users. > > One of the proposal made here, > https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/ > > Looking forward for your comments. > > Thanks, > Mukesh > > [1] > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. > > Qualcomm devices in engineering mode provides a mechanism for generating > full system ramdumps for post mortem debugging. But in some cases it's > however not feasible to capture the entire content of RAM. The minidump > mechanism provides the means for selecting which snippets should be > included in the ramdump. > > The core of SMEM based minidump feature is part of Qualcomm's boot > firmware code. It initializes shared memory (SMEM), which is a part of > DDR and allocates a small section of SMEM to minidump table i.e also > called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) > has their own table of segments to be included in the minidump and all > get their reference from G-ToC. Each segment/region has some details > like name, physical address and it's size etc. and it could be anywhere > scattered in the DDR. > > Existing upstream Qualcomm remoteproc driver[1] already supports SMEM > based minidump feature for remoteproc instances like ADSP, MODEM, ... > where predefined selective segments of subsystem region can be dumped > as part of coredump collection which generates smaller size artifacts > compared to complete coredump of subsystem on crash. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142 > > In addition to managing and querying the APSS minidump description, > the Linux driver maintains a ELF header in a segment. This segment > gets updated with section/program header whenever a new entry gets > registered. > > Changes in v5: > - On suggestion from Pavan.k, to have single function call for minidump collection > from remoteproc driver, separated the logic to have separate minidump file called > qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to > qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion > during region unregister in this series, will pursue it in next series. > > - To simplify the minidump driver, removed the complication for frontend and different > backend from Greg suggestion, will pursue this once main driver gets mainlined. > > - Move the dynamic ramoops region allocation from Device tree approach to command line > approch with the introduction command line parsing and memblock reservation during > early boot up; Not added documentation about it yet, will add if it gets positive > response. > > - Exporting linux banner from kernel to make minidump build also as module, however, > minidump is a debug module and should be kernel built to get most debug information > from kernel. > > - Tried to address comments given on dload patch series. > > Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/ > - Redesigned the driver and divided the driver into front end and backend (smem) so > that any new backend can be attached easily to avoid code duplication. > - Patch reordering as per the driver and subsystem to easier review of the code. > - Removed minidump specific code from remoteproc to minidump smem based driver. > - Enabled the all the driver as modules. > - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad] > - Address comments made qcom_pstore_minidump driver and given its Device tree > same set of properties as ramoops. [Luca/Kees] > - Added patch for MAINTAINER file. > - Include defconfig change as one patch as per [Krzysztof] suggestion. > - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion. > - Addressed comments made on dload mode patch v6 version > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/ > - Addressed most of the comments by Srini on v2 and refactored the minidump driver. > - Added platform device support > - Unregister region support. > - Added update region for clients. > - Added pending region support. > - Modified the documentation guide accordingly. > - Added qcom_pstore_ramdump client driver which happen to add ramoops platform > device and also registers ramoops region with minidump. > - Added download mode patch series with this minidump series. > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/ > - Addressed review comment made by [quic_tsoni/bmasney] to add documentation. > - Addressed comments made by [srinivas.kandagatla] > - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore > region in minidump. > - Fixed issue reported by kernel test robot. > > Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/ > > Testing of the patches has been done on sm8450 target after enabling config like > CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up. > > echo mini > /sys/module/qcom_scm/parameters/download_mode > > Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and > and put the device in download mode) on command prompt. > > Default storage type is set to via USB, so minidump would be downloaded with the > help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has > backed minidump boot firmware support. > > This will make the device go to download mode and collect the minidump on to the > attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm > package manager kit). > > After that we will see a bunch of predefined registered region as binary blobs files > starts with md_* downloaded on the x86 machine on given location in PCAT tool from > the target device, more about this can be found in qualcomm minidump guide patch. > I tried to apply this series on top of 535a265d7f0dd50 (as suggested by `b4 am -l -g`), but it conflicts on patch [04/17]. Please specify the exact base commit or another series for which this series is based on. Thanks.
On 9/11/2023 2:22 PM, Bagas Sanjaya wrote: > On Sun, Sep 10, 2023 at 01:46:01AM +0530, Mukesh Ojha wrote: >> Hi All, >> >> This is to continuation from the conversation happened at v4 >> >> https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/ >> >> https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/ >> >> We have put abstract on LPC on this topic as well as initiated a mail thread >> with other SoC vendors but did not get much traction on it. >> >> https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/ >> >> We explored most of possiblity present in kernel to address this issue[1] but >> solution like kdump/fadump does not seems safe/secure/performant from our >> perspective. >> >> Hence, with this series we tried to make the minidump kernel driver, simple >> and tied with pstore frontends, so that it collects the present available >> frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working >> towards enhancing generic pstore to capture more debug data which will be >> helpful for first hand of debugging that can benefit both other pstore users >> as well as us as minidump users. >> >> One of the proposal made here, >> https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/ >> >> Looking forward for your comments. >> >> Thanks, >> Mukesh >> >> [1] >> Minidump is a best effort mechanism to collect useful and predefined data >> for first level of debugging on end user devices running on Qualcomm SoCs. >> It is built on the premise that System on Chip (SoC) or subsystem part of >> SoC crashes, due to a range of hardware and software bugs. Hence, the >> ability to collect accurate data is only a best-effort. The data collected >> could be invalid or corrupted, data collection itself could fail, and so on. >> >> Qualcomm devices in engineering mode provides a mechanism for generating >> full system ramdumps for post mortem debugging. But in some cases it's >> however not feasible to capture the entire content of RAM. The minidump >> mechanism provides the means for selecting which snippets should be >> included in the ramdump. >> >> The core of SMEM based minidump feature is part of Qualcomm's boot >> firmware code. It initializes shared memory (SMEM), which is a part of >> DDR and allocates a small section of SMEM to minidump table i.e also >> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) >> has their own table of segments to be included in the minidump and all >> get their reference from G-ToC. Each segment/region has some details >> like name, physical address and it's size etc. and it could be anywhere >> scattered in the DDR. >> >> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM >> based minidump feature for remoteproc instances like ADSP, MODEM, ... >> where predefined selective segments of subsystem region can be dumped >> as part of coredump collection which generates smaller size artifacts >> compared to complete coredump of subsystem on crash. >> >> [1] >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142 >> >> In addition to managing and querying the APSS minidump description, >> the Linux driver maintains a ELF header in a segment. This segment >> gets updated with section/program header whenever a new entry gets >> registered. >> >> Changes in v5: >> - On suggestion from Pavan.k, to have single function call for minidump collection >> from remoteproc driver, separated the logic to have separate minidump file called >> qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to >> qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion >> during region unregister in this series, will pursue it in next series. >> >> - To simplify the minidump driver, removed the complication for frontend and different >> backend from Greg suggestion, will pursue this once main driver gets mainlined. >> >> - Move the dynamic ramoops region allocation from Device tree approach to command line >> approch with the introduction command line parsing and memblock reservation during >> early boot up; Not added documentation about it yet, will add if it gets positive >> response. >> >> - Exporting linux banner from kernel to make minidump build also as module, however, >> minidump is a debug module and should be kernel built to get most debug information >> from kernel. >> >> - Tried to address comments given on dload patch series. >> >> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/ >> - Redesigned the driver and divided the driver into front end and backend (smem) so >> that any new backend can be attached easily to avoid code duplication. >> - Patch reordering as per the driver and subsystem to easier review of the code. >> - Removed minidump specific code from remoteproc to minidump smem based driver. >> - Enabled the all the driver as modules. >> - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad] >> - Address comments made qcom_pstore_minidump driver and given its Device tree >> same set of properties as ramoops. [Luca/Kees] >> - Added patch for MAINTAINER file. >> - Include defconfig change as one patch as per [Krzysztof] suggestion. >> - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion. >> - Addressed comments made on dload mode patch v6 version >> https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ >> >> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/ >> - Addressed most of the comments by Srini on v2 and refactored the minidump driver. >> - Added platform device support >> - Unregister region support. >> - Added update region for clients. >> - Added pending region support. >> - Modified the documentation guide accordingly. >> - Added qcom_pstore_ramdump client driver which happen to add ramoops platform >> device and also registers ramoops region with minidump. >> - Added download mode patch series with this minidump series. >> https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ >> >> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/ >> - Addressed review comment made by [quic_tsoni/bmasney] to add documentation. >> - Addressed comments made by [srinivas.kandagatla] >> - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore >> region in minidump. >> - Fixed issue reported by kernel test robot. >> >> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/ >> >> Testing of the patches has been done on sm8450 target after enabling config like >> CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up. >> >> echo mini > /sys/module/qcom_scm/parameters/download_mode >> >> Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and >> and put the device in download mode) on command prompt. >> >> Default storage type is set to via USB, so minidump would be downloaded with the >> help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has >> backed minidump boot firmware support. >> >> This will make the device go to download mode and collect the minidump on to the >> attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm >> package manager kit). >> >> After that we will see a bunch of predefined registered region as binary blobs files >> starts with md_* downloaded on the x86 machine on given location in PCAT tool from >> the target device, more about this can be found in qualcomm minidump guide patch. >> > > I tried to apply this series on top of 535a265d7f0dd50 (as suggested by > `b4 am -l -g`), but it conflicts on patch [04/17]. Please specify the > exact base commit or another series for which this series is based on. Apologies ! I just realized, it was 6.5-rc7, but let me rebase version of the series; Sorry, for all the reviewed done so far, i will definitely take care of them or reply. -Mukesh > > Thanks. >
On 09/09/2023 22:16, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined > data for first level of debugging on end user devices running on > Qualcomm SoCs. It is built on the premise that System on Chip (SoC) > or subsystem part of SoC crashes, due to a range of hardware and > software bugs. Hence, the ability to collect accurate data is only > a best-effort. The data collected could be invalid or corrupted, > data collection itself could fail, and so on. ... > +static int qcom_apss_md_table_init(struct minidump *md, > + struct minidump_subsystem *mdss_toc) > +{ > + struct minidump_ss_data *mdss_data; > + > + mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL); > + if (!mdss_data) > + return -ENOMEM; > + > + mdss_data->md_ss_toc = mdss_toc; > + mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES, > + sizeof(struct minidump_region), > + GFP_KERNEL); > + if (!mdss_data->md_regions) > + return -ENOMEM; > + > + mdss_toc = mdss_data->md_ss_toc; > + mdss_toc->regions_baseptr = cpu_to_le64(virt_to_phys(mdss_data->md_regions)); > + mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); > + mdss_toc->status = cpu_to_le32(1); > + mdss_toc->region_count = cpu_to_le32(0); > + > + /* Tell bootloader not to encrypt the regions of this subsystem */ > + mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); > + mdss_toc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); > + > + md->apss_data = mdss_data; > + > + return 0; > +} > + > +static int qcom_apss_minidump_probe(struct platform_device *pdev) > +{ > + struct minidump_global_toc *mdgtoc; > + struct minidump *md; > + size_t size; > + int ret; > + > + md = devm_kzalloc(&pdev->dev, sizeof(struct minidump), GFP_KERNEL); sizeof(*) Didn't you get such comments already? > + if (!md) > + return -ENOMEM; > + > + md->dev = &pdev->dev; > + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size); > + if (IS_ERR(mdgtoc)) { > + ret = PTR_ERR(mdgtoc); > + dev_err(md->dev, "Couldn't find minidump smem item: %d\n", ret); > + return ret; The syntax is: return dev_err_probe > + } > + > + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { > + dev_err(md->dev, "minidump table is not initialized: %d\n", ret); ret is uninitialized here. Please use automated tools for checking your code: coccinelle, smatch and sparse > + return -EINVAL; > + } > + > + mutex_init(&md->md_lock); > + ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]); > + if (ret) { > + dev_err(md->dev, "apss minidump initialization failed: %d\n", ret); > + return ret; > + } > + > + /* First entry would be ELF header */ > + ret = qcom_md_add_elfheader(md); > + if (ret) { > + dev_err(md->dev, "Failed to add elf header: %d\n", ret); > + memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem)); Why do you need it? > + return ret; > + } > + > + platform_set_drvdata(pdev, md); > + > + return ret; > +} > + > +static int qcom_apss_minidump_remove(struct platform_device *pdev) > +{ > + struct minidump *md = platform_get_drvdata(pdev); > + struct minidump_ss_data *mdss_data; > + > + mdss_data = md->apss_data; > + memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(struct minidump_subsystem)); Why do you need it? > + md = NULL; That's useless assignment. > + > + return 0; > +} > + > +static struct platform_driver qcom_minidump_driver = { > + .probe = qcom_apss_minidump_probe, > + .remove = qcom_apss_minidump_remove, > + .driver = { > + .name = "qcom-minidump-smem", > + }, > +}; > + > +module_platform_driver(qcom_minidump_driver); > + > +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); > +MODULE_LICENSE("GPL"); > +MODULE_ALIAS("platform:qcom-minidump-smem"); Add a proper ID table instead of re-inventing it with module aliases. Best regards, Krzysztof
On 09/09/2023 22:16, Mukesh Ojha wrote: > +/** > + * qcom_minidump_region_register() - Register region in APSS Minidump table. > + * @region: minidump region. > + * > + * Return: On success, it returns 0 and negative error value on failure. > + */ > +int qcom_minidump_region_register(const struct qcom_minidump_region *region) > +{ > + struct minidump *md; > + int ret; > + > + md = qcom_smem_minidump_ready(); > + if (!md) > + return -EPROBE_DEFER; > + > + if (!qcom_minidump_valid_region(region)) > + return -EINVAL; > + > + mutex_lock(&md->md_lock); > + ret = qcom_md_region_register(md, region); > + if (ret) > + goto unlock; > + > + qcom_md_update_elfheader(md, region); > +unlock: > + mutex_unlock(&md->md_lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(qcom_minidump_region_register); NAK, there is no user for this. Drop all exports from minidump drivers. Your upstream driver *must not* expose stuff to your vendor drivers. Best regards, Krzysztof
On 09/09/2023 22:16, Mukesh Ojha wrote: > static int qcom_apss_minidump_probe(struct platform_device *pdev) > { > struct minidump_global_toc *mdgtoc; > @@ -571,7 +688,10 @@ static int qcom_apss_minidump_probe(struct platform_device *pdev) > return ret; > } > > + mutex_lock(&md_plist.plock); > platform_set_drvdata(pdev, md); Why this is locked? > + qcom_apss_register_pending_regions(md); Why this one is locked? It seems ordering of your operations is not correct if you need to lock the providers probe(). > + mutex_unlock(&md_plist.plock); Best regards, Krzysztof
On 11/09/2023 17:39, Mukesh Ojha wrote: > > > On 9/11/2023 2:22 PM, Bagas Sanjaya wrote: >> On Sun, Sep 10, 2023 at 01:46:01AM +0530, Mukesh Ojha wrote: >>> Hi All, >>> >>> This is to continuation from the conversation happened at v4 >>> >>> https://lore.kernel.org/lkml/632c5b97-4a91-c3e8-1e6c-33d6c4f6454f@quicinc.com/ >>> >>> https://lore.kernel.org/lkml/695133e6-105f-de2a-5559-555cea0a0462@quicinc.com/ >>> >>> We have put abstract on LPC on this topic as well as initiated a mail thread >>> with other SoC vendors but did not get much traction on it. >>> >>> https://lore.kernel.org/lkml/0199db00-1b1d-0c63-58ff-03efae02cb21@quicinc.com/ >>> >>> We explored most of possiblity present in kernel to address this issue[1] but >>> solution like kdump/fadump does not seems safe/secure/performant from our >>> perspective. >>> >>> Hence, with this series we tried to make the minidump kernel driver, simple >>> and tied with pstore frontends, so that it collects the present available >>> frontends data like dmesg, ftrace, pmsg, ftrace., Also, we will be working >>> towards enhancing generic pstore to capture more debug data which will be >>> helpful for first hand of debugging that can benefit both other pstore users >>> as well as us as minidump users. >>> >>> One of the proposal made here, >>> https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/ >>> >>> Looking forward for your comments. >>> >>> Thanks, >>> Mukesh >>> >>> [1] >>> Minidump is a best effort mechanism to collect useful and predefined data >>> for first level of debugging on end user devices running on Qualcomm SoCs. >>> It is built on the premise that System on Chip (SoC) or subsystem part of >>> SoC crashes, due to a range of hardware and software bugs. Hence, the >>> ability to collect accurate data is only a best-effort. The data collected >>> could be invalid or corrupted, data collection itself could fail, and so on. >>> >>> Qualcomm devices in engineering mode provides a mechanism for generating >>> full system ramdumps for post mortem debugging. But in some cases it's >>> however not feasible to capture the entire content of RAM. The minidump >>> mechanism provides the means for selecting which snippets should be >>> included in the ramdump. >>> >>> The core of SMEM based minidump feature is part of Qualcomm's boot >>> firmware code. It initializes shared memory (SMEM), which is a part of >>> DDR and allocates a small section of SMEM to minidump table i.e also >>> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) >>> has their own table of segments to be included in the minidump and all >>> get their reference from G-ToC. Each segment/region has some details >>> like name, physical address and it's size etc. and it could be anywhere >>> scattered in the DDR. >>> >>> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM >>> based minidump feature for remoteproc instances like ADSP, MODEM, ... >>> where predefined selective segments of subsystem region can be dumped >>> as part of coredump collection which generates smaller size artifacts >>> compared to complete coredump of subsystem on crash. >>> >>> [1] >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142 >>> >>> In addition to managing and querying the APSS minidump description, >>> the Linux driver maintains a ELF header in a segment. This segment >>> gets updated with section/program header whenever a new entry gets >>> registered. >>> >>> Changes in v5: >>> - On suggestion from Pavan.k, to have single function call for minidump collection >>> from remoteproc driver, separated the logic to have separate minidump file called >>> qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to >>> qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion >>> during region unregister in this series, will pursue it in next series. >>> >>> - To simplify the minidump driver, removed the complication for frontend and different >>> backend from Greg suggestion, will pursue this once main driver gets mainlined. >>> >>> - Move the dynamic ramoops region allocation from Device tree approach to command line >>> approch with the introduction command line parsing and memblock reservation during >>> early boot up; Not added documentation about it yet, will add if it gets positive >>> response. >>> >>> - Exporting linux banner from kernel to make minidump build also as module, however, >>> minidump is a debug module and should be kernel built to get most debug information >>> from kernel. >>> >>> - Tried to address comments given on dload patch series. >>> >>> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/ >>> - Redesigned the driver and divided the driver into front end and backend (smem) so >>> that any new backend can be attached easily to avoid code duplication. >>> - Patch reordering as per the driver and subsystem to easier review of the code. >>> - Removed minidump specific code from remoteproc to minidump smem based driver. >>> - Enabled the all the driver as modules. >>> - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad] >>> - Address comments made qcom_pstore_minidump driver and given its Device tree >>> same set of properties as ramoops. [Luca/Kees] >>> - Added patch for MAINTAINER file. >>> - Include defconfig change as one patch as per [Krzysztof] suggestion. >>> - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion. >>> - Addressed comments made on dload mode patch v6 version >>> https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ >>> >>> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/ >>> - Addressed most of the comments by Srini on v2 and refactored the minidump driver. >>> - Added platform device support >>> - Unregister region support. >>> - Added update region for clients. >>> - Added pending region support. >>> - Modified the documentation guide accordingly. >>> - Added qcom_pstore_ramdump client driver which happen to add ramoops platform >>> device and also registers ramoops region with minidump. >>> - Added download mode patch series with this minidump series. >>> https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ >>> >>> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/ >>> - Addressed review comment made by [quic_tsoni/bmasney] to add documentation. >>> - Addressed comments made by [srinivas.kandagatla] >>> - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore >>> region in minidump. >>> - Fixed issue reported by kernel test robot. >>> >>> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/ >>> >>> Testing of the patches has been done on sm8450 target after enabling config like >>> CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up. >>> >>> echo mini > /sys/module/qcom_scm/parameters/download_mode >>> >>> Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and >>> and put the device in download mode) on command prompt. >>> >>> Default storage type is set to via USB, so minidump would be downloaded with the >>> help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has >>> backed minidump boot firmware support. >>> >>> This will make the device go to download mode and collect the minidump on to the >>> attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm >>> package manager kit). >>> >>> After that we will see a bunch of predefined registered region as binary blobs files >>> starts with md_* downloaded on the x86 machine on given location in PCAT tool from >>> the target device, more about this can be found in qualcomm minidump guide patch. >>> >> >> I tried to apply this series on top of 535a265d7f0dd50 (as suggested by >> `b4 am -l -g`), but it conflicts on patch [04/17]. Please specify the >> exact base commit or another series for which this series is based on. > > Apologies ! > I just realized, it was 6.5-rc7, but let me rebase version of the series; > > Sorry, for all the reviewed done so far, i will definitely take care of them or reply. > OK, see you in v6!
On 9/9/2023 1:16 PM, Mukesh Ojha wrote: > +/** > + * struct minidump_pregion - Minidump pending region > + * @list : Pending region list pointer > + * @region : APSS minidump client region does kernel-doc parse this correctly? should not be whitespace between @ID and ":" refer to <https://static.lwn.net/kerneldoc/doc-guide/kernel-doc.html#structure-union-and-enumeration-documentation>
On 9/11/2023 4:07 AM, Krzysztof Kozlowski wrote: > On 09/09/2023 22:16, Mukesh Ojha wrote: >> +/** >> + * qcom_minidump_region_register() - Register region in APSS Minidump table. >> + * @region: minidump region. >> + * >> + * Return: On success, it returns 0 and negative error value on failure. >> + */ >> +int qcom_minidump_region_register(const struct qcom_minidump_region *region) >> +{ >> + struct minidump *md; >> + int ret; >> + >> + md = qcom_smem_minidump_ready(); >> + if (!md) >> + return -EPROBE_DEFER; >> + >> + if (!qcom_minidump_valid_region(region)) >> + return -EINVAL; >> + >> + mutex_lock(&md->md_lock); >> + ret = qcom_md_region_register(md, region); >> + if (ret) >> + goto unlock; >> + >> + qcom_md_update_elfheader(md, region); >> +unlock: >> + mutex_unlock(&md->md_lock); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register); > > NAK, there is no user for this. > > Drop all exports from minidump drivers. Your upstream driver *must not* > expose stuff to your vendor drivers. Do we not expect that upstream drivers would want to register? Mind you, in the downstream code the following was a bad limitation: #define MAX_NUM_OF_SS 10
On 11.09.2023 21:09, Jeff Johnson wrote: > On 9/11/2023 4:07 AM, Krzysztof Kozlowski wrote: >> On 09/09/2023 22:16, Mukesh Ojha wrote: >>> +/** >>> + * qcom_minidump_region_register() - Register region in APSS Minidump table. >>> + * @region: minidump region. >>> + * >>> + * Return: On success, it returns 0 and negative error value on failure. >>> + */ >>> +int qcom_minidump_region_register(const struct qcom_minidump_region *region) >>> +{ >>> + struct minidump *md; >>> + int ret; >>> + >>> + md = qcom_smem_minidump_ready(); >>> + if (!md) >>> + return -EPROBE_DEFER; >>> + >>> + if (!qcom_minidump_valid_region(region)) >>> + return -EINVAL; >>> + >>> + mutex_lock(&md->md_lock); >>> + ret = qcom_md_region_register(md, region); >>> + if (ret) >>> + goto unlock; >>> + >>> + qcom_md_update_elfheader(md, region); >>> +unlock: >>> + mutex_unlock(&md->md_lock); >>> + return ret; >>> +} >>> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register); >> >> NAK, there is no user for this. >> >> Drop all exports from minidump drivers. Your upstream driver *must not* >> expose stuff to your vendor drivers. > > Do we not expect that upstream drivers would want to register? > Mind you, in the downstream code the following was a bad limitation: > #define MAX_NUM_OF_SS 10 No, Krzysztof meant that you are not allowed to export symbols without immediately providing a user for them - meaning if the functions are not going to be used upstream, this change will not be accepted. Konrad
Thanks for your time in reviewing this. On 9/11/2023 4:31 PM, Krzysztof Kozlowski wrote: > On 09/09/2023 22:16, Mukesh Ojha wrote: >> Minidump is a best effort mechanism to collect useful and predefined >> data for first level of debugging on end user devices running on >> Qualcomm SoCs. It is built on the premise that System on Chip (SoC) >> or subsystem part of SoC crashes, due to a range of hardware and >> software bugs. Hence, the ability to collect accurate data is only >> a best-effort. The data collected could be invalid or corrupted, >> data collection itself could fail, and so on. > > ... > >> +static int qcom_apss_md_table_init(struct minidump *md, >> + struct minidump_subsystem *mdss_toc) >> +{ >> + struct minidump_ss_data *mdss_data; >> + >> + mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL); >> + if (!mdss_data) >> + return -ENOMEM; >> + >> + mdss_data->md_ss_toc = mdss_toc; >> + mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES, >> + sizeof(struct minidump_region), >> + GFP_KERNEL); >> + if (!mdss_data->md_regions) >> + return -ENOMEM; >> + >> + mdss_toc = mdss_data->md_ss_toc; >> + mdss_toc->regions_baseptr = cpu_to_le64(virt_to_phys(mdss_data->md_regions)); >> + mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); >> + mdss_toc->status = cpu_to_le32(1); >> + mdss_toc->region_count = cpu_to_le32(0); >> + >> + /* Tell bootloader not to encrypt the regions of this subsystem */ >> + mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); >> + mdss_toc->encryption_required = cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); >> + >> + md->apss_data = mdss_data; >> + >> + return 0; >> +} >> + >> +static int qcom_apss_minidump_probe(struct platform_device *pdev) >> +{ >> + struct minidump_global_toc *mdgtoc; >> + struct minidump *md; >> + size_t size; >> + int ret; >> + >> + md = devm_kzalloc(&pdev->dev, sizeof(struct minidump), GFP_KERNEL); > > sizeof(*) > > Didn't you get such comments already? Ok, will fix this, no i have not got such comments as of yet. Any reason of using this way? > > >> + if (!md) >> + return -ENOMEM; >> + >> + md->dev = &pdev->dev; >> + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, &size); >> + if (IS_ERR(mdgtoc)) { >> + ret = PTR_ERR(mdgtoc); >> + dev_err(md->dev, "Couldn't find minidump smem item: %d\n", ret); >> + return ret; > > The syntax is: > return dev_err_probe ACK. > >> + } >> + >> + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { >> + dev_err(md->dev, "minidump table is not initialized: %d\n", ret); > > ret is uninitialized here. Please use automated tools for checking your > code: > coccinelle, smatch and sparse Thanks. > >> + return -EINVAL; >> + } >> + >> + mutex_init(&md->md_lock); >> + ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]); >> + if (ret) { >> + dev_err(md->dev, "apss minidump initialization failed: %d\n", ret); >> + return ret; >> + } >> + >> + /* First entry would be ELF header */ >> + ret = qcom_md_add_elfheader(md); >> + if (ret) { >> + dev_err(md->dev, "Failed to add elf header: %d\n", ret); >> + memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem)); > > Why do you need it? Earlier, i got comment about clearing the SS TOC(subsystem table of content) which is shared with other SS and it will have stale values. > >> + return ret; >> + } >> + >> + platform_set_drvdata(pdev, md); >> + >> + return ret; >> +} >> + >> +static int qcom_apss_minidump_remove(struct platform_device *pdev) >> +{ >> + struct minidump *md = platform_get_drvdata(pdev); >> + struct minidump_ss_data *mdss_data; >> + >> + mdss_data = md->apss_data; >> + memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(struct minidump_subsystem)); > > Why do you need it? Same as above. > >> + md = NULL; > > That's useless assignment. Ok. > >> + >> + return 0; >> +} >> + >> +static struct platform_driver qcom_minidump_driver = { >> + .probe = qcom_apss_minidump_probe, >> + .remove = qcom_apss_minidump_remove, >> + .driver = { >> + .name = "qcom-minidump-smem", >> + }, >> +}; >> + >> +module_platform_driver(qcom_minidump_driver); >> + >> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); >> +MODULE_LICENSE("GPL"); >> +MODULE_ALIAS("platform:qcom-minidump-smem"); > > Add a proper ID table instead of re-inventing it with module aliases. Ok. -Mukesh > > Best regards, > Krzysztof >
On 12/09/2023 11:26, Mukesh Ojha wrote: >> >>> + return -EINVAL; >>> + } >>> + >>> + mutex_init(&md->md_lock); >>> + ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]); >>> + if (ret) { >>> + dev_err(md->dev, "apss minidump initialization failed: %d\n", ret); >>> + return ret; >>> + } >>> + >>> + /* First entry would be ELF header */ >>> + ret = qcom_md_add_elfheader(md); >>> + if (ret) { >>> + dev_err(md->dev, "Failed to add elf header: %d\n", ret); >>> + memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem)); >> >> Why do you need it? > > Earlier, i got comment about clearing the SS TOC(subsystem table of > content) which is shared with other SS and it will have stale values. OK, but then the entire code is poorly readable. First, any cleanup of qcom_apss_md_table_init() should be named similarly, e.g. qcom_apss_md_table_clean() or qcom_apss_md_table_exit() or whatever seems feasible. Second, shouldn't writing to shared memory be the last step? Step which cannot fail and there is no cleanup afterwards (like platform_set_drvdata)? I don't enjoy looking at this interface... Best regards, Krzysztof
On 9/12/2023 2:56 PM, Mukesh Ojha wrote: > Thanks for your time in reviewing this. > > On 9/11/2023 4:31 PM, Krzysztof Kozlowski wrote: >> On 09/09/2023 22:16, Mukesh Ojha wrote: >>> Minidump is a best effort mechanism to collect useful and predefined >>> data for first level of debugging on end user devices running on >>> Qualcomm SoCs. It is built on the premise that System on Chip (SoC) >>> or subsystem part of SoC crashes, due to a range of hardware and >>> software bugs. Hence, the ability to collect accurate data is only >>> a best-effort. The data collected could be invalid or corrupted, >>> data collection itself could fail, and so on. >> >> ... >> >>> +static int qcom_apss_md_table_init(struct minidump *md, >>> + struct minidump_subsystem *mdss_toc) >>> +{ >>> + struct minidump_ss_data *mdss_data; >>> + >>> + mdss_data = devm_kzalloc(md->dev, sizeof(*mdss_data), GFP_KERNEL); >>> + if (!mdss_data) >>> + return -ENOMEM; >>> + >>> + mdss_data->md_ss_toc = mdss_toc; >>> + mdss_data->md_regions = devm_kcalloc(md->dev, MAX_NUM_ENTRIES, >>> + sizeof(struct minidump_region), >>> + GFP_KERNEL); >>> + if (!mdss_data->md_regions) >>> + return -ENOMEM; >>> + >>> + mdss_toc = mdss_data->md_ss_toc; >>> + mdss_toc->regions_baseptr = >>> cpu_to_le64(virt_to_phys(mdss_data->md_regions)); >>> + mdss_toc->enabled = cpu_to_le32(MINIDUMP_SS_ENABLED); >>> + mdss_toc->status = cpu_to_le32(1); >>> + mdss_toc->region_count = cpu_to_le32(0); >>> + >>> + /* Tell bootloader not to encrypt the regions of this subsystem */ >>> + mdss_toc->encryption_status = cpu_to_le32(MINIDUMP_SS_ENCR_DONE); >>> + mdss_toc->encryption_required = >>> cpu_to_le32(MINIDUMP_SS_ENCR_NOTREQ); >>> + >>> + md->apss_data = mdss_data; >>> + >>> + return 0; >>> +} >>> + >>> +static int qcom_apss_minidump_probe(struct platform_device *pdev) >>> +{ >>> + struct minidump_global_toc *mdgtoc; >>> + struct minidump *md; >>> + size_t size; >>> + int ret; >>> + >>> + md = devm_kzalloc(&pdev->dev, sizeof(struct minidump), GFP_KERNEL); >> >> sizeof(*) >> >> Didn't you get such comments already? > > Ok, will fix this, no i have not got such comments as of yet. > Any reason of using this way? Got the reason, thanks. -Mukesh > >> >> >>> + if (!md) >>> + return -ENOMEM; >>> + >>> + md->dev = &pdev->dev; >>> + mdgtoc = qcom_smem_get(QCOM_SMEM_HOST_ANY, SBL_MINIDUMP_SMEM_ID, >>> &size); >>> + if (IS_ERR(mdgtoc)) { >>> + ret = PTR_ERR(mdgtoc); >>> + dev_err(md->dev, "Couldn't find minidump smem item: %d\n", >>> ret); >>> + return ret; >> >> The syntax is: >> return dev_err_probe > > ACK. > >> >>> + } >>> + >>> + if (size < sizeof(*mdgtoc) || !mdgtoc->status) { >>> + dev_err(md->dev, "minidump table is not initialized: %d\n", >>> ret); >> >> ret is uninitialized here. Please use automated tools for checking your >> code: >> coccinelle, smatch and sparse > > Thanks. > >> >>> + return -EINVAL; >>> + } >>> + >>> + mutex_init(&md->md_lock); >>> + ret = qcom_apss_md_table_init(md, >>> &mdgtoc->subsystems[MINIDUMP_APSS_DESC]); >>> + if (ret) { >>> + dev_err(md->dev, "apss minidump initialization failed: >>> %d\n", ret); >>> + return ret; >>> + } >>> + >>> + /* First entry would be ELF header */ >>> + ret = qcom_md_add_elfheader(md); >>> + if (ret) { >>> + dev_err(md->dev, "Failed to add elf header: %d\n", ret); >>> + memset(md->apss_data->md_ss_toc, 0, sizeof(struct >>> minidump_subsystem)); >> >> Why do you need it? > > Earlier, i got comment about clearing the SS TOC(subsystem table of > content) which is shared with other SS and it will have stale values. > >> >>> + return ret; >>> + } >>> + >>> + platform_set_drvdata(pdev, md); >>> + >>> + return ret; >>> +} >>> + >>> +static int qcom_apss_minidump_remove(struct platform_device *pdev) >>> +{ >>> + struct minidump *md = platform_get_drvdata(pdev); >>> + struct minidump_ss_data *mdss_data; >>> + >>> + mdss_data = md->apss_data; >>> + memset(mdss_data->md_ss_toc, cpu_to_le32(0), sizeof(struct >>> minidump_subsystem)); >> >> Why do you need it? > > Same as above. > >> >>> + md = NULL; >> >> That's useless assignment. > > Ok. > >> >>> + >>> + return 0; >>> +} >>> + >>> +static struct platform_driver qcom_minidump_driver = { >>> + .probe = qcom_apss_minidump_probe, >>> + .remove = qcom_apss_minidump_remove, >>> + .driver = { >>> + .name = "qcom-minidump-smem", >>> + }, >>> +}; >>> + >>> +module_platform_driver(qcom_minidump_driver); >>> + >>> +MODULE_DESCRIPTION("Qualcomm APSS minidump driver"); >>> +MODULE_LICENSE("GPL"); >>> +MODULE_ALIAS("platform:qcom-minidump-smem"); >> >> Add a proper ID table instead of re-inventing it with module aliases. > > Ok. > > -Mukesh > >> >> Best regards, >> Krzysztof >>
On 9/12/2023 3:24 PM, Krzysztof Kozlowski wrote: > On 12/09/2023 11:26, Mukesh Ojha wrote: >>> >>>> + return -EINVAL; >>>> + } >>>> + >>>> + mutex_init(&md->md_lock); >>>> + ret = qcom_apss_md_table_init(md, &mdgtoc->subsystems[MINIDUMP_APSS_DESC]); >>>> + if (ret) { >>>> + dev_err(md->dev, "apss minidump initialization failed: %d\n", ret); >>>> + return ret; >>>> + } >>>> + >>>> + /* First entry would be ELF header */ >>>> + ret = qcom_md_add_elfheader(md); >>>> + if (ret) { >>>> + dev_err(md->dev, "Failed to add elf header: %d\n", ret); >>>> + memset(md->apss_data->md_ss_toc, 0, sizeof(struct minidump_subsystem)); >>> >>> Why do you need it? >> >> Earlier, i got comment about clearing the SS TOC(subsystem table of >> content) which is shared with other SS and it will have stale values. > > OK, but then the entire code is poorly readable. First, any cleanup of > qcom_apss_md_table_init() should be named similarly, e.g. > qcom_apss_md_table_clean() or qcom_apss_md_table_exit() or whatever > seems feasible. ACK on this. > > Second, shouldn't writing to shared memory be the last step? Step which > cannot fail and there is no cleanup afterwards (like > platform_set_drvdata)? I don't enjoy looking at this interface... It can be done, if i shift adding elf header as first thing to first caller of qcom_minidump_region_register() but then i would have to remove qcom_ramoops_minidump() from this probe in 11/17 patch. -Mukesh > > > > Best regards, > Krzysztof >
On 9/12/2023 12:39 AM, Jeff Johnson wrote: > On 9/11/2023 4:07 AM, Krzysztof Kozlowski wrote: >> On 09/09/2023 22:16, Mukesh Ojha wrote: >>> +/** >>> + * qcom_minidump_region_register() - Register region in APSS >>> Minidump table. >>> + * @region: minidump region. >>> + * >>> + * Return: On success, it returns 0 and negative error value on >>> failure. >>> + */ >>> +int qcom_minidump_region_register(const struct qcom_minidump_region >>> *region) >>> +{ >>> + struct minidump *md; >>> + int ret; >>> + >>> + md = qcom_smem_minidump_ready(); >>> + if (!md) >>> + return -EPROBE_DEFER; >>> + >>> + if (!qcom_minidump_valid_region(region)) >>> + return -EINVAL; >>> + >>> + mutex_lock(&md->md_lock); >>> + ret = qcom_md_region_register(md, region); >>> + if (ret) >>> + goto unlock; >>> + >>> + qcom_md_update_elfheader(md, region); >>> +unlock: >>> + mutex_unlock(&md->md_lock); >>> + return ret; >>> +} >>> +EXPORT_SYMBOL_GPL(qcom_minidump_region_register); >> >> NAK, there is no user for this. >> >> Drop all exports from minidump drivers. Your upstream driver *must not* >> expose stuff to your vendor drivers. > > Do we not expect that upstream drivers would want to register? As of current version of patch, there is no user. Let's avoid till any upstream QCOM driver uses it . > Mind you, in the downstream code the following was a bad limitation: > #define MAX_NUM_OF_SS 10 I don't think, there is any problem with above macro, instead there is restriction on total number of APSS region can be registered. #define MAX_NUM_ENTRIES 201 -Mukesh > >