Message ID | 20210708120656.663851-1-thara.gopinath@linaro.org |
---|---|
Headers | show |
Series | Introduce LMh driver for Qualcomm SoCs | expand |
On Thu 08 Jul 07:06 CDT 2021, Thara Gopinath wrote: > Driver enabling various pieces of Limits Management Hardware(LMh) for cpu > cluster0 and cpu cluster1 namely kick starting monitoring of temperature, > current, battery current violations, enabling reliability algorithm and > setting up various temperature limits. > > The following has been explained in the cover letter. I am including this > here so that this remains in the commit message as well. > > LMh is a hardware infrastructure on some Qualcomm SoCs that can enforce > temperature and current limits as programmed by software for certain IPs > like CPU. On many newer LMh is configured by firmware/TZ and no programming > is needed from the kernel side. But on certain SoCs like sdm845 the > firmware does not do a complete programming of the h/w. On such soc's > kernel software has to explicitly set up the temperature limits and turn on > various monitoring and enforcing algorithms on the hardware. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > > v2->v3: > - Rearranged enabling of various LMh subfunction and removed returning > on error in enabling any one subfunction as the different pieces can > operate and thus be enabled independently. > - Other minor cosmetic fixes. > > v1->v2: > - Cosmetic and spelling fixes from review comments from Randy Dunlap > - Added irq_disable to lmh_irq_ops and removed disabling of irq from > lmh_handle_irq. Now cpufreq explicitly disables irq prior to > handling it as per Bjorn's suggestion. > - Rebased to new version of qcom_scm_lmh_dcvsh as changed in patch 1. > - Removed generic dt compatible string and introduced platform specific one > as per Bjorn's suggestion. > - Take arm, low and high temp thresholds for LMh from dt properties instead of > #defines in the driver as per Daniel's suggestion. > - Other minor fixes. > drivers/thermal/qcom/Kconfig | 10 ++ > drivers/thermal/qcom/Makefile | 1 + > drivers/thermal/qcom/lmh.c | 239 ++++++++++++++++++++++++++++++++++ > 3 files changed, 250 insertions(+) > create mode 100644 drivers/thermal/qcom/lmh.c > > diff --git a/drivers/thermal/qcom/Kconfig b/drivers/thermal/qcom/Kconfig > index 8d5ac2df26dc..7d942f71e532 100644 > --- a/drivers/thermal/qcom/Kconfig > +++ b/drivers/thermal/qcom/Kconfig > @@ -31,3 +31,13 @@ config QCOM_SPMI_TEMP_ALARM > trip points. The temperature reported by the thermal sensor reflects the > real time die temperature if an ADC is present or an estimate of the > temperature based upon the over temperature stage value. > + > +config QCOM_LMH > + tristate "Qualcomm Limits Management Hardware" > + depends on ARCH_QCOM > + help > + This enables initialization of Qualcomm limits management > + hardware(LMh). LMh allows for hardware-enforced mitigation for cpus based on > + input from temperature and current sensors. On many newer Qualcomm SoCs > + LMh is configured in the firmware and this feature need not be enabled. > + However, on certain SoCs like sdm845 LMh has to be configured from kernel. > diff --git a/drivers/thermal/qcom/Makefile b/drivers/thermal/qcom/Makefile > index 252ea7d9da0b..0fa2512042e7 100644 > --- a/drivers/thermal/qcom/Makefile > +++ b/drivers/thermal/qcom/Makefile > @@ -5,3 +5,4 @@ qcom_tsens-y += tsens.o tsens-v2.o tsens-v1.o tsens-v0_1.o \ > tsens-8960.o > obj-$(CONFIG_QCOM_SPMI_ADC_TM5) += qcom-spmi-adc-tm5.o > obj-$(CONFIG_QCOM_SPMI_TEMP_ALARM) += qcom-spmi-temp-alarm.o > +obj-$(CONFIG_QCOM_LMH) += lmh.o > diff --git a/drivers/thermal/qcom/lmh.c b/drivers/thermal/qcom/lmh.c > new file mode 100644 > index 000000000000..a7b1eb308642 > --- /dev/null > +++ b/drivers/thermal/qcom/lmh.c > @@ -0,0 +1,239 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +/* > + * Copyright (C) 2021, Linaro Limited. All rights reserved. > + */ > +#include <linux/module.h> > +#include <linux/interrupt.h> > +#include <linux/irqdomain.h> > +#include <linux/err.h> > +#include <linux/platform_device.h> > +#include <linux/of_platform.h> > +#include <linux/slab.h> > +#include <linux/qcom_scm.h> > + > +#define LMH_NODE_DCVS 0x44435653 > +#define LMH_CLUSTER0_NODE_ID 0x6370302D > +#define LMH_CLUSTER1_NODE_ID 0x6370312D > + > +#define LMH_SUB_FN_THERMAL 0x54484D4C > +#define LMH_SUB_FN_CRNT 0x43524E54 > +#define LMH_SUB_FN_REL 0x52454C00 > +#define LMH_SUB_FN_BCL 0x42434C00 > + > +#define LMH_ALGO_MODE_ENABLE 0x454E424C > +#define LMH_TH_HI_THRESHOLD 0x48494748 > +#define LMH_TH_LOW_THRESHOLD 0x4C4F5700 > +#define LMH_TH_ARM_THRESHOLD 0x41524D00 > + > +#define LMH_REG_DCVS_INTR_CLR 0x8 > + > +struct lmh_hw_data { > + void __iomem *base; > + struct irq_domain *domain; > + int irq; > + u32 cpu_id; cpu_id seems to only be used in lmh_probe(), how about making it a local variable? > +}; > + > +static irqreturn_t lmh_handle_irq(int hw_irq, void *data) > +{ > + struct lmh_hw_data *lmh_data = data; > + int irq = irq_find_mapping(lmh_data->domain, 0); > + > + /* Call the cpufreq driver to handle the interrupt */ > + if (irq) > + generic_handle_irq(irq); > + > + return 0; > +} > + > +static void lmh_enable_interrupt(struct irq_data *d) > +{ > + struct lmh_hw_data *lmh_data = irq_data_get_irq_chip_data(d); > + > + /* Clear the existing interrupt */ > + writel(0xff, lmh_data->base + LMH_REG_DCVS_INTR_CLR); > + enable_irq(lmh_data->irq); > +} > + > +static void lmh_disable_interrupt(struct irq_data *d) > +{ > + struct lmh_hw_data *lmh_data = irq_data_get_irq_chip_data(d); > + > + disable_irq_nosync(lmh_data->irq); > +} > + > +static struct irq_chip lmh_irq_chip = { > + .name = "lmh", > + .irq_enable = lmh_enable_interrupt, > + .irq_disable = lmh_disable_interrupt > +}; > + > +static int lmh_irq_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) > +{ > + struct lmh_hw_data *lmh_data = d->host_data; > + > + irq_set_chip_and_handler(irq, &lmh_irq_chip, handle_simple_irq); > + irq_set_chip_data(irq, lmh_data); > + > + return 0; > +} > + > +static const struct irq_domain_ops lmh_irq_ops = { > + .map = lmh_irq_map, > + .xlate = irq_domain_xlate_onecell, > +}; > + > +static int lmh_probe(struct platform_device *pdev) > +{ > + struct device *dev; > + struct device_node *np; > + struct lmh_hw_data *lmh_data; > + u32 node_id; > + int temp_low, temp_high, temp_arm, ret; > + > + dev = &pdev->dev; > + np = dev->of_node; How about initialize these as you declare you variables? > + if (!np) There's no reasonable way to probe this driver with !dev->of_node, so you can skip this check. > + return -EINVAL; > + > + lmh_data = devm_kzalloc(dev, sizeof(*lmh_data), GFP_KERNEL); > + if (!lmh_data) > + return -ENOMEM; > + > + lmh_data->base = devm_platform_ioremap_resource(pdev, 0); > + if (IS_ERR(lmh_data->base)) > + return PTR_ERR(lmh_data->base); > + > + ret = of_property_read_u32(np, "qcom,lmh-cpu-id", &lmh_data->cpu_id); > + if (ret) { > + dev_err(dev, "missing qcom,lmh-cpu-id property\n"); > + return ret; > + } > + > + ret = of_property_read_u32(np, "qcom,lmh-temperature-high", &temp_high); > + if (ret) { > + dev_err(dev, "missing qcom,lmh-temperature-high property\n"); > + return ret; > + } > + > + ret = of_property_read_u32(np, "qcom,lmh-temperature-low", &temp_low); > + if (ret) { > + dev_err(dev, "missing qcom,lmh-temperature-low property\n"); > + return ret; > + } > + > + ret = of_property_read_u32(np, "qcom,lmh-temperature-arm", &temp_arm); > + if (ret) { > + dev_err(dev, "missing qcom,lmh-temperature-arm property\n"); > + return ret; > + } > + > + /* > + * Only sdm845 has lmh hardware currently enabled from hlos. If this is needed > + * for other platforms, revisit this to check if the <cpu-id, node-id> should be part > + * of a dt match table. > + */ > + if (lmh_data->cpu_id == 0) { > + node_id = LMH_CLUSTER0_NODE_ID; > + } else if (lmh_data->cpu_id == 4) { > + node_id = LMH_CLUSTER1_NODE_ID; > + } else { > + dev_err(dev, "Wrong CPU id associated with LMh node\n"); > + return -EINVAL; > + } > + > + platform_set_drvdata(pdev, lmh_data); I don't see any get_drvdat(), so you can probably skip this? > + > + if (!qcom_scm_lmh_dcvsh_available()) > + return -EINVAL; > + > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_CRNT, LMH_ALGO_MODE_ENABLE, 1, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) > + dev_err(dev, "Error %d enabling current subfunction\n", ret); > + > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_REL, LMH_ALGO_MODE_ENABLE, 1, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) > + dev_err(dev, "Error %d enabling reliability subfunction\n", ret); > + > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_BCL, LMH_ALGO_MODE_ENABLE, 1, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) > + dev_err(dev, "Error %d enabling BCL subfunction\n", ret); > + > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_ALGO_MODE_ENABLE, 1, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) { > + dev_err(dev, "Error %d enabling thermal subfunction\n", ret); > + return ret; > + } > + > + ret = qcom_scm_lmh_profile_change(0x1); > + if (ret) { > + dev_err(dev, "Error %d changing profile\n", ret); > + return ret; > + } > + > + /* Set default thermal trips */ > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_TH_ARM_THRESHOLD, temp_arm, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) { > + dev_err(dev, "Error setting thermal ARM threshold%d\n", ret); > + return ret; > + } > + > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_TH_HI_THRESHOLD, temp_high, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) { > + dev_err(dev, "Error setting thermal HI threshold%d\n", ret); > + return ret; > + } > + > + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_TH_LOW_THRESHOLD, temp_low, > + LMH_NODE_DCVS, node_id, 0); > + if (ret) { > + dev_err(dev, "Error setting thermal ARM threshold%d\n", ret); > + return ret; > + } > + > + lmh_data->irq = platform_get_irq(pdev, 0); > + lmh_data->domain = irq_domain_add_linear(np, 1, &lmh_irq_ops, lmh_data); > + if (!lmh_data->domain) { > + dev_err(dev, "Error adding irq_domain\n"); > + return -EINVAL; > + } > + As written now, you might get interrupts before you get to disable_irq() below. Instead of the disable_irq() you can add this before request_irq: irq_set_status_flags(lmh_dat->irq, IRQ_NOAUTOEN); > + ret = devm_request_irq(dev, lmh_data->irq, lmh_handle_irq, > + IRQF_TRIGGER_HIGH | IRQF_ONESHOT | IRQF_NO_SUSPEND, Skip IRQF_TRIGGER_HIGH, as the flags will be merged with the properties from DT. > + "lmh-irq", lmh_data); > + if (ret) { > + dev_err(dev, "Error %d registering irq %x\n", ret, lmh_data->irq); > + irq_domain_remove(lmh_data->domain); > + return ret; > + } > + > + /* Disable the irq and let cpufreq enable it when ready to handle the interrupt */ > + disable_irq(lmh_data->irq); > + > + return 0; > +} > + > +static const struct of_device_id lmh_table[] = { > + { .compatible = "qcom,sdm845-lmh", }, > + {} > +}; > +MODULE_DEVICE_TABLE(of, lmh_table); > + > +static struct platform_driver lmh_driver = { > + .probe = lmh_probe, I think you at least need to irq_domain_remove() during .remove, but unless we have a clear understanding about how to stop the algorithm (without causing harmful side effects) it might be better to add .suppress_bind_attrs = true in .driver... Regards, Bjorn > + .driver = { > + .name = "qcom-lmh", > + .of_match_table = lmh_table, > + }, > +}; > +module_platform_driver(lmh_driver); > + > +MODULE_LICENSE("GPL v2"); > +MODULE_DESCRIPTION("QCOM LMh driver"); > -- > 2.25.1 >
On Thu 08 Jul 07:06 CDT 2021, Thara Gopinath wrote: > Add dt binding documentation to describe Qualcomm > Limits Management Hardware node. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > .../devicetree/bindings/thermal/qcom-lmh.yaml | 100 ++++++++++++++++++ > 1 file changed, 100 insertions(+) > create mode 100644 Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > > diff --git a/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > new file mode 100644 > index 000000000000..7f62bd3d543d > --- /dev/null > +++ b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > @@ -0,0 +1,100 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +# Copyright 2021 Linaro Ltd. > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/thermal/qcom-lmh.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Qualcomm Limits Management Hardware(LMh) > + > +maintainers: > + - Thara Gopinath <thara.gopinath@linaro.org> > + > +description: > + Limits Management Hardware(LMh) is a hardware infrastructure on some > + Qualcomm SoCs that can enforce temperature and current limits as > + programmed by software for certain IPs like CPU. > + > +properties: > + compatible: > + enum: > + - qcom,sdm845-lmh > + > + reg: > + items: > + - description: core registers > + > + interrupts: > + maxItems: 1 > + > + '#interrupt-cells': > + const: 1 > + > + interrupt-controller: true > + > + qcom,lmh-cpu-id: > + description: > + CPU id of the first cpu in the LMh cluster > + $ref: /schemas/types.yaml#/definitions/uint32 > + > + qcom,lmh-temperature-arm: > + description: > + An integer expressing temperature threshold in millicelsius at which > + the LMh thermal FSM is engaged. Do we know (by any public source) what "arm", "low" and "high" means beyond that they somehow pokes the state machine? > + $ref: /schemas/types.yaml#/definitions/int32 > + > + qcom,lmh-temperature-low: > + description: > + An integer expressing temperature threshold in millicelsius at which > + the LMh thermal FSM is engaged. > + $ref: /schemas/types.yaml#/definitions/int32 > + > + qcom,lmh-temperature-high: > + description: > + An integer expressing temperature threshold in millicelsius at which > + the LMh thermal FSM is engaged. > + $ref: /schemas/types.yaml#/definitions/int32 > + > +required: > + - compatible > + - reg > + - interrupts > + - #interrupt-cells > + - interrupt-controller > + - qcom,lmh-cpu-id > + - qcom,lmh-temperature-arm > + - qcom,lmh-temperature-low > + - qcom,lmh-temperature-high > + > +additionalProperties: false > + > +examples: > + - | > + #include <dt-bindings/interrupt-controller/arm-gic.h> > + #include <dt-bindings/clock/qcom,rpmh.h> > + #include <dt-bindings/interconnect/qcom,sdm845.h> I don't see why you need qcom,rpmh.h or the interconnect include in this example. > + > + lmh_cluster1: lmh@17d70800 { > + compatible = "qcom,sdm845-lmh"; > + reg = <0 0x17d70800 0 0x401>; #address- and #size-cells are 1 in the wrapper that validates the examples, so drop the two zeros. > + interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>; > + qcom,lmh-cpu-id = <0x4>; > + qcom,lmh-temperature-arm = <65000>; > + qcom,lmh-temperature-low = <94500>; > + qcom,lmh-temperature-high = <95000>; > + interrupt-controller; > + #interrupt-cells = <1>; > + }; > + - | This is a different example from the one above, if you intended that, don't you need the #include of arm-gic.h here as well? Regards, Bjorn > + lmh_cluster0: lmh@17d78800 { > + compatible = "qcom,sdm845-lmh"; > + reg = <0 0x17d78800 0 0x401>; > + interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>; > + qcom,lmh-cpu-id = <0x0>; > + qcom,lmh-temperature-arm = <65000>; > + qcom,lmh-temperature-low = <94500>; > + qcom,lmh-temperature-high = <95000>; > + interrupt-controller; > + #interrupt-cells = <1>; > + }; > + - | > -- > 2.25.1 >
On Thu, Jul 08, 2021 at 08:06:56AM -0400, Thara Gopinath wrote: > Add dt binding documentation to describe Qualcomm > Limits Management Hardware node. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > .../devicetree/bindings/thermal/qcom-lmh.yaml | 100 ++++++++++++++++++ > 1 file changed, 100 insertions(+) > create mode 100644 Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > > diff --git a/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > new file mode 100644 > index 000000000000..7f62bd3d543d > --- /dev/null > +++ b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > @@ -0,0 +1,100 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +# Copyright 2021 Linaro Ltd. > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/thermal/qcom-lmh.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Qualcomm Limits Management Hardware(LMh) > + > +maintainers: > + - Thara Gopinath <thara.gopinath@linaro.org> > + > +description: > + Limits Management Hardware(LMh) is a hardware infrastructure on some > + Qualcomm SoCs that can enforce temperature and current limits as > + programmed by software for certain IPs like CPU. > + > +properties: > + compatible: > + enum: > + - qcom,sdm845-lmh > + > + reg: > + items: > + - description: core registers > + > + interrupts: > + maxItems: 1 > + > + '#interrupt-cells': > + const: 1 > + > + interrupt-controller: true > + > + qcom,lmh-cpu-id: > + description: > + CPU id of the first cpu in the LMh cluster > + $ref: /schemas/types.yaml#/definitions/uint32 The way we reference other nodes in DT is phandles. 'cpus' is already somewhat established for this case. > + > + qcom,lmh-temperature-arm: > + description: > + An integer expressing temperature threshold in millicelsius at which Use unit suffix when you have units. > + the LMh thermal FSM is engaged. > + $ref: /schemas/types.yaml#/definitions/int32 > + > + qcom,lmh-temperature-low: > + description: > + An integer expressing temperature threshold in millicelsius at which > + the LMh thermal FSM is engaged. > + $ref: /schemas/types.yaml#/definitions/int32 > + > + qcom,lmh-temperature-high: > + description: > + An integer expressing temperature threshold in millicelsius at which > + the LMh thermal FSM is engaged. What's the difference in the 3 properties because the description is the same. > + $ref: /schemas/types.yaml#/definitions/int32 > + > +required: > + - compatible > + - reg > + - interrupts > + - #interrupt-cells > + - interrupt-controller > + - qcom,lmh-cpu-id > + - qcom,lmh-temperature-arm > + - qcom,lmh-temperature-low > + - qcom,lmh-temperature-high > + > +additionalProperties: false > + > +examples: > + - | > + #include <dt-bindings/interrupt-controller/arm-gic.h> > + #include <dt-bindings/clock/qcom,rpmh.h> > + #include <dt-bindings/interconnect/qcom,sdm845.h> > + > + lmh_cluster1: lmh@17d70800 { > + compatible = "qcom,sdm845-lmh"; > + reg = <0 0x17d70800 0 0x401>; > + interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>; > + qcom,lmh-cpu-id = <0x4>; > + qcom,lmh-temperature-arm = <65000>; > + qcom,lmh-temperature-low = <94500>; > + qcom,lmh-temperature-high = <95000>; > + interrupt-controller; What devices is this an interrupt controller for? > + #interrupt-cells = <1>; > + }; > + - | > + lmh_cluster0: lmh@17d78800 { > + compatible = "qcom,sdm845-lmh"; > + reg = <0 0x17d78800 0 0x401>; > + interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>; > + qcom,lmh-cpu-id = <0x0>; > + qcom,lmh-temperature-arm = <65000>; > + qcom,lmh-temperature-low = <94500>; > + qcom,lmh-temperature-high = <95000>; > + interrupt-controller; > + #interrupt-cells = <1>; > + }; > + - | > -- > 2.25.1 > >
Hi Bjorn, Thanks for the review On 7/10/21 12:15 AM, Bjorn Andersson wrote: > On Thu 08 Jul 07:06 CDT 2021, Thara Gopinath wrote: > >> Driver enabling various pieces of Limits Management Hardware(LMh) for cpu >> cluster0 and cpu cluster1 namely kick starting monitoring of temperature, >> current, battery current violations, enabling reliability algorithm and >> setting up various temperature limits. >> >> The following has been explained in the cover letter. I am including this >> here so that this remains in the commit message as well. >> >> LMh is a hardware infrastructure on some Qualcomm SoCs that can enforce >> temperature and current limits as programmed by software for certain IPs >> like CPU. On many newer LMh is configured by firmware/TZ and no programming >> is needed from the kernel side. But on certain SoCs like sdm845 the >> firmware does not do a complete programming of the h/w. On such soc's >> kernel software has to explicitly set up the temperature limits and turn on >> various monitoring and enforcing algorithms on the hardware. >> >> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> >> --- >> >> v2->v3: >> - Rearranged enabling of various LMh subfunction and removed returning >> on error in enabling any one subfunction as the different pieces can >> operate and thus be enabled independently. >> - Other minor cosmetic fixes. >> >> v1->v2: >> - Cosmetic and spelling fixes from review comments from Randy Dunlap >> - Added irq_disable to lmh_irq_ops and removed disabling of irq from >> lmh_handle_irq. Now cpufreq explicitly disables irq prior to >> handling it as per Bjorn's suggestion. >> - Rebased to new version of qcom_scm_lmh_dcvsh as changed in patch 1. >> - Removed generic dt compatible string and introduced platform specific one >> as per Bjorn's suggestion. >> - Take arm, low and high temp thresholds for LMh from dt properties instead of >> #defines in the driver as per Daniel's suggestion. >> - Other minor fixes. >> drivers/thermal/qcom/Kconfig | 10 ++ >> drivers/thermal/qcom/Makefile | 1 + >> drivers/thermal/qcom/lmh.c | 239 ++++++++++++++++++++++++++++++++++ >> 3 files changed, 250 insertions(+) >> create mode 100644 drivers/thermal/qcom/lmh.c >> >> diff --git a/drivers/thermal/qcom/Kconfig b/drivers/thermal/qcom/Kconfig >> index 8d5ac2df26dc..7d942f71e532 100644 >> --- a/drivers/thermal/qcom/Kconfig >> +++ b/drivers/thermal/qcom/Kconfig >> @@ -31,3 +31,13 @@ config QCOM_SPMI_TEMP_ALARM >> trip points. The temperature reported by the thermal sensor reflects the >> real time die temperature if an ADC is present or an estimate of the >> temperature based upon the over temperature stage value. >> + >> +config QCOM_LMH >> + tristate "Qualcomm Limits Management Hardware" >> + depends on ARCH_QCOM >> + help >> + This enables initialization of Qualcomm limits management >> + hardware(LMh). LMh allows for hardware-enforced mitigation for cpus based on >> + input from temperature and current sensors. On many newer Qualcomm SoCs >> + LMh is configured in the firmware and this feature need not be enabled. >> + However, on certain SoCs like sdm845 LMh has to be configured from kernel. >> diff --git a/drivers/thermal/qcom/Makefile b/drivers/thermal/qcom/Makefile >> index 252ea7d9da0b..0fa2512042e7 100644 >> --- a/drivers/thermal/qcom/Makefile >> +++ b/drivers/thermal/qcom/Makefile >> @@ -5,3 +5,4 @@ qcom_tsens-y += tsens.o tsens-v2.o tsens-v1.o tsens-v0_1.o \ >> tsens-8960.o >> obj-$(CONFIG_QCOM_SPMI_ADC_TM5) += qcom-spmi-adc-tm5.o >> obj-$(CONFIG_QCOM_SPMI_TEMP_ALARM) += qcom-spmi-temp-alarm.o >> +obj-$(CONFIG_QCOM_LMH) += lmh.o >> diff --git a/drivers/thermal/qcom/lmh.c b/drivers/thermal/qcom/lmh.c >> new file mode 100644 >> index 000000000000..a7b1eb308642 >> --- /dev/null >> +++ b/drivers/thermal/qcom/lmh.c >> @@ -0,0 +1,239 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> + >> +/* >> + * Copyright (C) 2021, Linaro Limited. All rights reserved. >> + */ >> +#include <linux/module.h> >> +#include <linux/interrupt.h> >> +#include <linux/irqdomain.h> >> +#include <linux/err.h> >> +#include <linux/platform_device.h> >> +#include <linux/of_platform.h> >> +#include <linux/slab.h> >> +#include <linux/qcom_scm.h> >> + >> +#define LMH_NODE_DCVS 0x44435653 >> +#define LMH_CLUSTER0_NODE_ID 0x6370302D >> +#define LMH_CLUSTER1_NODE_ID 0x6370312D >> + >> +#define LMH_SUB_FN_THERMAL 0x54484D4C >> +#define LMH_SUB_FN_CRNT 0x43524E54 >> +#define LMH_SUB_FN_REL 0x52454C00 >> +#define LMH_SUB_FN_BCL 0x42434C00 >> + >> +#define LMH_ALGO_MODE_ENABLE 0x454E424C >> +#define LMH_TH_HI_THRESHOLD 0x48494748 >> +#define LMH_TH_LOW_THRESHOLD 0x4C4F5700 >> +#define LMH_TH_ARM_THRESHOLD 0x41524D00 >> + >> +#define LMH_REG_DCVS_INTR_CLR 0x8 >> + >> +struct lmh_hw_data { >> + void __iomem *base; >> + struct irq_domain *domain; >> + int irq; >> + u32 cpu_id; > > cpu_id seems to only be used in lmh_probe(), how about making it a local > variable? yes, it makes sense. I will make it local. > >> +}; >> + >> +static irqreturn_t lmh_handle_irq(int hw_irq, void *data) >> +{ >> + struct lmh_hw_data *lmh_data = data; >> + int irq = irq_find_mapping(lmh_data->domain, 0); >> + >> + /* Call the cpufreq driver to handle the interrupt */ >> + if (irq) >> + generic_handle_irq(irq); >> + >> + return 0; >> +} >> + >> +static void lmh_enable_interrupt(struct irq_data *d) >> +{ >> + struct lmh_hw_data *lmh_data = irq_data_get_irq_chip_data(d); >> + >> + /* Clear the existing interrupt */ >> + writel(0xff, lmh_data->base + LMH_REG_DCVS_INTR_CLR); >> + enable_irq(lmh_data->irq); >> +} >> + >> +static void lmh_disable_interrupt(struct irq_data *d) >> +{ >> + struct lmh_hw_data *lmh_data = irq_data_get_irq_chip_data(d); >> + >> + disable_irq_nosync(lmh_data->irq); >> +} >> + >> +static struct irq_chip lmh_irq_chip = { >> + .name = "lmh", >> + .irq_enable = lmh_enable_interrupt, >> + .irq_disable = lmh_disable_interrupt >> +}; >> + >> +static int lmh_irq_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) >> +{ >> + struct lmh_hw_data *lmh_data = d->host_data; >> + >> + irq_set_chip_and_handler(irq, &lmh_irq_chip, handle_simple_irq); >> + irq_set_chip_data(irq, lmh_data); >> + >> + return 0; >> +} >> + >> +static const struct irq_domain_ops lmh_irq_ops = { >> + .map = lmh_irq_map, >> + .xlate = irq_domain_xlate_onecell, >> +}; >> + >> +static int lmh_probe(struct platform_device *pdev) >> +{ >> + struct device *dev; >> + struct device_node *np; >> + struct lmh_hw_data *lmh_data; >> + u32 node_id; >> + int temp_low, temp_high, temp_arm, ret; >> + >> + dev = &pdev->dev; >> + np = dev->of_node; > > How about initialize these as you declare you variables? ok. > >> + if (!np) > > There's no reasonable way to probe this driver with !dev->of_node, so > you can skip this check. ok. > >> + return -EINVAL; >> + >> + lmh_data = devm_kzalloc(dev, sizeof(*lmh_data), GFP_KERNEL); >> + if (!lmh_data) >> + return -ENOMEM; >> + >> + lmh_data->base = devm_platform_ioremap_resource(pdev, 0); >> + if (IS_ERR(lmh_data->base)) >> + return PTR_ERR(lmh_data->base); >> + >> + ret = of_property_read_u32(np, "qcom,lmh-cpu-id", &lmh_data->cpu_id); >> + if (ret) { >> + dev_err(dev, "missing qcom,lmh-cpu-id property\n"); >> + return ret; >> + } >> + >> + ret = of_property_read_u32(np, "qcom,lmh-temperature-high", &temp_high); >> + if (ret) { >> + dev_err(dev, "missing qcom,lmh-temperature-high property\n"); >> + return ret; >> + } >> + >> + ret = of_property_read_u32(np, "qcom,lmh-temperature-low", &temp_low); >> + if (ret) { >> + dev_err(dev, "missing qcom,lmh-temperature-low property\n"); >> + return ret; >> + } >> + >> + ret = of_property_read_u32(np, "qcom,lmh-temperature-arm", &temp_arm); >> + if (ret) { >> + dev_err(dev, "missing qcom,lmh-temperature-arm property\n"); >> + return ret; >> + } >> + >> + /* >> + * Only sdm845 has lmh hardware currently enabled from hlos. If this is needed >> + * for other platforms, revisit this to check if the <cpu-id, node-id> should be part >> + * of a dt match table. >> + */ >> + if (lmh_data->cpu_id == 0) { >> + node_id = LMH_CLUSTER0_NODE_ID; >> + } else if (lmh_data->cpu_id == 4) { >> + node_id = LMH_CLUSTER1_NODE_ID; >> + } else { >> + dev_err(dev, "Wrong CPU id associated with LMh node\n"); >> + return -EINVAL; >> + } >> + >> + platform_set_drvdata(pdev, lmh_data); > > I don't see any get_drvdat(), so you can probably skip this? Yes. I will remove it. I think it is stray remaining from one of the earlier revisions. > >> + >> + if (!qcom_scm_lmh_dcvsh_available()) >> + return -EINVAL; >> + >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_CRNT, LMH_ALGO_MODE_ENABLE, 1, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) >> + dev_err(dev, "Error %d enabling current subfunction\n", ret); >> + >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_REL, LMH_ALGO_MODE_ENABLE, 1, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) >> + dev_err(dev, "Error %d enabling reliability subfunction\n", ret); >> + >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_BCL, LMH_ALGO_MODE_ENABLE, 1, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) >> + dev_err(dev, "Error %d enabling BCL subfunction\n", ret); >> + >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_ALGO_MODE_ENABLE, 1, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) { >> + dev_err(dev, "Error %d enabling thermal subfunction\n", ret); >> + return ret; >> + } >> + >> + ret = qcom_scm_lmh_profile_change(0x1); >> + if (ret) { >> + dev_err(dev, "Error %d changing profile\n", ret); >> + return ret; >> + } >> + >> + /* Set default thermal trips */ >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_TH_ARM_THRESHOLD, temp_arm, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) { >> + dev_err(dev, "Error setting thermal ARM threshold%d\n", ret); >> + return ret; >> + } >> + >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_TH_HI_THRESHOLD, temp_high, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) { >> + dev_err(dev, "Error setting thermal HI threshold%d\n", ret); >> + return ret; >> + } >> + >> + ret = qcom_scm_lmh_dcvsh(LMH_SUB_FN_THERMAL, LMH_TH_LOW_THRESHOLD, temp_low, >> + LMH_NODE_DCVS, node_id, 0); >> + if (ret) { >> + dev_err(dev, "Error setting thermal ARM threshold%d\n", ret); >> + return ret; >> + } >> + >> + lmh_data->irq = platform_get_irq(pdev, 0); >> + lmh_data->domain = irq_domain_add_linear(np, 1, &lmh_irq_ops, lmh_data); >> + if (!lmh_data->domain) { >> + dev_err(dev, "Error adding irq_domain\n"); >> + return -EINVAL; >> + } >> + > > As written now, you might get interrupts before you get to disable_irq() > below. Instead of the disable_irq() you can add this before request_irq: > > irq_set_status_flags(lmh_dat->irq, IRQ_NOAUTOEN); > >> + ret = devm_request_irq(dev, lmh_data->irq, lmh_handle_irq, >> + IRQF_TRIGGER_HIGH | IRQF_ONESHOT | IRQF_NO_SUSPEND, > > Skip IRQF_TRIGGER_HIGH, as the flags will be merged with the properties > from DT. > >> + "lmh-irq", lmh_data); >> + if (ret) { >> + dev_err(dev, "Error %d registering irq %x\n", ret, lmh_data->irq); >> + irq_domain_remove(lmh_data->domain); >> + return ret; >> + } >> + >> + /* Disable the irq and let cpufreq enable it when ready to handle the interrupt */ >> + disable_irq(lmh_data->irq); >> + >> + return 0; >> +} >> + >> +static const struct of_device_id lmh_table[] = { >> + { .compatible = "qcom,sdm845-lmh", }, >> + {} >> +}; >> +MODULE_DEVICE_TABLE(of, lmh_table); >> + >> +static struct platform_driver lmh_driver = { >> + .probe = lmh_probe, > > I think you at least need to irq_domain_remove() during .remove, but > unless we have a clear understanding about how to stop the algorithm > (without causing harmful side effects) it might be better to add > .suppress_bind_attrs = true in .driver... sounds good. Like you said, I am not sure what is the right way to disable the algorithm. So I will add suppress_bind_attrs = true to prevent user space from doing something silly.
On 7/10/21 12:21 AM, Bjorn Andersson wrote: > On Thu 08 Jul 07:06 CDT 2021, Thara Gopinath wrote: > >> Add dt binding documentation to describe Qualcomm >> Limits Management Hardware node. >> >> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> >> --- >> .../devicetree/bindings/thermal/qcom-lmh.yaml | 100 ++++++++++++++++++ >> 1 file changed, 100 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/thermal/qcom-lmh.yaml >> >> diff --git a/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml >> new file mode 100644 >> index 000000000000..7f62bd3d543d >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml >> @@ -0,0 +1,100 @@ >> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >> +# Copyright 2021 Linaro Ltd. >> +%YAML 1.2 >> +--- >> +$id: http://devicetree.org/schemas/thermal/qcom-lmh.yaml# >> +$schema: http://devicetree.org/meta-schemas/core.yaml# >> + >> +title: Qualcomm Limits Management Hardware(LMh) >> + >> +maintainers: >> + - Thara Gopinath <thara.gopinath@linaro.org> >> + >> +description: >> + Limits Management Hardware(LMh) is a hardware infrastructure on some >> + Qualcomm SoCs that can enforce temperature and current limits as >> + programmed by software for certain IPs like CPU. >> + >> +properties: >> + compatible: >> + enum: >> + - qcom,sdm845-lmh >> + >> + reg: >> + items: >> + - description: core registers >> + >> + interrupts: >> + maxItems: 1 >> + >> + '#interrupt-cells': >> + const: 1 >> + >> + interrupt-controller: true >> + >> + qcom,lmh-cpu-id: >> + description: >> + CPU id of the first cpu in the LMh cluster >> + $ref: /schemas/types.yaml#/definitions/uint32 >> + >> + qcom,lmh-temperature-arm: >> + description: >> + An integer expressing temperature threshold in millicelsius at which >> + the LMh thermal FSM is engaged. > > Do we know (by any public source) what "arm", "low" and "high" means > beyond that they somehow pokes the state machine? Not from public documentation. I know what these thresholds means, atleast to some extent. Though I will never claim to be an expert in this! There is an error in description of qcom,lmh-temperature-low and qcom,lmh-temperature-high below. I copied and forgot to change the description. I will fix it. > >> + $ref: /schemas/types.yaml#/definitions/int32 >> + >> + qcom,lmh-temperature-low: >> + description: >> + An integer expressing temperature threshold in millicelsius at which >> + the LMh thermal FSM is engaged. >> + $ref: /schemas/types.yaml#/definitions/int32 >> + >> + qcom,lmh-temperature-high: >> + description: >> + An integer expressing temperature threshold in millicelsius at which >> + the LMh thermal FSM is engaged. >> + $ref: /schemas/types.yaml#/definitions/int32 >> + >> +required: >> + - compatible >> + - reg >> + - interrupts >> + - #interrupt-cells >> + - interrupt-controller >> + - qcom,lmh-cpu-id >> + - qcom,lmh-temperature-arm >> + - qcom,lmh-temperature-low >> + - qcom,lmh-temperature-high >> + >> +additionalProperties: false >> + >> +examples: >> + - | >> + #include <dt-bindings/interrupt-controller/arm-gic.h> >> + #include <dt-bindings/clock/qcom,rpmh.h> >> + #include <dt-bindings/interconnect/qcom,sdm845.h> > > I don't see why you need qcom,rpmh.h or the interconnect include in this > example. I could have sworn make dt-bindings check failed. But maybe only The first include is needed. I will remove the other two. > >> + >> + lmh_cluster1: lmh@17d70800 { >> + compatible = "qcom,sdm845-lmh"; >> + reg = <0 0x17d70800 0 0x401>; > > #address- and #size-cells are 1 in the wrapper that validates the > examples, so drop the two zeros. Ok. > >> + interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>; >> + qcom,lmh-cpu-id = <0x4>; >> + qcom,lmh-temperature-arm = <65000>; >> + qcom,lmh-temperature-low = <94500>; >> + qcom,lmh-temperature-high = <95000>; >> + interrupt-controller; >> + #interrupt-cells = <1>; >> + }; >> + - | > > This is a different example from the one above, if you intended that, > don't you need the #include of arm-gic.h here as well? Again make dt-bindings check did not fail. It is a different example. So I am not sure of the norm here. Is one example good enough ? > > Regards, > Bjorn > >> + lmh_cluster0: lmh@17d78800 { >> + compatible = "qcom,sdm845-lmh"; >> + reg = <0 0x17d78800 0 0x401>; >> + interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>; >> + qcom,lmh-cpu-id = <0x0>; >> + qcom,lmh-temperature-arm = <65000>; >> + qcom,lmh-temperature-low = <94500>; >> + qcom,lmh-temperature-high = <95000>; >> + interrupt-controller; >> + #interrupt-cells = <1>; >> + }; >> + - | >> -- >> 2.25.1 >>
Hi Thara! On 7/8/21 7:06 AM, Thara Gopinath wrote: > Limits Management Hardware(LMh) is a hardware infrastructure on some > Qualcomm SoCs that can enforce temperature and current limits as programmed > by software for certain IPs like CPU. On many newer SoCs LMh is configured > by firmware/TZ and no programming is needed from the kernel side. But on > certain SoCs like sdm845 the firmware does not do a complete programming of > the h/w block. On such SoCs kernel software has to explicitly set up the > temperature limits and turn on various monitoring and enforcing algorithms > on the hardware. > > Introduce support for enabling and programming various limit settings and > monitoring capabilities of Limits Management Hardware(LMh) associated with > cpu clusters. Also introduce support in cpufreq hardware driver to monitor > the interrupt associated with cpu frequency throttling so that this > information can be conveyed to the schdeuler via thermal pressure > interface. > > With this patch series following cpu performance improvement(30-70%) is > observed on sdm845. The reasoning here is that without LMh being programmed > properly from the kernel, the default settings were enabling thermal > mitigation for CPUs at too low a temperature (around 70-75 degree C). This > in turn meant that many a time CPUs were never actually allowed to hit the > maximum possible/required frequencies. > > UnixBench whets and dhry (./Run whets dhry) > System Benchmarks Index Score > > Without LMh Support With LMh Support > 1 copy test 1353.7 1773.2 > > 8 copy tests 4473.6 7402.3 > > Sysbench cpu > sysbench cpu --threads=8 --time=60 --cpu-max-prime=100000 run > > Without LMh Support With LMh Support > Events per > second 355 614 > > Avg Latency(ms) 21.84 13.02 > > v2->v3: > - Included patch adding dt binding documentation for LMh nodes. > - Rebased to v5.13 > > Thara Gopinath (6): > firmware: qcom_scm: Introduce SCM calls to access LMh > thermal: qcom: Add support for LMh driver > cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support > arm64: boot: dts: qcom: sdm45: Add support for LMh node > arm64: boot: dts: qcom: sdm845: Remove cpufreq cooling devices for CPU > thermal zones > dt-bindings: thermal: Add dt binding for QCOM LMh > > .../devicetree/bindings/thermal/qcom-lmh.yaml | 100 ++++++++ > arch/arm64/boot/dts/qcom/sdm845.dtsi | 162 ++---------- > drivers/cpufreq/qcom-cpufreq-hw.c | 118 +++++++++ > drivers/firmware/qcom_scm.c | 58 +++++ > drivers/firmware/qcom_scm.h | 4 + > drivers/thermal/qcom/Kconfig | 10 + > drivers/thermal/qcom/Makefile | 1 + > drivers/thermal/qcom/lmh.c | 239 ++++++++++++++++++ > include/linux/qcom_scm.h | 14 + > 9 files changed, 570 insertions(+), 136 deletions(-) > create mode 100644 Documentation/devicetree/bindings/thermal/qcom-lmh.yaml > create mode 100644 drivers/thermal/qcom/lmh.c > I've been using these patches on a 5.13 kernel (https://github.com/steev/linux/tree/linux-5.13.y - while trying to track down a different issue, while playing a video on youtube, as well as compressing a 9.2GB file with xz, I got the following |Jul 21 21:44:21 limitless kernel: [ 5438.914591] EXT4-fs (loop0p1): mounting ext3 file system using the ext4 subsystem | |Jul 21 21:44:21 limitless kernel: [ 5438.920817] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. | |Jul 21 21:45:56 limitless kernel: [ 5533.893290] BUG: scheduling while atomic: swapper/0/0/0x00010000 | |Jul 21 21:45:56 limitless kernel: [ 5533.893333] Modules linked in: dm_mod loop tcp_diag inet_diag aes_ce_ccm rfcomm algif_hash algif_skcipher af_alg bnep lz4 lz4_compress q6asm_dai zram q6routing zsmalloc q6afe_dai q6adm q6asm q6afe q6dsp_common q6core snd_soc_wsa881x regmap_sdw snd_soc_wcd934x soundwire_qcom gpio_wcd934x binfmt_misc venus_enc venus_dec videobuf2_dma_contig nls_ascii nls_cp437 vfat fat wcd934x regmap_slimbus qrtr_smd fastrpc apr aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce gf128mul sha2_ce uvcvideo videobuf2_vmalloc videobuf2_memops hci_uart ath10k_snoc sha256_arm64 sha1_ce btqca ath10k_core ath btrtl mac80211 btbcm btintel snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_common snd_soc_rl6231 soundwire_bus pm8941_pwrkey qcom_spmi_adc5 bluetooth qcom_vadc_common snd_soc_core venus_core snd_compress snd_pcm_dmaengine snd_pcm v4l2_mem2mem snd_timer snd qcom_spmi_temp_alarm videobuf2_v4l2 soundcore industrialio videobuf2_common videodev joydev mc hid_multitouch ecdh_generic ecc cfg80211 | |Jul 21 21:45:56 limitless kernel: [ 5533.893841] qcom_rng sg rfkill qrtr ns libarc4 qcom_q6v5_mss slim_qcom_ngd_ctrl qcom_q6v5_pas evdev pdr_interface qcom_pil_info qcom_q6v5 slimbus qcom_sysmon rmtfs_mem bam_dma qcom_wdt fuse configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 msm rtc_pm8xxx llcc_qcom ti_sn65dsi86 ocmem i2c_hid_of drm_kms_helper ipa qcom_common qmi_helpers mdt_loader camcc_sdm845 panel_simple drm gpio_keys | |Jul 21 21:45:56 limitless kernel: [ 5533.894072] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.13.4 #1 | |Jul 21 21:45:56 limitless kernel: [ 5533.894087] Hardware name: LENOVO 81JL/LNVNB161216, BIOS 9UCN33WW(V2.06) 06/ 4/2019 | |Jul 21 21:45:56 limitless kernel: [ 5533.894098] Call trace: | |Jul 21 21:45:56 limitless kernel: [ 5533.894103] dump_backtrace+0x0/0x1e4 | |Jul 21 21:45:56 limitless kernel: [ 5533.894133] show_stack+0x24/0x30 | |Jul 21 21:45:56 limitless kernel: [ 5533.894150] dump_stack+0xd0/0x12c | |Jul 21 21:45:56 limitless kernel: [ 5533.894168] __schedule_bug+0x68/0x80 | |Jul 21 21:45:56 limitless kernel: [ 5533.894186] __schedule+0x74c/0x8ac | |Jul 21 21:45:56 limitless kernel: [ 5533.894202] schedule+0x54/0xe0 | |Jul 21 21:45:56 limitless kernel: [ 5533.894217] schedule_preempt_disabled+0x1c/0x2c | |Jul 21 21:45:56 limitless kernel: [ 5533.894234] __mutex_lock.constprop.0+0x55c/0x590 | |Jul 21 21:45:56 limitless kernel: [ 5533.894247] __mutex_lock_slowpath+0x1c/0x30 | |Jul 21 21:45:56 limitless kernel: [ 5533.894260] mutex_lock+0x6c/0x80 | |Jul 21 21:45:56 limitless kernel: [ 5533.894271] dev_pm_opp_find_freq_floor+0x4c/0x200 | |Jul 21 21:45:56 limitless kernel: [ 5533.894287] qcom_lmh_dcvs_notify+0x74/0x170 | |Jul 21 21:45:56 limitless kernel: [ 5533.894301] qcom_lmh_dcvs_handle_irq+0x30/0x44 | |Jul 21 21:45:56 limitless kernel: [ 5533.894313] __handle_irq_event_percpu+0x68/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.894328] handle_irq_event+0x6c/0x1b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.894340] handle_simple_irq+0xc8/0x170 | |Jul 21 21:45:56 limitless kernel: [ 5533.894355] generic_handle_irq+0x3c/0x5c | |Jul 21 21:45:56 limitless kernel: [ 5533.894368] lmh_handle_irq+0x40/0x50 | |Jul 21 21:45:56 limitless kernel: [ 5533.894383] __handle_irq_event_percpu+0x68/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.894395] handle_irq_event+0x6c/0x1b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.894407] handle_fasteoi_irq+0xcc/0x1fc | |Jul 21 21:45:56 limitless kernel: [ 5533.894421] __handle_domain_irq+0x88/0xec | |Jul 21 21:45:56 limitless kernel: [ 5533.894433] gic_handle_irq+0xc8/0x148 | |Jul 21 21:45:56 limitless kernel: [ 5533.894446] el1_irq+0xbc/0x140 | |Jul 21 21:45:56 limitless kernel: [ 5533.894458] arch_local_irq_enable+0xc/0x14 | |Jul 21 21:45:56 limitless kernel: [ 5533.894473] __schedule+0x2fc/0x8ac | |Jul 21 21:45:56 limitless kernel: [ 5533.894489] schedule_idle+0x34/0x54 | |Jul 21 21:45:56 limitless kernel: [ 5533.894504] do_idle+0x1a4/0x2b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.894520] cpu_startup_entry+0x34/0xa0 | |Jul 21 21:45:56 limitless kernel: [ 5533.894536] rest_init+0xcc/0xdc | |Jul 21 21:45:56 limitless kernel: [ 5533.894550] arch_call_rest_init+0x1c/0x28 | |Jul 21 21:45:56 limitless kernel: [ 5533.894567] start_kernel+0x5b4/0x5ec | |Jul 21 21:45:56 limitless kernel: [ 5533.894954] ------------[ cut here ]------------ | |Jul 21 21:45:56 limitless kernel: [ 5533.894966] irq 156 handler qcom_lmh_dcvs_handle_irq+0x0/0x44 enabled interrupts | |Jul 21 21:45:56 limitless kernel: [ 5533.895005] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159 __handle_irq_event_percpu+0x1d8/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.895028] Modules linked in: dm_mod loop tcp_diag inet_diag aes_ce_ccm rfcomm algif_hash algif_skcipher af_alg bnep lz4 lz4_compress q6asm_dai zram q6routing zsmalloc q6afe_dai q6adm q6asm q6afe q6dsp_common q6core snd_soc_wsa881x regmap_sdw snd_soc_wcd934x soundwire_qcom gpio_wcd934x binfmt_misc venus_enc venus_dec videobuf2_dma_contig nls_ascii nls_cp437 vfat fat wcd934x regmap_slimbus qrtr_smd fastrpc apr aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce gf128mul sha2_ce uvcvideo videobuf2_vmalloc videobuf2_memops hci_uart ath10k_snoc sha256_arm64 sha1_ce btqca ath10k_core ath btrtl mac80211 btbcm btintel snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_common snd_soc_rl6231 soundwire_bus pm8941_pwrkey qcom_spmi_adc5 bluetooth qcom_vadc_common snd_soc_core venus_core snd_compress snd_pcm_dmaengine snd_pcm v4l2_mem2mem snd_timer snd qcom_spmi_temp_alarm videobuf2_v4l2 soundcore industrialio videobuf2_common videodev joydev mc hid_multitouch ecdh_generic ecc cfg80211 | |Jul 21 21:45:56 limitless kernel: [ 5533.895508] qcom_rng sg rfkill qrtr ns libarc4 qcom_q6v5_mss slim_qcom_ngd_ctrl qcom_q6v5_pas evdev pdr_interface qcom_pil_info qcom_q6v5 slimbus qcom_sysmon rmtfs_mem bam_dma qcom_wdt fuse configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 msm rtc_pm8xxx llcc_qcom ti_sn65dsi86 ocmem i2c_hid_of drm_kms_helper ipa qcom_common qmi_helpers mdt_loader camcc_sdm845 panel_simple drm gpio_keys | |Jul 21 21:45:56 limitless kernel: [ 5533.895732] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.13.4 #1 | |Jul 21 21:45:56 limitless kernel: [ 5533.895747] Hardware name: LENOVO 81JL/LNVNB161216, BIOS 9UCN33WW(V2.06) 06/ 4/2019 | |Jul 21 21:45:56 limitless kernel: [ 5533.895756] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) | |Jul 21 21:45:56 limitless kernel: [ 5533.895771] pc : __handle_irq_event_percpu+0x1d8/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.895785] lr : __handle_irq_event_percpu+0x1d8/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.895798] sp : ffff800010003de0 | |Jul 21 21:45:56 limitless kernel: [ 5533.895805] x29: ffff800010003de0 x28: ffffc75e330a3b00 x27: ffffc75e32c02000 | |Jul 21 21:45:56 limitless kernel: [ 5533.895831] x26: ffff4900496e0600 x25: ffffc75e32c02000 x24: ffffc75e33099a20 | |Jul 21 21:45:56 limitless kernel: [ 5533.895856] x23: 000000000000009c x22: ffff800010003e74 x21: 0000000000000000 | |Jul 21 21:45:56 limitless kernel: [ 5533.895881] x20: 0000000000000000 x19: ffff490049c02900 x18: 00000000fffffffb | |Jul 21 21:45:56 limitless kernel: [ 5533.895905] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000020 | |Jul 21 21:45:56 limitless kernel: [ 5533.895929] x14: 727265746e692064 x13: 656c62616e652034 x12: 3478302f3078302b | |Jul 21 21:45:56 limitless kernel: [ 5533.895954] x11: ffffc75e3311a9b0 x10: 00000000fffff000 x9 : ffffc75e31abe5a0 | |Jul 21 21:45:56 limitless kernel: [ 5533.895978] x8 : ffffc75e330c29b0 x7 : ffffc75e3311a9b0 x6 : 0000000000000000 | |Jul 21 21:45:56 limitless kernel: [ 5533.896002] x5 : ffff4901b36fa9c8 x4 : ffff800010003bd0 x3 : 0000000000000001 | |Jul 21 21:45:56 limitless kernel: [ 5533.896026] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffc75e330a3b00 | |Jul 21 21:45:56 limitless kernel: [ 5533.896050] Call trace: | |Jul 21 21:45:56 limitless kernel: [ 5533.896057] __handle_irq_event_percpu+0x1d8/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.896072] handle_irq_event+0x6c/0x1b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.896085] handle_simple_irq+0xc8/0x170 | |Jul 21 21:45:56 limitless kernel: [ 5533.896101] generic_handle_irq+0x3c/0x5c | |Jul 21 21:45:56 limitless kernel: [ 5533.896113] lmh_handle_irq+0x40/0x50 | |Jul 21 21:45:56 limitless kernel: [ 5533.896128] __handle_irq_event_percpu+0x68/0x210 | |Jul 21 21:45:56 limitless kernel: [ 5533.896141] handle_irq_event+0x6c/0x1b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.896153] handle_fasteoi_irq+0xcc/0x1fc | |Jul 21 21:45:56 limitless kernel: [ 5533.896168] __handle_domain_irq+0x88/0xec | |Jul 21 21:45:56 limitless kernel: [ 5533.896180] gic_handle_irq+0xc8/0x148 | |Jul 21 21:45:56 limitless kernel: [ 5533.896193] el1_irq+0xbc/0x140 | |Jul 21 21:45:56 limitless kernel: [ 5533.896206] arch_local_irq_enable+0xc/0x14 | |Jul 21 21:45:56 limitless kernel: [ 5533.896221] __schedule+0x2fc/0x8ac | |Jul 21 21:45:56 limitless kernel: [ 5533.896238] schedule_idle+0x34/0x54 | |Jul 21 21:45:56 limitless kernel: [ 5533.896254] do_idle+0x1a4/0x2b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.896269] cpu_startup_entry+0x34/0xa0 | |Jul 21 21:45:56 limitless kernel: [ 5533.896285] rest_init+0xcc/0xdc | |Jul 21 21:45:56 limitless kernel: [ 5533.896299] arch_call_rest_init+0x1c/0x28 | |Jul 21 21:45:56 limitless kernel: [ 5533.896316] start_kernel+0x5b4/0x5ec | |Jul 21 21:45:56 limitless kernel: [ 5533.896332] ---[ end trace 9b7875032d5e8e07 ]--- | |Jul 21 21:45:56 limitless kernel: [ 5533.896573] BUG: scheduling while atomic: swapper/0/0/0xffff0000 | |Jul 21 21:45:56 limitless kernel: [ 5533.896583] Modules linked in: dm_mod loop tcp_diag inet_diag aes_ce_ccm rfcomm algif_hash algif_skcipher af_alg bnep lz4 lz4_compress q6asm_dai zram q6routing zsmalloc q6afe_dai q6adm q6asm q6afe q6dsp_common q6core snd_soc_wsa881x regmap_sdw snd_soc_wcd934x soundwire_qcom gpio_wcd934x binfmt_misc venus_enc venus_dec videobuf2_dma_contig nls_ascii nls_cp437 vfat fat wcd934x regmap_slimbus qrtr_smd fastrpc apr aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce gf128mul sha2_ce uvcvideo videobuf2_vmalloc videobuf2_memops hci_uart ath10k_snoc sha256_arm64 sha1_ce btqca ath10k_core ath btrtl mac80211 btbcm btintel snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_common snd_soc_rl6231 soundwire_bus pm8941_pwrkey qcom_spmi_adc5 bluetooth qcom_vadc_common snd_soc_core venus_core snd_compress snd_pcm_dmaengine snd_pcm v4l2_mem2mem snd_timer snd qcom_spmi_temp_alarm videobuf2_v4l2 soundcore industrialio videobuf2_common videodev joydev mc hid_multitouch ecdh_generic ecc cfg80211 | |Jul 21 21:45:56 limitless kernel: [ 5533.896746] qcom_rng sg rfkill qrtr ns libarc4 qcom_q6v5_mss slim_qcom_ngd_ctrl qcom_q6v5_pas evdev pdr_interface qcom_pil_info qcom_q6v5 slimbus qcom_sysmon rmtfs_mem bam_dma qcom_wdt fuse configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 msm rtc_pm8xxx llcc_qcom ti_sn65dsi86 ocmem i2c_hid_of drm_kms_helper ipa qcom_common qmi_helpers mdt_loader camcc_sdm845 panel_simple drm gpio_keys | |Jul 21 21:45:56 limitless kernel: [ 5533.896819] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.13.4 #1 | |Jul 21 21:45:56 limitless kernel: [ 5533.896824] Hardware name: LENOVO 81JL/LNVNB161216, BIOS 9UCN33WW(V2.06) 06/ 4/2019 | |Jul 21 21:45:56 limitless kernel: [ 5533.896827] Call trace: | |Jul 21 21:45:56 limitless kernel: [ 5533.896829] dump_backtrace+0x0/0x1e4 | |Jul 21 21:45:56 limitless kernel: [ 5533.896838] show_stack+0x24/0x30 | |Jul 21 21:45:56 limitless kernel: [ 5533.896843] dump_stack+0xd0/0x12c | |Jul 21 21:45:56 limitless kernel: [ 5533.896849] __schedule_bug+0x68/0x80 | |Jul 21 21:45:56 limitless kernel: [ 5533.896855] __schedule+0x74c/0x8ac | |Jul 21 21:45:56 limitless kernel: [ 5533.896860] schedule_idle+0x34/0x54 | |Jul 21 21:45:56 limitless kernel: [ 5533.896866] do_idle+0x1a4/0x2b0 | |Jul 21 21:45:56 limitless kernel: [ 5533.896871] cpu_startup_entry+0x34/0xa0 | |Jul 21 21:45:56 limitless kernel: [ 5533.896876] rest_init+0xcc/0xdc | |Jul 21 21:45:56 limitless kernel: [ 5533.896881] arch_call_rest_init+0x1c/0x28 | |Jul 21 21:45:56 limitless kernel: [ 5533.896887] start_kernel+0x5b4/0x5ec|
On 7/21/21 11:14 PM, Steev Klimaszewski wrote: > Hi Thara! > > On 7/8/21 7:06 AM, Thara Gopinath wrote: >> Limits Management Hardware(LMh) is a hardware infrastructure on some >> Qualcomm SoCs that can enforce temperature and current limits as programmed >> by software for certain IPs like CPU. On many newer SoCs LMh is configured >> by firmware/TZ and no programming is needed from the kernel side. But on >> certain SoCs like sdm845 the firmware does not do a complete programming of >> the h/w block. On such SoCs kernel software has to explicitly set up the >> temperature limits and turn on various monitoring and enforcing algorithms >> on the hardware. >> >> Introduce support for enabling and programming various limit settings and >> monitoring capabilities of Limits Management Hardware(LMh) associated with >> cpu clusters. Also introduce support in cpufreq hardware driver to monitor >> the interrupt associated with cpu frequency throttling so that this >> information can be conveyed to the schdeuler via thermal pressure >> interface. >> >> With this patch series following cpu performance improvement(30-70%) is >> observed on sdm845. The reasoning here is that without LMh being programmed >> properly from the kernel, the default settings were enabling thermal >> mitigation for CPUs at too low a temperature (around 70-75 degree C). This >> in turn meant that many a time CPUs were never actually allowed to hit the >> maximum possible/required frequencies. >> >> UnixBench whets and dhry (./Run whets dhry) >> System Benchmarks Index Score >> >> Without LMh Support With LMh Support >> 1 copy test 1353.7 1773.2 >> >> 8 copy tests 4473.6 7402.3 >> >> Sysbench cpu >> sysbench cpu --threads=8 --time=60 --cpu-max-prime=100000 run >> >> Without LMh Support With LMh Support >> Events per >> second 355 614 >> >> Avg Latency(ms) 21.84 13.02 >> >> v2->v3: >> - Included patch adding dt binding documentation for LMh nodes. >> - Rebased to v5.13 >> >> Thara Gopinath (6): >> firmware: qcom_scm: Introduce SCM calls to access LMh >> thermal: qcom: Add support for LMh driver >> cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support >> arm64: boot: dts: qcom: sdm45: Add support for LMh node >> arm64: boot: dts: qcom: sdm845: Remove cpufreq cooling devices for CPU >> thermal zones >> dt-bindings: thermal: Add dt binding for QCOM LMh >> >> .../devicetree/bindings/thermal/qcom-lmh.yaml | 100 ++++++++ >> arch/arm64/boot/dts/qcom/sdm845.dtsi | 162 ++---------- >> drivers/cpufreq/qcom-cpufreq-hw.c | 118 +++++++++ >> drivers/firmware/qcom_scm.c | 58 +++++ >> drivers/firmware/qcom_scm.h | 4 + >> drivers/thermal/qcom/Kconfig | 10 + >> drivers/thermal/qcom/Makefile | 1 + >> drivers/thermal/qcom/lmh.c | 239 ++++++++++++++++++ >> include/linux/qcom_scm.h | 14 + >> 9 files changed, 570 insertions(+), 136 deletions(-) >> create mode 100644 Documentation/devicetree/bindings/thermal/qcom-lmh.yaml >> create mode 100644 drivers/thermal/qcom/lmh.c >> > I've been using these patches on a 5.13 kernel > (https://github.com/steev/linux/tree/linux-5.13.y - while trying to > track down a different issue, while playing a video on youtube, as well > as compressing a 9.2GB file with xz, I got the following Hi Steev, Thanks for testing this. I was unable to reproduce this. I have posted v4 moving the interrupt handling in qcom-cpufreq-hw to threaded interrupt handler and hopefully this should fix the issue. It will be great if you can test and let me know.
On 7/27/21 10:29 AM, Thara Gopinath wrote: > > > On 7/21/21 11:14 PM, Steev Klimaszewski wrote: >> Hi Thara! >> >> On 7/8/21 7:06 AM, Thara Gopinath wrote: >>> Limits Management Hardware(LMh) is a hardware infrastructure on some >>> Qualcomm SoCs that can enforce temperature and current limits as >>> programmed >>> by software for certain IPs like CPU. On many newer SoCs LMh is >>> configured >>> by firmware/TZ and no programming is needed from the kernel side. >>> But on >>> certain SoCs like sdm845 the firmware does not do a complete >>> programming of >>> the h/w block. On such SoCs kernel software has to explicitly set up >>> the >>> temperature limits and turn on various monitoring and enforcing >>> algorithms >>> on the hardware. >>> >>> Introduce support for enabling and programming various limit >>> settings and >>> monitoring capabilities of Limits Management Hardware(LMh) >>> associated with >>> cpu clusters. Also introduce support in cpufreq hardware driver to >>> monitor >>> the interrupt associated with cpu frequency throttling so that this >>> information can be conveyed to the schdeuler via thermal pressure >>> interface. >>> >>> With this patch series following cpu performance improvement(30-70%) is >>> observed on sdm845. The reasoning here is that without LMh being >>> programmed >>> properly from the kernel, the default settings were enabling thermal >>> mitigation for CPUs at too low a temperature (around 70-75 degree >>> C). This >>> in turn meant that many a time CPUs were never actually allowed to >>> hit the >>> maximum possible/required frequencies. >>> >>> UnixBench whets and dhry (./Run whets dhry) >>> System Benchmarks Index Score >>> >>> Without LMh Support With LMh Support >>> 1 copy test 1353.7 1773.2 >>> >>> 8 copy tests 4473.6 7402.3 >>> >>> Sysbench cpu >>> sysbench cpu --threads=8 --time=60 --cpu-max-prime=100000 run >>> >>> Without LMh Support With LMh Support >>> Events per >>> second 355 614 >>> >>> Avg Latency(ms) 21.84 13.02 >>> >>> v2->v3: >>> - Included patch adding dt binding documentation for LMh nodes. >>> - Rebased to v5.13 >>> >>> Thara Gopinath (6): >>> firmware: qcom_scm: Introduce SCM calls to access LMh >>> thermal: qcom: Add support for LMh driver >>> cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support >>> arm64: boot: dts: qcom: sdm45: Add support for LMh node >>> arm64: boot: dts: qcom: sdm845: Remove cpufreq cooling devices >>> for CPU >>> thermal zones >>> dt-bindings: thermal: Add dt binding for QCOM LMh >>> >>> .../devicetree/bindings/thermal/qcom-lmh.yaml | 100 ++++++++ >>> arch/arm64/boot/dts/qcom/sdm845.dtsi | 162 ++---------- >>> drivers/cpufreq/qcom-cpufreq-hw.c | 118 +++++++++ >>> drivers/firmware/qcom_scm.c | 58 +++++ >>> drivers/firmware/qcom_scm.h | 4 + >>> drivers/thermal/qcom/Kconfig | 10 + >>> drivers/thermal/qcom/Makefile | 1 + >>> drivers/thermal/qcom/lmh.c | 239 >>> ++++++++++++++++++ >>> include/linux/qcom_scm.h | 14 + >>> 9 files changed, 570 insertions(+), 136 deletions(-) >>> create mode 100644 >>> Documentation/devicetree/bindings/thermal/qcom-lmh.yaml >>> create mode 100644 drivers/thermal/qcom/lmh.c >>> >> I've been using these patches on a 5.13 kernel >> (https://github.com/steev/linux/tree/linux-5.13.y - while trying to >> track down a different issue, while playing a video on youtube, as well >> as compressing a 9.2GB file with xz, I got the following > > Hi Steev, > > Thanks for testing this. I was unable to reproduce this. I have posted > v4 moving the interrupt handling in qcom-cpufreq-hw to threaded > interrupt handler and hopefully this should fix the issue. It will be > great if you can test and let me know. > Hi Thara, I've been testing v4 for a little bit here, and so far I can't seem to get it to reproduce anymore. I will keep trying but fingers crossed that that did the trick. For setup, I'm using https://github.com/steev/linux/tree/linux-5.13.y with the "distro_defconfig" configuration here on my c630s. I'm also running https://github.com/steev/scheduler as a systemd service. So far I've been able to sleep/suspend without issue while running "make -j$(nproc) deb-pkg" in those kernel sources as well as `xz --memlimit-compress=50 -T 4 imagefile.img" on a 9.2GB file at the same time. One system is running the Budgie desktop on top of Xorg, and the other is running Gnome 3.38 on top of Wayland. -- steev