Message ID | 20241101121201.2464091-1-chin-ting_kuo@aspeedtech.com |
---|---|
Headers | show |
Series | Update ASPEED WDT bootstatus | expand |
On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote: > The boot status mapping rule follows the latest design guide from > the OpenBMC shown as below. > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design > - WDIOF_EXTERN1 => system is reset by Software > - WDIOF_CARDRESET => system is reset by WDT SoC reset > - Others => other reset events, e.g., power on reset. I'm quite surprised that the above is relevant for a kernel driver at all. Isn't "EXTERN1" a name of a real watchdog signal from your hardware (my recollection is that there are 2 external watchdogs). I think the point of this referenced design document was that most users of BMCs have "EXTERN1" used a for software reset conditions. `CARDRESET` should be representing resets by the watchdog itself. The purpose of this design proposal was not to require very specific changes to individual watchdog drivers, but to align the userspace use with the best practices already from other watchdog drivers. I don't think the kernel driver should be bending to match a particular userspace implementation; you should be exposing the information available to your hardware. Having said that, it was known that there would need to be changes to the driver because some of these conditions were not adequately exposed at all. I'm just still surprised that we're needing to reference that document as part of these changes. > > On ASPEED platform, the boot status is recorded in the SCU registers. > - AST2400: Only a bit represents for any WDT reset. > - AST2500: The reset triggered by different WDT controllers can be > distinguished by different SCU bits. But, WDIOF_EXTERN1 or > WDIOF_CARDRESET still cannot be identified due to > HW limitation. > - AST2600: Different from AST2500, additional HW bits are added for > distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET.
On 11/1/24 05:11, Chin-Ting Kuo wrote: > The boot status in the watchdog device struct is updated during > controller probe stage. Application layer can get the boot status > through the command, cat /sys/class/watchdog/watchdogX/bootstatus. > > The boot status mapping rule follows the latest design guide from > the OpenBMC shown as below. > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design > - WDIOF_EXTERN1 => system is reset by Software > - WDIOF_CARDRESET => system is reset by WDT SoC reset > - Others => other reset events, e.g., power on reset. > > On ASPEED platform, the boot status is recorded in the SCU registers. > - AST2400: Only a bit represents for any WDT reset. > - AST2500: The reset triggered by different WDT controllers can be > distinguished by different SCU bits. But, WDIOF_EXTERN1 or > WDIOF_CARDRESET still cannot be identified due to > HW limitation. > - AST2600: Different from AST2500, additional HW bits are added for > distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET. > > Besides, since alternating boot event is triggered by WDT SoC reset, > it is classified as WDIOF_CARDRESET. > > Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> > --- > drivers/watchdog/aspeed_wdt.c | 83 ++++++++++++++++++++++++++++++++++- > 1 file changed, 81 insertions(+), 2 deletions(-) > > diff --git a/drivers/watchdog/aspeed_wdt.c b/drivers/watchdog/aspeed_wdt.c > index b4773a6aaf8c..4ad6335ff25b 100644 > --- a/drivers/watchdog/aspeed_wdt.c > +++ b/drivers/watchdog/aspeed_wdt.c > @@ -11,21 +11,31 @@ > #include <linux/io.h> > #include <linux/kernel.h> > #include <linux/kstrtox.h> > +#include <linux/mfd/syscon.h> > #include <linux/module.h> > #include <linux/of.h> > #include <linux/of_irq.h> > #include <linux/platform_device.h> > +#include <linux/regmap.h> > #include <linux/watchdog.h> > > static bool nowayout = WATCHDOG_NOWAYOUT; > module_param(nowayout, bool, 0); > MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" > __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); > +struct aspeed_wdt_scu { > + const char *compatible; > + u32 reset_status_reg; > + u32 wdt_reset_mask; > + u32 wdt_sw_reset_mask; > + u32 wdt_reset_mask_shift; > +}; > > struct aspeed_wdt_config { > u32 ext_pulse_width_mask; > u32 irq_shift; > u32 irq_mask; > + struct aspeed_wdt_scu scu; > }; > > struct aspeed_wdt { > @@ -39,18 +49,39 @@ static const struct aspeed_wdt_config ast2400_config = { > .ext_pulse_width_mask = 0xff, > .irq_shift = 0, > .irq_mask = 0, > + .scu = { > + .compatible = "aspeed,ast2400-scu", > + .reset_status_reg = 0x3c, > + .wdt_reset_mask = 0x1, > + .wdt_sw_reset_mask = 0, > + .wdt_reset_mask_shift = 1, > + }, > }; > > static const struct aspeed_wdt_config ast2500_config = { > .ext_pulse_width_mask = 0xfffff, > .irq_shift = 12, > .irq_mask = GENMASK(31, 12), > + .scu = { > + .compatible = "aspeed,ast2500-scu", > + .reset_status_reg = 0x3c, > + .wdt_reset_mask = 0x1, > + .wdt_sw_reset_mask = 0, > + .wdt_reset_mask_shift = 2, > + }, > }; > > static const struct aspeed_wdt_config ast2600_config = { > .ext_pulse_width_mask = 0xfffff, > .irq_shift = 0, > .irq_mask = GENMASK(31, 10), > + .scu = { > + .compatible = "aspeed,ast2600-scu", > + .reset_status_reg = 0x74, > + .wdt_reset_mask = 0xf, > + .wdt_sw_reset_mask = 0x8, > + .wdt_reset_mask_shift = 16, > + }, > }; > > static const struct of_device_id aspeed_wdt_of_table[] = { > @@ -213,6 +244,52 @@ static int aspeed_wdt_restart(struct watchdog_device *wdd, > return 0; > } > > +static int aspeed_wdt_update_bootstatus(struct platform_device *pdev, > + struct aspeed_wdt *wdt) > +{ > + struct resource *res; > + struct aspeed_wdt_scu scu = wdt->cfg->scu; > + struct regmap *scu_base; > + u32 reset_mask_width; > + u32 reset_mask_shift; > + u32 reg_size = 0; Please no unnecesary initializations. > + u32 idx = 0; > + u32 status; > + int ret; > + > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > + reg_size = res->end - res->start; > + > + if (reg_size != 0) > + idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size; > + > + /* On ast2400, only a bit is used to represent WDT reset */ > + if (of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt")) > + idx = 0; > + There is some redundancy in the above code, and platform_get_resource() can return NULL. If idx==0 for aspeed,ast2400-wdt anyway, the code can be rewritten as if (!of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt")) { res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (res) { reg_size = res->end - res->start; if (reg_size) idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size; } } > + scu_base = syscon_regmap_lookup_by_compatible(scu.compatible); > + if (IS_ERR(scu_base)) > + return PTR_ERR(scu_base); > + > + ret = regmap_read(scu_base, scu.reset_status_reg, &status); > + if (ret) > + return ret; The above only affects bootstatus. Why fail to load the driver just because bootstatus can not be read ? > + > + reset_mask_width = hweight32(scu.wdt_reset_mask); > + reset_mask_shift = scu.wdt_reset_mask_shift + > + reset_mask_width * idx; > + > + if (status & (scu.wdt_sw_reset_mask << reset_mask_shift)) > + wdt->wdd.bootstatus = WDIOF_EXTERN1; > + else if (status & (scu.wdt_reset_mask << reset_mask_shift)) > + wdt->wdd.bootstatus = WDIOF_CARDRESET; > + else > + wdt->wdd.bootstatus = 0; That is already 0. > + > + return regmap_write(scu_base, scu.reset_status_reg, > + scu.wdt_reset_mask << reset_mask_shift); > +} > + > /* access_cs0 shows if cs0 is accessible, hence the reverted bit */ > static ssize_t access_cs0_show(struct device *dev, > struct device_attribute *attr, char *buf) > @@ -458,10 +535,12 @@ static int aspeed_wdt_probe(struct platform_device *pdev) > writel(duration - 1, wdt->base + WDT_RESET_WIDTH); > } > > + ret = aspeed_wdt_update_bootstatus(pdev, wdt); > + if (ret) > + return ret; > + > status = readl(wdt->base + WDT_TIMEOUT_STATUS); > if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) { > - wdt->wdd.bootstatus = WDIOF_CARDRESET; > - > if (of_device_is_compatible(np, "aspeed,ast2400-wdt") || > of_device_is_compatible(np, "aspeed,ast2500-wdt")) > wdt->wdd.groups = bswitch_groups;
On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote: > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote: > > The boot status mapping rule follows the latest design guide from > > the OpenBMC shown as below. > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design > > - WDIOF_EXTERN1 => system is reset by Software > > - WDIOF_CARDRESET => system is reset by WDT SoC reset > > - Others => other reset events, e.g., power on reset. > > I'm quite surprised that the above is relevant for a kernel driver at > all. Isn't "EXTERN1" a name of a real watchdog signal from your > hardware (my recollection is that there are 2 external watchdogs). I think you may be referring to WDTRST1 (and WDTRST2) here. WDIOF_EXTERN1 and WDIOF_EXTERN2 have an unrelated history: https://github.com/torvalds/linux/blame/master/include/uapi/linux/watchdog.h > I > think the point of this referenced design document was that most > users > of BMCs have "EXTERN1" used a for software reset conditions. > `CARDRESET` should be representing resets by the watchdog itself. I think this is what Chin-Ting is proposing for the Aspeed driver? > > The purpose of this design proposal was not to require very specific > changes to individual watchdog drivers, but to align the userspace > use > with the best practices already from other watchdog drivers. I don't > think the kernel driver should be bending to match a particular > userspace implementation; you should be exposing the information > available to your hardware. I agree with this in principle, but I'm not sure what else is meant to be done here. > > Having said that, it was known that there would need to be changes to > the driver because some of these conditions were not adequately > exposed > at all. I'm just still surprised that we're needing to reference > that > document as part of these changes. I think the main question is whether an internal, graceful (userspace- requested) reset is a reasonable use of WDIOF_EXTERN[12]. My feeling no. I wonder whether defining a new flag (WDIOF_REBOOT? WDIOF_GRACEFUL?) in the UAPI would be acceptable? Andrew
Hi Patrick, Thanks for the check and reply. > -----Original Message----- > From: Patrick Williams <patrick@stwcx.xyz> > Sent: Saturday, November 2, 2024 2:21 AM > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling > > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote: > > The boot status mapping rule follows the latest design guide from the > > OpenBMC shown as below. > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-up > date.md#proposed-design > > - WDIOF_EXTERN1 => system is reset by Software > > - WDIOF_CARDRESET => system is reset by WDT SoC reset > > - Others => other reset events, e.g., power on reset. > > I'm quite surprised that the above is relevant for a kernel driver at all. Isn't > "EXTERN1" a name of a real watchdog signal from your hardware (my > recollection is that there are 2 external watchdogs). I think the point of this > referenced design document was that most users of BMCs have "EXTERN1" > used a for software reset conditions. > `CARDRESET` should be representing resets by the watchdog itself. > ASPEED WDT controller is able to generate an external signal, wdt_ext, when the WDT timeout occurs. However, after system is rebooted, WDT controller doesn't know whether wdt_ext signal is generated previously. Moreover, whether BMC is reset due to this wdt_ext signal depends on the HW board design. On some boards, wdt_ext can affect EXTRST#, BMC external reset signal, indirectly. For this scenario, EXTERN1 can be classified to wdt_ext. On the other boards, wdt_ext just represents WDT timeout event for specific tasks. Thus, EXTERN1 boot status can be ignored in ASPEED WDT driver and just implement "CARDRESET" and "others" since EXTERN1 is not always affected/controlled by WDT controller directly. Or, an additional property in dts can be added to distinguish whether the EXTRST# reset event is triggered by wdt_ext signal. > The purpose of this design proposal was not to require very specific changes to > individual watchdog drivers, but to align the userspace use with the best > practices already from other watchdog drivers. I don't think the kernel driver > should be bending to match a particular userspace implementation; you should > be exposing the information available to your hardware. > Agree. > Having said that, it was known that there would need to be changes to the > driver because some of these conditions were not adequately exposed at all. > I'm just still surprised that we're needing to reference that document as part of > these changes. > Yes, if only "CARDRESET" and "others" types are taken into consideration, some patches are still needed to complete the boot status mechanism in ASPEED WDT driver. > > > > On ASPEED platform, the boot status is recorded in the SCU registers. > > - AST2400: Only a bit represents for any WDT reset. > > - AST2500: The reset triggered by different WDT controllers can be > > distinguished by different SCU bits. But, WDIOF_EXTERN1 or > > WDIOF_CARDRESET still cannot be identified due to > > HW limitation. > > - AST2600: Different from AST2500, additional HW bits are added for > > distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET. > > -- > Patrick Williams Chin-Ting
Hi Guenter, Thanks for the review. > -----Original Message----- > From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck > Sent: Saturday, November 2, 2024 6:17 AM > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling > > On 11/1/24 05:11, Chin-Ting Kuo wrote: > > The boot status in the watchdog device struct is updated during > > controller probe stage. Application layer can get the boot status > > through the command, cat /sys/class/watchdog/watchdogX/bootstatus. > > > > The boot status mapping rule follows the latest design guide from the > > OpenBMC shown as below. > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-up > date.md#proposed-design > > - WDIOF_EXTERN1 => system is reset by Software > > - WDIOF_CARDRESET => system is reset by WDT SoC reset > > - Others => other reset events, e.g., power on reset. > > > > On ASPEED platform, the boot status is recorded in the SCU registers. > > - AST2400: Only a bit represents for any WDT reset. > > - AST2500: The reset triggered by different WDT controllers can be > > distinguished by different SCU bits. But, WDIOF_EXTERN1 or > > WDIOF_CARDRESET still cannot be identified due to > > HW limitation. > > - AST2600: Different from AST2500, additional HW bits are added for > > distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET. > > > > Besides, since alternating boot event is triggered by WDT SoC reset, > > it is classified as WDIOF_CARDRESET. > > > > Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> > > --- > > drivers/watchdog/aspeed_wdt.c | 83 > ++++++++++++++++++++++++++++++++++- > > 1 file changed, 81 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/watchdog/aspeed_wdt.c > > b/drivers/watchdog/aspeed_wdt.c index b4773a6aaf8c..4ad6335ff25b > > 100644 > > --- a/drivers/watchdog/aspeed_wdt.c > > +++ b/drivers/watchdog/aspeed_wdt.c > > @@ -11,21 +11,31 @@ > > #include <linux/io.h> > > #include <linux/kernel.h> > > #include <linux/kstrtox.h> > > +#include <linux/mfd/syscon.h> > > #include <linux/module.h> > > #include <linux/of.h> > > #include <linux/of_irq.h> > > #include <linux/platform_device.h> > > +#include <linux/regmap.h> > > #include <linux/watchdog.h> > > > > static bool nowayout = WATCHDOG_NOWAYOUT; > > module_param(nowayout, bool, 0); > > MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once > started (default=" > > __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); > > +struct aspeed_wdt_scu { > > + const char *compatible; > > + u32 reset_status_reg; > > + u32 wdt_reset_mask; > > + u32 wdt_sw_reset_mask; > > + u32 wdt_reset_mask_shift; > > +}; > > > > struct aspeed_wdt_config { > > u32 ext_pulse_width_mask; > > u32 irq_shift; > > u32 irq_mask; > > + struct aspeed_wdt_scu scu; > > }; > > > > struct aspeed_wdt { > > @@ -39,18 +49,39 @@ static const struct aspeed_wdt_config > ast2400_config = { > > .ext_pulse_width_mask = 0xff, > > .irq_shift = 0, > > .irq_mask = 0, > > + .scu = { > > + .compatible = "aspeed,ast2400-scu", > > + .reset_status_reg = 0x3c, > > + .wdt_reset_mask = 0x1, > > + .wdt_sw_reset_mask = 0, > > + .wdt_reset_mask_shift = 1, > > + }, > > }; > > > > static const struct aspeed_wdt_config ast2500_config = { > > .ext_pulse_width_mask = 0xfffff, > > .irq_shift = 12, > > .irq_mask = GENMASK(31, 12), > > + .scu = { > > + .compatible = "aspeed,ast2500-scu", > > + .reset_status_reg = 0x3c, > > + .wdt_reset_mask = 0x1, > > + .wdt_sw_reset_mask = 0, > > + .wdt_reset_mask_shift = 2, > > + }, > > }; > > > > static const struct aspeed_wdt_config ast2600_config = { > > .ext_pulse_width_mask = 0xfffff, > > .irq_shift = 0, > > .irq_mask = GENMASK(31, 10), > > + .scu = { > > + .compatible = "aspeed,ast2600-scu", > > + .reset_status_reg = 0x74, > > + .wdt_reset_mask = 0xf, > > + .wdt_sw_reset_mask = 0x8, > > + .wdt_reset_mask_shift = 16, > > + }, > > }; > > > > static const struct of_device_id aspeed_wdt_of_table[] = { @@ -213,6 > > +244,52 @@ static int aspeed_wdt_restart(struct watchdog_device *wdd, > > return 0; > > } > > > > +static int aspeed_wdt_update_bootstatus(struct platform_device *pdev, > > + struct aspeed_wdt *wdt) > > +{ > > + struct resource *res; > > + struct aspeed_wdt_scu scu = wdt->cfg->scu; > > + struct regmap *scu_base; > > + u32 reset_mask_width; > > + u32 reset_mask_shift; > > + u32 reg_size = 0; > > Please no unnecesary initializations. > Okay, it will be updated in the next patch version. > > + u32 idx = 0; > > + u32 status; > > + int ret; > > + > > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > > + reg_size = res->end - res->start; > > + > > + if (reg_size != 0) > > + idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size; > > + > > + /* On ast2400, only a bit is used to represent WDT reset */ > > + if (of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt")) > > + idx = 0; > > + > > There is some redundancy in the above code, and platform_get_resource() can > return NULL. If idx==0 for aspeed,ast2400-wdt anyway, the code can be > rewritten as > > if (!of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt")) > { > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > if (res) { > reg_size = res->end - res->start; > if (reg_size) > idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size; > } > } > Okay, it looks more concise. Thanks for the reminder. > > + scu_base = syscon_regmap_lookup_by_compatible(scu.compatible); > > + if (IS_ERR(scu_base)) > > + return PTR_ERR(scu_base); > > + > > + ret = regmap_read(scu_base, scu.reset_status_reg, &status); > > + if (ret) > > + return ret; > > The above only affects bootstatus. Why fail to load the driver just because > bootstatus can not be read ? > It will be updated in the next patch. > > + > > + reset_mask_width = hweight32(scu.wdt_reset_mask); > > + reset_mask_shift = scu.wdt_reset_mask_shift + > > + reset_mask_width * idx; > > + > > + if (status & (scu.wdt_sw_reset_mask << reset_mask_shift)) > > + wdt->wdd.bootstatus = WDIOF_EXTERN1; > > + else if (status & (scu.wdt_reset_mask << reset_mask_shift)) > > + wdt->wdd.bootstatus = WDIOF_CARDRESET; > > + else > > + wdt->wdd.bootstatus = 0; > > That is already 0. > Okay. > > + > > + return regmap_write(scu_base, scu.reset_status_reg, > > + scu.wdt_reset_mask << reset_mask_shift); } > > + > > /* access_cs0 shows if cs0 is accessible, hence the reverted bit */ > > static ssize_t access_cs0_show(struct device *dev, > > struct device_attribute *attr, char *buf) @@ > -458,10 > > +535,12 @@ static int aspeed_wdt_probe(struct platform_device *pdev) > > writel(duration - 1, wdt->base + WDT_RESET_WIDTH); > > } > > > > + ret = aspeed_wdt_update_bootstatus(pdev, wdt); > > + if (ret) > > + return ret; > > + > > status = readl(wdt->base + WDT_TIMEOUT_STATUS); > > if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) { > > - wdt->wdd.bootstatus = WDIOF_CARDRESET; > > - > > if (of_device_is_compatible(np, "aspeed,ast2400-wdt") || > > of_device_is_compatible(np, "aspeed,ast2500-wdt")) > > wdt->wdd.groups = bswitch_groups; Chin-Ting
Hi Andrew, Thanks for the check. > -----Original Message----- > From: Andrew Jeffery <andrew@codeconstruct.com.au> > Sent: Monday, November 4, 2024 8:02 AM > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling > > On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote: > > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote: > > > The boot status mapping rule follows the latest design guide from > > > the OpenBMC shown as below. > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause > > > -update.md#proposed-design > > > - WDIOF_EXTERN1 => system is reset by Software > > > - WDIOF_CARDRESET => system is reset by WDT SoC reset > > > - Others => other reset events, e.g., power on reset. > > > > I'm quite surprised that the above is relevant for a kernel driver at > > all. Isn't "EXTERN1" a name of a real watchdog signal from your > > hardware (my recollection is that there are 2 external watchdogs). > > I think you may be referring to WDTRST1 (and WDTRST2) here. > WDTRST1, wdt_ext, is a pulse signal generated when WDT timeout occurs. However, depending on the HW board design, wdt_ext doesn’t always affect the system reset. Thus, EXTERN1 boot status can be ignored in ASPEED WDT driver and just implement "CARDRESET" and "others" types since EXTERN1 is not always affected/controlled by WDT controller directly. Or, an additional property in dts can be added to distinguish whether the current EXTRST# reset event is triggered by wdt_ext signal. > WDIOF_EXTERN1 and WDIOF_EXTERN2 have an unrelated history: > > https://github.com/torvalds/linux/blame/master/include/uapi/linux/watchdog. > h > > > I > > think the point of this referenced design document was that most users > > of BMCs have "EXTERN1" used a for software reset conditions. > > `CARDRESET` should be representing resets by the watchdog itself. > > I think this is what Chin-Ting is proposing for the Aspeed driver? > Yes. > > > > The purpose of this design proposal was not to require very specific > > changes to individual watchdog drivers, but to align the userspace use > > with the best practices already from other watchdog drivers. I don't > > think the kernel driver should be bending to match a particular > > userspace implementation; you should be exposing the information > > available to your hardware. > > I agree with this in principle, but I'm not sure what else is meant to be done > here. > > > > > Having said that, it was known that there would need to be changes to > > the driver because some of these conditions were not adequately > > exposed at all. I'm just still surprised that we're needing to > > reference that document as part of these changes. > > I think the main question is whether an internal, graceful (userspace- > requested) reset is a reasonable use of WDIOF_EXTERN[12]. My feeling no. I > wonder whether defining a new flag (WDIOF_REBOOT? > WDIOF_GRACEFUL?) in the UAPI would be acceptable? > Agree, but this is out of the scope of this patch series and can be discussed and implemented in the other future patches. The purpose of this patch series is to complete the boot status mechanism based on the current reset reason. > Andrew Chin-Ting
Hi Chin-Ting, On Thu, 2024-11-07 at 05:35 +0000, Chin-Ting Kuo wrote: > Hi Andrew, > > Thanks for the check. > > > -----Original Message----- > > From: Andrew Jeffery <andrew@codeconstruct.com.au> > > Sent: Monday, November 4, 2024 8:02 AM > > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus > > handling > > > > On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote: > > > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote: > > > > The boot status mapping rule follows the latest design guide > > > > from > > > > the OpenBMC shown as below. > > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause > > > > -update.md#proposed-design > > > > - WDIOF_EXTERN1 => system is reset by Software > > > > - WDIOF_CARDRESET => system is reset by WDT SoC reset > > > > - Others => other reset events, e.g., power on reset. > > > > > > I'm quite surprised that the above is relevant for a kernel > > > driver at > > > all. Isn't "EXTERN1" a name of a real watchdog signal from your > > > hardware (my recollection is that there are 2 external > > > watchdogs). > > > > I think you may be referring to WDTRST1 (and WDTRST2) here. > > > > WDTRST1, wdt_ext, is a pulse signal generated when WDT timeout > occurs. However, depending on the HW board design, wdt_ext doesn’t > always affect the system reset. Thus, EXTERN1 boot status can be > ignored in ASPEED WDT driver and just implement "CARDRESET" and > "others" types since EXTERN1 is not always affected/controlled by WDT > controller directly. Or, an additional property in dts can be added > to > distinguish whether the current EXTRST# reset event is triggered by > wdt_ext signal. Yep, I understand how it works. I was responding to Patrick's query to clear up some confusion around the watchdog signal names. > > > > > > > > Having said that, it was known that there would need to be > > > changes to > > > the driver because some of these conditions were not adequately > > > exposed at all. I'm just still surprised that we're needing to > > > reference that document as part of these changes. > > > > I think the main question is whether an internal, graceful > > (userspace- > > requested) reset is a reasonable use of WDIOF_EXTERN[12]. My > > feeling no. I > > wonder whether defining a new flag (WDIOF_REBOOT? > > WDIOF_GRACEFUL?) in the UAPI would be acceptable? > > > > Agree, but this is out of the scope of this patch series and can be > discussed and > implemented in the other future patches. I disagree, because then you're changing the userspace-visible behaviour of the driver yet again. I don't prefer the proposed patch as the way forward because I think it is abusing the meaning of WDIOF_EXTERN1. I think the concept needs input from the watchdog maintainers. Andrew
Hi Andrew, > -----Original Message----- > From: Andrew Jeffery <andrew@codeconstruct.com.au> > Sent: Friday, November 8, 2024 7:50 AM > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling > > Hi Chin-Ting, > > On Thu, 2024-11-07 at 05:35 +0000, Chin-Ting Kuo wrote: > > Hi Andrew, > > > > Thanks for the check. > > > > > -----Original Message----- > > > From: Andrew Jeffery <andrew@codeconstruct.com.au> > > > Sent: Monday, November 4, 2024 8:02 AM > > > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus > > > handling > > > > > > On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote: > > > > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote: > > > > > The boot status mapping rule follows the latest design guide > > > > > from the OpenBMC shown as below. > > > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-c > > > > > ause > > > > > -update.md#proposed-design > > > > > - WDIOF_EXTERN1 => system is reset by Software > > > > > - WDIOF_CARDRESET => system is reset by WDT SoC reset > > > > > - Others => other reset events, e.g., power on reset. > > > > > > > > I'm quite surprised that the above is relevant for a kernel driver > > > > at all. Isn't "EXTERN1" a name of a real watchdog signal from > > > > your hardware (my recollection is that there are 2 external > > > > watchdogs). > > > > > > I think you may be referring to WDTRST1 (and WDTRST2) here. > > > > > > > WDTRST1, wdt_ext, is a pulse signal generated when WDT timeout occurs. > > However, depending on the HW board design, wdt_ext doesn’t always > > affect the system reset. Thus, EXTERN1 boot status can be ignored in > > ASPEED WDT driver and just implement "CARDRESET" and "others" types > > since EXTERN1 is not always affected/controlled by WDT controller > > directly. Or, an additional property in dts can be added to > > distinguish whether the current EXTRST# reset event is triggered by > > wdt_ext signal. > > Yep, I understand how it works. I was responding to Patrick's query to clear up > some confusion around the watchdog signal names. > Okay. > > > > > > > > > > > Having said that, it was known that there would need to be changes > > > > to the driver because some of these conditions were not adequately > > > > exposed at all. I'm just still surprised that we're needing to > > > > reference that document as part of these changes. > > > > > > I think the main question is whether an internal, graceful > > > (userspace- > > > requested) reset is a reasonable use of WDIOF_EXTERN[12]. My feeling > > > no. I wonder whether defining a new flag (WDIOF_REBOOT? > > > WDIOF_GRACEFUL?) in the UAPI would be acceptable? > > > > > > > Agree, but this is out of the scope of this patch series and can be > > discussed and > > implemented in the other future patches. > > I disagree, because then you're changing the userspace-visible > behaviour of the driver yet again. I don't prefer the proposed patch as > the way forward because I think it is abusing the meaning of > WDIOF_EXTERN1. I think the concept needs input from the watchdog > maintainers. > Previously, I meant that only implement "CARDRESET " and "others" reset type to complete the current driver for "CARDRESET ". The implementation of SW restart mechanism can be postponed to the next new patch series. But now, I think it will be better to add a patch for creating a new reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h of uapi. Can I include this change, creating a new reset reason, in this patch series? Or, should I create an extra new patch series for this purpose? > Andrew Chin-Ting
On 11/7/24 21:42, Chin-Ting Kuo wrote: > But now, I think it will be better to add a patch for creating a new > reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h > of uapi. Can I include this change, creating a new reset reason, in > this patch series? Or, should I create an extra new patch series for > this purpose? > This is a UAPI change. That is a major change. It needs to be discussed separately, on its own, and can not be sneaked in like this. Guenter
Hi Guenter, Thanks for the reply. > -----Original Message----- > From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck > Sent: Friday, November 8, 2024 10:08 PM > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling > > On 11/7/24 21:42, Chin-Ting Kuo wrote: > > > But now, I think it will be better to add a patch for creating a new > > reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h of > > uapi. Can I include this change, creating a new reset reason, in this > > patch series? Or, should I create an extra new patch series for this > > purpose? > > > > This is a UAPI change. That is a major change. It needs to be discussed > separately, on its own, and can not be sneaked in like this. > Agree. However, how to trigger this discussion? Can I just send a new patch separately with only this UAPI modification? This is the first time I change such common source code. > Guenter Chin-Ting
On 11/18/24 04:46, Chin-Ting Kuo wrote: > Hi Guenter, > > Thanks for the reply. > >> -----Original Message----- >> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck >> Sent: Friday, November 8, 2024 10:08 PM >> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling >> >> On 11/7/24 21:42, Chin-Ting Kuo wrote: >> >>> But now, I think it will be better to add a patch for creating a new >>> reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h of >>> uapi. Can I include this change, creating a new reset reason, in this >>> patch series? Or, should I create an extra new patch series for this >>> purpose? >>> >> >> This is a UAPI change. That is a major change. It needs to be discussed >> separately, on its own, and can not be sneaked in like this. >> > > Agree. However, how to trigger this discussion? Can I just send a new > patch separately with only this UAPI modification? This is the first time > I change such common source code. > Yes. That needs to include arguments explaining why this specific new flag even adds value. I for my part don't immediately see that value. Guenter
On Mon, 2024-11-18 at 12:50 -0800, Guenter Roeck wrote: > On 11/18/24 04:46, Chin-Ting Kuo wrote: > > Hi Guenter, > > > > Thanks for the reply. > > > > > -----Original Message----- > > > From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter > > > Roeck > > > Sent: Friday, November 8, 2024 10:08 PM > > > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus > > > handling > > > > > > On 11/7/24 21:42, Chin-Ting Kuo wrote: > > > > > > > But now, I think it will be better to add a patch for creating > > > > a new > > > > reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in > > > > watchdog.h of > > > > uapi. Can I include this change, creating a new reset reason, > > > > in this > > > > patch series? Or, should I create an extra new patch series for > > > > this > > > > purpose? > > > > > > > > > > This is a UAPI change. That is a major change. It needs to be > > > discussed > > > separately, on its own, and can not be sneaked in like this. > > > > > > > Agree. However, how to trigger this discussion? Can I just send a > > new > > patch separately with only this UAPI modification? This is the > > first time > > I change such common source code. > > > > Yes. That needs to include arguments explaining why this specific new > flag > even adds value. I for my part don't immediately see that value. So maybe I was derailed with my WDIOF_REBOOT suggestion by the proposal to repurpose WDIOF_EXTERN1 to indicate a regular reboot. I still don't think repurposing WDIOF_EXTERN1 is the right direction. But, perhaps the thing to do for a regular reboot is to not set any reason flags at all? It just depends on whether we're wanting to separate a cold boot from a reboot (as they _may_ behave differently on Aspeed hardware), as on a cold boot we wouldn't set any reason flags either. Andrew
On 11/18/24 15:00, Andrew Jeffery wrote: > On Mon, 2024-11-18 at 12:50 -0800, Guenter Roeck wrote: >> On 11/18/24 04:46, Chin-Ting Kuo wrote: >>> Hi Guenter, >>> >>> Thanks for the reply. >>> >>>> -----Original Message----- >>>> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter >>>> Roeck >>>> Sent: Friday, November 8, 2024 10:08 PM >>>> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus >>>> handling >>>> >>>> On 11/7/24 21:42, Chin-Ting Kuo wrote: >>>> >>>>> But now, I think it will be better to add a patch for creating >>>>> a new >>>>> reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in >>>>> watchdog.h of >>>>> uapi. Can I include this change, creating a new reset reason, >>>>> in this >>>>> patch series? Or, should I create an extra new patch series for >>>>> this >>>>> purpose? >>>>> >>>> >>>> This is a UAPI change. That is a major change. It needs to be >>>> discussed >>>> separately, on its own, and can not be sneaked in like this. >>>> >>> >>> Agree. However, how to trigger this discussion? Can I just send a >>> new >>> patch separately with only this UAPI modification? This is the >>> first time >>> I change such common source code. >>> >> >> Yes. That needs to include arguments explaining why this specific new >> flag >> even adds value. I for my part don't immediately see that value. > > So maybe I was derailed with my WDIOF_REBOOT suggestion by the proposal > to repurpose WDIOF_EXTERN1 to indicate a regular reboot. I still don't > think repurposing WDIOF_EXTERN1 is the right direction. But, perhaps > the thing to do for a regular reboot is to not set any reason flags at > all? It just depends on whether we're wanting to separate a cold boot > from a reboot (as they _may_ behave differently on Aspeed hardware), as > on a cold boot we wouldn't set any reason flags either. > Thew point here is that this is just a warm reboot which happens to use the watchdog as vehicle trigger a reset. A UAPI change along that line would tell the user just that, and I don't see the value in it. FWIW, the only really valuable flag is WDIOF_CARDRESET. All others are at best misleading since the watchdog isn't typically involved in making policy decisions such as WDIOF_OVERHEAT or WDIOF_FANFAULT and thus can not report such reasons to userspace. Unfortunately we can at best deprecate all those existing flags. At the same time I really don't want to introduce new ones unless they provide real value. While it might be valuable to know if a reboot was a cold or a warm reboot, the watchdog subsystem is not involved in this either on almost all platforms, meaning userspace can not really rely on it. Yes, Aspeed hardware may behave differently, but typically all those differences would need to be handled in the kernel when (re-)initializing the hardware, not in userspace. So, again, what exactly would userspace do with the information that this was a watchdog triggered warm reboot ? Why would it need that information ? Thanks, Guenter
On Mon, 2024-11-18 at 17:27 -0800, Guenter Roeck wrote: > So, again, what exactly would userspace do with the information that > this > was a watchdog triggered warm reboot ? Why would it need that > information ? I'll defer to the others on To/Cc to answer that. My only position is I don't think changing behaviour of existing drivers to exploit WDIOF_EXTERN1 as a graceful-reboot indicator is a good idea either. Obviously I don't have much skin in the game with watchdog maintenance, so my thoughts shouldn't have much influence beyond the Aspeed-specifics, but I just didn't want to see some fun new confusion or incompatibility arise as a result. Andrew
Hi Andrew, > -----Original Message----- > From: Andrew Jeffery <andrew@codeconstruct.com.au> > Sent: Wednesday, November 20, 2024 12:54 PM > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling > > On Mon, 2024-11-18 at 17:27 -0800, Guenter Roeck wrote: > > So, again, what exactly would userspace do with the information that > > this was a watchdog triggered warm reboot ? Why would it need that > > information ? > > I'll defer to the others on To/Cc to answer that. > > My only position is I don't think changing behaviour of existing drivers to > exploit WDIOF_EXTERN1 as a graceful-reboot indicator is a good idea either. > Obviously I don't have much skin in the game with watchdog maintenance, so > my thoughts shouldn't have much influence beyond the Aspeed-specifics, but I > just didn't want to see some fun new confusion or incompatibility arise as a > result. > Agree to your misgiving, in the next patches, only two categories, "Power on reset" and "WDT reset" (Card reset), will be taken into consideration. The graceful-reboot scenario will be postponed to the patches in the future. Thanks, Chin-Ting