mbox series

[v4,0/3] Update ASPEED WDT bootstatus

Message ID 20241101121201.2464091-1-chin-ting_kuo@aspeedtech.com
Headers show
Series Update ASPEED WDT bootstatus | expand

Message

Chin-Ting Kuo Nov. 1, 2024, 12:11 p.m. UTC
This patch series inherits the patch submitted by Peter.
https://patchwork.kernel.org/project/linux-watchdog/patch/20240430143114.1323686-2-peteryin.openbmc@gmail.com/
Besides, the boot status updated in the WDT driver
obeys the rules proposed in the OpenBMC.
https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design
Moreover, WDT SW restart mechanism is supported by HW
since AST2600 platform and is also included in this
patch series.

Changes in v2:
  - Support SW restart on AST2600 by default without
    adding any dts property.

Changes in v3:
  - Get watchdog controller index by dividing register
    base offset by register size.

Changes in v4:
  - Update the commit message for updating bootstatus
    handling patch.
  - Rename aspeed_wdt_config struct to aspeed_wdt_data.
  - Create restart callback function.

Chin-Ting Kuo (3):
  watchdog: aspeed: Update bootstatus handling
  watchdog: aspeed: Change aspeed_wdt_config struct name to
    aspeed_wdt_data
  watchdog: aspeed: Update restart implementations

 drivers/watchdog/aspeed_wdt.c | 170 ++++++++++++++++++++++++++++------
 1 file changed, 144 insertions(+), 26 deletions(-)

Comments

Patrick Williams Nov. 1, 2024, 6:21 p.m. UTC | #1
On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote:
> The boot status mapping rule follows the latest design guide from
> the OpenBMC shown as below.
> https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design
> - WDIOF_EXTERN1   => system is reset by Software
> - WDIOF_CARDRESET => system is reset by WDT SoC reset
> - Others          => other reset events, e.g., power on reset.

I'm quite surprised that the above is relevant for a kernel driver at
all.  Isn't "EXTERN1" a name of a real watchdog signal from your
hardware (my recollection is that there are 2 external watchdogs).  I
think the point of this referenced design document was that most users
of BMCs have "EXTERN1" used a for software reset conditions.
`CARDRESET` should be representing resets by the watchdog itself.

The purpose of this design proposal was not to require very specific
changes to individual watchdog drivers, but to align the userspace use
with the best practices already from other watchdog drivers.  I don't
think the kernel driver should be bending to match a particular
userspace implementation; you should be exposing the information
available to your hardware.

Having said that, it was known that there would need to be changes to
the driver because some of these conditions were not adequately exposed
at all.  I'm just still surprised that we're needing to reference that
document as part of these changes.

> 
> On ASPEED platform, the boot status is recorded in the SCU registers.
> - AST2400: Only a bit represents for any WDT reset.
> - AST2500: The reset triggered by different WDT controllers can be
>            distinguished by different SCU bits. But, WDIOF_EXTERN1 or
>            WDIOF_CARDRESET still cannot be identified due to
>            HW limitation.
> - AST2600: Different from AST2500, additional HW bits are added for
>            distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET.
Guenter Roeck Nov. 1, 2024, 10:16 p.m. UTC | #2
On 11/1/24 05:11, Chin-Ting Kuo wrote:
> The boot status in the watchdog device struct is updated during
> controller probe stage. Application layer can get the boot status
> through the command, cat /sys/class/watchdog/watchdogX/bootstatus.
> 
> The boot status mapping rule follows the latest design guide from
> the OpenBMC shown as below.
> https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design
> - WDIOF_EXTERN1   => system is reset by Software
> - WDIOF_CARDRESET => system is reset by WDT SoC reset
> - Others          => other reset events, e.g., power on reset.
> 
> On ASPEED platform, the boot status is recorded in the SCU registers.
> - AST2400: Only a bit represents for any WDT reset.
> - AST2500: The reset triggered by different WDT controllers can be
>             distinguished by different SCU bits. But, WDIOF_EXTERN1 or
>             WDIOF_CARDRESET still cannot be identified due to
>             HW limitation.
> - AST2600: Different from AST2500, additional HW bits are added for
>             distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET.
> 
> Besides, since alternating boot event is triggered by WDT SoC reset,
> it is classified as WDIOF_CARDRESET.
> 
> Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com>
> ---
>   drivers/watchdog/aspeed_wdt.c | 83 ++++++++++++++++++++++++++++++++++-
>   1 file changed, 81 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/watchdog/aspeed_wdt.c b/drivers/watchdog/aspeed_wdt.c
> index b4773a6aaf8c..4ad6335ff25b 100644
> --- a/drivers/watchdog/aspeed_wdt.c
> +++ b/drivers/watchdog/aspeed_wdt.c
> @@ -11,21 +11,31 @@
>   #include <linux/io.h>
>   #include <linux/kernel.h>
>   #include <linux/kstrtox.h>
> +#include <linux/mfd/syscon.h>
>   #include <linux/module.h>
>   #include <linux/of.h>
>   #include <linux/of_irq.h>
>   #include <linux/platform_device.h>
> +#include <linux/regmap.h>
>   #include <linux/watchdog.h>
>   
>   static bool nowayout = WATCHDOG_NOWAYOUT;
>   module_param(nowayout, bool, 0);
>   MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
>   				__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
> +struct aspeed_wdt_scu {
> +	const char *compatible;
> +	u32 reset_status_reg;
> +	u32 wdt_reset_mask;
> +	u32 wdt_sw_reset_mask;
> +	u32 wdt_reset_mask_shift;
> +};
>   
>   struct aspeed_wdt_config {
>   	u32 ext_pulse_width_mask;
>   	u32 irq_shift;
>   	u32 irq_mask;
> +	struct aspeed_wdt_scu scu;
>   };
>   
>   struct aspeed_wdt {
> @@ -39,18 +49,39 @@ static const struct aspeed_wdt_config ast2400_config = {
>   	.ext_pulse_width_mask = 0xff,
>   	.irq_shift = 0,
>   	.irq_mask = 0,
> +	.scu = {
> +		.compatible = "aspeed,ast2400-scu",
> +		.reset_status_reg = 0x3c,
> +		.wdt_reset_mask = 0x1,
> +		.wdt_sw_reset_mask = 0,
> +		.wdt_reset_mask_shift = 1,
> +	},
>   };
>   
>   static const struct aspeed_wdt_config ast2500_config = {
>   	.ext_pulse_width_mask = 0xfffff,
>   	.irq_shift = 12,
>   	.irq_mask = GENMASK(31, 12),
> +	.scu = {
> +		.compatible = "aspeed,ast2500-scu",
> +		.reset_status_reg = 0x3c,
> +		.wdt_reset_mask = 0x1,
> +		.wdt_sw_reset_mask = 0,
> +		.wdt_reset_mask_shift = 2,
> +	},
>   };
>   
>   static const struct aspeed_wdt_config ast2600_config = {
>   	.ext_pulse_width_mask = 0xfffff,
>   	.irq_shift = 0,
>   	.irq_mask = GENMASK(31, 10),
> +	.scu = {
> +		.compatible = "aspeed,ast2600-scu",
> +		.reset_status_reg = 0x74,
> +		.wdt_reset_mask = 0xf,
> +		.wdt_sw_reset_mask = 0x8,
> +		.wdt_reset_mask_shift = 16,
> +	},
>   };
>   
>   static const struct of_device_id aspeed_wdt_of_table[] = {
> @@ -213,6 +244,52 @@ static int aspeed_wdt_restart(struct watchdog_device *wdd,
>   	return 0;
>   }
>   
> +static int aspeed_wdt_update_bootstatus(struct platform_device *pdev,
> +					struct aspeed_wdt *wdt)
> +{
> +	struct resource *res;
> +	struct aspeed_wdt_scu scu = wdt->cfg->scu;
> +	struct regmap *scu_base;
> +	u32 reset_mask_width;
> +	u32 reset_mask_shift;
> +	u32 reg_size = 0;

Please no unnecesary initializations.

> +	u32 idx = 0;
> +	u32 status;
> +	int ret;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	reg_size = res->end - res->start;
> +
> +	if (reg_size != 0)
> +		idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size;
> +
> +	/* On ast2400, only a bit is used to represent WDT reset */
> +	if (of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt"))
> +		idx = 0;
> +

There is some redundancy in the above code, and platform_get_resource()
can return NULL. If idx==0 for aspeed,ast2400-wdt anyway, the code can be
rewritten as

	if (!of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt")) {
		res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
		if (res) {
			reg_size = res->end - res->start;
			if (reg_size)
				idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size;
		}
	}

> +	scu_base = syscon_regmap_lookup_by_compatible(scu.compatible);
> +	if (IS_ERR(scu_base))
> +		return PTR_ERR(scu_base);
> +
> +	ret = regmap_read(scu_base, scu.reset_status_reg, &status);
> +	if (ret)
> +		return ret;

The above only affects bootstatus. Why fail to load the driver just because
bootstatus can not be read ?

> +
> +	reset_mask_width = hweight32(scu.wdt_reset_mask);
> +	reset_mask_shift = scu.wdt_reset_mask_shift +
> +			   reset_mask_width * idx;
> +
> +	if (status & (scu.wdt_sw_reset_mask << reset_mask_shift))
> +		wdt->wdd.bootstatus = WDIOF_EXTERN1;
> +	else if (status & (scu.wdt_reset_mask << reset_mask_shift))
> +		wdt->wdd.bootstatus = WDIOF_CARDRESET;
> +	else
> +		wdt->wdd.bootstatus = 0;

That is already 0.

> +
> +	return regmap_write(scu_base, scu.reset_status_reg,
> +			    scu.wdt_reset_mask << reset_mask_shift);
> +}
> +
>   /* access_cs0 shows if cs0 is accessible, hence the reverted bit */
>   static ssize_t access_cs0_show(struct device *dev,
>   			       struct device_attribute *attr, char *buf)
> @@ -458,10 +535,12 @@ static int aspeed_wdt_probe(struct platform_device *pdev)
>   		writel(duration - 1, wdt->base + WDT_RESET_WIDTH);
>   	}
>   
> +	ret = aspeed_wdt_update_bootstatus(pdev, wdt);
> +	if (ret)
> +		return ret;
> +
>   	status = readl(wdt->base + WDT_TIMEOUT_STATUS);
>   	if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) {
> -		wdt->wdd.bootstatus = WDIOF_CARDRESET;
> -
>   		if (of_device_is_compatible(np, "aspeed,ast2400-wdt") ||
>   		    of_device_is_compatible(np, "aspeed,ast2500-wdt"))
>   			wdt->wdd.groups = bswitch_groups;
Andrew Jeffery Nov. 4, 2024, 12:01 a.m. UTC | #3
On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote:
> On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote:
> > The boot status mapping rule follows the latest design guide from
> > the OpenBMC shown as below.
> > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-update.md#proposed-design
> > - WDIOF_EXTERN1   => system is reset by Software
> > - WDIOF_CARDRESET => system is reset by WDT SoC reset
> > - Others          => other reset events, e.g., power on reset.
> 
> I'm quite surprised that the above is relevant for a kernel driver at
> all.  Isn't "EXTERN1" a name of a real watchdog signal from your
> hardware (my recollection is that there are 2 external watchdogs).

I think you may be referring to WDTRST1 (and WDTRST2) here.

WDIOF_EXTERN1 and WDIOF_EXTERN2 have an unrelated history:

https://github.com/torvalds/linux/blame/master/include/uapi/linux/watchdog.h

>   I
> think the point of this referenced design document was that most
> users
> of BMCs have "EXTERN1" used a for software reset conditions.
> `CARDRESET` should be representing resets by the watchdog itself.

I think this is what Chin-Ting is proposing for the Aspeed driver?

> 
> The purpose of this design proposal was not to require very specific
> changes to individual watchdog drivers, but to align the userspace
> use
> with the best practices already from other watchdog drivers.  I don't
> think the kernel driver should be bending to match a particular
> userspace implementation; you should be exposing the information
> available to your hardware.

I agree with this in principle, but I'm not sure what else is meant to
be done here.

> 
> Having said that, it was known that there would need to be changes to
> the driver because some of these conditions were not adequately
> exposed
> at all.  I'm just still surprised that we're needing to reference
> that
> document as part of these changes.

I think the main question is whether an internal, graceful (userspace-
requested) reset is a reasonable use of WDIOF_EXTERN[12]. My feeling
no. I wonder whether defining a new flag (WDIOF_REBOOT?
WDIOF_GRACEFUL?) in the UAPI would be acceptable?

Andrew
Chin-Ting Kuo Nov. 7, 2024, 5:34 a.m. UTC | #4
Hi Patrick,

Thanks for the check and reply.
	
> -----Original Message-----
> From: Patrick Williams <patrick@stwcx.xyz>
> Sent: Saturday, November 2, 2024 2:21 AM
> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
> 
> On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote:
> > The boot status mapping rule follows the latest design guide from the
> > OpenBMC shown as below.
> >
> https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-up
> date.md#proposed-design
> > - WDIOF_EXTERN1   => system is reset by Software
> > - WDIOF_CARDRESET => system is reset by WDT SoC reset
> > - Others          => other reset events, e.g., power on reset.
> 
> I'm quite surprised that the above is relevant for a kernel driver at all.  Isn't
> "EXTERN1" a name of a real watchdog signal from your hardware (my
> recollection is that there are 2 external watchdogs).  I think the point of this
> referenced design document was that most users of BMCs have "EXTERN1"
> used a for software reset conditions.
> `CARDRESET` should be representing resets by the watchdog itself.
> 

ASPEED WDT controller is able to generate an external signal, wdt_ext, when the
WDT timeout occurs. However, after system is rebooted, WDT controller doesn't
know whether wdt_ext signal is generated previously. Moreover, whether BMC is
reset due to this wdt_ext signal depends on the HW board design. On some boards,
wdt_ext can affect EXTRST#, BMC external reset signal, indirectly. For this scenario,
EXTERN1 can be classified to wdt_ext. On the other boards, wdt_ext just represents
WDT timeout event for specific tasks. Thus, EXTERN1 boot status can be ignored in
ASPEED WDT driver and just implement "CARDRESET" and "others" since EXTERN1
is not always affected/controlled by WDT controller directly. Or, an additional
property in dts can be added to distinguish whether the EXTRST# reset event is
triggered by wdt_ext signal.

> The purpose of this design proposal was not to require very specific changes to
> individual watchdog drivers, but to align the userspace use with the best
> practices already from other watchdog drivers.  I don't think the kernel driver
> should be bending to match a particular userspace implementation; you should
> be exposing the information available to your hardware.
> 

Agree.

> Having said that, it was known that there would need to be changes to the
> driver because some of these conditions were not adequately exposed at all.
> I'm just still surprised that we're needing to reference that document as part of
> these changes.
> 

Yes, if only "CARDRESET" and "others" types are taken into consideration, some
patches are still needed to complete the boot status mechanism in ASPEED
WDT driver.

> >
> > On ASPEED platform, the boot status is recorded in the SCU registers.
> > - AST2400: Only a bit represents for any WDT reset.
> > - AST2500: The reset triggered by different WDT controllers can be
> >            distinguished by different SCU bits. But, WDIOF_EXTERN1 or
> >            WDIOF_CARDRESET still cannot be identified due to
> >            HW limitation.
> > - AST2600: Different from AST2500, additional HW bits are added for
> >            distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET.
> 
> --
> Patrick Williams


Chin-Ting
Chin-Ting Kuo Nov. 7, 2024, 5:35 a.m. UTC | #5
Hi Guenter,

Thanks for the review.

> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Saturday, November 2, 2024 6:17 AM
> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
> 
> On 11/1/24 05:11, Chin-Ting Kuo wrote:
> > The boot status in the watchdog device struct is updated during
> > controller probe stage. Application layer can get the boot status
> > through the command, cat /sys/class/watchdog/watchdogX/bootstatus.
> >
> > The boot status mapping rule follows the latest design guide from the
> > OpenBMC shown as below.
> >
> https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause-up
> date.md#proposed-design
> > - WDIOF_EXTERN1   => system is reset by Software
> > - WDIOF_CARDRESET => system is reset by WDT SoC reset
> > - Others          => other reset events, e.g., power on reset.
> >
> > On ASPEED platform, the boot status is recorded in the SCU registers.
> > - AST2400: Only a bit represents for any WDT reset.
> > - AST2500: The reset triggered by different WDT controllers can be
> >             distinguished by different SCU bits. But, WDIOF_EXTERN1 or
> >             WDIOF_CARDRESET still cannot be identified due to
> >             HW limitation.
> > - AST2600: Different from AST2500, additional HW bits are added for
> >             distinguishing WDIOF_EXTERN1 and WDIOF_CARDRESET.
> >
> > Besides, since alternating boot event is triggered by WDT SoC reset,
> > it is classified as WDIOF_CARDRESET.
> >
> > Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com>
> > ---
> >   drivers/watchdog/aspeed_wdt.c | 83
> ++++++++++++++++++++++++++++++++++-
> >   1 file changed, 81 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/watchdog/aspeed_wdt.c
> > b/drivers/watchdog/aspeed_wdt.c index b4773a6aaf8c..4ad6335ff25b
> > 100644
> > --- a/drivers/watchdog/aspeed_wdt.c
> > +++ b/drivers/watchdog/aspeed_wdt.c
> > @@ -11,21 +11,31 @@
> >   #include <linux/io.h>
> >   #include <linux/kernel.h>
> >   #include <linux/kstrtox.h>
> > +#include <linux/mfd/syscon.h>
> >   #include <linux/module.h>
> >   #include <linux/of.h>
> >   #include <linux/of_irq.h>
> >   #include <linux/platform_device.h>
> > +#include <linux/regmap.h>
> >   #include <linux/watchdog.h>
> >
> >   static bool nowayout = WATCHDOG_NOWAYOUT;
> >   module_param(nowayout, bool, 0);
> >   MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once
> started (default="
> >   				__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
> > +struct aspeed_wdt_scu {
> > +	const char *compatible;
> > +	u32 reset_status_reg;
> > +	u32 wdt_reset_mask;
> > +	u32 wdt_sw_reset_mask;
> > +	u32 wdt_reset_mask_shift;
> > +};
> >
> >   struct aspeed_wdt_config {
> >   	u32 ext_pulse_width_mask;
> >   	u32 irq_shift;
> >   	u32 irq_mask;
> > +	struct aspeed_wdt_scu scu;
> >   };
> >
> >   struct aspeed_wdt {
> > @@ -39,18 +49,39 @@ static const struct aspeed_wdt_config
> ast2400_config = {
> >   	.ext_pulse_width_mask = 0xff,
> >   	.irq_shift = 0,
> >   	.irq_mask = 0,
> > +	.scu = {
> > +		.compatible = "aspeed,ast2400-scu",
> > +		.reset_status_reg = 0x3c,
> > +		.wdt_reset_mask = 0x1,
> > +		.wdt_sw_reset_mask = 0,
> > +		.wdt_reset_mask_shift = 1,
> > +	},
> >   };
> >
> >   static const struct aspeed_wdt_config ast2500_config = {
> >   	.ext_pulse_width_mask = 0xfffff,
> >   	.irq_shift = 12,
> >   	.irq_mask = GENMASK(31, 12),
> > +	.scu = {
> > +		.compatible = "aspeed,ast2500-scu",
> > +		.reset_status_reg = 0x3c,
> > +		.wdt_reset_mask = 0x1,
> > +		.wdt_sw_reset_mask = 0,
> > +		.wdt_reset_mask_shift = 2,
> > +	},
> >   };
> >
> >   static const struct aspeed_wdt_config ast2600_config = {
> >   	.ext_pulse_width_mask = 0xfffff,
> >   	.irq_shift = 0,
> >   	.irq_mask = GENMASK(31, 10),
> > +	.scu = {
> > +		.compatible = "aspeed,ast2600-scu",
> > +		.reset_status_reg = 0x74,
> > +		.wdt_reset_mask = 0xf,
> > +		.wdt_sw_reset_mask = 0x8,
> > +		.wdt_reset_mask_shift = 16,
> > +	},
> >   };
> >
> >   static const struct of_device_id aspeed_wdt_of_table[] = { @@ -213,6
> > +244,52 @@ static int aspeed_wdt_restart(struct watchdog_device *wdd,
> >   	return 0;
> >   }
> >
> > +static int aspeed_wdt_update_bootstatus(struct platform_device *pdev,
> > +					struct aspeed_wdt *wdt)
> > +{
> > +	struct resource *res;
> > +	struct aspeed_wdt_scu scu = wdt->cfg->scu;
> > +	struct regmap *scu_base;
> > +	u32 reset_mask_width;
> > +	u32 reset_mask_shift;
> > +	u32 reg_size = 0;
> 
> Please no unnecesary initializations.
> 

Okay, it will be updated in the next patch version.

> > +	u32 idx = 0;
> > +	u32 status;
> > +	int ret;
> > +
> > +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	reg_size = res->end - res->start;
> > +
> > +	if (reg_size != 0)
> > +		idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size;
> > +
> > +	/* On ast2400, only a bit is used to represent WDT reset */
> > +	if (of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt"))
> > +		idx = 0;
> > +
> 
> There is some redundancy in the above code, and platform_get_resource() can
> return NULL. If idx==0 for aspeed,ast2400-wdt anyway, the code can be
> rewritten as
> 
> 	if (!of_device_is_compatible(pdev->dev.of_node, "aspeed,ast2400-wdt"))
> {
> 		res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> 		if (res) {
> 			reg_size = res->end - res->start;
> 			if (reg_size)
> 				idx = ((intptr_t)wdt->base & 0x00000fff) / reg_size;
> 		}
> 	}
> 

Okay, it looks more concise. Thanks for the reminder.

> > +	scu_base = syscon_regmap_lookup_by_compatible(scu.compatible);
> > +	if (IS_ERR(scu_base))
> > +		return PTR_ERR(scu_base);
> > +
> > +	ret = regmap_read(scu_base, scu.reset_status_reg, &status);
> > +	if (ret)
> > +		return ret;
> 
> The above only affects bootstatus. Why fail to load the driver just because
> bootstatus can not be read ?
> 

It will be updated in the next patch.

> > +
> > +	reset_mask_width = hweight32(scu.wdt_reset_mask);
> > +	reset_mask_shift = scu.wdt_reset_mask_shift +
> > +			   reset_mask_width * idx;
> > +
> > +	if (status & (scu.wdt_sw_reset_mask << reset_mask_shift))
> > +		wdt->wdd.bootstatus = WDIOF_EXTERN1;
> > +	else if (status & (scu.wdt_reset_mask << reset_mask_shift))
> > +		wdt->wdd.bootstatus = WDIOF_CARDRESET;
> > +	else
> > +		wdt->wdd.bootstatus = 0;
> 
> That is already 0.
> 

Okay.

> > +
> > +	return regmap_write(scu_base, scu.reset_status_reg,
> > +			    scu.wdt_reset_mask << reset_mask_shift); }
> > +
> >   /* access_cs0 shows if cs0 is accessible, hence the reverted bit */
> >   static ssize_t access_cs0_show(struct device *dev,
> >   			       struct device_attribute *attr, char *buf) @@
> -458,10
> > +535,12 @@ static int aspeed_wdt_probe(struct platform_device *pdev)
> >   		writel(duration - 1, wdt->base + WDT_RESET_WIDTH);
> >   	}
> >
> > +	ret = aspeed_wdt_update_bootstatus(pdev, wdt);
> > +	if (ret)
> > +		return ret;
> > +
> >   	status = readl(wdt->base + WDT_TIMEOUT_STATUS);
> >   	if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) {
> > -		wdt->wdd.bootstatus = WDIOF_CARDRESET;
> > -
> >   		if (of_device_is_compatible(np, "aspeed,ast2400-wdt") ||
> >   		    of_device_is_compatible(np, "aspeed,ast2500-wdt"))
> >   			wdt->wdd.groups = bswitch_groups;

Chin-Ting
Chin-Ting Kuo Nov. 7, 2024, 5:35 a.m. UTC | #6
Hi Andrew,

Thanks for the check.

> -----Original Message-----
> From: Andrew Jeffery <andrew@codeconstruct.com.au>
> Sent: Monday, November 4, 2024 8:02 AM
> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
> 
> On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote:
> > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote:
> > > The boot status mapping rule follows the latest design guide from
> > > the OpenBMC shown as below.
> > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause
> > > -update.md#proposed-design
> > > - WDIOF_EXTERN1   => system is reset by Software
> > > - WDIOF_CARDRESET => system is reset by WDT SoC reset
> > > - Others          => other reset events, e.g., power on reset.
> >
> > I'm quite surprised that the above is relevant for a kernel driver at
> > all.  Isn't "EXTERN1" a name of a real watchdog signal from your
> > hardware (my recollection is that there are 2 external watchdogs).
> 
> I think you may be referring to WDTRST1 (and WDTRST2) here.
> 

WDTRST1, wdt_ext, is a pulse signal generated when WDT timeout
occurs. However, depending on the HW board design, wdt_ext doesn’t
always affect the system reset. Thus, EXTERN1 boot status can be
ignored in ASPEED WDT driver and just implement "CARDRESET" and
"others" types since EXTERN1 is not always affected/controlled by WDT
controller directly. Or, an additional property in dts can be added to
distinguish whether the current EXTRST# reset event is triggered by
wdt_ext signal.

> WDIOF_EXTERN1 and WDIOF_EXTERN2 have an unrelated history:
> 
> https://github.com/torvalds/linux/blame/master/include/uapi/linux/watchdog.
> h
> 
> >   I
> > think the point of this referenced design document was that most users
> > of BMCs have "EXTERN1" used a for software reset conditions.
> > `CARDRESET` should be representing resets by the watchdog itself.
> 
> I think this is what Chin-Ting is proposing for the Aspeed driver?
> 

Yes.

> >
> > The purpose of this design proposal was not to require very specific
> > changes to individual watchdog drivers, but to align the userspace use
> > with the best practices already from other watchdog drivers.  I don't
> > think the kernel driver should be bending to match a particular
> > userspace implementation; you should be exposing the information
> > available to your hardware.
> 
> I agree with this in principle, but I'm not sure what else is meant to be done
> here.
> 
> >
> > Having said that, it was known that there would need to be changes to
> > the driver because some of these conditions were not adequately
> > exposed at all.  I'm just still surprised that we're needing to
> > reference that document as part of these changes.
> 
> I think the main question is whether an internal, graceful (userspace-
> requested) reset is a reasonable use of WDIOF_EXTERN[12]. My feeling no. I
> wonder whether defining a new flag (WDIOF_REBOOT?
> WDIOF_GRACEFUL?) in the UAPI would be acceptable?
> 

Agree, but this is out of the scope of this patch series and can be discussed and
implemented in the other future patches. The purpose of this patch series is to
complete the boot status mechanism based on the current reset reason.

> Andrew

Chin-Ting
Andrew Jeffery Nov. 7, 2024, 11:49 p.m. UTC | #7
Hi Chin-Ting,

On Thu, 2024-11-07 at 05:35 +0000, Chin-Ting Kuo wrote:
> Hi Andrew,
> 
> Thanks for the check.
> 
> > -----Original Message-----
> > From: Andrew Jeffery <andrew@codeconstruct.com.au>
> > Sent: Monday, November 4, 2024 8:02 AM
> > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus
> > handling
> > 
> > On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote:
> > > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote:
> > > > The boot status mapping rule follows the latest design guide
> > > > from
> > > > the OpenBMC shown as below.
> > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-cause
> > > > -update.md#proposed-design
> > > > - WDIOF_EXTERN1   => system is reset by Software
> > > > - WDIOF_CARDRESET => system is reset by WDT SoC reset
> > > > - Others          => other reset events, e.g., power on reset.
> > > 
> > > I'm quite surprised that the above is relevant for a kernel
> > > driver at
> > > all.  Isn't "EXTERN1" a name of a real watchdog signal from your
> > > hardware (my recollection is that there are 2 external
> > > watchdogs).
> > 
> > I think you may be referring to WDTRST1 (and WDTRST2) here.
> > 
> 
> WDTRST1, wdt_ext, is a pulse signal generated when WDT timeout
> occurs. However, depending on the HW board design, wdt_ext doesn’t
> always affect the system reset. Thus, EXTERN1 boot status can be
> ignored in ASPEED WDT driver and just implement "CARDRESET" and
> "others" types since EXTERN1 is not always affected/controlled by WDT
> controller directly. Or, an additional property in dts can be added
> to
> distinguish whether the current EXTRST# reset event is triggered by
> wdt_ext signal.

Yep, I understand how it works. I was responding to Patrick's query to
clear up some confusion around the watchdog signal names.

> > 
> > > 
> > > Having said that, it was known that there would need to be
> > > changes to
> > > the driver because some of these conditions were not adequately
> > > exposed at all.  I'm just still surprised that we're needing to
> > > reference that document as part of these changes.
> > 
> > I think the main question is whether an internal, graceful
> > (userspace-
> > requested) reset is a reasonable use of WDIOF_EXTERN[12]. My
> > feeling no. I
> > wonder whether defining a new flag (WDIOF_REBOOT?
> > WDIOF_GRACEFUL?) in the UAPI would be acceptable?
> > 
> 
> Agree, but this is out of the scope of this patch series and can be
> discussed and
> implemented in the other future patches.

I disagree, because then you're changing the userspace-visible
behaviour of the driver yet again. I don't prefer the proposed patch as
the way forward because I think it is abusing the meaning of
WDIOF_EXTERN1. I think the concept needs input from the watchdog
maintainers.

Andrew
Chin-Ting Kuo Nov. 8, 2024, 5:42 a.m. UTC | #8
Hi Andrew,

> -----Original Message-----
> From: Andrew Jeffery <andrew@codeconstruct.com.au>
> Sent: Friday, November 8, 2024 7:50 AM
> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
> 
> Hi Chin-Ting,
> 
> On Thu, 2024-11-07 at 05:35 +0000, Chin-Ting Kuo wrote:
> > Hi Andrew,
> >
> > Thanks for the check.
> >
> > > -----Original Message-----
> > > From: Andrew Jeffery <andrew@codeconstruct.com.au>
> > > Sent: Monday, November 4, 2024 8:02 AM
> > > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus
> > > handling
> > >
> > > On Fri, 2024-11-01 at 14:21 -0400, Patrick Williams wrote:
> > > > On Fri, Nov 01, 2024 at 08:11:59PM +0800, Chin-Ting Kuo wrote:
> > > > > The boot status mapping rule follows the latest design guide
> > > > > from the OpenBMC shown as below.
> > > > > https://github.com/openbmc/docs/blob/master/designs/bmc-reboot-c
> > > > > ause
> > > > > -update.md#proposed-design
> > > > > - WDIOF_EXTERN1   => system is reset by Software
> > > > > - WDIOF_CARDRESET => system is reset by WDT SoC reset
> > > > > - Others          => other reset events, e.g., power on reset.
> > > >
> > > > I'm quite surprised that the above is relevant for a kernel driver
> > > > at all.  Isn't "EXTERN1" a name of a real watchdog signal from
> > > > your hardware (my recollection is that there are 2 external
> > > > watchdogs).
> > >
> > > I think you may be referring to WDTRST1 (and WDTRST2) here.
> > >
> >
> > WDTRST1, wdt_ext, is a pulse signal generated when WDT timeout occurs.
> > However, depending on the HW board design, wdt_ext doesn’t always
> > affect the system reset. Thus, EXTERN1 boot status can be ignored in
> > ASPEED WDT driver and just implement "CARDRESET" and "others" types
> > since EXTERN1 is not always affected/controlled by WDT controller
> > directly. Or, an additional property in dts can be added to
> > distinguish whether the current EXTRST# reset event is triggered by
> > wdt_ext signal.
> 
> Yep, I understand how it works. I was responding to Patrick's query to clear up
> some confusion around the watchdog signal names.
> 

Okay.

> > >
> > > >
> > > > Having said that, it was known that there would need to be changes
> > > > to the driver because some of these conditions were not adequately
> > > > exposed at all.  I'm just still surprised that we're needing to
> > > > reference that document as part of these changes.
> > >
> > > I think the main question is whether an internal, graceful
> > > (userspace-
> > > requested) reset is a reasonable use of WDIOF_EXTERN[12]. My feeling
> > > no. I wonder whether defining a new flag (WDIOF_REBOOT?
> > > WDIOF_GRACEFUL?) in the UAPI would be acceptable?
> > >
> >
> > Agree, but this is out of the scope of this patch series and can be
> > discussed and
> > implemented in the other future patches.
> 
> I disagree, because then you're changing the userspace-visible
> behaviour of the driver yet again. I don't prefer the proposed patch as
> the way forward because I think it is abusing the meaning of
> WDIOF_EXTERN1. I think the concept needs input from the watchdog
> maintainers.
> 

Previously, I meant that only implement "CARDRESET " and "others"
reset type to complete the current driver for "CARDRESET ".
The implementation of SW restart mechanism can be postponed to the
next new patch series.

But now, I think it will be better to add a patch for creating a new
reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h
of uapi. Can I include this change, creating a new reset reason, in
this patch series? Or, should I create an extra new patch series for
this purpose?

> Andrew

Chin-Ting
Guenter Roeck Nov. 8, 2024, 2:07 p.m. UTC | #9
On 11/7/24 21:42, Chin-Ting Kuo wrote:

> But now, I think it will be better to add a patch for creating a new
> reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h
> of uapi. Can I include this change, creating a new reset reason, in
> this patch series? Or, should I create an extra new patch series for
> this purpose?
> 

This is a UAPI change. That is a major change. It needs to be discussed
separately, on its own, and can not be sneaked in like this.

Guenter
Chin-Ting Kuo Nov. 18, 2024, 12:46 p.m. UTC | #10
Hi Guenter,

Thanks for the reply.

> -----Original Message-----
> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
> Sent: Friday, November 8, 2024 10:08 PM
> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
> 
> On 11/7/24 21:42, Chin-Ting Kuo wrote:
> 
> > But now, I think it will be better to add a patch for creating a new
> > reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h of
> > uapi. Can I include this change, creating a new reset reason, in this
> > patch series? Or, should I create an extra new patch series for this
> > purpose?
> >
> 
> This is a UAPI change. That is a major change. It needs to be discussed
> separately, on its own, and can not be sneaked in like this.
> 

Agree. However, how to trigger this discussion? Can I just send a new
patch separately with only this UAPI modification? This is the first time
I change such common source code.

> Guenter

Chin-Ting
Guenter Roeck Nov. 18, 2024, 8:50 p.m. UTC | #11
On 11/18/24 04:46, Chin-Ting Kuo wrote:
> Hi Guenter,
> 
> Thanks for the reply.
> 
>> -----Original Message-----
>> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter Roeck
>> Sent: Friday, November 8, 2024 10:08 PM
>> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
>>
>> On 11/7/24 21:42, Chin-Ting Kuo wrote:
>>
>>> But now, I think it will be better to add a patch for creating a new
>>> reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in watchdog.h of
>>> uapi. Can I include this change, creating a new reset reason, in this
>>> patch series? Or, should I create an extra new patch series for this
>>> purpose?
>>>
>>
>> This is a UAPI change. That is a major change. It needs to be discussed
>> separately, on its own, and can not be sneaked in like this.
>>
> 
> Agree. However, how to trigger this discussion? Can I just send a new
> patch separately with only this UAPI modification? This is the first time
> I change such common source code.
> 

Yes. That needs to include arguments explaining why this specific new flag
even adds value. I for my part don't immediately see that value.

Guenter
Andrew Jeffery Nov. 18, 2024, 11 p.m. UTC | #12
On Mon, 2024-11-18 at 12:50 -0800, Guenter Roeck wrote:
> On 11/18/24 04:46, Chin-Ting Kuo wrote:
> > Hi Guenter,
> > 
> > Thanks for the reply.
> > 
> > > -----Original Message-----
> > > From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter
> > > Roeck
> > > Sent: Friday, November 8, 2024 10:08 PM
> > > Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus
> > > handling
> > > 
> > > On 11/7/24 21:42, Chin-Ting Kuo wrote:
> > > 
> > > > But now, I think it will be better to add a patch for creating
> > > > a new
> > > > reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in
> > > > watchdog.h of
> > > > uapi. Can I include this change, creating a new reset reason,
> > > > in this
> > > > patch series? Or, should I create an extra new patch series for
> > > > this
> > > > purpose?
> > > > 
> > > 
> > > This is a UAPI change. That is a major change. It needs to be
> > > discussed
> > > separately, on its own, and can not be sneaked in like this.
> > > 
> > 
> > Agree. However, how to trigger this discussion? Can I just send a
> > new
> > patch separately with only this UAPI modification? This is the
> > first time
> > I change such common source code.
> > 
> 
> Yes. That needs to include arguments explaining why this specific new
> flag
> even adds value. I for my part don't immediately see that value.

So maybe I was derailed with my WDIOF_REBOOT suggestion by the proposal
to repurpose WDIOF_EXTERN1 to indicate a regular reboot. I still don't
think repurposing WDIOF_EXTERN1 is the right direction. But, perhaps
the thing to do for a regular reboot is to not set any reason flags at
all? It just depends on whether we're wanting to separate a cold boot
from a reboot (as they _may_ behave differently on Aspeed hardware), as
on a cold boot we wouldn't set any reason flags either.

Andrew
Guenter Roeck Nov. 19, 2024, 1:27 a.m. UTC | #13
On 11/18/24 15:00, Andrew Jeffery wrote:
> On Mon, 2024-11-18 at 12:50 -0800, Guenter Roeck wrote:
>> On 11/18/24 04:46, Chin-Ting Kuo wrote:
>>> Hi Guenter,
>>>
>>> Thanks for the reply.
>>>
>>>> -----Original Message-----
>>>> From: Guenter Roeck <groeck7@gmail.com> On Behalf Of Guenter
>>>> Roeck
>>>> Sent: Friday, November 8, 2024 10:08 PM
>>>> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus
>>>> handling
>>>>
>>>> On 11/7/24 21:42, Chin-Ting Kuo wrote:
>>>>
>>>>> But now, I think it will be better to add a patch for creating
>>>>> a new
>>>>> reset reason, e.g., WDIOF_REBOOT or WDIOF_RESTART, in
>>>>> watchdog.h of
>>>>> uapi. Can I include this change, creating a new reset reason,
>>>>> in this
>>>>> patch series? Or, should I create an extra new patch series for
>>>>> this
>>>>> purpose?
>>>>>
>>>>
>>>> This is a UAPI change. That is a major change. It needs to be
>>>> discussed
>>>> separately, on its own, and can not be sneaked in like this.
>>>>
>>>
>>> Agree. However, how to trigger this discussion? Can I just send a
>>> new
>>> patch separately with only this UAPI modification? This is the
>>> first time
>>> I change such common source code.
>>>
>>
>> Yes. That needs to include arguments explaining why this specific new
>> flag
>> even adds value. I for my part don't immediately see that value.
> 
> So maybe I was derailed with my WDIOF_REBOOT suggestion by the proposal
> to repurpose WDIOF_EXTERN1 to indicate a regular reboot. I still don't
> think repurposing WDIOF_EXTERN1 is the right direction. But, perhaps
> the thing to do for a regular reboot is to not set any reason flags at
> all? It just depends on whether we're wanting to separate a cold boot
> from a reboot (as they _may_ behave differently on Aspeed hardware), as
> on a cold boot we wouldn't set any reason flags either.
> 


Thew point here is that this is just a warm reboot which happens to use
the watchdog as vehicle trigger a reset. A UAPI change along that line
would tell the user just that, and I don't see the value in it.

FWIW, the only really valuable flag is WDIOF_CARDRESET. All others are
at best misleading since the watchdog isn't typically involved in
making policy decisions such as WDIOF_OVERHEAT or WDIOF_FANFAULT and thus
can not report such reasons to userspace.

Unfortunately we can at best deprecate all those existing flags. At the same
time I really don't want to introduce new ones unless they provide real value.
While it might be valuable to know if a reboot was a cold or a warm reboot,
the watchdog subsystem is not involved in this either on almost all
platforms, meaning userspace can not really rely on it. Yes, Aspeed
hardware may behave differently, but typically all those differences
would need to be handled in the kernel when (re-)initializing the hardware,
not in userspace.

So, again, what exactly would userspace do with the information that this
was a watchdog triggered warm reboot ? Why would it need that information ?

Thanks,
Guenter
Andrew Jeffery Nov. 20, 2024, 4:54 a.m. UTC | #14
On Mon, 2024-11-18 at 17:27 -0800, Guenter Roeck wrote:
> So, again, what exactly would userspace do with the information that
> this
> was a watchdog triggered warm reboot ? Why would it need that
> information ?

I'll defer to the others on To/Cc to answer that.

My only position is I don't think changing behaviour of existing
drivers to exploit WDIOF_EXTERN1 as a graceful-reboot indicator is a
good idea either. Obviously I don't have much skin in the game with
watchdog maintenance, so my thoughts shouldn't have much influence
beyond the Aspeed-specifics, but I just didn't want to see some fun new
confusion or incompatibility arise as a result.

Andrew
Chin-Ting Kuo Dec. 17, 2024, 2:15 a.m. UTC | #15
Hi Andrew,

> -----Original Message-----
> From: Andrew Jeffery <andrew@codeconstruct.com.au>
> Sent: Wednesday, November 20, 2024 12:54 PM
> Subject: Re: [PATCH v4 1/3] watchdog: aspeed: Update bootstatus handling
> 
> On Mon, 2024-11-18 at 17:27 -0800, Guenter Roeck wrote:
> > So, again, what exactly would userspace do with the information that
> > this was a watchdog triggered warm reboot ? Why would it need that
> > information ?
> 
> I'll defer to the others on To/Cc to answer that.
> 
> My only position is I don't think changing behaviour of existing drivers to
> exploit WDIOF_EXTERN1 as a graceful-reboot indicator is a good idea either.
> Obviously I don't have much skin in the game with watchdog maintenance, so
> my thoughts shouldn't have much influence beyond the Aspeed-specifics, but I
> just didn't want to see some fun new confusion or incompatibility arise as a
> result.
> 
Agree to your misgiving, in the next patches, only two categories, "Power on reset"
and "WDT reset" (Card reset), will be taken into consideration. The graceful-reboot
scenario will be postponed to the patches in the future.


Thanks,
Chin-Ting