Message ID | 1641320780-81620-1-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers | show |
Series | [RFT] scsi: pm8001: Fix FW crash for maxcpus=1 | expand |
On 05/01/2022 04:03, Damien Le Moal wrote: > On 1/5/22 03:26, John Garry wrote: >> According to the comment in check_fw_ready() we should not check the >> IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers. >> >> However we check this very field in process_oq() for processing the highest >> index interrupt vector. Change that function to not check IOP1_READY for >> those mentioned controllers, but do check ILA_READY in both cases. >> >> The reason I assume that this was not hit earlier was because we always >> allocated 64 MSI(X), and just did not pass the vector index check in >> process_oq(), i.e. the handler never ran for vector index 63. >> >> Signed-off-by: John Garry<john.garry@huawei.com> >> >> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c >> index 2101fc5761c3..77b8bb30615b 100644 >> --- a/drivers/scsi/pm8001/pm80xx_hwi.c >> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c >> @@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec) >> u32 regval; >> >> if (vec == (pm8001_ha->max_q_num - 1)) { >> + u32 mipsall_ready; >> + >> + if ((pm8001_ha->chip_id == chip_8008) || >> + (pm8001_ha->chip_id == chip_8009)) > nit: no need for the inner brackets here. ok, I can fix that. But I would also like opinion from microchip guys/maintainer on why this code is here at all. Seems strange in the way we check in this register in the interrupt handler for only a specific vector and, also, why we check at all in an interrupt handler. > >> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT; >> + else >> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT; >> + >> regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1); >> - if ((regval & SCRATCH_PAD_MIPSALL_READY) != >> - SCRATCH_PAD_MIPSALL_READY) { >> + if ((regval & mipsall_ready) != mipsall_ready) { >> pm8001_ha->controller_fatal_error = true; >> pm8001_dbg(pm8001_ha, FAIL, >> "Firmware Fatal error! Regval:0x%x\n", >> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h >> index c7e5d93bea92..c41ed039c92a 100644 >> --- a/drivers/scsi/pm8001/pm80xx_hwi.h >> +++ b/drivers/scsi/pm8001/pm80xx_hwi.h >> @@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t; >> #define SCRATCH_PAD_BOOT_LOAD_SUCCESS 0x0 >> #define SCRATCH_PAD_IOP0_READY 0xC00 >> #define SCRATCH_PAD_IOP1_READY 0x3000 >> -#define SCRATCH_PAD_MIPSALL_READY (SCRATCH_PAD_IOP1_READY | \ >> +#define SCRATCH_PAD_MIPSALL_READY_16PORT (SCRATCH_PAD_IOP1_READY | \ >> SCRATCH_PAD_IOP0_READY | \ >> + SCRATCH_PAD_ILA_READY | \ >> + SCRATCH_PAD_RAAE_READY) >> +#define SCRATCH_PAD_MIPSALL_READY_8PORT (SCRATCH_PAD_IOP0_READY | \ >> + SCRATCH_PAD_ILA_READY | \ >> SCRATCH_PAD_RAAE_READY) >> >> /* boot loader state */ > Otherwise, looks OK to me. > I tested with and without max_cpus=1 with a ATTO Technology, Inc. > ExpressSAS 12Gb/s SAS/SATA HBA (rev 06) adapter and everything is OK. > That adapter uses chip_8072 though, not 8008 or 8009. > > Feel free to add: > > Reviewed-by: Damien Le Moal<damien.lemoal@opensource.wdc.com> > Tested-by: Damien Le Moal<damien.lemoal@opensource.wdc.com> Thanks! john
On 2022/01/07 0:32, John Garry wrote: > On 06/01/2022 13:03, Ajish.Koshy@microchip.com wrote: >>> only a specific vector and, also, why we check at all in >>> an interrupt handler. >> Here is my initial understanding so far based on the code >> and data sheet >> >> 1. Controller has the capability to communicate >> to the host about fatal error condition via configured >> interrupt vector MSI/MSI-X. >> 2. This capability is achieved by setting two fields >> a. Enable Controller Fatal error notification >> Dowrd 0x1C, Bit[0]. >> 1 - Enable; 0 - Disable >> Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl. >> fatal_err_interrupt = 0x01; >> b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8] >> This parameter configures which interrupt vector >> is used to notify the host of the fatal error. >> Code: /* Update Fatal error interrupt vector */ >> pm8001_ha->main_cfg_tbl.pm80xx_tbl. >> fatal_err_interrupt |= >> ((pm8001_ha->max_q_num - 1) << 8); >> >> Probably this will be the reason why we check >> the vector in process_oq() for processing >> controller fatal error >> >> if (vec == (pm8001_ha->max_q_num - 1)) { >> >> Please do let me know if it helped in clarification. >> > > Sounds reasonable. And we only discover the issue for 8008/8009 now as > we have that (pm8001_ha->max_q_num - 1) vector being used for standard IO. > > So let me know of any other issue, otherwise I'll send a v2 with the > coding style fixup. And maybe add comments about the above so that the information does not get lost ? > > Thanks, > John
Hi John, > On 06/01/2022 13:03, Ajish.Koshy@microchip.com wrote: > >> only a specific vector and, also, why we check at all in an > >> interrupt handler. > > Here is my initial understanding so far based on the code and data > > sheet > > > > 1. Controller has the capability to communicate to the host about > > fatal error condition via configured interrupt vector MSI/MSI-X. > > 2. This capability is achieved by setting two fields > > a. Enable Controller Fatal error notification > > Dowrd 0x1C, Bit[0]. > > 1 - Enable; 0 - Disable > > Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl. > > fatal_err_interrupt = 0x01; > > b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8] > > This parameter configures which interrupt vector > > is used to notify the host of the fatal error. > > Code: /* Update Fatal error interrupt vector */ > > pm8001_ha->main_cfg_tbl.pm80xx_tbl. > > fatal_err_interrupt |= > > ((pm8001_ha->max_q_num - 1) << 8); > > > > Probably this will be the reason why we check the vector in > > process_oq() for processing controller fatal error > > > > if (vec == (pm8001_ha->max_q_num - 1)) { > > > > Please do let me know if it helped in clarification. > > > > Sounds reasonable. And we only discover the issue for 8008/8009 now as we > have that (pm8001_ha->max_q_num - 1) vector being used for standard IO. > > So let me know of any other issue, otherwise I'll send a v2 with the coding > style fixup. After going through check_fw_ready(), the change here looks fine. > > Thanks, > John
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index 2101fc5761c3..77b8bb30615b 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec) u32 regval; if (vec == (pm8001_ha->max_q_num - 1)) { + u32 mipsall_ready; + + if ((pm8001_ha->chip_id == chip_8008) || + (pm8001_ha->chip_id == chip_8009)) + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT; + else + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT; + regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1); - if ((regval & SCRATCH_PAD_MIPSALL_READY) != - SCRATCH_PAD_MIPSALL_READY) { + if ((regval & mipsall_ready) != mipsall_ready) { pm8001_ha->controller_fatal_error = true; pm8001_dbg(pm8001_ha, FAIL, "Firmware Fatal error! Regval:0x%x\n", diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h index c7e5d93bea92..c41ed039c92a 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.h +++ b/drivers/scsi/pm8001/pm80xx_hwi.h @@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t; #define SCRATCH_PAD_BOOT_LOAD_SUCCESS 0x0 #define SCRATCH_PAD_IOP0_READY 0xC00 #define SCRATCH_PAD_IOP1_READY 0x3000 -#define SCRATCH_PAD_MIPSALL_READY (SCRATCH_PAD_IOP1_READY | \ +#define SCRATCH_PAD_MIPSALL_READY_16PORT (SCRATCH_PAD_IOP1_READY | \ SCRATCH_PAD_IOP0_READY | \ + SCRATCH_PAD_ILA_READY | \ + SCRATCH_PAD_RAAE_READY) +#define SCRATCH_PAD_MIPSALL_READY_8PORT (SCRATCH_PAD_IOP0_READY | \ + SCRATCH_PAD_ILA_READY | \ SCRATCH_PAD_RAAE_READY) /* boot loader state */
According to the comment in check_fw_ready() we should not check the IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers. However we check this very field in process_oq() for processing the highest index interrupt vector. Change that function to not check IOP1_READY for those mentioned controllers, but do check ILA_READY in both cases. The reason I assume that this was not hit earlier was because we always allocated 64 MSI(X), and just did not pass the vector index check in process_oq(), i.e. the handler never ran for vector index 63. Signed-off-by: John Garry <john.garry@huawei.com>