mbox series

[0/2] Improve ATA NCQ command error in mpt3sas and mpi3mr

Message ID 20250606052747.742998-1-dlemoal@kernel.org
Headers show
Series Improve ATA NCQ command error in mpt3sas and mpi3mr | expand

Message

Damien Le Moal June 6, 2025, 5:27 a.m. UTC
Martin,

Two similar patches for the mpt3sas and mpi3mr drivers to improve the
handling of NCQ command terminated due to an NCQ command failure. These
so-called collateral aborts must be retried immediately but that must be
done without incrementing the command retry counter. Otherwise, these
collateral abort commands may endup being failed due to other NCQ
command errors.

This issue is especially easy to trigger with the mpi3mr driver with a
drive subject to a mixed workload of commands with a short CDL limit and
commands without limits. The failures due to the limit being exceeded,
which are normal, endup also failing commands without a limit, which is
incorrect.

Broadcom people,

I am working in the dark here, with zero information on how your HBA
handle ATA NCQ collateral aborts. I am patching against what I am
seeing, which may be only a partial picture of the problem. So please
check this !

Damien Le Moal (2):
  scsi: mpi3mr: Correctly handle ATA device errors
  scsi: mpt3sas: Correctly handle ATA device errors

 drivers/scsi/mpi3mr/mpi3mr_os.c      | 20 +++++++++++++++++++-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)