Message ID | 20200908213715.3553098-2-arnd@arndb.de |
---|---|
State | Superseded |
Headers | show |
Series | None | expand |
Hi [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.8.7, v5.4.63, v4.19.143, v4.14.196, v4.9.235, v4.4.235. v5.8.7: Build OK! v5.4.63: Build OK! v4.19.143: Build OK! v4.14.196: Failed to apply! Possible dependencies: 107a60dd71b5 ("scsi: megaraid_sas: Add support for 64bit consistent DMA") 1b4bed206159 ("scsi: megaraid_sas: Create separate functions for allocating and freeing controller DMA buffers") 201a810cc188 ("scsi: megaraid_sas: Re-Define enum DCMD_RETURN_STATUS") 2ce435087902 ("scsi: megaraid_sas: Enhance internal DCMD timeout prints") 7535f27d1f14 ("scsi: megaraid_sas: Move initialization of instance parameters inside newly created function megasas_init_ctrl_params") 82add4e1b354 ("scsi: megaraid_sas: Incorrect processing of IOCTL frames for SMP/STP commands") e5d65b4b81af ("scsi: megaraid_sas: Move controller memory allocations and DMA mask settings from probe to megasas_init_fw") e97e673ca63b ("scsi: megaraid_sas: Retry with reduced queue depth when alloc fails for higher QD") v4.9.235: Failed to apply! Possible dependencies: 201a810cc188 ("scsi: megaraid_sas: Re-Define enum DCMD_RETURN_STATUS") 2493c67e518c ("scsi: megaraid_sas: 128 MSIX Support") 3e5eadb1a881 ("scsi: megaraid_sas: Enable or Disable Fast path based on the PCI Threshold Bandwidth") 45b8a35eed7b ("scsi: megaraid_sas: 32 bit descriptor fire cmd optimization") 45f4f2eb3da3 ("scsi: megaraid_sas: Add new pci device Ids for SAS3.5 Generic Megaraid Controllers") 82add4e1b354 ("scsi: megaraid_sas: Incorrect processing of IOCTL frames for SMP/STP commands") 8823abeddbbc ("scsi: megaraid_sas: Fix endianness issues in DCMD handling") 95c060869e68 ("scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD") d0fc91d67c59 ("scsi: megaraid_sas: Send SYNCHRONIZE_CACHE for VD to firmware") f4fc209326c7 ("scsi: megaraid_sas: change issue_dcmd to return void from int") fad119b707f8 ("scsi: megaraid_sas: switch to pci_alloc_irq_vectors") v4.4.235: Failed to apply! Possible dependencies: 201a810cc188 ("scsi: megaraid_sas: Re-Define enum DCMD_RETURN_STATUS") 6d40afbc7d13 ("megaraid_sas: MFI IO timeout handling") 82add4e1b354 ("scsi: megaraid_sas: Incorrect processing of IOCTL frames for SMP/STP commands") 8823abeddbbc ("scsi: megaraid_sas: Fix endianness issues in DCMD handling") 8a01a41d8647 ("megaraid_sas: Make adprecovery variable atomic") 95c060869e68 ("scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD") f4fc209326c7 ("scsi: megaraid_sas: change issue_dcmd to return void from int") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks Sasha
On Tue, Sep 08, 2020 at 11:36:22PM +0200, Arnd Bergmann wrote: > It sounds unwise to let user space pass an unchecked 32-bit > offset into a kernel structure in an ioctl. This is an unsigned > variable, so checking the upper bound for the size of the structure > it points into is sufficient to avoid data corruption, but as > the pointer might also be unaligned, it has to be written carefully > as well. > > While I stumbled over this problem by reading the code, I did not > continue checking the function for further problems like it. Oh, yikes! > > Cc: stable@vger.kernel.org What about a Fixes tag instead? > if (ioc->sense_len) { > + /* make sure the pointer is part of the frame */ > + if (ioc->sense_off > (sizeof(union megasas_frame) - sizeof(__le64))) { No need for the inner braces and please avoid over 80 char lines. Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@lst.de>
On Sat, Sep 12, 2020 at 9:20 AM Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Sep 08, 2020 at 11:36:22PM +0200, Arnd Bergmann wrote: > > Cc: stable@vger.kernel.org > > What about a Fixes tag instead? Sure, I can add that. It's been broken since 2.6.15 though, when the driver was initially merged. Arnd
On Tue, Sep 08, 2020 at 11:36:22PM +0200, Arnd Bergmann wrote: > It sounds unwise to let user space pass an unchecked 32-bit > offset into a kernel structure in an ioctl. This is an unsigned > variable, so checking the upper bound for the size of the structure > it points into is sufficient to avoid data corruption, but as > the pointer might also be unaligned, it has to be written carefully > as well. > > While I stumbled over this problem by reading the code, I did not > continue checking the function for further problems like it. Sorry for replying to an ancient thread, but this patch just recently made it into 5.10.3 and has caused unintended consequences. On Dell servers with PERC RAID controllers, booting 5.10.3+ with this patch causes a PCI parity error. Specifically: Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0. Severity: Critical Message ID: PCI1308 I reverted this single patch and the errors went away. Thoughts? Phil Oester
On Thu, Dec 31, 2020 at 1:15 AM Phil Oester <kernel@linuxace.com> wrote: > > On Tue, Sep 08, 2020 at 11:36:22PM +0200, Arnd Bergmann wrote: > > It sounds unwise to let user space pass an unchecked 32-bit > > offset into a kernel structure in an ioctl. This is an unsigned > > variable, so checking the upper bound for the size of the structure > > it points into is sufficient to avoid data corruption, but as > > the pointer might also be unaligned, it has to be written carefully > > as well. > > > > While I stumbled over this problem by reading the code, I did not > > continue checking the function for further problems like it. > > Sorry for replying to an ancient thread, but this patch just recently > made it into 5.10.3 and has caused unintended consequences. On Dell > servers with PERC RAID controllers, booting 5.10.3+ with this patch > causes a PCI parity error. Specifically: > > Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0. > Severity: Critical > Message ID: PCI1308 > > I reverted this single patch and the errors went away. > > Thoughts? Thank you for the report and bisecting the issue, and sorry this broke your system! Fortunately, the patch is fairly small, so there are only a limited number of things that could go wrong. I haven't tried to analyze that message, but I have two ideas: a) The added ioc->sense_off check gets triggered and the code relies on the data being written outside of the structure b) the address actually needs to always be written as a 64-bit value regardless of the instance->consistent_mask_64bit flag, as the driver did before. This looked like it was done in error. Can you try the patch below instead of the revert and see if that resolves the regression, and if it triggers the warning message I add? Arnd diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 6e4bf05c6d77..248063a4148b 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -8194,8 +8194,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, /* make sure the pointer is part of the frame */ if (ioc->sense_off > (sizeof(union megasas_frame) - sizeof(__le64))) { - error = -EINVAL; - goto out; + pr_warn("possible out of bounds access offset %d\n", ioc->sense_off); } sense = dma_alloc_coherent(&instance->pdev->dev, ioc->sense_len, @@ -8209,7 +8208,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, if (instance->consistent_mask_64bit) put_unaligned_le64(sense_handle, sense_ptr); else - put_unaligned_le32(sense_handle, sense_ptr); + put_unaligned_le64(sense_handle, sense_ptr); } /*
On Sun, 2021-01-03 at 17:26 +0100, Arnd Bergmann wrote: [...] > @@ -8209,7 +8208,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance > *instance, > if (instance->consistent_mask_64bit) > put_unaligned_le64(sense_handle, sense_ptr); > else > - put_unaligned_le32(sense_handle, sense_ptr); > + put_unaligned_le64(sense_handle, sense_ptr); > } This hunk can't be right. It effectively means removing the if. However, the if is needed because sense_handle is a dma_addr_t which can be either 32 or 64 bit. What about changing the if to if (sizeof(dma_addr_t) == 8) instead? James
On Sun, Jan 3, 2021 at 6:00 PM James Bottomley <jejb@linux.ibm.com> wrote: > On Sun, 2021-01-03 at 17:26 +0100, Arnd Bergmann wrote: > [...] > > @@ -8209,7 +8208,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance > > *instance, > > if (instance->consistent_mask_64bit) > > put_unaligned_le64(sense_handle, sense_ptr); > > else > > - put_unaligned_le32(sense_handle, sense_ptr); > > + put_unaligned_le64(sense_handle, sense_ptr); > > } > > This hunk can't be right. It effectively means removing the if. I'm just trying to restore the state before the regression introduced in my 381d34e376e3 ("scsi: megaraid_sas: Check user-provided offsets"). The old code always stored 'sizeof(long)' bytes into sense_ptr, regardless of instance->consistent_mask_64bit, but it would truncate the address to 32 bit if that was cleared. This was clearly bogus and I tried to make it do something more meaningful, only storing 8 bytes into the structure if it was configured for 64-bit DMA, regardless of the capabilities of the kernel. > However, the if is needed because sense_handle is a dma_addr_t which > can be either 32 or 64 bit. What about changing the if to > > if (sizeof(dma_addr_t) == 8) > > instead? That would not be useful either, the device surely does not care if the kernel supports 64-bit DMA. What we'd really need here is someone with access to the interface specifications to see how many bytes should be stored in the structure. I suspect always storing 64 bits (as my patch does) is correct, and would send a proper patch to remove the if() if Phil confirms that my test patch fixes the regression. Arnd
On Sun, 2021-01-03 at 19:49 +0100, Arnd Bergmann wrote: > On Sun, Jan 3, 2021 at 6:00 PM James Bottomley <jejb@linux.ibm.com> > wrote: > > On Sun, 2021-01-03 at 17:26 +0100, Arnd Bergmann wrote: > > [...] > > > @@ -8209,7 +8208,7 @@ megasas_mgmt_fw_ioctl(struct > > > megasas_instance > > > *instance, > > > if (instance->consistent_mask_64bit) > > > put_unaligned_le64(sense_handle, > > > sense_ptr); > > > else > > > - put_unaligned_le32(sense_handle, > > > sense_ptr); > > > + put_unaligned_le64(sense_handle, > > > sense_ptr); > > > } > > > > This hunk can't be right. It effectively means removing the if. > > I'm just trying to restore the state before the regression introduced > in my 381d34e376e3 ("scsi: megaraid_sas: Check user-provided > offsets"). > > The old code always stored 'sizeof(long)' bytes into sense_ptr, > regardless of instance->consistent_mask_64bit, but it would truncate > the address to 32 bit if that was cleared. This was clearly bogus > and I tried to make it do something more meaningful, only storing > 8 bytes into the structure if it was configured for 64-bit DMA, > regardless of the capabilities of the kernel. Heh, well, all this depends on how the firmware interprets the pointer, for which we don't seem to have a manual. Instinct tells me the flag MFI_FRAME_SENSE64 is what does this and that's conditioned on the same if clause 100 lines above this, so the fix your proposing would still seem to be wrong, because I think when that flag is not set, the device expects the sense pointer to be 32 bit. > > However, the if is needed because sense_handle is a dma_addr_t > > which can be either 32 or 64 bit. What about changing the if to > > > > if (sizeof(dma_addr_t) == 8) > > > > instead? > > That would not be useful either, the device surely does not care > if the kernel supports 64-bit DMA. What we'd really need here is > someone with access to the interface specifications to see how > many bytes should be stored in the structure. I suspect always > storing 64 bits (as my patch does) is correct, and would send a > proper patch to remove the if() if Phil confirms that my test > patch fixes the regression. Well, as I said above, I'm speculating the device does what we tell it, and whether to use 32 or 64 bits for the sense pointer definitely seems to be a flag the driver controls ... we really need someone with access to the programming manual to tell us if this speculation is accurate, though. James
On Sun, Jan 03, 2021 at 05:26:29PM +0100, Arnd Bergmann wrote: > Thank you for the report and bisecting the issue, and sorry this broke > your system! > > Fortunately, the patch is fairly small, so there are only a limited number > of things that could go wrong. I haven't tried to analyze that message, > but I have two ideas: > > a) The added ioc->sense_off check gets triggered and the code relies > on the data being written outside of the structure > > b) the address actually needs to always be written as a 64-bit value > regardless of the instance->consistent_mask_64bit flag, as the > driver did before. This looked like it was done in error. > > Can you try the patch below instead of the revert and see if that > resolves the regression, and if it triggers the warning message I > add? Thanks Arnd, I tried your patch and it resolves the regression. It does not trigger the warning message you added. Phil
On Mon, Jan 4, 2021 at 6:48 PM Phil Oester <kernel@linuxace.com> wrote: > > On Sun, Jan 03, 2021 at 05:26:29PM +0100, Arnd Bergmann wrote: > > Thank you for the report and bisecting the issue, and sorry this broke > > your system! > > > > Fortunately, the patch is fairly small, so there are only a limited number > > of things that could go wrong. I haven't tried to analyze that message, > > but I have two ideas: > > > > a) The added ioc->sense_off check gets triggered and the code relies > > on the data being written outside of the structure > > > > b) the address actually needs to always be written as a 64-bit value > > regardless of the instance->consistent_mask_64bit flag, as the > > driver did before. This looked like it was done in error. > > > > Can you try the patch below instead of the revert and see if that > > resolves the regression, and if it triggers the warning message I > > add? > > Thanks Arnd, I tried your patch and it resolves the regression. It does not > trigger the warning message you added. Ok, thanks for testing! That would mean the range check is correct, but the sense pointer must indeed be treated as a 64-bit entity regardless of instance->consistent_mask_64bit, or at least the upper 32 bit must be zero when the flag is unset, rather than the recycled previous value. I'll send a proper fix shortly, it would be nice if you could give it another spin, but the behavior should be the same as this patch. Arnd
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 861f7140f52e..c3de69f3bee8 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -8095,7 +8095,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, int error = 0, i; void *sense = NULL; dma_addr_t sense_handle; - unsigned long *sense_ptr; + void *sense_ptr; u32 opcode = 0; int ret = DCMD_SUCCESS; @@ -8218,6 +8218,12 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, } if (ioc->sense_len) { + /* make sure the pointer is part of the frame */ + if (ioc->sense_off > (sizeof(union megasas_frame) - sizeof(__le64))) { + error = -EINVAL; + goto out; + } + sense = dma_alloc_coherent(&instance->pdev->dev, ioc->sense_len, &sense_handle, GFP_KERNEL); if (!sense) { @@ -8225,12 +8231,11 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, goto out; } - sense_ptr = - (unsigned long *) ((unsigned long)cmd->frame + ioc->sense_off); + sense_ptr = (void *)cmd->frame + ioc->sense_off; if (instance->consistent_mask_64bit) - *sense_ptr = cpu_to_le64(sense_handle); + put_unaligned_le64(sense_handle, sense_ptr); else - *sense_ptr = cpu_to_le32(sense_handle); + put_unaligned_le32(sense_handle, sense_ptr); } /*
It sounds unwise to let user space pass an unchecked 32-bit offset into a kernel structure in an ioctl. This is an unsigned variable, so checking the upper bound for the size of the structure it points into is sufficient to avoid data corruption, but as the pointer might also be unaligned, it has to be written carefully as well. While I stumbled over this problem by reading the code, I did not continue checking the function for further problems like it. Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/scsi/megaraid/megaraid_sas_base.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) -- 2.27.0