Message ID | 20240530082556.2960148-1-quic_kriskura@quicinc.com |
---|---|
Headers | show |
Series | Disable SS instances in park mode for SC7180/ SC7280 | expand |
On 30.05.2024 3:34 PM, Doug Anderson wrote: > Hi, > > On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati > <quic_kriskura@quicinc.com> wrote: >> >> When working in host mode, in certain conditions, when the USB >> host controller is stressed, there is a HC died warning that comes up. >> Fix this up by disabling SS instances in park mode for SC7280 and SC7180. >> >> Krishna Kurapati (2): >> arm64: dts: qcom: sc7180: Disable SS instances in park mode >> arm64: dts: qcom: sc7280: Disable SS instances in park mode >> >> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 + >> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 + >> 2 files changed, 2 insertions(+) > > FWIW, the test case I used to reproduce this: > > 1. Plug in a USB dock w/ Ethernet > 2. Plug a USB 3 SD card reader into the dock. > 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3. > 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null > bs=4M; done to read from the card reader. > 5. At the same time, stress the Internet. If you've got a very fast > Internet connection then running Google's "Internet speed test" did > it, but I could also reproduce by just running this from a PC > connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null" > < /dev/zero > > I would also note that, though I personally reproduced this on sc7180 > and sc7280 boards and thus Krishna posted the patch for those boards, > there's no reason to believe that this problem doesn't affect all of > Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a > followup patch fixing this everywhere. Right, this sounds like a more widespread issue That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean 8280 isn't affected). My setup was: - USB3 5GB/s hub plugged into one of the side USBs - on-hub 1 Gb /s network hub connected straight to my router with a 600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the adapter isn't particularly speedy) So it stands to reason that it might not have been enough to trigger it. Konrad
Hi, On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <konrad.dybcio@linaro.org> wrote: > > On 30.05.2024 3:34 PM, Doug Anderson wrote: > > Hi, > > > > On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati > > <quic_kriskura@quicinc.com> wrote: > >> > >> When working in host mode, in certain conditions, when the USB > >> host controller is stressed, there is a HC died warning that comes up. > >> Fix this up by disabling SS instances in park mode for SC7280 and SC7180. > >> > >> Krishna Kurapati (2): > >> arm64: dts: qcom: sc7180: Disable SS instances in park mode > >> arm64: dts: qcom: sc7280: Disable SS instances in park mode > >> > >> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 + > >> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 + > >> 2 files changed, 2 insertions(+) > > > > FWIW, the test case I used to reproduce this: > > > > 1. Plug in a USB dock w/ Ethernet > > 2. Plug a USB 3 SD card reader into the dock. > > 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3. > > 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null > > bs=4M; done to read from the card reader. > > 5. At the same time, stress the Internet. If you've got a very fast > > Internet connection then running Google's "Internet speed test" did > > it, but I could also reproduce by just running this from a PC > > connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null" > > < /dev/zero > > > > I would also note that, though I personally reproduced this on sc7180 > > and sc7280 boards and thus Krishna posted the patch for those boards, > > there's no reason to believe that this problem doesn't affect all of > > Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a > > followup patch fixing this everywhere. > > Right, this sounds like a more widespread issue > > That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean > 8280 isn't affected). My setup was: > > - USB3 5GB/s hub plugged into one of the side USBs > - on-hub 1 Gb /s network hub connected straight to my router with a > 600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh > - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the > adapter isn't particularly speedy) > > So it stands to reason that it might not have been enough to trigger it. In my case I wasn't using anything nearly as fast as a M.2 SSD. I was just using a normal USB3 SD card reader. That being said, multiple people at Qualcomm were able to replicate the issue without lots of back and forth, so I'd guess that the problem isn't that sensitive to the exact storage device. I will also note that it's not sensitive to the exact network device as I replicated it with two Ethernet adapters with very different chipsets. My only guess is that somehow SC8280XP is faster and that changes the timing of how it handles interrupts. I guess you could try capping your cpufreq in sysfs and see if that makes a difference in reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of the IP where they've fixed this? It would be interesting if someone with a SDM845 dragonboard could try replicating since that seems highly likely to reproduce, at least. -Doug
On 31.05.2024 4:17 PM, Doug Anderson wrote: > Hi, > > On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <konrad.dybcio@linaro.org> wrote: >> >> On 30.05.2024 3:34 PM, Doug Anderson wrote: >>> Hi, >>> >>> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati >>> <quic_kriskura@quicinc.com> wrote: >>>> >>>> When working in host mode, in certain conditions, when the USB >>>> host controller is stressed, there is a HC died warning that comes up. >>>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180. >>>> >>>> Krishna Kurapati (2): >>>> arm64: dts: qcom: sc7180: Disable SS instances in park mode >>>> arm64: dts: qcom: sc7280: Disable SS instances in park mode >>>> >>>> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 + >>>> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 + >>>> 2 files changed, 2 insertions(+) >>> >>> FWIW, the test case I used to reproduce this: >>> >>> 1. Plug in a USB dock w/ Ethernet >>> 2. Plug a USB 3 SD card reader into the dock. >>> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3. >>> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null >>> bs=4M; done to read from the card reader. >>> 5. At the same time, stress the Internet. If you've got a very fast >>> Internet connection then running Google's "Internet speed test" did >>> it, but I could also reproduce by just running this from a PC >>> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null" >>> < /dev/zero >>> >>> I would also note that, though I personally reproduced this on sc7180 >>> and sc7280 boards and thus Krishna posted the patch for those boards, >>> there's no reason to believe that this problem doesn't affect all of >>> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a >>> followup patch fixing this everywhere. >> >> Right, this sounds like a more widespread issue >> >> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean >> 8280 isn't affected). My setup was: >> >> - USB3 5GB/s hub plugged into one of the side USBs >> - on-hub 1 Gb /s network hub connected straight to my router with a >> 600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh >> - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the >> adapter isn't particularly speedy) >> >> So it stands to reason that it might not have been enough to trigger it. > > In my case I wasn't using anything nearly as fast as a M.2 SSD. I was > just using a normal USB3 SD card reader. That being said, multiple > people at Qualcomm were able to replicate the issue without lots of > back and forth, so I'd guess that the problem isn't that sensitive to > the exact storage device. I will also note that it's not sensitive to > the exact network device as I replicated it with two Ethernet adapters > with very different chipsets. > > My only guess is that somehow SC8280XP is faster and that changes the > timing of how it handles interrupts. I guess you could try capping > your cpufreq in sysfs and see if that makes a difference in > reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of > the IP where they've fixed this? Well, great minds think alike :P I did cap it to f_min on all cores, but that didn't change the situation. Might have been worth to check out powering off all cores except 0.. I might do that at one point. My guess is that with a process node change, they might have used some newer/better ip revision though. Remains to be seen. Konrad > > It would be interesting if someone with a SDM845 dragonboard could try > replicating since that seems highly likely to reproduce, at least. > > -Doug
Hi, On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV <quic_kriskura@quicinc.com> wrote: > > > My only guess is that somehow SC8280XP is faster and that changes the > > timing of how it handles interrupts. I guess you could try capping > > your cpufreq in sysfs and see if that makes a difference in > > reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of > > the IP where they've fixed this? > > > > It would be interesting if someone with a SDM845 dragonboard could try > > replicating since that seems highly likely to reproduce, at least. > > > > Hi Konrad, Doug, > > Usually on downstream we set this quirk only for all Gen-1 targets > (not particularly for this testcase) but to avoid these kind of > controller going dead issues. I can filter out the gen-1 targets (other > than sc7280/sc7180) and send a separate series to add this quirk in all > of them. Sounds like a plan to me! -Doug
On 31.05.2024 4:31 PM, Doug Anderson wrote: > Hi, > > On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV > <quic_kriskura@quicinc.com> wrote: >> >>> My only guess is that somehow SC8280XP is faster and that changes the >>> timing of how it handles interrupts. I guess you could try capping >>> your cpufreq in sysfs and see if that makes a difference in >>> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of >>> the IP where they've fixed this? >>> >>> It would be interesting if someone with a SDM845 dragonboard could try >>> replicating since that seems highly likely to reproduce, at least. >>> >> >> Hi Konrad, Doug, >> >> Usually on downstream we set this quirk only for all Gen-1 targets >> (not particularly for this testcase) but to avoid these kind of >> controller going dead issues. I can filter out the gen-1 targets (other >> than sc7280/sc7180) and send a separate series to add this quirk in all >> of them. > > Sounds like a plan to me! Yep! In case there are more gen1 platforms than what we have upstream, it would be of great utility if you could list them all, so that we can have a reference for future additions, Krishna. Konrad
On Thu, May 30, 2024 at 01:55:55PM GMT, Krishna Kurapati wrote: > On SC7180, in host mode, it is observed that stressing out controller > in host mode results in HC died error and only restarting the host Could you please include a copy of that error message, so that others searching for that error message will be able to find this commit? Also, there's three "in host mode"s in this sentence. > mode fixes it. Disable SS instances in park mode for these targets to Please spell SS SuperSpeed. Regards, Bjorn > avoid host controller being dead. > > Reported-by: Doug Anderson <dianders@google.com> > Cc: <stable@vger.kernel.org> > Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes") > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > --- > arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi > index 2b481e20ae38..cc93b5675d5d 100644 > --- a/arch/arm64/boot/dts/qcom/sc7180.dtsi > +++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi > @@ -3063,6 +3063,7 @@ usb_1_dwc3: usb@a600000 { > iommus = <&apps_smmu 0x540 0>; > snps,dis_u2_susphy_quirk; > snps,dis_enblslpm_quirk; > + snps,parkmode-disable-ss-quirk; > phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>; > phy-names = "usb2-phy", "usb3-phy"; > maximum-speed = "super-speed"; > -- > 2.34.1 >