Message ID | 20210419115921.19219-1-qiangqing.zhang@nxp.com |
---|---|
State | New |
Headers | show |
Series | [RFC,net-next] net: stmmac: should not modify RX descriptor when STMMAC resume | expand |
On 20/04/2021 02:49, Joakim Zhang wrote: ... >> I have tested this patch, but unfortunately the board still fails to resume >> correctly. So it appears to suffer with the same issue we saw on the previous >> implementation. > > Could I double check with you? Have you reverted Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume back") and then apply above patch to do the test? > > If yes, you still saw the same issue with Commit 9c63faaa931e? Let's recall the problem, system suspended, but system hang when STMMAC resume back, right? I tested your patch on top of next-20210419 which has Thierry's revert of 9c63faaa931e. So yes this is reverted. Unfortunately, with this change resuming from suspend still does not work. Jon
> -----Original Message----- > From: Jon Hunter <jonathanh@nvidia.com> > Sent: 2021年4月20日 21:34 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; peppe.cavallaro@st.com; > alexandre.torgue@foss.st.com; joabreu@synopsys.com; > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com; > andrew@lunn.ch; f.fainelli@gmail.com > Cc: dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; > netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > > > On 20/04/2021 02:49, Joakim Zhang wrote: > > ... > > >> I have tested this patch, but unfortunately the board still fails to > >> resume correctly. So it appears to suffer with the same issue we saw > >> on the previous implementation. > > > > Could I double check with you? Have you reverted Commit 9c63faaa931e > ("net: stmmac: re-init rx buffers when mac resume back") and then apply above > patch to do the test? > > > > If yes, you still saw the same issue with Commit 9c63faaa931e? Let's recall > the problem, system suspended, but system hang when STMMAC resume back, > right? > > > I tested your patch on top of next-20210419 which has Thierry's revert of > 9c63faaa931e. So yes this is reverted. Unfortunately, with this change > resuming from suspend still does not work. Hi Jakub, Andrew, Could you please help review this patch? It's really beyond my comprehension, why this patch would affect Tegra186 Jetson TX2 board? Thanks a lot! Best Regards, Joakim Zhang > Jon > > -- > nvpublic
On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: > Could you please help review this patch? It's really beyond my > comprehension, why this patch would affect Tegra186 Jetson TX2 board? Looks okay, please repost as non-RFC.
On 4/21/2021 9:53 PM, Joakim Zhang wrote: > >> -----Original Message----- >> From: Jon Hunter <jonathanh@nvidia.com> >> Sent: 2021年4月20日 21:34 >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; peppe.cavallaro@st.com; >> alexandre.torgue@foss.st.com; joabreu@synopsys.com; >> davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com; >> andrew@lunn.ch; f.fainelli@gmail.com >> Cc: dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; >> netdev@vger.kernel.org >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when >> STMMAC resume >> >> >> >> On 20/04/2021 02:49, Joakim Zhang wrote: >> >> ... >> >>>> I have tested this patch, but unfortunately the board still fails to >>>> resume correctly. So it appears to suffer with the same issue we saw >>>> on the previous implementation. >>> >>> Could I double check with you? Have you reverted Commit 9c63faaa931e >> ("net: stmmac: re-init rx buffers when mac resume back") and then apply above >> patch to do the test? >>> >>> If yes, you still saw the same issue with Commit 9c63faaa931e? Let's recall >> the problem, system suspended, but system hang when STMMAC resume back, >> right? >> >> >> I tested your patch on top of next-20210419 which has Thierry's revert of >> 9c63faaa931e. So yes this is reverted. Unfortunately, with this change >> resuming from suspend still does not work. > > > Hi Jakub, Andrew, > > Could you please help review this patch? It's really beyond my comprehension, why this patch would affect Tegra186 Jetson TX2 board? > Thanks a lot! What does the resumption failure looks like? Does the stmmac driver successfully resume from your suspend state, but there is no network traffic? Do you have a log by any chance? Is power to the Ethernet MAC turned off in this suspend state, in which case could we be missing an essential register programming stage? -- Florian
On 4/19/2021 4:59 AM, Joakim Zhang wrote: > When system resume back, STMMAC will clear RX descriptors: > stmmac_resume() > ->stmmac_clear_descriptors() > ->stmmac_clear_rx_descriptors() > ->stmmac_init_rx_desc() > ->dwmac4_set_rx_owner() > //p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR); > It only assets OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2 fields. > > Let's take a case into account, when system suspend, it is possible that > there are packets have not received yet, so the RX descriptors are wrote > back by DMA, e.g. > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040 > > When system resume back, after above process, it became a broken > descriptor: > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 > > The issue is that it only changes the owner of this descriptor, but do nothing > about desc0/1/2 fields. The descriptor of STMMAC a bit special, applicaton > prepares RX descriptors for DMA, after DMA recevie the packets, it will write > back the descriptors, so the same field of a descriptor have different > meanings to application and DMA. It should be a software bug there, and may > not easy to reproduce, but there is a certain probability that it will > occur. > > Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume back") tried > to re-init desc0/desc1 (buffer address fields) to fix this issue, but it > is not a proper solution, and made regression on Jetson TX2 boards. > > It is unreasonable to modify RX descriptors outside of stmmac_rx_refill() function, > where it will clear all desc0/desc1/desc2/desc3 fields together. > > This patch removes RX descriptors modification when STMMAC resume. Your patch makes sense to me, however the explanation seems to highlight that you may have a few cases to consider while you suspend. Usually you will turn off the RX DMA such that DMA into DRAM stops there, this may not an entirely atomic operation as the MAC may have to wait for a certain packet boundary to be crossed, that could leave you with descriptors in 3 states I believe: - descriptor is ready for RX DMA to process and is owned by RX DMA, no need to do anything - descriptor has been fully consumed by the CPU and is owned by the CPU, CPU should be putting the descriptor back on the ring and relinquish ownership - descriptor has been written to DRAM but not processed by CPU, and it should be put back on the ring for RX DMA to use it Out of suspend, don't you need to deal with descriptors in cases 2 and 3 somehow? Does the DMA skip over descriptors that are still marked as owned by the CPU or does it stop/stall? -- Florian
On 22/04/2021 17:12, Florian Fainelli wrote: ... > What does the resumption failure looks like? Does the stmmac driver > successfully resume from your suspend state, but there is no network > traffic? Do you have a log by any chance? The board fails to resume and appears to hang. With regard to the original patch I did find that moving the code to re-init the RX buffers to before the PHY is enabled did work [0]. > Is power to the Ethernet MAC turned off in this suspend state, in which > case could we be missing an essential register programming stage? It seems to be more of a sequencing issue rather than a power issue. I have also ran 2000 suspend cycles on our Tegra platform without Joakim's patch to see how stable suspend is on this platform. I did not see any failures in 2000 cycles and so it is not evident to me that the problem that Joakim is trying to fix is seen on devices such as Tegra. Admittedly, if it is hard to reproduce, then it is possible we have not seen it yet. Jon [0] https://lore.kernel.org/netdev/e4864046-e52f-63b6-a490-74c3cd8045f4@nvidia.com/ -- nvpublic
On 4/22/2021 10:00 AM, Jon Hunter wrote: > > On 22/04/2021 17:12, Florian Fainelli wrote: > > ... > >> What does the resumption failure looks like? Does the stmmac driver >> successfully resume from your suspend state, but there is no network >> traffic? Do you have a log by any chance? > > The board fails to resume and appears to hang. With regard to the > original patch I did find that moving the code to re-init the RX buffers > to before the PHY is enabled did work [0]. You indicated that you are using a Broadcom PHY, which specific PHY are you using? I suspect that the stmmac is somehow relying on the PHY to provide its 125MHz RXC clock back to you in order to have its RX logic work correctly. One difference between using the Broadcom PHY and the Generic PHY drivers could be whether your Broadcom PHY driver entry has a .suspend/.resume callback implemented or not. > >> Is power to the Ethernet MAC turned off in this suspend state, in which >> case could we be missing an essential register programming stage? > > It seems to be more of a sequencing issue rather than a power issue. > > I have also ran 2000 suspend cycles on our Tegra platform without > Joakim's patch to see how stable suspend is on this platform. I did not > see any failures in 2000 cycles and so it is not evident to me that the > problem that Joakim is trying to fix is seen on devices such as Tegra. > Admittedly, if it is hard to reproduce, then it is possible we have not > seen it yet. > > Jon > > [0] > https://lore.kernel.org/netdev/e4864046-e52f-63b6-a490-74c3cd8045f4@nvidia.com/ > -- Florian
On 22/04/2021 18:32, Florian Fainelli wrote: > > > On 4/22/2021 10:00 AM, Jon Hunter wrote: >> >> On 22/04/2021 17:12, Florian Fainelli wrote: >> >> ... >> >>> What does the resumption failure looks like? Does the stmmac driver >>> successfully resume from your suspend state, but there is no network >>> traffic? Do you have a log by any chance? >> >> The board fails to resume and appears to hang. With regard to the >> original patch I did find that moving the code to re-init the RX buffers >> to before the PHY is enabled did work [0]. > > You indicated that you are using a Broadcom PHY, which specific PHY are > you using? > > I suspect that the stmmac is somehow relying on the PHY to provide its > 125MHz RXC clock back to you in order to have its RX logic work correctly. > > One difference between using the Broadcom PHY and the Generic PHY > drivers could be whether your Broadcom PHY driver entry has a > .suspend/.resume callback implemented or not. This board has a BCM89610 and uses the drivers/net/phy/broadcom.c driver. Interestingly I don't see any suspend/resume handlers for this phy. Cheers Jon -- nvpublic
On 4/22/2021 11:18 AM, Jon Hunter wrote: > > On 22/04/2021 18:32, Florian Fainelli wrote: >> >> >> On 4/22/2021 10:00 AM, Jon Hunter wrote: >>> >>> On 22/04/2021 17:12, Florian Fainelli wrote: >>> >>> ... >>> >>>> What does the resumption failure looks like? Does the stmmac driver >>>> successfully resume from your suspend state, but there is no network >>>> traffic? Do you have a log by any chance? >>> >>> The board fails to resume and appears to hang. With regard to the >>> original patch I did find that moving the code to re-init the RX buffers >>> to before the PHY is enabled did work [0]. >> >> You indicated that you are using a Broadcom PHY, which specific PHY are >> you using? >> >> I suspect that the stmmac is somehow relying on the PHY to provide its >> 125MHz RXC clock back to you in order to have its RX logic work correctly. >> >> One difference between using the Broadcom PHY and the Generic PHY >> drivers could be whether your Broadcom PHY driver entry has a >> .suspend/.resume callback implemented or not. > > > This board has a BCM89610 and uses the drivers/net/phy/broadcom.c > driver. Interestingly I don't see any suspend/resume handlers for this phy. Now if you do add .suspend = genphy_suspend and .resume = bcm54xx_resume for this PHY entry does it work better? -- Florian
Hi Florian, Thanks for your comments. > -----Original Message----- > From: Florian Fainelli <f.fainelli@gmail.com> > Sent: 2021年4月23日 0:32 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; peppe.cavallaro@st.com; > alexandre.torgue@foss.st.com; joabreu@synopsys.com; > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com; > andrew@lunn.ch > Cc: dl-linux-imx <linux-imx@nxp.com>; jonathanh@nvidia.com; > treding@nvidia.com; netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > > > On 4/19/2021 4:59 AM, Joakim Zhang wrote: > > When system resume back, STMMAC will clear RX descriptors: > > stmmac_resume() > > ->stmmac_clear_descriptors() > > ->stmmac_clear_rx_descriptors() > > ->stmmac_init_rx_desc() > > ->dwmac4_set_rx_owner() > > //p->des3 |= cpu_to_le32(RDES3_OWN | > RDES3_BUFFER1_VALID_ADDR); It > > only assets OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2 fields. > > > > Let's take a case into account, when system suspend, it is possible > > that there are packets have not received yet, so the RX descriptors > > are wrote back by DMA, e.g. > > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040 > > > > When system resume back, after above process, it became a broken > > descriptor: > > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 > > > > The issue is that it only changes the owner of this descriptor, but do > > nothing about desc0/1/2 fields. The descriptor of STMMAC a bit > > special, applicaton prepares RX descriptors for DMA, after DMA recevie > > the packets, it will write back the descriptors, so the same field of > > a descriptor have different meanings to application and DMA. It should > > be a software bug there, and may not easy to reproduce, but there is a > > certain probability that it will occur. > > > > Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume > > back") tried to re-init desc0/desc1 (buffer address fields) to fix > > this issue, but it is not a proper solution, and made regression on Jetson TX2 > boards. > > > > It is unreasonable to modify RX descriptors outside of > > stmmac_rx_refill() function, where it will clear all desc0/desc1/desc2/desc3 > fields together. > > > > This patch removes RX descriptors modification when STMMAC resume. > > Your patch makes sense to me, however the explanation seems to highlight > that you may have a few cases to consider while you suspend. > > Usually you will turn off the RX DMA such that DMA into DRAM stops there, > this may not an entirely atomic operation as the MAC may have to wait for a > certain packet boundary to be crossed, that could leave you with descriptors in > 3 states I believe: > > - descriptor is ready for RX DMA to process and is owned by RX DMA, no need > to do anything Agree. > - descriptor has been fully consumed by the CPU and is owned by the CPU, CPU > should be putting the descriptor back on the ring and relinquish ownership Yes, at this case, after stmmac resume, this descriptor would have chance to be refilled in stmmac_rx_refill() function. > - descriptor has been written to DRAM but not processed by CPU, and it should > be put back on the ring for RX DMA to use it This case is descriptor is owned by CPU and rx buffer have not been consumed yet, I think it still has chance to be received from stmmac_rx() after stmmac resume. > > Out of suspend, don't you need to deal with descriptors in cases 2 and 3 > somehow? Does the DMA skip over descriptors that are still marked as owned > by the CPU or does it stop/stall? Yes, AFAIK, DMA would skip over descriptors that are owned by CPU, only use the descriptors that owned by DMA. This is my understanding. Best Regards, Joakim Zhang > -- > Florian
Hi Florian, John, > -----Original Message----- > From: Florian Fainelli <f.fainelli@gmail.com> > Sent: 2021年4月23日 1:33 > To: Jon Hunter <jonathanh@nvidia.com>; Joakim Zhang > <qiangqing.zhang@nxp.com>; peppe.cavallaro@st.com; > alexandre.torgue@foss.st.com; joabreu@synopsys.com; > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com; > andrew@lunn.ch > Cc: dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; > netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > > > On 4/22/2021 10:00 AM, Jon Hunter wrote: > > > > On 22/04/2021 17:12, Florian Fainelli wrote: > > > > ... > > > >> What does the resumption failure looks like? Does the stmmac driver > >> successfully resume from your suspend state, but there is no network > >> traffic? Do you have a log by any chance? > > > > The board fails to resume and appears to hang. With regard to the > > original patch I did find that moving the code to re-init the RX > > buffers to before the PHY is enabled did work [0]. > > You indicated that you are using a Broadcom PHY, which specific PHY are you > using? > > I suspect that the stmmac is somehow relying on the PHY to provide its > 125MHz RXC clock back to you in order to have its RX logic work correctly. Yes, for i.MX platforms, we need PHY feeds its RXC clock back to STMMAC for RX logic, not sure if it is need for NVIDIA platforms. And now stmmac resume PHY(phylink_start) before initialize MAC. > One difference between using the Broadcom PHY and the Generic PHY drivers > could be whether your Broadcom PHY driver entry has a .suspend/.resume > callback implemented or not. If there is no .suspend/.resume callback, the PHY should always work. I have a concern, does Boardcom always can feed continuous RX clock? Does it support EEE? Best Regards, Joakim Zhang > > > >> Is power to the Ethernet MAC turned off in this suspend state, in > >> which case could we be missing an essential register programming stage? > > > > It seems to be more of a sequencing issue rather than a power issue. > > > > I have also ran 2000 suspend cycles on our Tegra platform without > > Joakim's patch to see how stable suspend is on this platform. I did > > not see any failures in 2000 cycles and so it is not evident to me > > that the problem that Joakim is trying to fix is seen on devices such as Tegra. > > Admittedly, if it is hard to reproduce, then it is possible we have > > not seen it yet. > > > > Jon > > > > [0] > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore > > .kernel.org%2Fnetdev%2Fe4864046-e52f-63b6-a490-74c3cd8045f4%40nvidia > .c > > > om%2F&data=04%7C01%7Cqiangqing.zhang%40nxp.com%7Ca4ce9a89b > 92c4f600 > > > 84608d905b4a285%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63 > 7547095 > > > 639789970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi > V2luMzIi > > > LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xLgYkYit0EfOXbbM > lz78ws > > b4w19W36A0FvPvkaPzl7o%3D&reserved=0 > > > > -- > Florian
On 22/04/2021 19:20, Florian Fainelli wrote: > > > On 4/22/2021 11:18 AM, Jon Hunter wrote: >> >> On 22/04/2021 18:32, Florian Fainelli wrote: >>> >>> >>> On 4/22/2021 10:00 AM, Jon Hunter wrote: >>>> >>>> On 22/04/2021 17:12, Florian Fainelli wrote: >>>> >>>> ... >>>> >>>>> What does the resumption failure looks like? Does the stmmac driver >>>>> successfully resume from your suspend state, but there is no network >>>>> traffic? Do you have a log by any chance? >>>> >>>> The board fails to resume and appears to hang. With regard to the >>>> original patch I did find that moving the code to re-init the RX buffers >>>> to before the PHY is enabled did work [0]. >>> >>> You indicated that you are using a Broadcom PHY, which specific PHY are >>> you using? >>> >>> I suspect that the stmmac is somehow relying on the PHY to provide its >>> 125MHz RXC clock back to you in order to have its RX logic work correctly. >>> >>> One difference between using the Broadcom PHY and the Generic PHY >>> drivers could be whether your Broadcom PHY driver entry has a >>> .suspend/.resume callback implemented or not. >> >> >> This board has a BCM89610 and uses the drivers/net/phy/broadcom.c >> driver. Interestingly I don't see any suspend/resume handlers for this phy. > > Now if you do add .suspend = genphy_suspend and .resume = bcm54xx_resume > for this PHY entry does it work better? Thanks for the suggestion. I tried this and this appears to breaking networking altogether. In other words, the board was not longer able to request an IP address. I also tried the genphy_resume and the board was able to get an IP address, but it is still unable to resume from suspend. Thanks Jon -- nvpublic
On 22/04/2021 16:56, Jakub Kicinski wrote: > On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: >> Could you please help review this patch? It's really beyond my >> comprehension, why this patch would affect Tegra186 Jetson TX2 board? > > Looks okay, please repost as non-RFC. I still have an issue with a board not being able to resume from suspend with this patch. Shouldn't we try to resolve that first? Jon
On 23.04.2021 15:46, Jon Hunter wrote: > > On 22/04/2021 19:20, Florian Fainelli wrote: >> >> >> On 4/22/2021 11:18 AM, Jon Hunter wrote: >>> >>> On 22/04/2021 18:32, Florian Fainelli wrote: >>>> >>>> >>>> On 4/22/2021 10:00 AM, Jon Hunter wrote: >>>>> >>>>> On 22/04/2021 17:12, Florian Fainelli wrote: >>>>> >>>>> ... >>>>> >>>>>> What does the resumption failure looks like? Does the stmmac driver >>>>>> successfully resume from your suspend state, but there is no network >>>>>> traffic? Do you have a log by any chance? >>>>> >>>>> The board fails to resume and appears to hang. With regard to the >>>>> original patch I did find that moving the code to re-init the RX buffers >>>>> to before the PHY is enabled did work [0]. >>>> >>>> You indicated that you are using a Broadcom PHY, which specific PHY are >>>> you using? >>>> >>>> I suspect that the stmmac is somehow relying on the PHY to provide its >>>> 125MHz RXC clock back to you in order to have its RX logic work correctly. >>>> >>>> One difference between using the Broadcom PHY and the Generic PHY >>>> drivers could be whether your Broadcom PHY driver entry has a >>>> .suspend/.resume callback implemented or not. >>> >>> >>> This board has a BCM89610 and uses the drivers/net/phy/broadcom.c >>> driver. Interestingly I don't see any suspend/resume handlers for this phy. >> >> Now if you do add .suspend = genphy_suspend and .resume = bcm54xx_resume >> for this PHY entry does it work better? > > Thanks for the suggestion. I tried this and this appears to breaking > networking altogether. In other words, the board was not longer able to > request an IP address. I also tried the genphy_resume and the board was > able to get an IP address, but it is still unable to resume from suspend. > Maybe MDIO bus PM and MAC driver PM interfere. Following changes dealt with such an issue with the fec driver and a specific PHY model. fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") 557d5dc83f68 ("net: fec: use mac-managed PHY PM") If stmmac suspend/resume take care also of PHY pm, you may try a similar change in stmmac. > Thanks > Jon > Heiner
> -----Original Message----- > From: Jon Hunter <jonathanh@nvidia.com> > Sent: 2021年4月23日 21:48 > To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang > <qiangqing.zhang@nxp.com> > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > joabreu@synopsys.com; davem@davemloft.net; > mcoquelin.stm32@gmail.com; andrew@lunn.ch; f.fainelli@gmail.com; > dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; > netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > > On 22/04/2021 16:56, Jakub Kicinski wrote: > > On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: > >> Could you please help review this patch? It's really beyond my > >> comprehension, why this patch would affect Tegra186 Jetson TX2 board? > > > > Looks okay, please repost as non-RFC. > > > I still have an issue with a board not being able to resume from suspend with > this patch. Shouldn't we try to resolve that first? Hi Jon, Any updates about this? Could I repost as non-RFC? Best Regards, Joakim Zhang > Jon
Hi Joakim, On 06/05/2021 07:33, Joakim Zhang wrote: > >> -----Original Message----- >> From: Jon Hunter <jonathanh@nvidia.com> >> Sent: 2021年4月23日 21:48 >> To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang >> <qiangqing.zhang@nxp.com> >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; >> joabreu@synopsys.com; davem@davemloft.net; >> mcoquelin.stm32@gmail.com; andrew@lunn.ch; f.fainelli@gmail.com; >> dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; >> netdev@vger.kernel.org >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when >> STMMAC resume >> >> >> On 22/04/2021 16:56, Jakub Kicinski wrote: >>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: >>>> Could you please help review this patch? It's really beyond my >>>> comprehension, why this patch would affect Tegra186 Jetson TX2 board? >>> >>> Looks okay, please repost as non-RFC. >> >> >> I still have an issue with a board not being able to resume from suspend with >> this patch. Shouldn't we try to resolve that first? > > Hi Jon, > > Any updates about this? Could I repost as non-RFC? Sorry no updates from my end. Again, I don't see how we can post this as it introduces a regression for us. I am sorry that I am not able to help more here, but we have done some extensive testing on the current mainline without your change and I don't see any issues with regard to suspend/resume. Hence, this does not appear to fix any pre-existing issues. It is possible that we are not seeing them. At this point I think that we really need someone from Synopsys to help us understand that exact problem that you are experiencing so that we can ensure we have the necessary fix in place and if this is something that is applicable to all devices or not. Jon -- nvpublic
Hi Jakub, > -----Original Message----- > From: Jon Hunter <jonathanh@nvidia.com> > Sent: 2021年5月7日 22:22 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jakub Kicinski > <kuba@kernel.org> > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > joabreu@synopsys.com; davem@davemloft.net; > mcoquelin.stm32@gmail.com; andrew@lunn.ch; f.fainelli@gmail.com; > dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; > netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > Hi Joakim, > > On 06/05/2021 07:33, Joakim Zhang wrote: > > > >> -----Original Message----- > >> From: Jon Hunter <jonathanh@nvidia.com> > >> Sent: 2021年4月23日 21:48 > >> To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang > >> <qiangqing.zhang@nxp.com> > >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > >> joabreu@synopsys.com; davem@davemloft.net; > mcoquelin.stm32@gmail.com; > >> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > >> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > >> descriptor when STMMAC resume > >> > >> > >> On 22/04/2021 16:56, Jakub Kicinski wrote: > >>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: > >>>> Could you please help review this patch? It's really beyond my > >>>> comprehension, why this patch would affect Tegra186 Jetson TX2 board? > >>> > >>> Looks okay, please repost as non-RFC. > >> > >> > >> I still have an issue with a board not being able to resume from > >> suspend with this patch. Shouldn't we try to resolve that first? > > > > Hi Jon, > > > > Any updates about this? Could I repost as non-RFC? > > > Sorry no updates from my end. Again, I don't see how we can post this as it > introduces a regression for us. I am sorry that I am not able to help more here, > but we have done some extensive testing on the current mainline without your > change and I don't see any issues with regard to suspend/resume. Hence, this > does not appear to fix any pre-existing issues. It is possible that we are not > seeing them. > > At this point I think that we really need someone from Synopsys to help us > understand that exact problem that you are experiencing so that we can > ensure we have the necessary fix in place and if this is something that is > applicable to all devices or not. This patch only removes modification of Rx descriptors when STMMAC resume back, IMHO, it should not affect system suspend/resume function. Do you have any idea about Joh's issue or any acceptable solution to fix the issue I met? Thanks a lot! Best Regards, Joakim Zhang > Jon > > -- > nvpublic
On 5/8/2021 4:20 AM, Joakim Zhang wrote: > > Hi Jakub, > >> -----Original Message----- >> From: Jon Hunter <jonathanh@nvidia.com> >> Sent: 2021年5月7日 22:22 >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jakub Kicinski >> <kuba@kernel.org> >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; >> joabreu@synopsys.com; davem@davemloft.net; >> mcoquelin.stm32@gmail.com; andrew@lunn.ch; f.fainelli@gmail.com; >> dl-linux-imx <linux-imx@nxp.com>; treding@nvidia.com; >> netdev@vger.kernel.org >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when >> STMMAC resume >> >> Hi Joakim, >> >> On 06/05/2021 07:33, Joakim Zhang wrote: >>> >>>> -----Original Message----- >>>> From: Jon Hunter <jonathanh@nvidia.com> >>>> Sent: 2021年4月23日 21:48 >>>> To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang >>>> <qiangqing.zhang@nxp.com> >>>> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; >>>> joabreu@synopsys.com; davem@davemloft.net; >> mcoquelin.stm32@gmail.com; >>>> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx >>>> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org >>>> Subject: Re: [RFC net-next] net: stmmac: should not modify RX >>>> descriptor when STMMAC resume >>>> >>>> >>>> On 22/04/2021 16:56, Jakub Kicinski wrote: >>>>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: >>>>>> Could you please help review this patch? It's really beyond my >>>>>> comprehension, why this patch would affect Tegra186 Jetson TX2 board? >>>>> >>>>> Looks okay, please repost as non-RFC. >>>> >>>> >>>> I still have an issue with a board not being able to resume from >>>> suspend with this patch. Shouldn't we try to resolve that first? >>> >>> Hi Jon, >>> >>> Any updates about this? Could I repost as non-RFC? >> >> >> Sorry no updates from my end. Again, I don't see how we can post this as it >> introduces a regression for us. I am sorry that I am not able to help more here, >> but we have done some extensive testing on the current mainline without your >> change and I don't see any issues with regard to suspend/resume. Hence, this >> does not appear to fix any pre-existing issues. It is possible that we are not >> seeing them. >> >> At this point I think that we really need someone from Synopsys to help us >> understand that exact problem that you are experiencing so that we can >> ensure we have the necessary fix in place and if this is something that is >> applicable to all devices or not. > > This patch only removes modification of Rx descriptors when STMMAC resume back, IMHO, it should not affect system suspend/resume function. > Do you have any idea about Joh's issue or any acceptable solution to fix the issue I met? Thanks a lot! Joakim, don't you have a support contact at Synopsys who would be able to help or someone at NXP who was responsible for the MAC integration? We also have Synopsys engineers copied so presumably they could shed some light. -- Florian
Hi Florian, > -----Original Message----- > From: Florian Fainelli <f.fainelli@gmail.com> > Sent: 2021年5月8日 23:42 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jon Hunter > <jonathanh@nvidia.com>; Jakub Kicinski <kuba@kernel.org> > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > joabreu@synopsys.com; davem@davemloft.net; > mcoquelin.stm32@gmail.com; andrew@lunn.ch; dl-linux-imx > <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > > > On 5/8/2021 4:20 AM, Joakim Zhang wrote: > > > > Hi Jakub, > > > >> -----Original Message----- > >> From: Jon Hunter <jonathanh@nvidia.com> > >> Sent: 2021年5月7日 22:22 > >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jakub Kicinski > >> <kuba@kernel.org> > >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > >> joabreu@synopsys.com; davem@davemloft.net; > mcoquelin.stm32@gmail.com; > >> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > >> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > >> descriptor when STMMAC resume > >> > >> Hi Joakim, > >> > >> On 06/05/2021 07:33, Joakim Zhang wrote: > >>> > >>>> -----Original Message----- > >>>> From: Jon Hunter <jonathanh@nvidia.com> > >>>> Sent: 2021年4月23日 21:48 > >>>> To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang > >>>> <qiangqing.zhang@nxp.com> > >>>> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > >>>> joabreu@synopsys.com; davem@davemloft.net; > >> mcoquelin.stm32@gmail.com; > >>>> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > >>>> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > >>>> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > >>>> descriptor when STMMAC resume > >>>> > >>>> > >>>> On 22/04/2021 16:56, Jakub Kicinski wrote: > >>>>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: > >>>>>> Could you please help review this patch? It's really beyond my > >>>>>> comprehension, why this patch would affect Tegra186 Jetson TX2 > board? > >>>>> > >>>>> Looks okay, please repost as non-RFC. > >>>> > >>>> > >>>> I still have an issue with a board not being able to resume from > >>>> suspend with this patch. Shouldn't we try to resolve that first? > >>> > >>> Hi Jon, > >>> > >>> Any updates about this? Could I repost as non-RFC? > >> > >> > >> Sorry no updates from my end. Again, I don't see how we can post this > >> as it introduces a regression for us. I am sorry that I am not able > >> to help more here, but we have done some extensive testing on the > >> current mainline without your change and I don't see any issues with > >> regard to suspend/resume. Hence, this does not appear to fix any > >> pre-existing issues. It is possible that we are not seeing them. > >> > >> At this point I think that we really need someone from Synopsys to > >> help us understand that exact problem that you are experiencing so > >> that we can ensure we have the necessary fix in place and if this is > >> something that is applicable to all devices or not. > > > > This patch only removes modification of Rx descriptors when STMMAC > resume back, IMHO, it should not affect system suspend/resume function. > > Do you have any idea about Joh's issue or any acceptable solution to fix the > issue I met? Thanks a lot! > > Joakim, don't you have a support contact at Synopsys who would be able to > help or someone at NXP who was responsible for the MAC integration? > We also have Synopsys engineers copied so presumably they could shed some > light. I contacted Synopsys no substantive help was received, and integration guys from NXP is unavailable now. But, some hints has came out, seems a bit help. I found that the DMA width is 34 bits on i.MX8MP, this may different from many existing SoCs which integrated STMMAC. As I described in the commit message: When system suspend: the rx descriptor is 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040 When system resume: the rx descriptor modified to 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 Since the DMA is 34 bits width, so desc0/desc1 indicates the buffer address, after system resume, the buffer address changed to 0x4000000000. And the correct rx descriptor is 008 [0x00000000c4310080]: 0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x16511000. So when DMA tried to access 0x4000000000, this valid address, would generate fatal bus error. But for other 32 bits width DMA, DMA seems still can work when this issue happened, only desc0 indicates buffer address, so the buffer address is 0x0 when system resume. And there is a NOTE in the guide: In the Receive Descriptor (Read Format), if the Buffer Address field is all 0s, the module does not transfer data to that buffer and skips to the next buffer or next descriptor. For this note, I don't know what could IP actually do, when detect all zeros buffer address, it will change the descriptor to application own? If not, STMMAC driver seems can't handle this case. I will contact Synopsys guys for more details. It now appears that this issue seems only can be reproduced on DMA width more than 32 bits, this may be why other SoCs(e.g. i.MX8DXL) which integrated the same STMMAC IP can't reproduce it. Best Regards, Joakim Zhang > -- > Florian
On Mon, May 10, 2021 at 02:10:21AM +0000, Joakim Zhang wrote: > > Hi Florian, > > > -----Original Message----- > > From: Florian Fainelli <f.fainelli@gmail.com> > > Sent: 2021年5月8日 23:42 > > To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jon Hunter > > <jonathanh@nvidia.com>; Jakub Kicinski <kuba@kernel.org> > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > > joabreu@synopsys.com; davem@davemloft.net; > > mcoquelin.stm32@gmail.com; andrew@lunn.ch; dl-linux-imx > > <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > > STMMAC resume > > > > > > > > On 5/8/2021 4:20 AM, Joakim Zhang wrote: > > > > > > Hi Jakub, > > > > > >> -----Original Message----- > > >> From: Jon Hunter <jonathanh@nvidia.com> > > >> Sent: 2021年5月7日 22:22 > > >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jakub Kicinski > > >> <kuba@kernel.org> > > >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > > >> joabreu@synopsys.com; davem@davemloft.net; > > mcoquelin.stm32@gmail.com; > > >> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > > >> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > > >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > > >> descriptor when STMMAC resume > > >> > > >> Hi Joakim, > > >> > > >> On 06/05/2021 07:33, Joakim Zhang wrote: > > >>> > > >>>> -----Original Message----- > > >>>> From: Jon Hunter <jonathanh@nvidia.com> > > >>>> Sent: 2021年4月23日 21:48 > > >>>> To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang > > >>>> <qiangqing.zhang@nxp.com> > > >>>> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > > >>>> joabreu@synopsys.com; davem@davemloft.net; > > >> mcoquelin.stm32@gmail.com; > > >>>> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > > >>>> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > > >>>> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > > >>>> descriptor when STMMAC resume > > >>>> > > >>>> > > >>>> On 22/04/2021 16:56, Jakub Kicinski wrote: > > >>>>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: > > >>>>>> Could you please help review this patch? It's really beyond my > > >>>>>> comprehension, why this patch would affect Tegra186 Jetson TX2 > > board? > > >>>>> > > >>>>> Looks okay, please repost as non-RFC. > > >>>> > > >>>> > > >>>> I still have an issue with a board not being able to resume from > > >>>> suspend with this patch. Shouldn't we try to resolve that first? > > >>> > > >>> Hi Jon, > > >>> > > >>> Any updates about this? Could I repost as non-RFC? > > >> > > >> > > >> Sorry no updates from my end. Again, I don't see how we can post this > > >> as it introduces a regression for us. I am sorry that I am not able > > >> to help more here, but we have done some extensive testing on the > > >> current mainline without your change and I don't see any issues with > > >> regard to suspend/resume. Hence, this does not appear to fix any > > >> pre-existing issues. It is possible that we are not seeing them. > > >> > > >> At this point I think that we really need someone from Synopsys to > > >> help us understand that exact problem that you are experiencing so > > >> that we can ensure we have the necessary fix in place and if this is > > >> something that is applicable to all devices or not. > > > > > > This patch only removes modification of Rx descriptors when STMMAC > > resume back, IMHO, it should not affect system suspend/resume function. > > > Do you have any idea about Joh's issue or any acceptable solution to fix the > > issue I met? Thanks a lot! > > > > Joakim, don't you have a support contact at Synopsys who would be able to > > help or someone at NXP who was responsible for the MAC integration? > > We also have Synopsys engineers copied so presumably they could shed some > > light. > > I contacted Synopsys no substantive help was received, and integration guys from NXP is unavailable now. > > But, some hints has came out, seems a bit help. I found that the DMA width is 34 bits on i.MX8MP, this may different from many existing SoCs which integrated STMMAC. > > As I described in the commit message: > When system suspend: the rx descriptor is 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040 > When system resume: the rx descriptor modified to 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 > Since the DMA is 34 bits width, so desc0/desc1 indicates the buffer address, after system resume, the buffer address changed to 0x4000000000. > And the correct rx descriptor is 008 [0x00000000c4310080]: 0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x16511000. > So when DMA tried to access 0x4000000000, this valid address, would generate fatal bus error. Okay, that's interesting. If i.MX8MP supports only 34 address bits but the driver tries to set a DMA address of 0x4000000000, that's way out of the valid range. I suspect what might be happening is that the DMA mask isn't properly set for your device. There's in fact some code in the driver that deals with this. If you look at the implementation of stmmac_dvr_probe() in drivers/net/ethernet/stmicro/stmmac/stmmac_main.c around line 4980, there's a comment that actually mentions i.MX8MP and the 34 address bit limitation. Can you find out what that priv->plat->addr64 is set to on your system? Or alternatively find out what priv->dma_cap.addr64 ends up being set a few lines further down? That value is effectively used to set the DMA mask and if that's wrong it might explain why the driver is setting a bad DMA address. In fact, maybe that information is already in the kernel log. There's a dev_info() there that should print out something like: Using 34 bits DMA width in your case. If that says something other than 34 in there, it's very likely that this needs to be correctly set somewhere. Looking at the code in dwmac-imx.c, I see that that's already set to 34, so this looks like it should be setting things correctly, but better make sure. > But for other 32 bits width DMA, DMA seems still can work when this issue happened, only desc0 indicates buffer address, so the buffer address is 0x0 when system resume. > And there is a NOTE in the guide: > In the Receive Descriptor (Read Format), if the Buffer Address > field is all 0s, the module does not transfer data to that buffer > and skips to the next buffer or next descriptor. > For this note, I don't know what could IP actually do, when detect all zeros buffer address, it will change the descriptor to application own? If not, STMMAC driver seems can't handle this case. > I will contact Synopsys guys for more details. > > It now appears that this issue seems only can be reproduced on DMA width more than 32 bits, this may be why other SoCs(e.g. i.MX8DXL) which integrated the same STMMAC IP can't reproduce it. On Tegra186 and later we support up to 40 address bits. The newer Tegra194 has a special quirk where bit 39 has special meaning, so we have to override the DMA mask as well. I recall that this was causing issues at some point, which is why I suspect something like this could be happening in your case as well. Thierry
Hi Thierry, > -----Original Message----- > From: Thierry Reding <treding@nvidia.com> > Sent: 2021年5月17日 15:53 > To: Joakim Zhang <qiangqing.zhang@nxp.com> > Cc: Florian Fainelli <f.fainelli@gmail.com>; Jon Hunter > <jonathanh@nvidia.com>; Jakub Kicinski <kuba@kernel.org>; > peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > joabreu@synopsys.com; davem@davemloft.net; > mcoquelin.stm32@gmail.com; andrew@lunn.ch; dl-linux-imx > <linux-imx@nxp.com>; netdev@vger.kernel.org > Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when > STMMAC resume > > On Mon, May 10, 2021 at 02:10:21AM +0000, Joakim Zhang wrote: > > > > Hi Florian, > > > > > -----Original Message----- > > > From: Florian Fainelli <f.fainelli@gmail.com> > > > Sent: 2021年5月8日 23:42 > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jon Hunter > > > <jonathanh@nvidia.com>; Jakub Kicinski <kuba@kernel.org> > > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > > > joabreu@synopsys.com; davem@davemloft.net; > > > mcoquelin.stm32@gmail.com; andrew@lunn.ch; dl-linux-imx > > > <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > > > Subject: Re: [RFC net-next] net: stmmac: should not modify RX > > > descriptor when STMMAC resume > > > > > > > > > > > > On 5/8/2021 4:20 AM, Joakim Zhang wrote: > > > > > > > > Hi Jakub, > > > > > > > >> -----Original Message----- > > > >> From: Jon Hunter <jonathanh@nvidia.com> > > > >> Sent: 2021年5月7日 22:22 > > > >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; Jakub Kicinski > > > >> <kuba@kernel.org> > > > >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > > > >> joabreu@synopsys.com; davem@davemloft.net; > > > mcoquelin.stm32@gmail.com; > > > >> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > > > >> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > > > >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > > > >> descriptor when STMMAC resume > > > >> > > > >> Hi Joakim, > > > >> > > > >> On 06/05/2021 07:33, Joakim Zhang wrote: > > > >>> > > > >>>> -----Original Message----- > > > >>>> From: Jon Hunter <jonathanh@nvidia.com> > > > >>>> Sent: 2021年4月23日 21:48 > > > >>>> To: Jakub Kicinski <kuba@kernel.org>; Joakim Zhang > > > >>>> <qiangqing.zhang@nxp.com> > > > >>>> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com; > > > >>>> joabreu@synopsys.com; davem@davemloft.net; > > > >> mcoquelin.stm32@gmail.com; > > > >>>> andrew@lunn.ch; f.fainelli@gmail.com; dl-linux-imx > > > >>>> <linux-imx@nxp.com>; treding@nvidia.com; netdev@vger.kernel.org > > > >>>> Subject: Re: [RFC net-next] net: stmmac: should not modify RX > > > >>>> descriptor when STMMAC resume > > > >>>> > > > >>>> > > > >>>> On 22/04/2021 16:56, Jakub Kicinski wrote: > > > >>>>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote: > > > >>>>>> Could you please help review this patch? It's really beyond > > > >>>>>> my comprehension, why this patch would affect Tegra186 Jetson > > > >>>>>> TX2 > > > board? > > > >>>>> > > > >>>>> Looks okay, please repost as non-RFC. > > > >>>> > > > >>>> > > > >>>> I still have an issue with a board not being able to resume > > > >>>> from suspend with this patch. Shouldn't we try to resolve that first? > > > >>> > > > >>> Hi Jon, > > > >>> > > > >>> Any updates about this? Could I repost as non-RFC? > > > >> > > > >> > > > >> Sorry no updates from my end. Again, I don't see how we can post > > > >> this as it introduces a regression for us. I am sorry that I am > > > >> not able to help more here, but we have done some extensive > > > >> testing on the current mainline without your change and I don't > > > >> see any issues with regard to suspend/resume. Hence, this does > > > >> not appear to fix any pre-existing issues. It is possible that we are not > seeing them. > > > >> > > > >> At this point I think that we really need someone from Synopsys > > > >> to help us understand that exact problem that you are > > > >> experiencing so that we can ensure we have the necessary fix in > > > >> place and if this is something that is applicable to all devices or not. > > > > > > > > This patch only removes modification of Rx descriptors when STMMAC > > > resume back, IMHO, it should not affect system suspend/resume function. > > > > Do you have any idea about Joh's issue or any acceptable solution > > > > to fix the > > > issue I met? Thanks a lot! > > > > > > Joakim, don't you have a support contact at Synopsys who would be > > > able to help or someone at NXP who was responsible for the MAC > integration? > > > We also have Synopsys engineers copied so presumably they could shed > > > some light. > > > > I contacted Synopsys no substantive help was received, and integration guys > from NXP is unavailable now. > > > > But, some hints has came out, seems a bit help. I found that the DMA width > is 34 bits on i.MX8MP, this may different from many existing SoCs which > integrated STMMAC. > > > > As I described in the commit message: > > When system suspend: the rx descriptor is 008 [0x00000000c4310080]: > > 0x0 0x40 0x0 0x34010040 When system resume: the rx descriptor modified > > to 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 Since the DMA is 34 > bits width, so desc0/desc1 indicates the buffer address, after system resume, > the buffer address changed to 0x4000000000. > > And the correct rx descriptor is 008 [0x00000000c4310080]: 0x6511000 0x1 > 0x0 0x81000000, the valid buffer address is 0x16511000. > > So when DMA tried to access 0x4000000000, this valid address, would > generate fatal bus error. > > Okay, that's interesting. If i.MX8MP supports only 34 address bits but the > driver tries to set a DMA address of 0x4000000000, that's way out of the valid > range. > > I suspect what might be happening is that the DMA mask isn't properly set for > your device. There's in fact some code in the driver that deals with this. If you > look at the implementation of stmmac_dvr_probe() in > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c around line 4980, > there's a comment that actually mentions i.MX8MP and the 34 address bit > limitation. Can you find out what that priv->plat->addr64 is set to on your > system? Yes, I know STMMAC driver has taken DMA width into account, and you can see that DMA width is set to 34 bits in dwmac-imx.c (.addr_width = 34,). You also can see that the valid DMA address is 0x106511000, which is within 34bits. But the valid DMA address is 0x4000000000 after system resume, which is out of 34 bits range. The reason here is that we modify the rx descriptor when system resume. > Or alternatively find out what priv->dma_cap.addr64 ends up being set a few > lines further down? That value is effectively used to set the DMA mask and if > that's wrong it might explain why the driver is setting a bad DMA address. As I described above, the DMA address allocated when initialized is large than 32bits, MAC can play well in normal case. So this should be impossible. > In fact, maybe that information is already in the kernel log. There's a > dev_info() there that should print out something like: > > Using 34 bits DMA width Yes, we can this log: [ 2.376903] imx-dwmac 30bf0000.ethernet: Using 34 bits DMA width > in your case. If that says something other than 34 in there, it's very likely that > this needs to be correctly set somewhere. Looking at the code in dwmac-imx.c, > I see that that's already set to 34, so this looks like it should be setting things > correctly, but better make sure. Yes, now we make sure the DMA mask is correct. > > But for other 32 bits width DMA, DMA seems still can work when this issue > happened, only desc0 indicates buffer address, so the buffer address is 0x0 > when system resume. > > And there is a NOTE in the guide: > > In the Receive Descriptor (Read Format), if the Buffer Address field > > is all 0s, the module does not transfer data to that buffer and skips > > to the next buffer or next descriptor. > > For this note, I don't know what could IP actually do, when detect all zeros > buffer address, it will change the descriptor to application own? If not, > STMMAC driver seems can't handle this case. > > I will contact Synopsys guys for more details. > > > > It now appears that this issue seems only can be reproduced on DMA width > more than 32 bits, this may be why other SoCs(e.g. i.MX8DXL) which integrated > the same STMMAC IP can't reproduce it. > > On Tegra186 and later we support up to 40 address bits. The newer > Tegra194 has a special quirk where bit 39 has special meaning, so we have to > override the DMA mask as well. I recall that this was causing issues at some > point, which is why I suspect something like this could be happening in your > case as well. I am not quite understand what you means? Do you mean that our 34bits DMA width also has a special meaning? Thanks much Thierry for helping analyzing this issue. As I described in the commit message, we should not _only_ change the rx descriptors to DMA own and let other parts of rx descriptors not updated. So could you please help check why this RFC would make regression at you side? Why system can't resume back? Best Regards, Joakim Zhang > Thierry
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 9f396648d76f..b784304a22e8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7186,6 +7186,8 @@ static void stmmac_reset_queues_param(struct stmmac_priv *priv) tx_q->mss = 0; netdev_tx_reset_queue(netdev_get_tx_queue(priv->dev, queue)); + + stmmac_clear_tx_descriptors(priv, queue); } } @@ -7250,7 +7252,6 @@ int stmmac_resume(struct device *dev) stmmac_reset_queues_param(priv); stmmac_free_tx_skbufs(priv); - stmmac_clear_descriptors(priv); stmmac_hw_setup(ndev, false); stmmac_init_coalesce(priv);
When system resume back, STMMAC will clear RX descriptors: stmmac_resume() ->stmmac_clear_descriptors() ->stmmac_clear_rx_descriptors() ->stmmac_init_rx_desc() ->dwmac4_set_rx_owner() //p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR); It only assets OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2 fields. Let's take a case into account, when system suspend, it is possible that there are packets have not received yet, so the RX descriptors are wrote back by DMA, e.g. 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040 When system resume back, after above process, it became a broken descriptor: 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 The issue is that it only changes the owner of this descriptor, but do nothing about desc0/1/2 fields. The descriptor of STMMAC a bit special, applicaton prepares RX descriptors for DMA, after DMA recevie the packets, it will write back the descriptors, so the same field of a descriptor have different meanings to application and DMA. It should be a software bug there, and may not easy to reproduce, but there is a certain probability that it will occur. Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume back") tried to re-init desc0/desc1 (buffer address fields) to fix this issue, but it is not a proper solution, and made regression on Jetson TX2 boards. It is unreasonable to modify RX descriptors outside of stmmac_rx_refill() function, where it will clear all desc0/desc1/desc2/desc3 fields together. This patch removes RX descriptors modification when STMMAC resume. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)