[1/2] mwifiex: Use non-posted PCI register writes

Message ID	20210830123704.221494-2-verdre@v0yd.nl
State	New
Headers	show Return-Path: <linux-wireless-owner@kernel.org> From: =?utf-8?q?Jonas_Dre=C3=9Fler?= <verdre@v0yd.nl> To: Amitkumar Karwar <amitkarwar@gmail.com>, Ganapathi Bhat <ganapathi017@gmail.com>, Xinming Hu <huxinming820@gmail.com>, Kalle Valo <kvalo@codeaurora.org>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org> Cc: =?utf-8?q?Jonas_Dre=C3=9Fler?= <verdre@v0yd.nl>, Tsuchiya Yuto <kitakar@gmail.com>, linux-wireless@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, Maximilian Luz <luzmaximilian@gmail.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Bjorn Helgaas <bhelgaas@google.com>, =?utf-8?q?Pali_Roh=C3=A1r?= <pali@kernel.org> Subject: [PATCH 1/2] mwifiex: Use non-posted PCI register writes Date: Mon, 30 Aug 2021 14:37:03 +0200 Message-Id: <20210830123704.221494-2-verdre@v0yd.nl> In-Reply-To: <20210830123704.221494-1-verdre@v0yd.nl> References: <20210830123704.221494-1-verdre@v0yd.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	mwifiex: Work around firmware bugs on 88W8897 chip \| expand [0/2] mwifiex: Work around firmware bugs on 88W8897 chip [1/2] mwifiex: Use non-posted PCI register writes [2/2] mwifiex: Try waking the firmware until we get an interrupt

Jonas Dreßler Aug. 30, 2021, 12:37 p.m. UTC

On the 88W8897 card it's very important the TX ring write pointer is
updated correctly to its new value before setting the TX ready
interrupt, otherwise the firmware appears to crash (probably because
it's trying to DMA-read from the wrong place).

Since PCI uses "posted writes" when writing to a register, it's not
guaranteed that a write will happen immediately. That means the pointer
might be outdated when setting the TX ready interrupt, leading to
firmware crashes especially when ASPM L1 and L1 substates are enabled
(because of the higher link latency, the write will probably take
longer).

So fix those firmware crashes by always forcing non-posted writes. We do
that by simply reading back the register after writing it, just as a lot
of other drivers do.

There are two reproducers that are fixed with this patch:

1) During rx/tx traffic and with ASPM L1 substates enabled (the enabled
substates are platform dependent), the firmware crashes and eventually a
command timeout appears in the logs. That crash is fixed by using a
non-posted write in mwifiex_pcie_send_data().

2) When sending lots of commands to the card, waking it up from sleep in
very quick intervals, the firmware eventually crashes. That crash
appears to be fixed by some other non-posted write included here.

Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 6 ++++++
 1 file changed, 6 insertions(+)

Andy Shevchenko Sept. 1, 2021, 3:47 p.m. UTC | #1

On Wed, Sep 1, 2021 at 5:02 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
> On 8/30/21 2:49 PM, Andy Shevchenko wrote:

>  > On Mon, Aug 30, 2021 at 3:38 PM Jonas Dreßler <verdre@v0yd.nl> wrote:


...

>  > Thanks for all this work!

>  >

>  > Nevertheless, do we have any commits that may be a good candidate to

>  > be in the Fixes tag here?


> I don't think there's any commit we could point to, given that the bug

> is probably somewhere in the firmware code.


Here (in the commit message)...

>  >> +       /* Do a read-back, which makes the write non-posted,

> ensuring the

>  >> +        * completion before returning.

>  >

>  >> +        * The firmware of the 88W8897 card is buggy and this avoids

> crashes.

>  >

>  > Any firmware version reference? Would be nice to have just for the

>  > sake of record.

>

> Pretty sure the crash is present in every firmware that has been

> released, I've tried most of them. FTR, the current firmware version is

> 15.68.19.p21.


...and here it would be nice to state this version, so in the future
we will have a right landmark.

>  >> +        */


-- 
With Best Regards,
Andy Shevchenko

Pali Rohár Sept. 1, 2021, 3:51 p.m. UTC | #2

On Wednesday 01 September 2021 16:01:54 Jonas Dreßler wrote:
> On 8/30/21 2:49 PM, Andy Shevchenko wrote:

> > On Mon, Aug 30, 2021 at 3:38 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

> >>

> >> On the 88W8897 card it's very important the TX ring write pointer is

> >> updated correctly to its new value before setting the TX ready

> >> interrupt, otherwise the firmware appears to crash (probably because

> >> it's trying to DMA-read from the wrong place).

> >>

> >> Since PCI uses "posted writes" when writing to a register, it's not

> >> guaranteed that a write will happen immediately. That means the pointer

> >> might be outdated when setting the TX ready interrupt, leading to

> >> firmware crashes especially when ASPM L1 and L1 substates are enabled

> >> (because of the higher link latency, the write will probably take

> >> longer).

> >>

> >> So fix those firmware crashes by always forcing non-posted writes. We do

> >> that by simply reading back the register after writing it, just as a lot

> >> of other drivers do.

> >>

> >> There are two reproducers that are fixed with this patch:

> >>

> >> 1) During rx/tx traffic and with ASPM L1 substates enabled (the enabled

> >> substates are platform dependent), the firmware crashes and eventually a

> >> command timeout appears in the logs. That crash is fixed by using a

> >> non-posted write in mwifiex_pcie_send_data().

> >>

> >> 2) When sending lots of commands to the card, waking it up from sleep in

> >> very quick intervals, the firmware eventually crashes. That crash

> >> appears to be fixed by some other non-posted write included here.

> >

> > Thanks for all this work!

> >

> > Nevertheless, do we have any commits that may be a good candidate to

> > be in the Fixes tag here?

> >

> 

> I don't think there's any commit we could point to, given that the bug is

> probably somewhere in the firmware code.


Then please add Cc: stable@vger.kernel.org tag into commit message. Such
bugfix is a good candidate for backporting into stable releases.

> >> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>

Heiner Kallweit Sept. 1, 2021, 4:51 p.m. UTC | #3

On 01.09.2021 17:51, Pali Rohár wrote:
> On Wednesday 01 September 2021 16:01:54 Jonas Dreßler wrote:

>> On 8/30/21 2:49 PM, Andy Shevchenko wrote:

>>> On Mon, Aug 30, 2021 at 3:38 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

>>>>

>>>> On the 88W8897 card it's very important the TX ring write pointer is

>>>> updated correctly to its new value before setting the TX ready

>>>> interrupt, otherwise the firmware appears to crash (probably because

>>>> it's trying to DMA-read from the wrong place).

>>>>


This sounds somehow like the typical case where you write DMA descriptors
and then ring the doorbell. This normally requires a dma_wmb().
Maybe something like that is missing here?

Reading back all register writes may cause a certain performance impact,
and if it can be avoided we should try to avoid it.

>>>> Since PCI uses "posted writes" when writing to a register, it's not

>>>> guaranteed that a write will happen immediately. That means the pointer

>>>> might be outdated when setting the TX ready interrupt, leading to

>>>> firmware crashes especially when ASPM L1 and L1 substates are enabled

>>>> (because of the higher link latency, the write will probably take

>>>> longer).

>>>>

>>>> So fix those firmware crashes by always forcing non-posted writes. We do

>>>> that by simply reading back the register after writing it, just as a lot

>>>> of other drivers do.

>>>>

>>>> There are two reproducers that are fixed with this patch:

>>>>

>>>> 1) During rx/tx traffic and with ASPM L1 substates enabled (the enabled

>>>> substates are platform dependent), the firmware crashes and eventually a

>>>> command timeout appears in the logs. That crash is fixed by using a

>>>> non-posted write in mwifiex_pcie_send_data().

>>>>

>>>> 2) When sending lots of commands to the card, waking it up from sleep in

>>>> very quick intervals, the firmware eventually crashes. That crash

>>>> appears to be fixed by some other non-posted write included here.

>>>

>>> Thanks for all this work!

>>>

>>> Nevertheless, do we have any commits that may be a good candidate to

>>> be in the Fixes tag here?

>>>

>>

>> I don't think there's any commit we could point to, given that the bug is

>> probably somewhere in the firmware code.

> 

> Then please add Cc: stable@vger.kernel.org tag into commit message. Such

> bugfix is a good candidate for backporting into stable releases.

> 

>>>> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>

Johannes Berg Sept. 1, 2021, 5:07 p.m. UTC | #4

On Wed, 2021-09-01 at 18:51 +0200, Heiner Kallweit wrote:
> On 01.09.2021 17:51, Pali Rohár wrote:

> > On Wednesday 01 September 2021 16:01:54 Jonas Dreßler wrote:

> > > On 8/30/21 2:49 PM, Andy Shevchenko wrote:

> > > > On Mon, Aug 30, 2021 at 3:38 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

> > > > > 

> > > > > On the 88W8897 card it's very important the TX ring write pointer is

> > > > > updated correctly to its new value before setting the TX ready

> > > > > interrupt, otherwise the firmware appears to crash (probably because

> > > > > it's trying to DMA-read from the wrong place).

> > > > > 

> 

> This sounds somehow like the typical case where you write DMA descriptors

> and then ring the doorbell. This normally requires a dma_wmb().

> Maybe something like that is missing here?


But it looks like this "TX ring write pointer" is actually the register?

However, I would agree that doing it in mwifiex_write_reg() is possibly
too big a hammer - could be done only for reg->tx_wrptr, not all the
registers?

Actually, can two writes actually cross on PCI?

johannes

Brian Norris Sept. 1, 2021, 7:40 p.m. UTC | #5

On Mon, Aug 30, 2021 at 5:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>

> On the 88W8897 card it's very important the TX ring write pointer is

> updated correctly to its new value before setting the TX ready

> interrupt, otherwise the firmware appears to crash (probably because

> it's trying to DMA-read from the wrong place).

>

> Since PCI uses "posted writes" when writing to a register, it's not

> guaranteed that a write will happen immediately. That means the pointer

> might be outdated when setting the TX ready interrupt, leading to

> firmware crashes especially when ASPM L1 and L1 substates are enabled

> (because of the higher link latency, the write will probably take

> longer).

>

> So fix those firmware crashes by always forcing non-posted writes. We do

> that by simply reading back the register after writing it, just as a lot

> of other drivers do.

>

> There are two reproducers that are fixed with this patch:

>

> 1) During rx/tx traffic and with ASPM L1 substates enabled (the enabled

> substates are platform dependent), the firmware crashes and eventually a

> command timeout appears in the logs. That crash is fixed by using a

> non-posted write in mwifiex_pcie_send_data().

>

> 2) When sending lots of commands to the card, waking it up from sleep in

> very quick intervals, the firmware eventually crashes. That crash

> appears to be fixed by some other non-posted write included here.

>

> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>


This might be good for many cases, but please read this commit:

https://git.kernel.org/linus/062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3
mwifiex: pcie: use posted write to wake up firmware

It's very much intentional that this is a posted write in some cases.

Without ensuring this doesn't regress, NAK from me.

Brian

Andy Shevchenko Sept. 1, 2021, 8:40 p.m. UTC | #6

On Wed, Sep 1, 2021 at 11:25 PM Brian Norris <briannorris@chromium.org> wrote:
> On Mon, Aug 30, 2021 at 5:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:

...

> This might be good for many cases, but please read this commit:
>
> https://git.kernel.org/linus/062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3
> mwifiex: pcie: use posted write to wake up firmware
>
> It's very much intentional that this is a posted write in some cases.
>
> Without ensuring this doesn't regress, NAK from me.

Can you ensure that from Chrome / Google perspective, please?

Brian Norris Sept. 1, 2021, 9:04 p.m. UTC | #7

On Wed, Sep 1, 2021 at 1:41 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Wed, Sep 1, 2021 at 11:25 PM Brian Norris <briannorris@chromium.org> wrote:
> > On Mon, Aug 30, 2021 at 5:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> ...
>
> > This might be good for many cases, but please read this commit:
> >
> > https://git.kernel.org/linus/062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3
> > mwifiex: pcie: use posted write to wake up firmware
> >
> > It's very much intentional that this is a posted write in some cases.
> >
> > Without ensuring this doesn't regress, NAK from me.
>
> Can you ensure that from Chrome / Google perspective, please?

I mean, the context link should be pretty obvious. But sure: with this patch:

(a) all my least-favorite audio issues to debug return (audio is
especially latency-sensitive on the Rockchip RK3399 systems I have)
(b) latency spikes to ~6ms:
# trace-cmd record -p function_graph -O funcgraph-abstime -l
mwifiex_pm_wakeup_card
# trace-cmd report
   kworker/u13:0-199   [003]   348.987306: funcgraph_entry:      #
6219.500 us |  mwifiex_pm_wakeup_card();
   kworker/u13:0-199   [003]   349.316312: funcgraph_entry:      #
6267.625 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [001]   352.238530: funcgraph_entry:      #
6184.250 us |  mwifiex_pm_wakeup_card();
   kworker/u13:0-199   [002]   356.626366: funcgraph_entry:      #
6553.166 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [002]   356.709389: funcgraph_entry:      #
6212.500 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [002]   356.847215: funcgraph_entry:      #
6230.292 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [000]   356.897576: funcgraph_entry:      #
6451.667 us |  mwifiex_pm_wakeup_card();
   kworker/u13:0-199   [004]   357.175025: funcgraph_entry:      #
6204.042 us |  mwifiex_pm_wakeup_card();

whereas it used to look more like:

   kworker/u13:1-173   [005]   212.230542: funcgraph_entry:
7.000 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [005]   213.886063: funcgraph_entry:
9.334 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [002]   214.473273: funcgraph_entry:      +
11.375 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [005]   214.530705: funcgraph_entry:
5.542 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:1-173   [002]   215.050168: funcgraph_entry:      +
13.125 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:1-173   [002]   215.106492: funcgraph_entry:      +
11.959 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [005]   215.484807: funcgraph_entry:
8.459 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:1-173   [003]   215.515238: funcgraph_entry:      +
15.166 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [001]   217.175691: funcgraph_entry:      +
11.083 us  |  mwifiex_pm_wakeup_card();

I think it would be enough to only modify mwifiex_pm_wakeup_card() to
retain the posted version, but I'm not sure how that fits in with the
rest of the author's work. I suppose I can give a later version a run
if needed.

Brian

Brian Norris Sept. 1, 2021, 9:07 p.m. UTC | #8

Apologies for the brain-dead mailer. I forget that I should only reply
via web when I _want_ text wrapping:

On Wed, Sep 01, 2021 at 02:04:04PM -0700, Brian Norris wrote:
> (b) latency spikes to ~6ms:

> # trace-cmd record -p function_graph -O funcgraph-abstime -l

> mwifiex_pm_wakeup_card

> # trace-cmd report

>    kworker/u13:0-199   [003]   348.987306: funcgraph_entry:      #

> 6219.500 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:0-199   [003]   349.316312: funcgraph_entry:      #

> 6267.625 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-4057  [001]   352.238530: funcgraph_entry:      #

> 6184.250 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:0-199   [002]   356.626366: funcgraph_entry:      #

> 6553.166 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-4057  [002]   356.709389: funcgraph_entry:      #

> 6212.500 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-4057  [002]   356.847215: funcgraph_entry:      #

> 6230.292 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-4057  [000]   356.897576: funcgraph_entry:      #

> 6451.667 us |  mwifiex_pm_wakeup_card();

>    kworker/u13:0-199   [004]   357.175025: funcgraph_entry:      #

> 6204.042 us |  mwifiex_pm_wakeup_card();

> 

> whereas it used to look more like:

> 

>    kworker/u13:1-173   [005]   212.230542: funcgraph_entry:

> 7.000 us   |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-1768  [005]   213.886063: funcgraph_entry:

> 9.334 us   |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-1768  [002]   214.473273: funcgraph_entry:      +

> 11.375 us  |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-1768  [005]   214.530705: funcgraph_entry:

> 5.542 us   |  mwifiex_pm_wakeup_card();

>    kworker/u13:1-173   [002]   215.050168: funcgraph_entry:      +

> 13.125 us  |  mwifiex_pm_wakeup_card();

>    kworker/u13:1-173   [002]   215.106492: funcgraph_entry:      +

> 11.959 us  |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-1768  [005]   215.484807: funcgraph_entry:

> 8.459 us   |  mwifiex_pm_wakeup_card();

>    kworker/u13:1-173   [003]   215.515238: funcgraph_entry:      +

> 15.166 us  |  mwifiex_pm_wakeup_card();

>    kworker/u13:3-1768  [001]   217.175691: funcgraph_entry:      +

> 11.083 us  |  mwifiex_pm_wakeup_card();


That should read:

# trace-cmd record -p function_graph -O funcgraph-abstime -l mwifiex_pm_wakeup_card
# trace-cmd report
   kworker/u13:0-199   [003]   348.987306: funcgraph_entry:      # 6219.500 us |  mwifiex_pm_wakeup_card();
   kworker/u13:0-199   [003]   349.316312: funcgraph_entry:      # 6267.625 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [001]   352.238530: funcgraph_entry:      # 6184.250 us |  mwifiex_pm_wakeup_card();
   kworker/u13:0-199   [002]   356.626366: funcgraph_entry:      # 6553.166 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [002]   356.709389: funcgraph_entry:      # 6212.500 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [002]   356.847215: funcgraph_entry:      # 6230.292 us |  mwifiex_pm_wakeup_card();
   kworker/u13:3-4057  [000]   356.897576: funcgraph_entry:      # 6451.667 us |  mwifiex_pm_wakeup_card();
   kworker/u13:0-199   [004]   357.175025: funcgraph_entry:      # 6204.042 us |  mwifiex_pm_wakeup_card();

vs.

   kworker/u13:1-173   [005]   212.230542: funcgraph_entry:        7.000 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [005]   213.886063: funcgraph_entry:        9.334 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [002]   214.473273: funcgraph_entry:      + 11.375 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [005]   214.530705: funcgraph_entry:        5.542 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:1-173   [002]   215.050168: funcgraph_entry:      + 13.125 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:1-173   [002]   215.106492: funcgraph_entry:      + 11.959 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [005]   215.484807: funcgraph_entry:        8.459 us   |  mwifiex_pm_wakeup_card();
   kworker/u13:1-173   [003]   215.515238: funcgraph_entry:      + 15.166 us  |  mwifiex_pm_wakeup_card();
   kworker/u13:3-1768  [001]   217.175691: funcgraph_entry:      + 11.083 us  |  mwifiex_pm_wakeup_card();

Brian

Bjorn Helgaas Sept. 1, 2021, 10:41 p.m. UTC | #9

On Wed, Sep 01, 2021 at 07:07:58PM +0200, Johannes Berg wrote:
> On Wed, 2021-09-01 at 18:51 +0200, Heiner Kallweit wrote:
> > On 01.09.2021 17:51, Pali Rohár wrote:
> > > On Wednesday 01 September 2021 16:01:54 Jonas Dreßler wrote:
> > > > On 8/30/21 2:49 PM, Andy Shevchenko wrote:
> > > > > On Mon, Aug 30, 2021 at 3:38 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
> > > > > > 
> > > > > > On the 88W8897 card it's very important the TX ring write pointer is
> > > > > > updated correctly to its new value before setting the TX ready
> > > > > > interrupt, otherwise the firmware appears to crash (probably because
> > > > > > it's trying to DMA-read from the wrong place).
> > > > > > 
> > 
> > This sounds somehow like the typical case where you write DMA descriptors
> > and then ring the doorbell. This normally requires a dma_wmb().
> > Maybe something like that is missing here?
> 
> But it looks like this "TX ring write pointer" is actually the register?
> 
> However, I would agree that doing it in mwifiex_write_reg() is possibly
> too big a hammer - could be done only for reg->tx_wrptr, not all the
> registers?
> 
> Actually, can two writes actually cross on PCI?

Per PCIe r5.0, sec 2.4.1,

  A2a  A Posted Request must not pass another Posted Request unless A2b
       applies.

  A2b  A Posted Request with RO Set is permitted to pass another
       Posted Request.  A Posted Request with IDO Set is permitted to
       pass another Posted Request if the two Requester IDs are
       different or if both Requests contain a PASID TLP Prefix and
       the two PASID values are different.

A few drivers enable RO (Relaxed Ordering) for their devices, which
means the *device* is permitted to set the RO bit in transactions it
initiates.

BUt IIUC we're talking about MMIO writes initiated by a CPU, and they
won't have the RO bit set unless the Root Port has Relaxed Ordering
enabled, and Linux generally does not enable that.  So A2a should
apply, and writes should be ordered on PCI.

There are a few wrinkles that I worry about:

  d1e714db8129 ("mtip32xx: Fix ERO and NoSnoop values in PCIe upstream
  on AMD systems") [1] turns off RO for some AMD Root Ports, which
  makes me think BIOS might be enabling RO in these Root Ports.

  c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5
  Completion erratum") [2] turns off RO for all Root Ports leading to
  Chelsio T5 devices, which again makes me think there's firmware that
  enables RO in Root Ports.  Follow-up [3].

  77ffc1465cec ("tegra: add PCI Express support") [4] (see
  tegra_pcie_relax_enable()) enables RO for Tegra Root Ports due to
  some hardware issue.  I don't whether these Root Ports every
  actually *set* RO in the PCIe transactions they generate.  Follow-up
  [5].

These concern me because I don't think we have a way for drivers to
specify whether their writes should use strong ordering or relaxed
ordering, and I think they depend on strong ordering.  If Root Ports
have RO enabled, I think we are at risk, so I suspect Linux should
actively *disable* RO for Root Ports.

[1] https://git.kernel.org/linus/d1e714db8129
[2] https://git.kernel.org/linus/c56d4450eb68
[3] https://lore.kernel.org/r/20210901222353.GA251391@bjorn-Precision-5520
[4] https://git.kernel.org/linus/77ffc1465cec
[5] https://lore.kernel.org/r/20210901204045.GA236987@bjorn-Precision-5520

Bjorn Helgaas Sept. 2, 2021, 2:05 p.m. UTC | #10

[+cc Keith]

On Wed, Sep 01, 2021 at 05:41:56PM -0500, Bjorn Helgaas wrote:
> On Wed, Sep 01, 2021 at 07:07:58PM +0200, Johannes Berg wrote:

> > On Wed, 2021-09-01 at 18:51 +0200, Heiner Kallweit wrote:

> > > On 01.09.2021 17:51, Pali Rohár wrote:

> > > > On Wednesday 01 September 2021 16:01:54 Jonas Dreßler wrote:

> > > > > On 8/30/21 2:49 PM, Andy Shevchenko wrote:

> > > > > > On Mon, Aug 30, 2021 at 3:38 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

> > > > > > > 

> > > > > > > On the 88W8897 card it's very important the TX ring write pointer is

> > > > > > > updated correctly to its new value before setting the TX ready

> > > > > > > interrupt, otherwise the firmware appears to crash (probably because

> > > > > > > it's trying to DMA-read from the wrong place).

> > > > > > > 

> > > 

> > > This sounds somehow like the typical case where you write DMA descriptors

> > > and then ring the doorbell. This normally requires a dma_wmb().

> > > Maybe something like that is missing here?

> > 

> > But it looks like this "TX ring write pointer" is actually the register?

> > 

> > However, I would agree that doing it in mwifiex_write_reg() is possibly

> > too big a hammer - could be done only for reg->tx_wrptr, not all the

> > registers?

> > 

> > Actually, can two writes actually cross on PCI?

> 

> Per PCIe r5.0, sec 2.4.1,

> 

>   A2a  A Posted Request must not pass another Posted Request unless A2b

>        applies.

> 

>   A2b  A Posted Request with RO Set is permitted to pass another

>        Posted Request.  A Posted Request with IDO Set is permitted to

>        pass another Posted Request if the two Requester IDs are

>        different or if both Requests contain a PASID TLP Prefix and

>        the two PASID values are different.

> 

> A few drivers enable RO (Relaxed Ordering) for their devices, which

> means the *device* is permitted to set the RO bit in transactions it

> initiates.

> 

> BUt IIUC we're talking about MMIO writes initiated by a CPU, and they

> won't have the RO bit set unless the Root Port has Relaxed Ordering

> enabled, and Linux generally does not enable that.  So A2a should

> apply, and writes should be ordered on PCI.

> 

> There are a few wrinkles that I worry about:

> 

>   d1e714db8129 ("mtip32xx: Fix ERO and NoSnoop values in PCIe upstream

>   on AMD systems") [1] turns off RO for some AMD Root Ports, which

>   makes me think BIOS might be enabling RO in these Root Ports.

> 

>   c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5

>   Completion erratum") [2] turns off RO for all Root Ports leading to

>   Chelsio T5 devices, which again makes me think there's firmware that

>   enables RO in Root Ports.  Follow-up [3].

> 

>   77ffc1465cec ("tegra: add PCI Express support") [4] (see

>   tegra_pcie_relax_enable()) enables RO for Tegra Root Ports due to

>   some hardware issue.  I don't whether these Root Ports every

>   actually *set* RO in the PCIe transactions they generate.  Follow-up

>   [5].

> 

> These concern me because I don't think we have a way for drivers to

> specify whether their writes should use strong ordering or relaxed

> ordering, and I think they depend on strong ordering.


Sorry, I was completely wrong about this part.  Keith Busch reminded
me that writel_relaxed() is an obvious way for drivers to specify that
they don't need strong ordering.

I apologize for the distraction.

> If Root Ports have RO enabled, I think we are at risk, so I suspect

> Linux should actively *disable* RO for Root Ports.

> 

> [1] https://git.kernel.org/linus/d1e714db8129

> [2] https://git.kernel.org/linus/c56d4450eb68

> [3] https://lore.kernel.org/r/20210901222353.GA251391@bjorn-Precision-5520

> [4] https://git.kernel.org/linus/77ffc1465cec

> [5] https://lore.kernel.org/r/20210901204045.GA236987@bjorn-Precision-5520

Jonas Dreßler Sept. 18, 2021, 7:37 a.m. UTC | #11

On 9/1/21 11:07 PM, Brian Norris wrote:
> Apologies for the brain-dead mailer. I forget that I should only reply

> via web when I _want_ text wrapping:

> 

> On Wed, Sep 01, 2021 at 02:04:04PM -0700, Brian Norris wrote:

>> (b) latency spikes to ~6ms:

>> # trace-cmd record -p function_graph -O funcgraph-abstime -l

>> mwifiex_pm_wakeup_card

>> # trace-cmd report

>>     kworker/u13:0-199   [003]   348.987306: funcgraph_entry:      #

>> 6219.500 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:0-199   [003]   349.316312: funcgraph_entry:      #

>> 6267.625 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-4057  [001]   352.238530: funcgraph_entry:      #

>> 6184.250 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:0-199   [002]   356.626366: funcgraph_entry:      #

>> 6553.166 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-4057  [002]   356.709389: funcgraph_entry:      #

>> 6212.500 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-4057  [002]   356.847215: funcgraph_entry:      #

>> 6230.292 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-4057  [000]   356.897576: funcgraph_entry:      #

>> 6451.667 us |  mwifiex_pm_wakeup_card();

>>     kworker/u13:0-199   [004]   357.175025: funcgraph_entry:      #

>> 6204.042 us |  mwifiex_pm_wakeup_card();

>>

>> whereas it used to look more like:

>>

>>     kworker/u13:1-173   [005]   212.230542: funcgraph_entry:

>> 7.000 us   |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-1768  [005]   213.886063: funcgraph_entry:

>> 9.334 us   |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-1768  [002]   214.473273: funcgraph_entry:      +

>> 11.375 us  |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-1768  [005]   214.530705: funcgraph_entry:

>> 5.542 us   |  mwifiex_pm_wakeup_card();

>>     kworker/u13:1-173   [002]   215.050168: funcgraph_entry:      +

>> 13.125 us  |  mwifiex_pm_wakeup_card();

>>     kworker/u13:1-173   [002]   215.106492: funcgraph_entry:      +

>> 11.959 us  |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-1768  [005]   215.484807: funcgraph_entry:

>> 8.459 us   |  mwifiex_pm_wakeup_card();

>>     kworker/u13:1-173   [003]   215.515238: funcgraph_entry:      +

>> 15.166 us  |  mwifiex_pm_wakeup_card();

>>     kworker/u13:3-1768  [001]   217.175691: funcgraph_entry:      +

>> 11.083 us  |  mwifiex_pm_wakeup_card();

> 

> That should read:

> 

> # trace-cmd record -p function_graph -O funcgraph-abstime -l mwifiex_pm_wakeup_card

> # trace-cmd report

>     kworker/u13:0-199   [003]   348.987306: funcgraph_entry:      # 6219.500 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:0-199   [003]   349.316312: funcgraph_entry:      # 6267.625 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-4057  [001]   352.238530: funcgraph_entry:      # 6184.250 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:0-199   [002]   356.626366: funcgraph_entry:      # 6553.166 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-4057  [002]   356.709389: funcgraph_entry:      # 6212.500 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-4057  [002]   356.847215: funcgraph_entry:      # 6230.292 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-4057  [000]   356.897576: funcgraph_entry:      # 6451.667 us |  mwifiex_pm_wakeup_card();

>     kworker/u13:0-199   [004]   357.175025: funcgraph_entry:      # 6204.042 us |  mwifiex_pm_wakeup_card();

> 

> vs.

> 

>     kworker/u13:1-173   [005]   212.230542: funcgraph_entry:        7.000 us   |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-1768  [005]   213.886063: funcgraph_entry:        9.334 us   |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-1768  [002]   214.473273: funcgraph_entry:      + 11.375 us  |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-1768  [005]   214.530705: funcgraph_entry:        5.542 us   |  mwifiex_pm_wakeup_card();

>     kworker/u13:1-173   [002]   215.050168: funcgraph_entry:      + 13.125 us  |  mwifiex_pm_wakeup_card();

>     kworker/u13:1-173   [002]   215.106492: funcgraph_entry:      + 11.959 us  |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-1768  [005]   215.484807: funcgraph_entry:        8.459 us   |  mwifiex_pm_wakeup_card();

>     kworker/u13:1-173   [003]   215.515238: funcgraph_entry:      + 15.166 us  |  mwifiex_pm_wakeup_card();

>     kworker/u13:3-1768  [001]   217.175691: funcgraph_entry:      + 11.083 us  |  mwifiex_pm_wakeup_card();

> 

> Brian

> 


Thanks for the pointer to that commit Brian, it turns out this is 
actually the change that causes the "Firmware wakeup failed" issues that 
I'm trying to fix with the second patch here.

Also my approach is a lot messier than just reverting 
062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 and also appears to be blocking 
even longer...

Does anyone have an idea what could be the reason for the posted write 
not going through, or could that also be a potential firmware bug in the 
chip?

Jonas

Brian Norris Sept. 20, 2021, 5:48 p.m. UTC | #12

On Sat, Sep 18, 2021 at 12:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
> Thanks for the pointer to that commit Brian, it turns out this is

> actually the change that causes the "Firmware wakeup failed" issues that

> I'm trying to fix with the second patch here.

Huh. That's interesting, although I guess it makes some sense given
your theory of "dropped writes". FWIW, this strategy (post a single
write, then wait for wakeup) is the same used by some other
chips/drivers too (e.g., ath10k/pci), although in those cases card
wakeup is much much faster. But if the bus was dropping writes
somehow, those strategies would fail too.

> Also my approach is a lot messier than just reverting

> 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 and also appears to be blocking

> even longer...

For the record, in case you're talking about my data ("blocking even
longer"): I was only testing patch 1. Patch 2 isn't really relevant to
my particular systems (Rockchip RK3399 + Marvell 8997/PCIe), because
(a) I'm pretty sure my system isn't "dropping" any reads or writes
(b) all my delay is in the read-back; the Rockchip PCIe bus is waiting
indefinitely for the card to wake up, instead of timing out and
reporting all-1's like many x86 systems appear to do (I've tested
this).

So, the 6ms delay is entirely sitting in the ioread32(), not a delay loop.

I haven't yet tried your version 2 (which avoids the blocking read to
wake up; good!), but it sounds like in theory it could solve your
problem while avoiding 6ms delays for me. I intend to test your v2
this week.

> Does anyone have an idea what could be the reason for the posted write

> not going through, or could that also be a potential firmware bug in the

> chip?

I have no clue about that. That does sound downright horrible, but so
are many things when dealing with this family of hardware/firmware.
I'm not sure how to prove out whether this is a host bus problem, or
an endpoint/firmware problem, other than perhaps trying the same
module/firmware on another system, if that's possible.

Anyway, to reiterate: I'm not fundamentally opposed to v2 (pending a
test run here), even if it is a bit ugly and perhaps not 100%
understood.

Brian

Jonas Dreßler Sept. 23, 2021, 3:28 p.m. UTC | #13

On 9/22/21 2:50 PM, Jonas Dreßler wrote:
> On 9/20/21 7:48 PM, Brian Norris wrote:

>> On Sat, Sep 18, 2021 at 12:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:

>>> Thanks for the pointer to that commit Brian, it turns out this is

>>> actually the change that causes the "Firmware wakeup failed" issues that

>>> I'm trying to fix with the second patch here.

>>

>> Huh. That's interesting, although I guess it makes some sense given

>> your theory of "dropped writes". FWIW, this strategy (post a single

>> write, then wait for wakeup) is the same used by some other

>> chips/drivers too (e.g., ath10k/pci), although in those cases card

>> wakeup is much much faster. But if the bus was dropping writes

>> somehow, those strategies would fail too.

>>

>>> Also my approach is a lot messier than just reverting

>>> 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 and also appears to be blocking

>>> even longer...

>>

>> For the record, in case you're talking about my data ("blocking even

>> longer"): I was only testing patch 1. Patch 2 isn't really relevant to

>> my particular systems (Rockchip RK3399 + Marvell 8997/PCIe), because

>> (a) I'm pretty sure my system isn't "dropping" any reads or writes

>> (b) all my delay is in the read-back; the Rockchip PCIe bus is waiting

>> indefinitely for the card to wake up, instead of timing out and

>> reporting all-1's like many x86 systems appear to do (I've tested

>> this).

>>

>> So, the 6ms delay is entirely sitting in the ioread32(), not a delay 

>> loop.

>>

>> I haven't yet tried your version 2 (which avoids the blocking read to

>> wake up; good!), but it sounds like in theory it could solve your

>> problem while avoiding 6ms delays for me. I intend to test your v2

>> this week.

>>

> 

> With "blocking even longer" I meant that (on my system) the delay-loop 

> blocks even longer than waking up the card via mwifiex_read_reg() (both 

> are in the orders of milliseconds). And given that in certain cases the 

> card wakeup (or a write getting through to the card, I have no idea) can 

> take extremely long, I'd feel more confident going with the 

> mwifiex_read_reg() method to wake up the card.

> 

> Anyway, you know what's even weirder with all this: I've been testing 

> the first commit of patch v2 (so just the single read-back instead of 

> the big hammer) together with 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 

> reverted for a good week now and haven't seen any wakeup failure yet. 

> Otoh I'm fairly sure the big hammer with reading back every write wasn't 

> enough to fix the wakeup failures, otherwise I wouldn't even have 

> started working on the second commit.

> 

> So that would mean there's a difference between writing and then reading 

> back vs only reading to wake up the card: Only the latter fixes the 

> wakeup failures.

> 

>>> Does anyone have an idea what could be the reason for the posted write

>>> not going through, or could that also be a potential firmware bug in the

>>> chip?

>>

>> I have no clue about that. That does sound downright horrible, but so

>> are many things when dealing with this family of hardware/firmware.

>> I'm not sure how to prove out whether this is a host bus problem, or

>> an endpoint/firmware problem, other than perhaps trying the same

>> module/firmware on another system, if that's possible.

>>

>> Anyway, to reiterate: I'm not fundamentally opposed to v2 (pending a

>> test run here), even if it is a bit ugly and perhaps not 100%

>> understood.

>>

> 

> I'm not 100% sure about all this yet, I think I'm gonna try to confirm 

> my older findings once again now and then we'll see. FTR, would you be 

> fine with using the mwifiex_read_reg() method to wake up the card and 

> somehow quirking your system to use write_reg()?

> 

>> Brian

>>

> 


Okay, so I finally managed to find my exact reproducer for the bug again:

1) Make sure wifi powersaving is enabled (iw dev wlp1s0 set power_save on)
2) Connect to any wifi network (makes firmware go into wifi powersaving 
mode, not deep sleep)
3) Make sure bluetooth is turned off (to ensure the firmware actually 
enters powersave mode and doesn't keep the radio active doing bluetooth 
stuff)
4) To confirm that wifi powersaving is entered ping a device on the LAN, 
pings should be a few ms higher than without powersaving
5) Run "while true; do iwconfig; sleep 0.0001; done", this wakes and 
suspends the firmware extremely often
6) Wait until things explode, for me it consistently takes <5 minutes

Using this reproducer I was able to clear things up a bit:

- There still are wakeup failures when using (only) mwifiex_read_reg() 
to wake the card, so there's no weird difference between waking up using 
read vs write+read-back

- Just calling mwifiex_write_reg() once and then blocking until the card 
wakes up using my delay-loop doesn't fix the issue, it's actually 
writing multiple times that fixes the issue

These observations sound a lot like writes (and even reads) are actually 
being dropped, don't they?

Andy Shevchenko Sept. 23, 2021, 7:37 p.m. UTC | #14

On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>

> On 9/22/21 2:50 PM, Jonas Dreßler wrote:

> > On 9/20/21 7:48 PM, Brian Norris wrote:

> >> On Sat, Sep 18, 2021 at 12:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:

> >>> Thanks for the pointer to that commit Brian, it turns out this is

> >>> actually the change that causes the "Firmware wakeup failed" issues that

> >>> I'm trying to fix with the second patch here.

> >>

> >> Huh. That's interesting, although I guess it makes some sense given

> >> your theory of "dropped writes". FWIW, this strategy (post a single

> >> write, then wait for wakeup) is the same used by some other

> >> chips/drivers too (e.g., ath10k/pci), although in those cases card

> >> wakeup is much much faster. But if the bus was dropping writes

> >> somehow, those strategies would fail too.

> >>

> >>> Also my approach is a lot messier than just reverting

> >>> 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 and also appears to be blocking

> >>> even longer...

> >>

> >> For the record, in case you're talking about my data ("blocking even

> >> longer"): I was only testing patch 1. Patch 2 isn't really relevant to

> >> my particular systems (Rockchip RK3399 + Marvell 8997/PCIe), because

> >> (a) I'm pretty sure my system isn't "dropping" any reads or writes

> >> (b) all my delay is in the read-back; the Rockchip PCIe bus is waiting

> >> indefinitely for the card to wake up, instead of timing out and

> >> reporting all-1's like many x86 systems appear to do (I've tested

> >> this).

> >>

> >> So, the 6ms delay is entirely sitting in the ioread32(), not a delay

> >> loop.

> >>

> >> I haven't yet tried your version 2 (which avoids the blocking read to

> >> wake up; good!), but it sounds like in theory it could solve your

> >> problem while avoiding 6ms delays for me. I intend to test your v2

> >> this week.

> >>

> >

> > With "blocking even longer" I meant that (on my system) the delay-loop

> > blocks even longer than waking up the card via mwifiex_read_reg() (both

> > are in the orders of milliseconds). And given that in certain cases the

> > card wakeup (or a write getting through to the card, I have no idea) can

> > take extremely long, I'd feel more confident going with the

> > mwifiex_read_reg() method to wake up the card.

> >

> > Anyway, you know what's even weirder with all this: I've been testing

> > the first commit of patch v2 (so just the single read-back instead of

> > the big hammer) together with 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3

> > reverted for a good week now and haven't seen any wakeup failure yet.

> > Otoh I'm fairly sure the big hammer with reading back every write wasn't

> > enough to fix the wakeup failures, otherwise I wouldn't even have

> > started working on the second commit.

> >

> > So that would mean there's a difference between writing and then reading

> > back vs only reading to wake up the card: Only the latter fixes the

> > wakeup failures.

> >

> >>> Does anyone have an idea what could be the reason for the posted write

> >>> not going through, or could that also be a potential firmware bug in the

> >>> chip?

> >>

> >> I have no clue about that. That does sound downright horrible, but so

> >> are many things when dealing with this family of hardware/firmware.

> >> I'm not sure how to prove out whether this is a host bus problem, or

> >> an endpoint/firmware problem, other than perhaps trying the same

> >> module/firmware on another system, if that's possible.

> >>

> >> Anyway, to reiterate: I'm not fundamentally opposed to v2 (pending a

> >> test run here), even if it is a bit ugly and perhaps not 100%

> >> understood.

> >>

> >

> > I'm not 100% sure about all this yet, I think I'm gonna try to confirm

> > my older findings once again now and then we'll see. FTR, would you be

> > fine with using the mwifiex_read_reg() method to wake up the card and

> > somehow quirking your system to use write_reg()?

> >

> >> Brian

> >>

> >

>

> Okay, so I finally managed to find my exact reproducer for the bug again:

>

> 1) Make sure wifi powersaving is enabled (iw dev wlp1s0 set power_save on)

> 2) Connect to any wifi network (makes firmware go into wifi powersaving

> mode, not deep sleep)

> 3) Make sure bluetooth is turned off (to ensure the firmware actually

> enters powersave mode and doesn't keep the radio active doing bluetooth

> stuff)

> 4) To confirm that wifi powersaving is entered ping a device on the LAN,

> pings should be a few ms higher than without powersaving

> 5) Run "while true; do iwconfig; sleep 0.0001; done", this wakes and

> suspends the firmware extremely often

> 6) Wait until things explode, for me it consistently takes <5 minutes

>

> Using this reproducer I was able to clear things up a bit:

>

> - There still are wakeup failures when using (only) mwifiex_read_reg()

> to wake the card, so there's no weird difference between waking up using

> read vs write+read-back

>

> - Just calling mwifiex_write_reg() once and then blocking until the card

> wakes up using my delay-loop doesn't fix the issue, it's actually

> writing multiple times that fixes the issue

>

> These observations sound a lot like writes (and even reads) are actually

> being dropped, don't they?




-- 
With Best Regards,
Andy Shevchenko

Andy Shevchenko Sept. 23, 2021, 7:41 p.m. UTC | #15

On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
> On 9/22/21 2:50 PM, Jonas Dreßler wrote:


...

> - Just calling mwifiex_write_reg() once and then blocking until the card

> wakes up using my delay-loop doesn't fix the issue, it's actually

> writing multiple times that fixes the issue

>

> These observations sound a lot like writes (and even reads) are actually

> being dropped, don't they?


It sounds like you're writing into a not ready (fully powered on) device.

To check this, try to put a busy loop for reading and check the value
till it gets 0.

Something like

  unsigned int count = 1000;

  do {
    if (mwifiex_read_reg(...) == 0)
      break;
  } while (--count);


-- 
With Best Regards,
Andy Shevchenko

Pali Rohár Sept. 23, 2021, 8:22 p.m. UTC | #16

On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:
> On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

> > On 9/22/21 2:50 PM, Jonas Dreßler wrote:

> 

> ...

> 

> > - Just calling mwifiex_write_reg() once and then blocking until the card

> > wakes up using my delay-loop doesn't fix the issue, it's actually

> > writing multiple times that fixes the issue

> >

> > These observations sound a lot like writes (and even reads) are actually

> > being dropped, don't they?

> 

> It sounds like you're writing into a not ready (fully powered on) device.


This reminds me a discussion with Bjorn about CRS response returned
after firmware crash / reset when device is not ready yet:
https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/

Could not be this similar issue? You could check it via reading
PCI_VENDOR_ID register from config space. And if it is not valid value
then card is not really ready yet.

> To check this, try to put a busy loop for reading and check the value

> till it gets 0.

> 

> Something like

> 

>   unsigned int count = 1000;

> 

>   do {

>     if (mwifiex_read_reg(...) == 0)

>       break;

>   } while (--count);

> 

> 

> -- 

> With Best Regards,

> Andy Shevchenko

Pali Rohár Sept. 30, 2021, 3:42 p.m. UTC | #17

On Thursday 30 September 2021 17:38:43 Jonas Dreßler wrote:
> On 9/23/21 10:22 PM, Pali Rohár wrote:

> > On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:

> > > On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

> > > > On 9/22/21 2:50 PM, Jonas Dreßler wrote:

> > > 

> > > ...

> > > 

> > > > - Just calling mwifiex_write_reg() once and then blocking until the card

> > > > wakes up using my delay-loop doesn't fix the issue, it's actually

> > > > writing multiple times that fixes the issue

> > > > 

> > > > These observations sound a lot like writes (and even reads) are actually

> > > > being dropped, don't they?

> > > 

> > > It sounds like you're writing into a not ready (fully powered on) device.

> > 

> > This reminds me a discussion with Bjorn about CRS response returned

> > after firmware crash / reset when device is not ready yet:

> > https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/

> > 

> > Could not be this similar issue? You could check it via reading

> > PCI_VENDOR_ID register from config space. And if it is not valid value

> > then card is not really ready yet.

> > 

> > > To check this, try to put a busy loop for reading and check the value

> > > till it gets 0.

> > > 

> > > Something like

> > > 

> > >    unsigned int count = 1000;

> > > 

> > >    do {

> > >      if (mwifiex_read_reg(...) == 0)

> > >        break;

> > >    } while (--count);

> > > 

> > > 

> > > -- 

> > > With Best Regards,

> > > Andy Shevchenko

> 

> I've tried both reading PCI_VENDOR_ID and the firmware status using a busy

> loop now, but sadly none of them worked. It looks like the card always

> replies with the correct values even though it sometimes won't wake up after

> that.

> 

> I do have one new observation though, although I've no clue what could be

> happening here: When reading PCI_VENDOR_ID 1000 times to wakeup we can

> "predict" the wakeup failure because exactly one (usually around the 20th)

> of those 1000 reads will fail.


What does "fail" means here?

> Maybe the firmware actually tries to wake up,

> encounters an error somewhere in its wakeup routines and then goes down a

> special failure code path. That code path keeps the cards CPU so busy that

> at some point a PCI_VENDOR_ID request times out?

> 

> Or well, maybe the card actually wakes up fine, but we don't receive the

> interrupt on our end, so many possibilities...

Jonas Dreßler Sept. 30, 2021, 4:14 p.m. UTC | #18

On 9/30/21 5:42 PM, Pali Rohár wrote:
> On Thursday 30 September 2021 17:38:43 Jonas Dreßler wrote:

>> On 9/23/21 10:22 PM, Pali Rohár wrote:

>>> On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:

>>>> On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

>>>>> On 9/22/21 2:50 PM, Jonas Dreßler wrote:

>>>>

>>>> ...

>>>>

>>>>> - Just calling mwifiex_write_reg() once and then blocking until the card

>>>>> wakes up using my delay-loop doesn't fix the issue, it's actually

>>>>> writing multiple times that fixes the issue

>>>>>

>>>>> These observations sound a lot like writes (and even reads) are actually

>>>>> being dropped, don't they?

>>>>

>>>> It sounds like you're writing into a not ready (fully powered on) device.

>>>

>>> This reminds me a discussion with Bjorn about CRS response returned

>>> after firmware crash / reset when device is not ready yet:

>>> https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/

>>>

>>> Could not be this similar issue? You could check it via reading

>>> PCI_VENDOR_ID register from config space. And if it is not valid value

>>> then card is not really ready yet.

>>>

>>>> To check this, try to put a busy loop for reading and check the value

>>>> till it gets 0.

>>>>

>>>> Something like

>>>>

>>>>     unsigned int count = 1000;

>>>>

>>>>     do {

>>>>       if (mwifiex_read_reg(...) == 0)

>>>>         break;

>>>>     } while (--count);

>>>>

>>>>

>>>> -- 

>>>> With Best Regards,

>>>> Andy Shevchenko

>>

>> I've tried both reading PCI_VENDOR_ID and the firmware status using a busy

>> loop now, but sadly none of them worked. It looks like the card always

>> replies with the correct values even though it sometimes won't wake up after

>> that.

>>

>> I do have one new observation though, although I've no clue what could be

>> happening here: When reading PCI_VENDOR_ID 1000 times to wakeup we can

>> "predict" the wakeup failure because exactly one (usually around the 20th)

>> of those 1000 reads will fail.

> 

> What does "fail" means here?


ioread32() returns all ones, that's interpreted as failure by 
mwifiex_read_reg().

> 

>> Maybe the firmware actually tries to wake up,

>> encounters an error somewhere in its wakeup routines and then goes down a

>> special failure code path. That code path keeps the cards CPU so busy that

>> at some point a PCI_VENDOR_ID request times out?

>>

>> Or well, maybe the card actually wakes up fine, but we don't receive the

>> interrupt on our end, so many possibilities...

Jonas Dreßler Sept. 30, 2021, 4:22 p.m. UTC | #19

On 9/30/21 6:19 PM, Pali Rohár wrote:
> On Thursday 30 September 2021 18:14:04 Jonas Dreßler wrote:

>> On 9/30/21 5:42 PM, Pali Rohár wrote:

>>> On Thursday 30 September 2021 17:38:43 Jonas Dreßler wrote:

>>>> On 9/23/21 10:22 PM, Pali Rohár wrote:

>>>>> On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:

>>>>>> On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

>>>>>>> On 9/22/21 2:50 PM, Jonas Dreßler wrote:

>>>>>>

>>>>>> ...

>>>>>>

>>>>>>> - Just calling mwifiex_write_reg() once and then blocking until the card

>>>>>>> wakes up using my delay-loop doesn't fix the issue, it's actually

>>>>>>> writing multiple times that fixes the issue

>>>>>>>

>>>>>>> These observations sound a lot like writes (and even reads) are actually

>>>>>>> being dropped, don't they?

>>>>>>

>>>>>> It sounds like you're writing into a not ready (fully powered on) device.

>>>>>

>>>>> This reminds me a discussion with Bjorn about CRS response returned

>>>>> after firmware crash / reset when device is not ready yet:

>>>>> https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/

>>>>>

>>>>> Could not be this similar issue? You could check it via reading

>>>>> PCI_VENDOR_ID register from config space. And if it is not valid value

>>>>> then card is not really ready yet.

>>>>>

>>>>>> To check this, try to put a busy loop for reading and check the value

>>>>>> till it gets 0.

>>>>>>

>>>>>> Something like

>>>>>>

>>>>>>      unsigned int count = 1000;

>>>>>>

>>>>>>      do {

>>>>>>        if (mwifiex_read_reg(...) == 0)

>>>>>>          break;

>>>>>>      } while (--count);

>>>>>>

>>>>>>

>>>>>> -- 

>>>>>> With Best Regards,

>>>>>> Andy Shevchenko

>>>>

>>>> I've tried both reading PCI_VENDOR_ID and the firmware status using a busy

>>>> loop now, but sadly none of them worked. It looks like the card always

>>>> replies with the correct values even though it sometimes won't wake up after

>>>> that.

>>>>

>>>> I do have one new observation though, although I've no clue what could be

>>>> happening here: When reading PCI_VENDOR_ID 1000 times to wakeup we can

>>>> "predict" the wakeup failure because exactly one (usually around the 20th)

>>>> of those 1000 reads will fail.

>>>

>>> What does "fail" means here?

>>

>> ioread32() returns all ones, that's interpreted as failure by

>> mwifiex_read_reg().

> 

> Ok. And can you check if PCI Bridge above this card has enabled CRSSVE

> bit (CRSVisible in RootCtl+RootCap in lspci output)? To determinate if

> Bridge could convert CRS response to all-ones as failed transaction.

> 


Seems like that bit is disabled:
 > RootCap: CRSVisible-

 > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-



>>>

>>>> Maybe the firmware actually tries to wake up,

>>>> encounters an error somewhere in its wakeup routines and then goes down a

>>>> special failure code path. That code path keeps the cards CPU so busy that

>>>> at some point a PCI_VENDOR_ID request times out?

>>>>

>>>> Or well, maybe the card actually wakes up fine, but we don't receive the

>>>> interrupt on our end, so many possibilities...

Pali Rohár Sept. 30, 2021, 4:39 p.m. UTC | #20

On Thursday 30 September 2021 18:22:42 Jonas Dreßler wrote:
> On 9/30/21 6:19 PM, Pali Rohár wrote:

> > On Thursday 30 September 2021 18:14:04 Jonas Dreßler wrote:

> > > On 9/30/21 5:42 PM, Pali Rohár wrote:

> > > > On Thursday 30 September 2021 17:38:43 Jonas Dreßler wrote:

> > > > > On 9/23/21 10:22 PM, Pali Rohár wrote:

> > > > > > On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:

> > > > > > > On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@v0yd.nl> wrote:

> > > > > > > > On 9/22/21 2:50 PM, Jonas Dreßler wrote:

> > > > > > > 

> > > > > > > ...

> > > > > > > 

> > > > > > > > - Just calling mwifiex_write_reg() once and then blocking until the card

> > > > > > > > wakes up using my delay-loop doesn't fix the issue, it's actually

> > > > > > > > writing multiple times that fixes the issue

> > > > > > > > 

> > > > > > > > These observations sound a lot like writes (and even reads) are actually

> > > > > > > > being dropped, don't they?

> > > > > > > 

> > > > > > > It sounds like you're writing into a not ready (fully powered on) device.

> > > > > > 

> > > > > > This reminds me a discussion with Bjorn about CRS response returned

> > > > > > after firmware crash / reset when device is not ready yet:

> > > > > > https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/

> > > > > > 

> > > > > > Could not be this similar issue? You could check it via reading

> > > > > > PCI_VENDOR_ID register from config space. And if it is not valid value

> > > > > > then card is not really ready yet.

> > > > > > 

> > > > > > > To check this, try to put a busy loop for reading and check the value

> > > > > > > till it gets 0.

> > > > > > > 

> > > > > > > Something like

> > > > > > > 

> > > > > > >      unsigned int count = 1000;

> > > > > > > 

> > > > > > >      do {

> > > > > > >        if (mwifiex_read_reg(...) == 0)

> > > > > > >          break;

> > > > > > >      } while (--count);

> > > > > > > 

> > > > > > > 

> > > > > > > -- 

> > > > > > > With Best Regards,

> > > > > > > Andy Shevchenko

> > > > > 

> > > > > I've tried both reading PCI_VENDOR_ID and the firmware status using a busy

> > > > > loop now, but sadly none of them worked. It looks like the card always

> > > > > replies with the correct values even though it sometimes won't wake up after

> > > > > that.

> > > > > 

> > > > > I do have one new observation though, although I've no clue what could be

> > > > > happening here: When reading PCI_VENDOR_ID 1000 times to wakeup we can

> > > > > "predict" the wakeup failure because exactly one (usually around the 20th)

> > > > > of those 1000 reads will fail.

> > > > 

> > > > What does "fail" means here?

> > > 

> > > ioread32() returns all ones, that's interpreted as failure by

> > > mwifiex_read_reg().

> > 

> > Ok. And can you check if PCI Bridge above this card has enabled CRSSVE

> > bit (CRSVisible in RootCtl+RootCap in lspci output)? To determinate if

> > Bridge could convert CRS response to all-ones as failed transaction.

> > 

> 

> Seems like that bit is disabled:

> > RootCap: CRSVisible-

> > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-


So it means that CRSSVE is unsupported by upper bridge. In case card
returns CRS response to system (via bridge) that is not ready yet,
bridge re-issue read request, and after some failures it returns to
system all-ones to indicate failed transaction. But all-ones can be
returned also by bridge when card does not send any response.

So from this test we do not know what happened. It would be nice to know
it, but such test requires to connect this card into system which
supports CRSSVE, in which case CRS response it passed directly to OS as
value 0xffff0001. Look at the link above where I discussed with Bjorn
about buggy wifi cards which resets internally, for more details.

But in this setup when CRSSVE is not supported, I think there is no
other option than just adding sleep prior accessing card...

For debugging such issues I got the only advice to use PCIe analyzer and
look at what is really happening on the bus. But required equipment for
this debugging is not cheap...

> > > > 

> > > > > Maybe the firmware actually tries to wake up,

> > > > > encounters an error somewhere in its wakeup routines and then goes down a

> > > > > special failure code path. That code path keeps the cards CPU so busy that

> > > > > at some point a PCI_VENDOR_ID request times out?

> > > > > 

> > > > > Or well, maybe the card actually wakes up fine, but we don't receive the

> > > > > interrupt on our end, so many possibilities...

[1/2] mwifiex: Use non-posted PCI register writes

Commit Message

Comments

Patch