Message ID | 20190109210748.29074-5-paulmck@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
Hi PeterA, The Cover leter has this: > 5. Update memory-barriers.txt on enforcing heavy ordering for > port-I/O accesses, courtesy of Will Deacon. This one needs > an ack, preferably by someone from Intel. Matthew Wilcox > posted some feedback from an Intel manual here, which might > be considered to be a close substitute, but... ;-) > > http://lkml.kernel.org/r/20181127192234.GF10377@bombadil.infradead.org which in turn has: > Here's a quote from Section 18.6 of volume 1 of the Software Developer > Manual, November 2018 edition: > > When the I/O address space is used instead of memory-mapped I/O, the > situation is different in two respects: > • The processor never buffers I/O writes. Therefore, strict ordering of > I/O operations is enforced by the processor. (As with memory-mapped I/O, > it is possible for a chip set to post writes in certain I/O ranges.) > • The processor synchronizes I/O instruction execution with external > bus activity (see Table 18-1). > > Table 18-1 says that in* delays execution of the current instruction until > completion of pending stores, and out* delays execution of the _next_ > instruction until completion of both pending stores and the current store. Can we give an Intel ACK on the below patch? On Wed, Jan 09, 2019 at 01:07:46PM -0800, Paul E. McKenney wrote: > From: Will Deacon <will.deacon@arm.com> > > David Laight explains: > > | A long time ago there was a document from Intel that said that > | inb/outb weren't necessarily synchronised wrt memory accesses. > | (Might be P-pro era). However no processors actually behaved that > | way and more recent docs say that inb/outb are fully ordered. > > This also reflects the situation on other architectures, the the port > accessor macros tend to be implemented in terms of readX/writeX. > > Update Documentation/memory-barriers.txt to reflect reality. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: David Laight <David.Laight@ACULAB.COM> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: <linux-arch@vger.kernel.org> > Cc: <linux-doc@vger.kernel.org> > Cc: <linux-kernel@vger.kernel.org> > Signed-off-by: Will Deacon <will.deacon@arm.com> > Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> > --- > Documentation/memory-barriers.txt | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 1c22b21ae922..a70104e2a087 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -2619,10 +2619,8 @@ functions: > intermediary bridges (such as the PCI host bridge) may not fully honour > that. > > - They are guaranteed to be fully ordered with respect to each other. > - > - They are not guaranteed to be fully ordered with respect to other types of > - memory and I/O operation. > + They are guaranteed to be fully ordered with respect to each other and > + also with respect to other types of memory and I/O operation. > > (*) readX(), writeX(): > > -- > 2.17.1 >
Hi Paul, On Wed, Jan 09, 2019 at 01:07:46PM -0800, Paul E. McKenney wrote: > From: Will Deacon <will.deacon@arm.com> > > David Laight explains: > > | A long time ago there was a document from Intel that said that > | inb/outb weren't necessarily synchronised wrt memory accesses. > | (Might be P-pro era). However no processors actually behaved that > | way and more recent docs say that inb/outb are fully ordered. > > This also reflects the situation on other architectures, the the port > accessor macros tend to be implemented in terms of readX/writeX. > > Update Documentation/memory-barriers.txt to reflect reality. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: David Laight <David.Laight@ACULAB.COM> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: <linux-arch@vger.kernel.org> > Cc: <linux-doc@vger.kernel.org> > Cc: <linux-kernel@vger.kernel.org> > Signed-off-by: Will Deacon <will.deacon@arm.com> > Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> > --- > Documentation/memory-barriers.txt | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 1c22b21ae922..a70104e2a087 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -2619,10 +2619,8 @@ functions: > intermediary bridges (such as the PCI host bridge) may not fully honour > that. > > - They are guaranteed to be fully ordered with respect to each other. > - > - They are not guaranteed to be fully ordered with respect to other types of > - memory and I/O operation. > + They are guaranteed to be fully ordered with respect to each other and > + also with respect to other types of memory and I/O operation. Given the lack of Intel response here, I went away to do some digging. As evidenced by the commit message, there is certainly an understanding amongst some developers that inX/outX() are strongly ordered on x86 and this was re-enforced by Linus in March last year: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg131212.html It was this information on which I based my patch. The Intel SDM is not quite as assertive in its claims. However, it has also occurred to me that this patch is actually missing the point. memory-barriers.txt should be documenting the *Linux* memory model, not the x86 one, and so the port accessors should be defined to have the same ordering semantics as the MMIO accessors. If this wasn't the case, then macros such as ioreadX() and iowriteX() would be unusable in portable driver code. The inX/outX implementation in asm-generic would also be bogus, despite being widely used. Unfortunately, the whole "KERNEL I/O BARRIER EFFECTS" section in memory-barriers.txt is vague, x86-centric and out of date. I think the best way forward is for me to propose a rewrite of that section, based on the work I did putting together my I/O ordering talk at ELCE last year. That, at least, will allow us to start off with a portable semantics rather than trying to infer the details from CPU manuals. So please drop this for now, and I'll send out a more involved RFC patch shortly with the usual suspects on cc. Cheers, Will
On Mon, Feb 11, 2019 at 4:30 PM Will Deacon <will.deacon@arm.com> wrote: > Given the lack of Intel response here, I went away to do some digging. > As evidenced by the commit message, there is certainly an understanding > amongst some developers that inX/outX() are strongly ordered on x86 and > this was re-enforced by Linus in March last year: > > https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg131212.html > > It was this information on which I based my patch. The Intel SDM is not > quite as assertive in its claims. > > However, it has also occurred to me that this patch is actually missing > the point. memory-barriers.txt should be documenting the *Linux* memory > model, not the x86 one, and so the port accessors should be defined to > have the same ordering semantics as the MMIO accessors. If this wasn't > the case, then macros such as ioreadX() and iowriteX() would be unusable > in portable driver code. My interpretation of the ioreadX() and iowriteX() semantics is that they only guarantee readl()/writel() barrier semantics, even though they may in fact provide stronger barriers for PIO on architectures that use CONFIG_GENERIC_IOMAP (which falls back to inX()/outX()). > The inX/outX implementation in asm-generic would > also be bogus, despite being widely used. They likely are. The asm-generic files tend to provide a generic abstraction as much as that is possible, but without having access to the architecture specific semantics, they raditionally don't know what should be done here. We now have __io_pbw()/__io_paw()/ __io_pbr()/__io_par() to let architectures get it right, but that is a fairly recent addition, so nothing other than riscv defines them today. To make things worse, a lot of machines are unable to provide __io_paw(), e.g. when all bus writes are posted. Arnd
Hi Arnd, On Mon, Feb 11, 2019 at 06:11:48PM +0100, Arnd Bergmann wrote: > On Mon, Feb 11, 2019 at 4:30 PM Will Deacon <will.deacon@arm.com> wrote: > > > Given the lack of Intel response here, I went away to do some digging. > > As evidenced by the commit message, there is certainly an understanding > > amongst some developers that inX/outX() are strongly ordered on x86 and > > this was re-enforced by Linus in March last year: > > > > https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg131212.html > > > > It was this information on which I based my patch. The Intel SDM is not > > quite as assertive in its claims. > > > > However, it has also occurred to me that this patch is actually missing > > the point. memory-barriers.txt should be documenting the *Linux* memory > > model, not the x86 one, and so the port accessors should be defined to > > have the same ordering semantics as the MMIO accessors. If this wasn't > > the case, then macros such as ioreadX() and iowriteX() would be unusable > > in portable driver code. > > My interpretation of the ioreadX() and iowriteX() semantics is that they > only guarantee readl()/writel() barrier semantics, even though they > may in fact provide stronger barriers for PIO on architectures that use > CONFIG_GENERIC_IOMAP (which falls back to inX()/outX()). > > > The inX/outX implementation in asm-generic would > > also be bogus, despite being widely used. > > They likely are. The asm-generic files tend to provide a generic > abstraction as much as that is possible, but without having access > to the architecture specific semantics, they raditionally don't know > what should be done here. We now have __io_pbw()/__io_paw()/ > __io_pbr()/__io_par() to let architectures get it right, but that is > a fairly recent addition, so nothing other than riscv defines them > today. > To make things worse, a lot of machines are unable to provide > __io_paw(), e.g. when all bus writes are posted. So I've just sent an RFC (you're on cc) that attempts to rewrite this part of memory-barriers.txt to reflect reality. Hopefully that can act as a starting point for discussion if we decide we want to change the documented behaviour and/or implementation. Will
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1c22b21ae922..a70104e2a087 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -2619,10 +2619,8 @@ functions: intermediary bridges (such as the PCI host bridge) may not fully honour that. - They are guaranteed to be fully ordered with respect to each other. - - They are not guaranteed to be fully ordered with respect to other types of - memory and I/O operation. + They are guaranteed to be fully ordered with respect to each other and + also with respect to other types of memory and I/O operation. (*) readX(), writeX():