mbox series

[v9,0/6] lib/ring: APIs to support custom element size

Message ID 20200116052511.8557-1-honnappa.nagarahalli@arm.com
Headers show
Series lib/ring: APIs to support custom element size | expand

Message

Honnappa Nagarahalli Jan. 16, 2020, 5:25 a.m. UTC
The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation.

v9
 - Split 'test_ring_burst_bulk_tests' test case into 4 smaller
   functions to address clang compilation time issue.
 - Addressed compilation failure in Intel CI in the hash changes.

v8
 - Changed the 128b copy elements inline function to use 'memcpy'
   to generate unaligned load/store instructions for x86. Generic
   copy function results in performance drop. (Konstantin)
 - Changed the API type #defines to be more clear (Konstantin)
 - Removed the code duplication in performance tests (Konstantin)
 - Fixed memory leak, changed test macros to inline functions (Konstantin)
 - Changed functional tests to test for 20B ring element. Fixed
   a bug in 32b element copy code for enqueue/dequeue(ring size
   needs to be normalized for 32b).
 - Squashed the functional and performance tests in their
   respective single commits.

v7
 - Merged the test cases to test both legacy APIs and
   rte_ring_xxx_elem APIs without code duplication (Konstantin, Olivier)
 - Performance test cases are merged as well (Konstantin, Olivier)
 - Macros to copy elements are converted into inline functions (Olivier)
 - Added back the changes to hash and event libraries

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (6):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for rte_ring_xxx_elem APIs
  test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  lib/hash: use ring with 32b element size to save memory
  lib/eventdev: use custom element size ring for event rings

 app/test/test_ring.c                 | 1342 +++++++++++++-------------
 app/test/test_ring.h                 |  187 ++++
 app/test/test_ring_perf.c            |  452 +++++----
 lib/librte_eventdev/rte_event_ring.c |  147 +--
 lib/librte_eventdev/rte_event_ring.h |   45 +-
 lib/librte_hash/rte_cuckoo_hash.c    |   94 +-
 lib/librte_hash/rte_cuckoo_hash.h    |    2 +-
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 +++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 13 files changed, 2242 insertions(+), 1081 deletions(-)
 create mode 100644 app/test/test_ring.h
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1

Comments

Honnappa Nagarahalli Jan. 16, 2020, 4:36 p.m. UTC | #1
I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?

Thanks,
Honnappa

> -----Original Message-----

> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> Sent: Wednesday, January 15, 2020 11:25 PM

> To: olivier.matz@6wind.com; sthemmin@microsoft.com; jerinj@marvell.com;

> bruce.richardson@intel.com; david.marchand@redhat.com;

> pbhagavatula@marvell.com; konstantin.ananyev@intel.com;

> yipeng1.wang@intel.com; Honnappa Nagarahalli

> <Honnappa.Nagarahalli@arm.com>

> Cc: dev@dpdk.org; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; Ruifeng

> Wang <Ruifeng.Wang@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; nd

> <nd@arm.com>

> Subject: [PATCH v9 0/6] lib/ring: APIs to support custom element size

> 

> The current rte_ring hard-codes the type of the ring element to 'void *', hence

> the size of the element is hard-coded to 32b/64b. Since the ring element type

> is not an input to rte_ring APIs, it results in couple of issues:

> 

> 1) If an application requires to store an element which is not 64b, it

>    needs to write its own ring APIs similar to rte_event_ring APIs. This

>    creates additional burden on the programmers, who end up making

>    work-arounds and often waste memory.

> 2) If there are multiple libraries that store elements of the same

>    type, currently they would have to write their own rte_ring APIs. This

>    results in code duplication.

> 

> This patch adds new APIs to support configurable ring element size.

> The APIs support custom element sizes by allowing to define the ring element

> to be a multiple of 32b.

> 

> The aim is to achieve same performance as the existing ring implementation.

> 

> v9

>  - Split 'test_ring_burst_bulk_tests' test case into 4 smaller

>    functions to address clang compilation time issue.

>  - Addressed compilation failure in Intel CI in the hash changes.

> 

> v8

>  - Changed the 128b copy elements inline function to use 'memcpy'

>    to generate unaligned load/store instructions for x86. Generic

>    copy function results in performance drop. (Konstantin)

>  - Changed the API type #defines to be more clear (Konstantin)

>  - Removed the code duplication in performance tests (Konstantin)

>  - Fixed memory leak, changed test macros to inline functions (Konstantin)

>  - Changed functional tests to test for 20B ring element. Fixed

>    a bug in 32b element copy code for enqueue/dequeue(ring size

>    needs to be normalized for 32b).

>  - Squashed the functional and performance tests in their

>    respective single commits.

> 

> v7

>  - Merged the test cases to test both legacy APIs and

>    rte_ring_xxx_elem APIs without code duplication (Konstantin, Olivier)

>  - Performance test cases are merged as well (Konstantin, Olivier)

>  - Macros to copy elements are converted into inline functions (Olivier)

>  - Added back the changes to hash and event libraries

> 

> v6

>  - Labelled as RFC to indicate the better status

>  - Added unit tests to test the rte_ring_xxx_elem APIs

>  - Corrected 'macro based partial memcpy' (5/6) patch

>  - Added Konstantin's method after correction (6/6)

>  - Check Patch shows significant warnings and errors mainly due

>    copying code from existing test cases. None of them are harmful.

>    I will fix them once we have an agreement.

> 

> v5

>  - Use memcpy for chunks of 32B (Konstantin).

>  - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available

>    to compare the results easily.

>  - Copying without memcpy is also available in 1/3, if anyone wants to

>    experiment on their platform.

>  - Added other platform owners to test on their respective platforms.

> 

> v4

>  - Few fixes after more performance testing

> 

> v3

>  - Removed macro-fest and used inline functions

>    (Stephen, Bruce)

> 

> v2

>  - Change Event Ring implementation to use ring templates

>    (Jerin, Pavan)

> 

> Honnappa Nagarahalli (6):

>   test/ring: use division for cycle count calculation

>   lib/ring: apis to support configurable element size

>   test/ring: add functional tests for rte_ring_xxx_elem APIs

>   test/ring: modify perf test cases to use rte_ring_xxx_elem APIs

>   lib/hash: use ring with 32b element size to save memory

>   lib/eventdev: use custom element size ring for event rings

> 

>  app/test/test_ring.c                 | 1342 +++++++++++++-------------

>  app/test/test_ring.h                 |  187 ++++

>  app/test/test_ring_perf.c            |  452 +++++----

>  lib/librte_eventdev/rte_event_ring.c |  147 +--

>  lib/librte_eventdev/rte_event_ring.h |   45 +-

>  lib/librte_hash/rte_cuckoo_hash.c    |   94 +-

>  lib/librte_hash/rte_cuckoo_hash.h    |    2 +-

>  lib/librte_ring/Makefile             |    3 +-

>  lib/librte_ring/meson.build          |    4 +

>  lib/librte_ring/rte_ring.c           |   41 +-

>  lib/librte_ring/rte_ring.h           |    1 +

>  lib/librte_ring/rte_ring_elem.h      | 1003 +++++++++++++++++++

>  lib/librte_ring/rte_ring_version.map |    2 +

>  13 files changed, 2242 insertions(+), 1081 deletions(-)  create mode 100644

> app/test/test_ring.h  create mode 100644 lib/librte_ring/rte_ring_elem.h

> 

> --

> 2.17.1
David Marchand Jan. 17, 2020, 12:14 p.m. UTC | #2
On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>

> I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?


- Pushed the series
https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch
of mine for checks.
Travis reports:
"
[2155/2156] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.

No output has been received in the last 10m0s, this potentially
indicates a stalled build or something wrong with the build itself.

Check the details on how to adjust your build configuration on:
https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

The build has been terminated
"

I see you discussed this already with Aaron, did I miss something?


- Besides, I have no ack/review from the hash and eventdev maintainers.


-- 
David Marchand
Jerin Jacob Jan. 17, 2020, 1:34 p.m. UTC | #3
On Fri, Jan 17, 2020 at 5:44 PM David Marchand
<david.marchand@redhat.com> wrote:
>

> On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli

> <Honnappa.Nagarahalli@arm.com> wrote:

> >

> > I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?

>

> - Pushed the series

> https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch

> of mine for checks.

> Travis reports:

> "

> [2155/2156] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.

>

> No output has been received in the last 10m0s, this potentially

> indicates a stalled build or something wrong with the build itself.

>

> Check the details on how to adjust your build configuration on:

> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

>

> The build has been terminated

> "

>

> I see you discussed this already with Aaron, did I miss something?

>

>

> - Besides, I have no ack/review from the hash and eventdev maintainers.


+ Bruce, Harry, Mattias

Even though event ring added in the eventdev common code, it's been
used only by SW eventdev drivers.
So adding evendev ring author and SW driver maintainers for review.

>

>

> --

> David Marchand

>
Honnappa Nagarahalli Jan. 17, 2020, 2:28 p.m. UTC | #4
<snip>

> 

> On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli

> <Honnappa.Nagarahalli@arm.com> wrote:

> >

> > I see that the none of the CIs (except Travis) have run on this patch. Intel CI

> has reported a compilation error and I fixed it in this version. Does anyone

> know if/when the CI will run on the patches?

> 

> - Pushed the series

> https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch of

> mine for checks.

> Travis reports:

> "

> [2155/2156] Compiling C object 'app/te...st@@dpdk-

> test@exe/test_ring_perf.c.o'.

> 

> No output has been received in the last 10m0s, this potentially indicates a

> stalled build or something wrong with the build itself.

> 

> Check the details on how to adjust your build configuration on:

> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-

> because-no-output-was-received

> 

> The build has been terminated

> "

> 

> I see you discussed this already with Aaron, did I miss something?

Aaron has tested it with clang-8 and said it does not show the issue. I am not sure if he tested this on Travis CI.
I have tested it on clang-9 and it does not show any issues. I have modified the test cases to take much lesser time for Clang-7.

Aaron, please let me know if you want to upgrade Travis CI?
> 

> 

> - Besides, I have no ack/review from the hash and eventdev maintainers.

Yipeng, Jerin can you please review your parts?

> 

> 

> --

> David Marchand
Honnappa Nagarahalli Jan. 17, 2020, 2:36 p.m. UTC | #5
<snip>
> 

> >

> > On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli

> > <Honnappa.Nagarahalli@arm.com> wrote:

> > >

> > > I see that the none of the CIs (except Travis) have run on this

> > > patch. Intel CI

> > has reported a compilation error and I fixed it in this version. Does

> > anyone know if/when the CI will run on the patches?

> >

> > - Pushed the series

> > https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch

> > of mine for checks.

> > Travis reports:

> > "

> > [2155/2156] Compiling C object 'app/te...st@@dpdk-

> > test@exe/test_ring_perf.c.o'.

> >

> > No output has been received in the last 10m0s, this potentially

> > indicates a stalled build or something wrong with the build itself.

> >

> > Check the details on how to adjust your build configuration on:

> > https://docs.travis-ci.com/user/common-build-problems/#build-times-out

> > -

> > because-no-output-was-received

> >

> > The build has been terminated

> > "

> >

> > I see you discussed this already with Aaron, did I miss something?

> Aaron has tested it with clang-8 and said it does not show the issue. I am not

> sure if he tested this on Travis CI.

> I have tested it on clang-9 and it does not show any issues. I have modified

> the test cases to take much lesser time for Clang-7.

> 

> Aaron, please let me know if you want to upgrade Travis CI?

BTW, intel compilation CI is passing now with v9 including clang.

> >

> >

> > - Besides, I have no ack/review from the hash and eventdev maintainers.

> Yipeng, Jerin can you please review your parts?

> 

> >

> >

> > --

> > David Marchand

>
David Marchand Jan. 17, 2020, 4:15 p.m. UTC | #6
On Fri, Jan 17, 2020 at 3:29 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > - Pushed the series

> > https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch of

> > mine for checks.

> > Travis reports:

> > "

> > [2155/2156] Compiling C object 'app/te...st@@dpdk-

> > test@exe/test_ring_perf.c.o'.

> >

> > No output has been received in the last 10m0s, this potentially indicates a

> > stalled build or something wrong with the build itself.

> >

> > Check the details on how to adjust your build configuration on:

> > https://docs.travis-ci.com/user/common-build-problems/#build-times-out-

> > because-no-output-was-received

> >

> > The build has been terminated

> > "

> >

> > I see you discussed this already with Aaron, did I miss something?

> Aaron has tested it with clang-8 and said it does not show the issue. I am not sure if he tested this on Travis CI.

> I have tested it on clang-9 and it does not show any issues. I have modified the test cases to take much lesser time for Clang-7.


The problem is seen with clang 7 in Ubuntu 16.04.
https://travis-ci.com/david-marchand/dpdk/jobs/276790838

>

> Aaron, please let me know if you want to upgrade Travis CI?


Do you mean upgrading Travis to hide this issue?


-- 
David Marchand
Honnappa Nagarahalli Jan. 17, 2020, 4:32 p.m. UTC | #7
<snip>

> On Fri, Jan 17, 2020 at 3:29 PM Honnappa Nagarahalli

> <Honnappa.Nagarahalli@arm.com> wrote:

> > > - Pushed the series

> > > https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a

> > > branch of mine for checks.

> > > Travis reports:

> > > "

> > > [2155/2156] Compiling C object 'app/te...st@@dpdk-

> > > test@exe/test_ring_perf.c.o'.

> > >

> > > No output has been received in the last 10m0s, this potentially

> > > indicates a stalled build or something wrong with the build itself.

> > >

> > > Check the details on how to adjust your build configuration on:

> > > https://docs.travis-ci.com/user/common-build-problems/#build-times-o

> > > ut-

> > > because-no-output-was-received

> > >

> > > The build has been terminated

> > > "

> > >

> > > I see you discussed this already with Aaron, did I miss something?

> > Aaron has tested it with clang-8 and said it does not show the issue. I am not

> sure if he tested this on Travis CI.

> > I have tested it on clang-9 and it does not show any issues. I have modified

> the test cases to take much lesser time for Clang-7.

> 

> The problem is seen with clang 7 in Ubuntu 16.04.

> https://travis-ci.com/david-marchand/dpdk/jobs/276790838

Agree. I have split the test cases into multiple functions to address clang 7 compile time. But it does not suffice for Travis CI.

> 

> >

> > Aaron, please let me know if you want to upgrade Travis CI?

> 

> Do you mean upgrading Travis to hide this issue?

Yes, upgrading to use clang-8 or clang-9.

> 

> 

> --

> David Marchand
Mattias Rönnblom Jan. 17, 2020, 4:37 p.m. UTC | #8
On 2020-01-17 14:34, Jerin Jacob wrote:
> On Fri, Jan 17, 2020 at 5:44 PM David Marchand

> <david.marchand@redhat.com> wrote:

>> On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli

>> <Honnappa.Nagarahalli@arm.com> wrote:

>>> I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?

>> - Pushed the series

>> https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch

>> of mine for checks.

>> Travis reports:

>> "

>> [2155/2156] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.

>>

>> No output has been received in the last 10m0s, this potentially

>> indicates a stalled build or something wrong with the build itself.

>>

>> Check the details on how to adjust your build configuration on:

>> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

>>

>> The build has been terminated

>> "

>>

>> I see you discussed this already with Aaron, did I miss something?

>>

>>

>> - Besides, I have no ack/review from the hash and eventdev maintainers.

> + Bruce, Harry, Mattias

>

> Even though event ring added in the eventdev common code, it's been

> used only by SW eventdev drivers.

> So adding evendev ring author and SW driver maintainers for review.


I endorse this change, although I hadn't have time to review the code.

DSW throughput increases ~5% on x86_64 after applying this patchset, so 
the goal of maintaining legacy performance seems to have been met (and 
exceeded).

I will look into using these new custom-sized rings for the DSW control 
rings (which uses regular DPDK void-pointer rings today).
Olivier Matz Jan. 17, 2020, 5:15 p.m. UTC | #9
On Wed, Jan 15, 2020 at 11:25:05PM -0600, Honnappa Nagarahalli wrote:
> The current rte_ring hard-codes the type of the ring element to 'void *',

> hence the size of the element is hard-coded to 32b/64b. Since the ring

> element type is not an input to rte_ring APIs, it results in couple

> of issues:

> 

> 1) If an application requires to store an element which is not 64b, it

>    needs to write its own ring APIs similar to rte_event_ring APIs. This

>    creates additional burden on the programmers, who end up making

>    work-arounds and often waste memory.

> 2) If there are multiple libraries that store elements of the same

>    type, currently they would have to write their own rte_ring APIs. This

>    results in code duplication.

> 

> This patch adds new APIs to support configurable ring element size.

> The APIs support custom element sizes by allowing to define the ring

> element to be a multiple of 32b.

> 

> The aim is to achieve same performance as the existing ring

> implementation.

> 


This is a nice improvement to the ring library, thanks!

It looks globally good to me, few comments as reply to individual
patches.


Olivier