mbox series

[v8,0/2] selftests/resctrl: SNC kernel support discovery

Message ID cover.1734361935.git.maciej.wieczor-retman@intel.com
Headers show
Series selftests/resctrl: SNC kernel support discovery | expand

Message

Maciej Wieczor-Retman Dec. 16, 2024, 3:18 p.m. UTC
Changes v8:
- Fix Makefile changes.
- Update cover letter SNC status information.
- Add Reinette's reviewed by tag to patch 2/2.

Changes v7:
- Include fallthrough in resctrlfs.c.
- Check fp after opening empty cpus file.
- Correct a comment and merge strings in snprintf().

Changes v6:
- Rebase onto latest kselftest-next.
- Looking at the two patches with a fresh eye decided to make a split
  along the lines of:
	- Patch 1/2 contains all of the code that relates to SNC mode
	  detection and checking that detection's reliability.
	- Patch 2/2 contains checking kernel support for SNC and
	  modifying the messages at the end of affected tests.

Changes v5:
- Tests are skipped if snc_unreliable was set.
- Moved resctrlfs.c changes from patch 2/2 to 1/2.
- Removed CAT changes since it's not impacted by SNC in the selftest.
- Updated various comments.
- Fixed a bunch of minor issues pointed out in the review.

Changes v4:
- Printing SNC warnings at the start of every test.
- Printing SNC warnings at the end of every relevant test.
- Remove global snc_mode variable, consolidate snc detection functions
  into one.
- Correct minor mistakes.

Changes v3:
- Reworked patch 2.
- Changed minor things in patch 1 like function name and made
  corrections to the patch message.

Changes v2:
- Removed patches 2 and 3 since now this part will be supported by the
  kernel.

Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.

SNC support was merged into the kernel [1]. With SNC enabled
and kernel support in place all the tests will function normally (aside
from effective cache size). There might be a problem when SNC is enabled
but the system is still using an older kernel version without SNC
support. Currently the only message displayed in that situation is a
guess that SNC might be enabled and is causing issues. That message also
is displayed whenever the test fails on an Intel platform.

Add a mechanism to discover kernel support for SNC which will add more
meaning and certainty to the error message.

Add runtime SNC mode detection and verify how reliable that information
is.

Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.

Series applies cleanly on kselftest/next.

[1] https://lore.kernel.org/all/20240716065458.GAZpYZQhh0PBItpD1k@fat_crate.local/

Previous versions of this series:
[v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@intel.com/
[v2] https://lore.kernel.org/all/cover.1715769576.git.maciej.wieczor-retman@intel.com/
[v3] https://lore.kernel.org/all/cover.1719842207.git.maciej.wieczor-retman@intel.com/
[v4] https://lore.kernel.org/all/cover.1720774981.git.maciej.wieczor-retman@intel.com/
[v5] https://lore.kernel.org/all/cover.1730206468.git.maciej.wieczor-retman@intel.com/
[v6] https://lore.kernel.org/all/cover.1733136454.git.maciej.wieczor-retman@intel.com/
[v7] https://lore.kernel.org/all/cover.1733741950.git.maciej.wieczor-retman@intel.com/

Maciej Wieczor-Retman (2):
  selftests/resctrl: Adjust effective L3 cache size with SNC enabled
  selftests/resctrl: Discover SNC kernel support and adjust messages

 tools/testing/selftests/resctrl/Makefile      |   1 +
 tools/testing/selftests/resctrl/cmt_test.c    |   4 +-
 tools/testing/selftests/resctrl/mba_test.c    |   2 +
 tools/testing/selftests/resctrl/mbm_test.c    |   4 +-
 tools/testing/selftests/resctrl/resctrl.h     |   6 +
 .../testing/selftests/resctrl/resctrl_tests.c |   9 +-
 tools/testing/selftests/resctrl/resctrlfs.c   | 137 ++++++++++++++++++
 7 files changed, 158 insertions(+), 5 deletions(-)

Comments

Reinette Chatre Dec. 16, 2024, 9:43 p.m. UTC | #1
Hi Maciej,

On 12/16/24 7:18 AM, Maciej Wieczor-Retman wrote:
> Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA
> nodes. Systems may support splitting into either two, three, four or six
> nodes. When SNC mode is enabled the effective amount of L3 cache
> available for allocation is divided by the number of nodes per L3.
> 
> It's possible to detect which SNC mode is active by comparing the number
> of CPUs that share a cache with CPU0, with the number of CPUs on node0.
> 
> Detect SNC mode once and let other tests inherit that information.
> 
> Update CFLAGS after including lib.mk in the Makefile so that fallthrough
> macro can be used.
> 
> To check if SNC detection is reliable one can check the
> /sys/devices/system/cpu/offline file. If it's empty, it means all cores
> are operational and the ratio should be calculated correctly. If it has
> any contents, it means the detected SNC mode can't be trusted and should
> be disabled.
> 
> Check if detection was not reliable due to offline cpus. If it was skip
> running tests since the results couldn't be trusted.
> 
> Co-developed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
> ---

Thank you very much.

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette
Reinette Chatre Dec. 19, 2024, 6:35 p.m. UTC | #2
Hi Shuah,

On 12/16/24 7:18 AM, Maciej Wieczor-Retman wrote:
> 
> Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
> into multiple NUMA nodes. When enabled, NUMA-aware applications can
> achieve better performance on bigger server platforms.
> 
> SNC support was merged into the kernel [1]. With SNC enabled
> and kernel support in place all the tests will function normally (aside
> from effective cache size). There might be a problem when SNC is enabled
> but the system is still using an older kernel version without SNC
> support. Currently the only message displayed in that situation is a
> guess that SNC might be enabled and is causing issues. That message also
> is displayed whenever the test fails on an Intel platform.
> 
> Add a mechanism to discover kernel support for SNC which will add more
> meaning and certainty to the error message.
> 
> Add runtime SNC mode detection and verify how reliable that information
> is.
> 
> Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
> and SNC-4. The tests were also ran with and without kernel support for
> SNC.
> 
> Series applies cleanly on kselftest/next.
> 

Could you please consider this series for inclusion?

Thank you very much.

Reinette