Message ID | 20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org |
---|---|
State | New |
Headers | show |
Series | mptcp: more misc. fixes for v6.8 | expand |
On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote: > From: Paolo Abeni <pabeni@redhat.com> > > The mptcp diag interface already experienced a few locking bugs > that lockdep and appropriate coverage have detected in advance. > > Let's add a test-case triggering the relevant code path, to prevent > similar issues in the future. > > Be careful to cope with very slow environments. > > Note that we don't need an explicit timeout on the mptcp_connect > subprocess to cope with eventual bug/hang-up as the final cleanup > terminating the child processes will take care of that. Hi! There's a failure in CI under debug after merging net and net-next in diag.sh. Maybe because of the patch which lowered timeout? https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
Hi Jakub, On 01/03/2024 15:37, Jakub Kicinski wrote: > On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote: >> From: Paolo Abeni <pabeni@redhat.com> >> >> The mptcp diag interface already experienced a few locking bugs >> that lockdep and appropriate coverage have detected in advance. >> >> Let's add a test-case triggering the relevant code path, to prevent >> similar issues in the future. >> >> Be careful to cope with very slow environments. >> >> Note that we don't need an explicit timeout on the mptcp_connect >> subprocess to cope with eventual bug/hang-up as the final cleanup >> terminating the child processes will take care of that. > > Hi! > > There's a failure in CI under debug after merging net and net-next > in diag.sh. Maybe because of the patch which lowered timeout? > https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/ Thank you for this message! I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we do on top of the debug kconfig, but I see I can reproduce it on slower environments. Indeed, it looks like it can be caused by that modification. I will send a fix ASAP! Cheers, Matt
Hi Jakub, On 01/03/2024 16:10, Matthieu Baerts wrote: > On 01/03/2024 15:37, Jakub Kicinski wrote: >> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote: >>> From: Paolo Abeni <pabeni@redhat.com> >>> >>> The mptcp diag interface already experienced a few locking bugs >>> that lockdep and appropriate coverage have detected in advance. >>> >>> Let's add a test-case triggering the relevant code path, to prevent >>> similar issues in the future. >>> >>> Be careful to cope with very slow environments. >>> >>> Note that we don't need an explicit timeout on the mptcp_connect >>> subprocess to cope with eventual bug/hang-up as the final cleanup >>> terminating the child processes will take care of that. >> >> Hi! >> >> There's a failure in CI under debug after merging net and net-next >> in diag.sh. Maybe because of the patch which lowered timeout? >> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/ > > Thank you for this message! > > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we > do on top of the debug kconfig, but I see I can reproduce it on slower > environments. Indeed, it looks like it can be caused by that > modification. I will send a fix ASAP! The following patch fixes the issue on my side (when using 'renice 20' and stress-ng in parallel :) ): https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/ I queued it for -net, but the issue should only be visible in net-next due to the reduction of the timeout, as you suggested. This patch can also be applied in net-next if preferred, it will not cause any conflicts between net and net-next. Cheers, Matt
On Fri, 1 Mar 2024 19:24:53 +0100 Matthieu Baerts wrote: > > Thank you for this message! > > > > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we > > do on top of the debug kconfig, but I see I can reproduce it on slower > > environments. Indeed, it looks like it can be caused by that > > modification. I will send a fix ASAP! > > The following patch fixes the issue on my side (when using 'renice 20' > and stress-ng in parallel :) ): > > https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/ Nice! Note only a fix but also lowers runtime, if I'm reading right! > I queued it for -net, but the issue should only be visible in net-next > due to the reduction of the timeout, as you suggested. This patch can > also be applied in net-next if preferred, it will not cause any > conflicts between net and net-next. No strong preference, net sounds good. Thanks for a quick turnaround!
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh index 0a58ebb8b04c..f300f4e1eb59 100755 --- a/tools/testing/selftests/net/mptcp/diag.sh +++ b/tools/testing/selftests/net/mptcp/diag.sh @@ -20,7 +20,7 @@ flush_pids() ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null - for _ in $(seq 10); do + for _ in $(seq $((timeout_poll * 10))); do [ -z "$(ip netns pids "${ns}")" ] && break sleep 0.1 done @@ -91,6 +91,15 @@ chk_msk_nr() __chk_msk_nr "grep -c token:" "$@" } +chk_listener_nr() +{ + local expected=$1 + local msg="$2" + + __chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0 + __chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows" +} + wait_msk_nr() { local condition="grep -c token:" @@ -289,5 +298,24 @@ flush_pids chk_msk_inuse 0 "many->0" chk_msk_cestab 0 "many->0" +chk_listener_nr 0 "no listener sockets" +NR_SERVERS=100 +for I in $(seq 1 $NR_SERVERS); do + ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \ + -t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 & +done + +for I in $(seq 1 $NR_SERVERS); do + mptcp_lib_wait_local_port_listen $ns $((I + 20001)) +done + +chk_listener_nr $NR_SERVERS "many listener sockets" + +# graceful termination +for I in $(seq 1 $NR_SERVERS); do + echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 & +done +flush_pids + mptcp_lib_result_print_all_tap exit $ret