diff mbox series

tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

Message ID 20240229140000.175274-1-alessandro.carminati@gmail.com
State New
Headers show
Series tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind | expand

Commit Message

Alessandro Carminati (Red Hat) Feb. 29, 2024, 2 p.m. UTC
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
   # ip gre none gso
   # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
   # test basic connectivity
   # Ncat: Connection refused.

Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com>
---
 tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Comments

Martin KaFai Lau March 8, 2024, 1:03 a.m. UTC | #1
On 2/29/24 6:00 AM, Alessandro Carminati (Red Hat) wrote:
> In some systems, the netcat server can incur in delay to start listening.
> When this happens, the test can randomly fail in various points.
> This is an example error message:
>     # ip gre none gso
>     # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
>     # test basic connectivity
>     # Ncat: Connection refused.

This explained what is the issue. Please also explain how the patch solves it.

> 
> Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com>
> ---
>   tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> index 910044f08908..01c0f4b1a8c2 100755
> --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
> +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> @@ -72,7 +72,6 @@ cleanup() {
>   server_listen() {
>   	ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
>   	server_pid=$!
> -	sleep 0.2
>   }
>   
>   client_connect() {
> @@ -93,6 +92,22 @@ verify_data() {
>   	fi
>   }
>   
> +wait_for_port() {
> +	local digits=8
> +	local port2check=$(printf ":%04X" $1)
> +	local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
> +
> +	for i in $(seq 20); do
> +		if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
> +			sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
> +			grep -q "${port2check}"; then

The idea is to check if there is socket listening on port 8888?

May be something simpler like "ss -OHtl src :$1" instead?

--
pw-bot: cr

The check-and-wait fix in this patch is fine to get your test environment going.

Eventually, it will be good to see the test_tc_tunnel.sh test moved to 
test_progs. The test_tc_tunnel.sh is not run by bpf CI and issue like this got 
unnoticed. Some other "*.sh" tests have already been moved to test_progs.
Alessandro Carminati (Red Hat) March 10, 2024, 9:45 a.m. UTC | #2
Hi Martin,
Thanks for the review.

Il giorno ven 8 mar 2024 alle ore 02:03 Martin KaFai Lau
<martin.lau@linux.dev> ha scritto:
>
> On 2/29/24 6:00 AM, Alessandro Carminati (Red Hat) wrote:
> > In some systems, the netcat server can incur in delay to start listening.
> > When this happens, the test can randomly fail in various points.
> > This is an example error message:
> >     # ip gre none gso
> >     # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
> >     # test basic connectivity
> >     # Ncat: Connection refused.
>
> This explained what is the issue. Please also explain how the patch solves it.
>
The issue, as stated, depends on a race condition between the netcat client
and server. The test author addressed this problem using a sleep, which I
removed in this patch. To easily solve the issue, one could simply increase
the sleep duration. However, this patch opts to tackle the problem by
querying the /proc directory and verifying TCP binds at the specified port
before letting the client connect.
> >
> > Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com>
> > ---
> >   tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
> >   1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> > index 910044f08908..01c0f4b1a8c2 100755
> > --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
> > +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> > @@ -72,7 +72,6 @@ cleanup() {
> >   server_listen() {
> >       ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
> >       server_pid=$!
> > -     sleep 0.2
> >   }
> >
> >   client_connect() {
> > @@ -93,6 +92,22 @@ verify_data() {
> >       fi
> >   }
> >
> > +wait_for_port() {
> > +     local digits=8
> > +     local port2check=$(printf ":%04X" $1)
> > +     local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
> > +
> > +     for i in $(seq 20); do
> > +             if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
> > +                     sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
> > +                     grep -q "${port2check}"; then
>
> The idea is to check if there is socket listening on port 8888?
>
> May be something simpler like "ss -OHtl src :$1" instead?
Indeed, the aim is to ensure that the server is bound before the
client attempts to
connect by checking if socket is listening, and yes using 'ss' would be shorter.
However, I chose not to use 'ss' or 'netstat' to avoid adding new dependencies,
considering they are already many.
Nonetheless, 'ss' is preferred, I have no objections.
>
> --
> pw-bot: cr
>
> The check-and-wait fix in this patch is fine to get your test environment going.
>
> Eventually, it will be good to see the test_tc_tunnel.sh test moved to
> test_progs. The test_tc_tunnel.sh is not run by bpf CI and issue like this got
> unnoticed. Some other "*.sh" tests have already been moved to test_progs.
>
>
Regards
Alessandro
Martin KaFai Lau March 13, 2024, 11:44 p.m. UTC | #3
On 3/10/24 1:45 AM, Alessandro Carminati wrote:
> Hi Martin,
> Thanks for the review.
> 
> Il giorno ven 8 mar 2024 alle ore 02:03 Martin KaFai Lau
> <martin.lau@linux.dev> ha scritto:
>>
>> On 2/29/24 6:00 AM, Alessandro Carminati (Red Hat) wrote:
>>> In some systems, the netcat server can incur in delay to start listening.
>>> When this happens, the test can randomly fail in various points.
>>> This is an example error message:
>>>      # ip gre none gso
>>>      # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
>>>      # test basic connectivity
>>>      # Ncat: Connection refused.
>>
>> This explained what is the issue. Please also explain how the patch solves it.
>>
> The issue, as stated, depends on a race condition between the netcat client
> and server. The test author addressed this problem using a sleep, which I
> removed in this patch. To easily solve the issue, one could simply increase
> the sleep duration. However, this patch opts to tackle the problem by
> querying the /proc directory and verifying TCP binds at the specified port
> before letting the client connect.

Please include this in the commit message.

>>>
>>> Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com>
>>> ---
>>>    tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
>>>    1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
>>> index 910044f08908..01c0f4b1a8c2 100755
>>> --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
>>> +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
>>> @@ -72,7 +72,6 @@ cleanup() {
>>>    server_listen() {
>>>        ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
>>>        server_pid=$!
>>> -     sleep 0.2
>>>    }
>>>
>>>    client_connect() {
>>> @@ -93,6 +92,22 @@ verify_data() {
>>>        fi
>>>    }
>>>
>>> +wait_for_port() {
>>> +     local digits=8
>>> +     local port2check=$(printf ":%04X" $1)
>>> +     local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
>>> +
>>> +     for i in $(seq 20); do
>>> +             if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
>>> +                     sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
>>> +                     grep -q "${port2check}"; then
>>
>> The idea is to check if there is socket listening on port 8888?
>>
>> May be something simpler like "ss -OHtl src :$1" instead?
> Indeed, the aim is to ensure that the server is bound before the
> client attempts to
> connect by checking if socket is listening, and yes using 'ss' would be shorter.
> However, I chose not to use 'ss' or 'netstat' to avoid adding new dependencies,
> considering they are already many.

ss should be in the same iproute package that "(ip) netns..." lives in also. The 
above changes added sed and grep.

Regardless, this external dependency will be all gone once moved to the 
selftests/test_progs. The check-and-wait will be gone by creating a listen fd 
instead of using "nc -l...". A simpler change such that people doing the future 
test_progs migration will have an easier time.

>> The check-and-wait fix in this patch is fine to get your test environment going.
>>
>> Eventually, it will be good to see the test_tc_tunnel.sh test moved to
>> test_progs. The test_tc_tunnel.sh is not run by bpf CI and issue like this got
>> unnoticed. Some other "*.sh" tests have already been moved to test_progs.
.>>
>>
> Regards
> Alessandro
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..01c0f4b1a8c2 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@  cleanup() {
 server_listen() {
 	ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
 	server_pid=$!
-	sleep 0.2
 }
 
 client_connect() {
@@ -93,6 +92,22 @@  verify_data() {
 	fi
 }
 
+wait_for_port() {
+	local digits=8
+	local port2check=$(printf ":%04X" $1)
+	local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
+
+	for i in $(seq 20); do
+		if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
+			sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
+			grep -q "${port2check}"; then
+			return 0
+		fi
+		sleep 0.1
+	done
+	return 1
+}
+
 set -e
 
 # no arguments: automated test, run all
@@ -193,6 +208,7 @@  setup
 # basic communication works
 echo "test basic connectivity"
 server_listen
+wait_for_port ${port} ${netcat_opt}
 client_connect
 verify_data
 
@@ -204,6 +220,7 @@  ip netns exec "${ns1}" tc filter add dev veth1 egress \
 	section "encap_${tuntype}_${mac}"
 echo "test bpf encap without decap (expect failure)"
 server_listen
+wait_for_port ${port} ${netcat_opt}
 ! client_connect
 
 if [[ "$tuntype" =~ "udp" ]]; then