Message ID | 20210202183051.21022-1-phil@nwl.cc |
---|---|
State | Superseded |
Headers | show |
Series | [iproute] tc: u32: Fix key folding in sample option | expand |
Hi Phil, I couldnt tell by inspection if what used to work before continues to. In particular the kernel version does consider the divisor when folding. Two examples that currently work, if you can try them: Most used scheme: --- tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ ht 2:: \ sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \ action ok ---- and this i also found in one of my scripts: ---- tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ ht 2:: \ sample u32 0x00000806 0x0000ffff at 12 \ match u32 0x00000800 0x0000ff00 at 12 flowid 1:10 \ action ok ---- Probably a simple meaning of "working" is: the values before and after (your changes) are consistent. If also you will do us a kindness and add maybe a testcase in tdc? This way next person wanting to fix it can run the tests first before posting a patch. cheers, jamal On 2021-02-02 1:30 p.m., Phil Sutter wrote: > In between Linux kernel 2.4 and 2.6, key folding for hash tables changed > in kernel space. When iproute2 dropped support for the older algorithm, > the wrong code was removed and kernel 2.4 folding method remained in > place. To get things functional for recent kernels again, restoring the > old code alone was not sufficient - additional byteorder fixes were > needed. > > While being at it, make use of ffs() and thereby align the code with how > kernel determines the shift width. > > Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.") > Signed-off-by: Phil Sutter <phil@nwl.cc> > --- > Initially I considered changing the kernel's key folding instead as the > old method didn't just ignore key bits beyond the first byte. Yet I am > not sure if this would cause problems with hardware offloading. And > given the fact that this simplified key folding is in place since the > dawn of 2.6, it is probably not such a big problem anyway. > --- > tc/f_u32.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/tc/f_u32.c b/tc/f_u32.c > index 2ed5254a40d5f..a5747f671e1ea 100644 > --- a/tc/f_u32.c > +++ b/tc/f_u32.c > @@ -978,6 +978,13 @@ show_k: > goto show_k; > } > > +static __u32 u32_hash_fold(struct tc_u32_key *key) > +{ > + __u8 fshift = key->mask ? ffs(ntohl(key->mask)) - 1 : 0; > + > + return ntohl(key->val & key->mask) >> fshift; > +} > + > static int u32_parse_opt(struct filter_util *qu, char *handle, > int argc, char **argv, struct nlmsghdr *n) > { > @@ -1110,9 +1117,7 @@ static int u32_parse_opt(struct filter_util *qu, char *handle, > } > NEXT_ARG(); > } > - hash = sel2.keys[0].val & sel2.keys[0].mask; > - hash ^= hash >> 16; > - hash ^= hash >> 8; > + hash = u32_hash_fold(&sel2.keys[0]); > htid = ((hash % divisor) << 12) | (htid & 0xFFF00000); > sample_ok = 1; > continue; >
Jamal, On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote: > I couldnt tell by inspection if what used to work before continues to. > In particular the kernel version does consider the divisor when folding. That's correct. And so does tc. What's the matter? > Two examples that currently work, if you can try them: Both lack information about the used hashkey and divisor. > Most used scheme: > --- > tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ > ht 2:: \ > sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \ > action ok > ---- htid before: 0x201000 htid after: 0x201000 > > and this i also found in one of my scripts: > ---- > tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ > ht 2:: \ > sample u32 0x00000806 0x0000ffff at 12 \ > match u32 0x00000800 0x0000ff00 at 12 flowid 1:10 \ > action ok > ---- htid before: 0x20e000 (0x8 ^ 0x6 = 0xe) htid after: 0x206000 Are you sure this still works with current kernel and iproute2 (excluding my patch)? What divisor and hashkey is used? > Probably a simple meaning of "working" is: > the values before and after (your changes) are consistent. > > If also you will do us a kindness and add maybe a testcase in tdc? > This way next person wanting to fix it can run the tests first before > posting a patch. What is "tdc"? Cheers, Phil
On 2021-02-04 9:04 a.m., Phil Sutter wrote: > Jamal, > > On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote: >> I couldnt tell by inspection if what used to work before continues to. >> In particular the kernel version does consider the divisor when folding. > > That's correct. And so does tc. What's the matter? > tc assumes 256 when undefined. Maybe man page needs to be updated to state we need divisor specified otherwise default is 256. >> Two examples that currently work, if you can try them: > > Both lack information about the used hashkey and divisor. > >> Most used scheme: >> --- >> tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ >> ht 2:: \ >> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \ >> action ok >> ---- > > htid before: 0x201000 > htid after: 0x201000 > Ok, this is the most common use-case. So we are good. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> cheers, jamal
sorry - meant to say, tdc is in kernel tree: tools/testing/selftests/tc-testing cheers, jamal
On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote: > On 2021-02-04 9:04 a.m., Phil Sutter wrote: > > Jamal, > > > > On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote: > >> I couldnt tell by inspection if what used to work before continues to. > >> In particular the kernel version does consider the divisor when folding. > > > > That's correct. And so does tc. What's the matter? > > > > tc assumes 256 when undefined. Maybe man page needs to be > updated to state we need divisor specified otherwise default > is 256. tc-u32.8 mentions the default in 'sample' option description. Specifying divisor is mandatory when creating a hash table, so that path is covered, too. I still don't get how this is related to my patch, though. > >> Two examples that currently work, if you can try them: > > > > Both lack information about the used hashkey and divisor. > > > >> Most used scheme: > >> --- > >> tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ > >> ht 2:: \ > >> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \ > >> action ok > >> ---- > > > > htid before: 0x201000 > > htid after: 0x201000 > > > > Ok, this is the most common use-case. So we are good. Whatever. Thanks, Phil
On 2021-02-04 9:50 a.m., Phil Sutter wrote: > On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote: >> On 2021-02-04 9:04 a.m., Phil Sutter wrote: >>> Jamal, >>> >>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote: >>>> I couldnt tell by inspection if what used to work before continues to. >>>> In particular the kernel version does consider the divisor when folding. >>> >>> That's correct. And so does tc. What's the matter? >>> >> >> tc assumes 256 when undefined. Maybe man page needs to be >> updated to state we need divisor specified otherwise default >> is 256. > > tc-u32.8 mentions the default in 'sample' option description. Specifying > divisor is mandatory when creating a hash table, so that path is > covered, too. I still don't get how this is related to my patch, though. > It is a general comment related to this code (that you are modifying). You mentioned divisor in your earlier email as part of the syntax for sample. So it made me wonder: Does the bucket placement assume a specific number of buckets in a table? Example if i had a hash table with 4 buckets, would the sample then pick the correct bucket? Would it be also correct for 32 buckets, etc. Or it didnt matter before and it doesnt matter now. >>>> Two examples that currently work, if you can try them: >>> >>> Both lack information about the used hashkey and divisor. >>> >>>> Most used scheme: >>>> --- >>>> tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ >>>> ht 2:: \ >>>> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \ >>>> action ok >>>> ---- >>> >>> htid before: 0x201000 >>> htid after: 0x201000 >>> >> >> Ok, this is the most common use-case. So we are good. > > Whatever. Meaning? cheers, jamal
On Thu, Feb 04, 2021 at 10:28:26AM -0500, Jamal Hadi Salim wrote: > On 2021-02-04 9:50 a.m., Phil Sutter wrote: > > On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote: > >> On 2021-02-04 9:04 a.m., Phil Sutter wrote: > >>> Jamal, > >>> > >>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote: > >>>> I couldnt tell by inspection if what used to work before continues to. > >>>> In particular the kernel version does consider the divisor when folding. > >>> > >>> That's correct. And so does tc. What's the matter? > >>> > >> > >> tc assumes 256 when undefined. Maybe man page needs to be > >> updated to state we need divisor specified otherwise default > >> is 256. > > > > tc-u32.8 mentions the default in 'sample' option description. Specifying > > divisor is mandatory when creating a hash table, so that path is > > covered, too. I still don't get how this is related to my patch, though. > > > > It is a general comment related to this code (that you are modifying). > You mentioned divisor in your earlier email as part of the syntax for > sample. So it made me wonder: > Does the bucket placement assume a specific number of buckets in a > table? Example if i had a hash table with 4 buckets, would the sample > then pick the correct bucket? Would it be also correct for 32 buckets, > etc. Or it didnt matter before and it doesnt matter now. My patch doesn't change how divisor is applied. And yes, with a smaller than 256 buckets hash table, specifying the divisor along with sample is necessary. > >>>> Two examples that currently work, if you can try them: > >>> > >>> Both lack information about the used hashkey and divisor. > >>> > >>>> Most used scheme: > >>>> --- > >>>> tc filter add dev $DEV parent 999:0 protocol ip prio 10 u32 \ > >>>> ht 2:: \ > >>>> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \ > >>>> action ok > >>>> ---- > >>> > >>> htid before: 0x201000 > >>> htid after: 0x201000 > >>> > >> > >> Ok, this is the most common use-case. So we are good. > > > > Whatever. > > Meaning? Things are always fine if we only consider the positive results. Cheers, Phil
On 2021-02-04 11:50 a.m., Phil Sutter wrote: > On Thu, Feb 04, 2021 at 10:28:26AM -0500, Jamal Hadi Salim wrote: >> On 2021-02-04 9:50 a.m., Phil Sutter wrote: >>> On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote: >>>> On 2021-02-04 9:04 a.m., Phil Sutter wrote: >>>>> Jamal, >>>>> >>>>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote: >>>>>> I couldnt tell by inspection if what used to work before continues to. >>>>>> In particular the kernel version does consider the divisor when folding. >>>>> >>>>> That's correct. And so does tc. What's the matter? >>>>> >>>> >>>> tc assumes 256 when undefined. Maybe man page needs to be >>>> updated to state we need divisor specified otherwise default >>>> is 256. >>> >>> tc-u32.8 mentions the default in 'sample' option description. Specifying >>> divisor is mandatory when creating a hash table, so that path is >>> covered, too. I still don't get how this is related to my patch, though. >>> >> >> It is a general comment related to this code (that you are modifying). >> You mentioned divisor in your earlier email as part of the syntax for >> sample. So it made me wonder: >> Does the bucket placement assume a specific number of buckets in a >> table? Example if i had a hash table with 4 buckets, would the sample >> then pick the correct bucket? Would it be also correct for 32 buckets, >> etc. Or it didnt matter before and it doesnt matter now. > > My patch doesn't change how divisor is applied. And yes, with a smaller > than 256 buckets hash table, specifying the divisor along with sample is > necessary. > Sorry - hadnt looked at the man page earlier. I think the text added is sufficient. I am wondering why we made the divisor optional to begin with. cheers, jamal
Hi, On Tue, Feb 02, 2021 at 07:30:51PM +0100, Phil Sutter wrote: > In between Linux kernel 2.4 and 2.6, key folding for hash tables changed > in kernel space. When iproute2 dropped support for the older algorithm, > the wrong code was removed and kernel 2.4 folding method remained in > place. To get things functional for recent kernels again, restoring the > old code alone was not sufficient - additional byteorder fixes were > needed. > > While being at it, make use of ffs() and thereby align the code with how > kernel determines the shift width. > > Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.") > Signed-off-by: Phil Sutter <phil@nwl.cc> Seems this patch fell off the table? Or was there an objection I missed? FWIW, the related kernel selftests patch[1] which asserts this patch's change is upstream already. Cheers, Phil [1] 373e13bc63639 ("selftests: tc-testing: u32: Add tests covering sample option")
On 2021-07-05 10:17 a.m., Phil Sutter wrote: > Hi, > > On Tue, Feb 02, 2021 at 07:30:51PM +0100, Phil Sutter wrote: >> In between Linux kernel 2.4 and 2.6, key folding for hash tables changed >> in kernel space. When iproute2 dropped support for the older algorithm, >> the wrong code was removed and kernel 2.4 folding method remained in >> place. To get things functional for recent kernels again, restoring the >> old code alone was not sufficient - additional byteorder fixes were >> needed. >> >> While being at it, make use of ffs() and thereby align the code with how >> kernel determines the shift width. >> >> Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.") >> Signed-off-by: Phil Sutter <phil@nwl.cc> > > Seems this patch fell off the table? Or was there an objection I missed? > None i am aware of. I did ACK the patch. cheers, jamal
David, On Mon, Jul 05, 2021 at 01:25:21PM -0400, Jamal Hadi Salim wrote: > On 2021-07-05 10:17 a.m., Phil Sutter wrote: [...] > > Seems this patch fell off the table? Or was there an objection I missed? > > > > None i am aware of. I did ACK the patch. Could you please consider applying this patch? I am aware this is a fix and therefore shouldn't have to go via iproute2-next but it's almost a month since I pinged Stephen and didn't receive a reply. Thanks, Phil
diff --git a/tc/f_u32.c b/tc/f_u32.c index 2ed5254a40d5f..a5747f671e1ea 100644 --- a/tc/f_u32.c +++ b/tc/f_u32.c @@ -978,6 +978,13 @@ show_k: goto show_k; } +static __u32 u32_hash_fold(struct tc_u32_key *key) +{ + __u8 fshift = key->mask ? ffs(ntohl(key->mask)) - 1 : 0; + + return ntohl(key->val & key->mask) >> fshift; +} + static int u32_parse_opt(struct filter_util *qu, char *handle, int argc, char **argv, struct nlmsghdr *n) { @@ -1110,9 +1117,7 @@ static int u32_parse_opt(struct filter_util *qu, char *handle, } NEXT_ARG(); } - hash = sel2.keys[0].val & sel2.keys[0].mask; - hash ^= hash >> 16; - hash ^= hash >> 8; + hash = u32_hash_fold(&sel2.keys[0]); htid = ((hash % divisor) << 12) | (htid & 0xFFF00000); sample_ok = 1; continue;
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed in kernel space. When iproute2 dropped support for the older algorithm, the wrong code was removed and kernel 2.4 folding method remained in place. To get things functional for recent kernels again, restoring the old code alone was not sufficient - additional byteorder fixes were needed. While being at it, make use of ffs() and thereby align the code with how kernel determines the shift width. Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.") Signed-off-by: Phil Sutter <phil@nwl.cc> --- Initially I considered changing the kernel's key folding instead as the old method didn't just ignore key bits beyond the first byte. Yet I am not sure if this would cause problems with hardware offloading. And given the fact that this simplified key folding is in place since the dawn of 2.6, it is probably not such a big problem anyway. --- tc/f_u32.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)