Message ID | 20210512144748.600206118@linuxfoundation.org |
---|---|
State | Superseded |
Headers | show |
Series | None | expand |
On Wed, Jun 16, 2021 at 10:02:44AM +0300, Amit Klein wrote: > Hi Greg et al. > > I see that you backported this patch (increasing the IP ID hash table size) > to the newer LTS branches more than a month ago. But I don't see that it > was backported to older LTS branches (4.19, 4.14, 4.9, 4.4). Is this > intentional? It applies cleanly to 4.19, but not the older ones. If you think it is needed there for those kernels, please provide a working backport that we can apply. thanks, greg k-h
Here is the patch (minus headers, description, etc. - I believe these can be copied as is from the 5.x patch, but not sure about the rules...). It can be applied to 4.14.236. If this is OK, I can move on to 4.9 and 4.4. net/ipv4/route.c | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 78d6bc61a1d8..022a2b748da3 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -70,6 +70,7 @@ #include <linux/types.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/memblock.h> #include <linux/string.h> #include <linux/socket.h> #include <linux/sockios.h> @@ -485,8 +486,10 @@ static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr) __ipv4_confirm_neigh(dev, *(__force u32 *)pkey); } -#define IP_IDENTS_SZ 2048u - +/* Hash tables of size 2048..262144 depending on RAM size. + * Each bucket uses 8 bytes. + */ +static u32 ip_idents_mask __read_mostly; static atomic_t *ip_idents __read_mostly; static u32 *ip_tstamps __read_mostly; @@ -496,12 +499,16 @@ static u32 *ip_tstamps __read_mostly; */ u32 ip_idents_reserve(u32 hash, int segs) { - u32 *p_tstamp = ip_tstamps + hash % IP_IDENTS_SZ; - atomic_t *p_id = ip_idents + hash % IP_IDENTS_SZ; - u32 old = ACCESS_ONCE(*p_tstamp); - u32 now = (u32)jiffies; + u32 bucket, old, now = (u32)jiffies; + atomic_t *p_id; + u32 *p_tstamp; u32 delta = 0; + bucket = hash & ip_idents_mask; + p_tstamp = ip_tstamps + bucket; + p_id = ip_idents + bucket; + old = READ_ONCE(*p_tstamp); + if (old != now && cmpxchg(p_tstamp, old, now) == old) delta = prandom_u32_max(now - old); @@ -3099,17 +3106,25 @@ struct ip_rt_acct __percpu *ip_rt_acct __read_mostly; int __init ip_rt_init(void) { int rc = 0; + void *idents_hash; int cpu; - ip_idents = kmalloc(IP_IDENTS_SZ * sizeof(*ip_idents), GFP_KERNEL); - if (!ip_idents) - panic("IP: failed to allocate ip_idents\n"); + /* For modern hosts, this will use 2 MB of memory */ + idents_hash = alloc_large_system_hash("IP idents", + sizeof(*ip_idents) + sizeof(*ip_tstamps), + 0, + 16, /* one bucket per 64 KB */ + HASH_ZERO, + NULL, + &ip_idents_mask, + 2048, + 256*1024); + + ip_idents = idents_hash; - prandom_bytes(ip_idents, IP_IDENTS_SZ * sizeof(*ip_idents)); + prandom_bytes(ip_idents, (ip_idents_mask + 1) * sizeof(*ip_idents)); - ip_tstamps = kcalloc(IP_IDENTS_SZ, sizeof(*ip_tstamps), GFP_KERNEL); - if (!ip_tstamps) - panic("IP: failed to allocate ip_tstamps\n"); + ip_tstamps = idents_hash + (ip_idents_mask + 1) * sizeof(*ip_idents); for_each_possible_cpu(cpu) { struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu); On Wed, Jun 16, 2021 at 10:16 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Wed, Jun 16, 2021 at 10:02:44AM +0300, Amit Klein wrote: > > Hi Greg et al. > > > > I see that you backported this patch (increasing the IP ID hash table size) > > to the newer LTS branches more than a month ago. But I don't see that it > > was backported to older LTS branches (4.19, 4.14, 4.9, 4.4). Is this > > intentional? > > It applies cleanly to 4.19, but not the older ones. If you think it is > needed there for those kernels, please provide a working backport that > we can apply. > > thanks, > > greg k-h
On Wed, Jun 16, 2021 at 12:16:52PM +0300, Amit Klein wrote: > Here is the patch (minus headers, description, etc. - I believe these > can be copied as is from the 5.x patch, but not sure about the > rules...). It can be applied to 4.14.236. If this is OK, I can move on > to 4.9 and 4.4. I do not know "the patch" is here. Please resend it in a format that I can apply it in. Look at the stable list archives for lots of examples of backported patches, they need to have the original changelog description as well as the git commit id of the patch you are backporting, which I do not see here at all. thanks, greg k-h
On Wed, Jun 16, 2021 at 11:32:41AM +0200, Greg Kroah-Hartman wrote: > On Wed, Jun 16, 2021 at 12:16:52PM +0300, Amit Klein wrote: > > Here is the patch (minus headers, description, etc. - I believe these > > can be copied as is from the 5.x patch, but not sure about the > > rules...). It can be applied to 4.14.236. If this is OK, I can move on > > to 4.9 and 4.4. > > I do not know "the patch" is here. > > Please resend it in a format that I can apply it in. Look at the stable > list archives for lots of examples of backported patches, they need to > have the original changelog description as well as the git commit id of > the patch you are backporting, which I do not see here at all. Also, did you test this on 4.14.y and prove that it is both needed, and works properly? thanks, greg k-h
From: Amit Klein > Sent: 16 June 2021 10:17 ... > -#define IP_IDENTS_SZ 2048u > - > +/* Hash tables of size 2048..262144 depending on RAM size. > + * Each bucket uses 8 bytes. > + */ > +static u32 ip_idents_mask __read_mostly; ... > + /* For modern hosts, this will use 2 MB of memory */ > + idents_hash = alloc_large_system_hash("IP idents", > + sizeof(*ip_idents) + sizeof(*ip_tstamps), > + 0, > + 16, /* one bucket per 64 KB */ > + HASH_ZERO, > + NULL, > + &ip_idents_mask, > + 2048, > + 256*1024); > + Can someone explain why this is a good idea for a 'normal' system? Why should my desktop system 'waste' 2MB of memory on a massive hash table that I don't need. It might be needed by systems than handle massive numbers of concurrent connections - but that isn't 'most systems'. Surely it would be better to detect when the number of entries is comparable to the table size and then resize the table. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Jun 16, 2021 at 10:19:15AM +0000, David Laight wrote: > From: Amit Klein > > Sent: 16 June 2021 10:17 > ... > > -#define IP_IDENTS_SZ 2048u > > - > > +/* Hash tables of size 2048..262144 depending on RAM size. > > + * Each bucket uses 8 bytes. > > + */ > > +static u32 ip_idents_mask __read_mostly; > ... > > + /* For modern hosts, this will use 2 MB of memory */ > > + idents_hash = alloc_large_system_hash("IP idents", > > + sizeof(*ip_idents) + sizeof(*ip_tstamps), > > + 0, > > + 16, /* one bucket per 64 KB */ > > + HASH_ZERO, > > + NULL, > > + &ip_idents_mask, > > + 2048, > > + 256*1024); > > + > > Can someone explain why this is a good idea for a 'normal' system? > > Why should my desktop system 'waste' 2MB of memory on a massive > hash table that I don't need. > It might be needed by systems than handle massive numbers > of concurrent connections - but that isn't 'most systems'. > > Surely it would be better to detect when the number of entries > is comparable to the table size and then resize the table. Patches always gladly accepted.
On Wed, Jun 16, 2021 at 12:19 PM David Laight <David.Laight@aculab.com> wrote: > > From: Amit Klein > > Sent: 16 June 2021 10:17 > ... > > -#define IP_IDENTS_SZ 2048u > > - > > +/* Hash tables of size 2048..262144 depending on RAM size. > > + * Each bucket uses 8 bytes. > > + */ > > +static u32 ip_idents_mask __read_mostly; > ... > > + /* For modern hosts, this will use 2 MB of memory */ > > + idents_hash = alloc_large_system_hash("IP idents", > > + sizeof(*ip_idents) + sizeof(*ip_tstamps), > > + 0, > > + 16, /* one bucket per 64 KB */ > > + HASH_ZERO, > > + NULL, > > + &ip_idents_mask, > > + 2048, > > + 256*1024); > > + > > Can someone explain why this is a good idea for a 'normal' system? > > Why should my desktop system 'waste' 2MB of memory on a massive > hash table that I don't need. Only if your desktop has a lot of RAM. Otherwise the table will be smaller (like it was before this patch) > It might be needed by systems than handle massive numbers > of concurrent connections - but that isn't 'most systems'. > > Surely it would be better to detect when the number of entries > is comparable to the table size and then resize the table. Please send a patch, instead of always complaining about what others do. Security comes first. Then eventually we can ' optimize' . > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
On Wed, Jun 16, 2021 at 1:19 PM David Laight <David.Laight@aculab.com> wrote: > > Can someone explain why this is a good idea for a 'normal' system? > This patch mitigates some techniques that leak internal state due to table hash collisions. > Why should my desktop system 'waste' 2MB of memory on a massive > hash table that I don't need. In the patch's defense, it only consumes 2MB when the physical RAM is >= 16GB. > It might be needed by systems than handle massive numbers > of concurrent connections - but that isn't 'most systems'. > > Surely it would be better to detect when the number of entries > is comparable to the table size and then resize the table. Security-wise, this approach is not effective. The table size was increased to reduce the likelihood of hash collisions. These start happening when you have ~N^(1/2) elements (for table size N), so you'll need to resize pretty quickly anyway.
From: Amit Klein > Sent: 16 June 2021 14:20 > > On Wed, Jun 16, 2021 at 1:19 PM David Laight <David.Laight@aculab.com> wrote: > > > Can someone explain why this is a good idea for a 'normal' system? > > This patch mitigates some techniques that leak internal state due to > table hash collisions. As you say below it isn't really effective... > > Why should my desktop system 'waste' 2MB of memory on a massive > > hash table that I don't need. > > In the patch's defense, it only consumes 2MB when the physical RAM is >= 16GB. Right, but I've 16GB of memory because I run some very large applications that need all the memory they can get (and more). In any case, for many architectures 8GB is 'entry level'. That includes some or the 'small' ARM cpus. They are unlikely to have tens of connections, never mind thousands. > > It might be needed by systems than handle massive numbers > > of concurrent connections - but that isn't 'most systems'. > > > > Surely it would be better to detect when the number of entries > > is comparable to the table size and then resize the table. > > Security-wise, this approach is not effective. The table size was > increased to reduce the likelihood of hash collisions. These start > happening when you have ~N^(1/2) elements (for table size N), so > you'll need to resize pretty quickly anyway. You really have to live with some hash collisions. Resizing at N^(1/2) just doesn't scale. Not only that the hash function will stop generating non-colliding values once the table starts getting big. (A very expensive hash function might help - but i wouldn't count on it.) In any case, the chance of hitting a hash collision is slightly less than (table_size / active_entries). What you want to avoid is the ELF symbol hash when the average chain length is about 1.5 and you have to separately check the hash table of every shared library. That quickly balloons to a lot of string compares. (I looked at that for mozilla may years ago - was horrid.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Jun 16, 2021 at 01:42:17PM +0000, David Laight wrote: > > This patch mitigates some techniques that leak internal state due to > > table hash collisions. > > As you say below it isn't really effective... Actually it's Amit's attack which is really effective against anything that is not 100% random. And as you mentioned yourself in another thread, full randomness is not desirable as it can result in big trouble with ID collisions, especially when limited to 16 bits only. The only remaining factor to act on is thus the number of buckets to reduce the effectiveness of the attack. > > > Why should my desktop system 'waste' 2MB of memory on a massive > > > hash table that I don't need. > > > > In the patch's defense, it only consumes 2MB when the physical RAM is >= 16GB. > > Right, but I've 16GB of memory because I run some very large > applications that need all the memory they can get (and more). Then by all means if your system is so tightly sized, buy this extra 2 MB DRAM stick you're missing, and install it next to your existing 16GB one. > > Security-wise, this approach is not effective. The table size was > > increased to reduce the likelihood of hash collisions. These start > > happening when you have ~N^(1/2) elements (for table size N), so > > you'll need to resize pretty quickly anyway. > > You really have to live with some hash collisions. Absolutely, and that's the principle, that at some point it costs enough to the attacker for the attack to not be affordable or at least be easily detectable. All this for 1/8192 of your total RAM in the worst case, you will not notice more than when some fields are added to certain important structures and probably much less than when you simply upgrade whatever lib or binary. And probably that you forgot to reduce the amount of RAM dedicated to the video in your BIOS and can salvage no less than 32MB there! Willy
On Wed, Jun 16, 2021 at 12:16:52PM +0300, Amit Klein wrote: > Here is the patch (minus headers, description, etc. - I believe these > can be copied as is from the 5.x patch, but not sure about the > rules...). It can be applied to 4.14.236. If this is OK, I can move on > to 4.9 and 4.4. > > > net/ipv4/route.c | 41 ++++++++++++++++++++++++++++------------- > 1 file changed, 28 insertions(+), 13 deletions(-) > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 78d6bc61a1d8..022a2b748da3 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -70,6 +70,7 @@ > #include <linux/types.h> > #include <linux/kernel.h> > #include <linux/mm.h> > +#include <linux/memblock.h> > #include <linux/string.h> > #include <linux/socket.h> > #include <linux/sockios.h> > @@ -485,8 +486,10 @@ static void ipv4_confirm_neigh(const struct > dst_entry *dst, const void *daddr) > __ipv4_confirm_neigh(dev, *(__force u32 *)pkey); > } > > -#define IP_IDENTS_SZ 2048u > - > +/* Hash tables of size 2048..262144 depending on RAM size. > + * Each bucket uses 8 bytes. > + */ > +static u32 ip_idents_mask __read_mostly; > static atomic_t *ip_idents __read_mostly; > static u32 *ip_tstamps __read_mostly; > > @@ -496,12 +499,16 @@ static u32 *ip_tstamps __read_mostly; > */ > u32 ip_idents_reserve(u32 hash, int segs) > { > - u32 *p_tstamp = ip_tstamps + hash % IP_IDENTS_SZ; > - atomic_t *p_id = ip_idents + hash % IP_IDENTS_SZ; > - u32 old = ACCESS_ONCE(*p_tstamp); > - u32 now = (u32)jiffies; > + u32 bucket, old, now = (u32)jiffies; > + atomic_t *p_id; > + u32 *p_tstamp; Your patch is corrupted and couldn't be applied if I wanted to :(
On Wed, Jun 16, 2021 at 09:16:12AM +0200, Greg Kroah-Hartman wrote: > On Wed, Jun 16, 2021 at 10:02:44AM +0300, Amit Klein wrote: > > Hi Greg et al. > > > > I see that you backported this patch (increasing the IP ID hash table size) > > to the newer LTS branches more than a month ago. But I don't see that it > > was backported to older LTS branches (4.19, 4.14, 4.9, 4.4). Is this > > intentional? > > It applies cleanly to 4.19, but not the older ones. If you think it is > needed there for those kernels, please provide a working backport that > we can apply. It breaks the build on 4.19, which is why I didn't apply it there, so I would need a working version for that tree as well. thanks, greg k-h
I just sent a revised patch for 4.19. On Wed, Jun 16, 2021 at 5:57 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Wed, Jun 16, 2021 at 09:16:12AM +0200, Greg Kroah-Hartman wrote: > > On Wed, Jun 16, 2021 at 10:02:44AM +0300, Amit Klein wrote: > > > Hi Greg et al. > > > > > > I see that you backported this patch (increasing the IP ID hash table size) > > > to the newer LTS branches more than a month ago. But I don't see that it > > > was backported to older LTS branches (4.19, 4.14, 4.9, 4.4). Is this > > > intentional? > > > > It applies cleanly to 4.19, but not the older ones. If you think it is > > needed there for those kernels, please provide a working backport that > > we can apply. > > It breaks the build on 4.19, which is why I didn't apply it there, so I > would need a working version for that tree as well. > > thanks, > > greg k-h
I just submitted a revised patch for 4.14. Sorry for the earlier blunder. On Wed, Jun 16, 2021 at 5:56 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Wed, Jun 16, 2021 at 12:16:52PM +0300, Amit Klein wrote: > > Here is the patch (minus headers, description, etc. - I believe these > > can be copied as is from the 5.x patch, but not sure about the > > rules...). It can be applied to 4.14.236. If this is OK, I can move on > > to 4.9 and 4.4. > > > > > > net/ipv4/route.c | 41 ++++++++++++++++++++++++++++------------- > > 1 file changed, 28 insertions(+), 13 deletions(-) > > > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > > index 78d6bc61a1d8..022a2b748da3 100644 > > --- a/net/ipv4/route.c > > +++ b/net/ipv4/route.c > > @@ -70,6 +70,7 @@ > > #include <linux/types.h> > > #include <linux/kernel.h> > > #include <linux/mm.h> > > +#include <linux/memblock.h> > > #include <linux/string.h> > > #include <linux/socket.h> > > #include <linux/sockios.h> > > @@ -485,8 +486,10 @@ static void ipv4_confirm_neigh(const struct > > dst_entry *dst, const void *daddr) > > __ipv4_confirm_neigh(dev, *(__force u32 *)pkey); > > } > > > > -#define IP_IDENTS_SZ 2048u > > - > > +/* Hash tables of size 2048..262144 depending on RAM size. > > + * Each bucket uses 8 bytes. > > + */ > > +static u32 ip_idents_mask __read_mostly; > > static atomic_t *ip_idents __read_mostly; > > static u32 *ip_tstamps __read_mostly; > > > > @@ -496,12 +499,16 @@ static u32 *ip_tstamps __read_mostly; > > */ > > u32 ip_idents_reserve(u32 hash, int segs) > > { > > - u32 *p_tstamp = ip_tstamps + hash % IP_IDENTS_SZ; > > - atomic_t *p_id = ip_idents + hash % IP_IDENTS_SZ; > > - u32 old = ACCESS_ONCE(*p_tstamp); > > - u32 now = (u32)jiffies; > > + u32 bucket, old, now = (u32)jiffies; > > + atomic_t *p_id; > > + u32 *p_tstamp; > > Your patch is corrupted and couldn't be applied if I wanted to :( >
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 53c5cf5723aa..3ff702380b62 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -66,6 +66,7 @@ #include <linux/types.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/memblock.h> #include <linux/string.h> #include <linux/socket.h> #include <linux/sockios.h> @@ -476,8 +477,10 @@ static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr) __ipv4_confirm_neigh(dev, *(__force u32 *)pkey); } -#define IP_IDENTS_SZ 2048u - +/* Hash tables of size 2048..262144 depending on RAM size. + * Each bucket uses 8 bytes. + */ +static u32 ip_idents_mask __read_mostly; static atomic_t *ip_idents __read_mostly; static u32 *ip_tstamps __read_mostly; @@ -487,12 +490,16 @@ static u32 *ip_tstamps __read_mostly; */ u32 ip_idents_reserve(u32 hash, int segs) { - u32 *p_tstamp = ip_tstamps + hash % IP_IDENTS_SZ; - atomic_t *p_id = ip_idents + hash % IP_IDENTS_SZ; - u32 old = READ_ONCE(*p_tstamp); - u32 now = (u32)jiffies; + u32 bucket, old, now = (u32)jiffies; + atomic_t *p_id; + u32 *p_tstamp; u32 delta = 0; + bucket = hash & ip_idents_mask; + p_tstamp = ip_tstamps + bucket; + p_id = ip_idents + bucket; + old = READ_ONCE(*p_tstamp); + if (old != now && cmpxchg(p_tstamp, old, now) == old) delta = prandom_u32_max(now - old); @@ -3459,18 +3466,25 @@ struct ip_rt_acct __percpu *ip_rt_acct __read_mostly; int __init ip_rt_init(void) { + void *idents_hash; int cpu; - ip_idents = kmalloc_array(IP_IDENTS_SZ, sizeof(*ip_idents), - GFP_KERNEL); - if (!ip_idents) - panic("IP: failed to allocate ip_idents\n"); + /* For modern hosts, this will use 2 MB of memory */ + idents_hash = alloc_large_system_hash("IP idents", + sizeof(*ip_idents) + sizeof(*ip_tstamps), + 0, + 16, /* one bucket per 64 KB */ + HASH_ZERO, + NULL, + &ip_idents_mask, + 2048, + 256*1024); + + ip_idents = idents_hash; - prandom_bytes(ip_idents, IP_IDENTS_SZ * sizeof(*ip_idents)); + prandom_bytes(ip_idents, (ip_idents_mask + 1) * sizeof(*ip_idents)); - ip_tstamps = kcalloc(IP_IDENTS_SZ, sizeof(*ip_tstamps), GFP_KERNEL); - if (!ip_tstamps) - panic("IP: failed to allocate ip_tstamps\n"); + ip_tstamps = idents_hash + (ip_idents_mask + 1) * sizeof(*ip_idents); for_each_possible_cpu(cpu) { struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu);