Message ID | 20221111040027.621646-1-yury.norov@gmail.com |
---|---|
Headers | show |
Series | cpumask: improve on cpumask_local_spread() locality | expand |
On Fri, Nov 11, 2022 at 01:42:29PM +0200, Andy Shevchenko wrote: > On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > > The function finds Nth set CPU in a given cpumask starting from a given > > node. > > > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > > same or greater number of CPUs than the previous one, we can use binary > > search on hops instead of linear walk, which makes the overall complexity > > of O(log n) in terms of number of cpumask_weight() calls. > > ... > > > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > > +{ > > + unsigned int first = 0, mid, last = sched_domains_numa_levels; > > + struct cpumask ***masks; > > *** ? > Hmm... Do we really need such deep indirection? It's 2d array of pointers, so - yes. > > + int w, ret = nr_cpu_ids; > > + > > + rcu_read_lock(); > > + masks = rcu_dereference(sched_domains_numa_masks); > > + if (!masks) > > + goto out; > > + > > + while (last >= first) { > > + mid = (last + first) / 2; > > + > > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > > + first = mid + 1; > > + continue; > > + } > > + > > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); > > See below. > > > + if (w <= cpu) > > + break; > > + > > + last = mid - 1; > > + } > > We have lib/bsearch.h. I haven't really looked deeply into the above, but my > gut feelings that that might be useful here. Can you check that? Yes we do. I tried it, and it didn't work because nodes arrays are allocated dynamically, and distance between different pairs of hops for a given node is not a constant, which is a requirement for bsearch. However, distance between hops pointers in 1st level array should be constant, and we can try feeding bsearch with it. I'll experiment with bsearch for more. > > + ret = (mid == 0) ? > > + cpumask_nth_and(cpu - w, cpus, masks[mid][node]) : > > + cpumask_nth_and_andnot(cpu - w, cpus, masks[mid][node], masks[mid - 1][node]); > > You can also shorten this by inversing the conditional: > > ret = mid ? ...not 0... : ...for 0...; Yep, why not. > > +out: > > out_unlock: ? Do you think it's better? > > + rcu_read_unlock(); > > + return ret; > > +} > > -- > With Best Regards, > Andy Shevchenko >
On Fri, Nov 11, 2022 at 09:07:17AM -0800, Yury Norov wrote: > On Fri, Nov 11, 2022 at 01:42:29PM +0200, Andy Shevchenko wrote: > > On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > > > + int w, ret = nr_cpu_ids; > > > + > > > + rcu_read_lock(); > > > + masks = rcu_dereference(sched_domains_numa_masks); > > > + if (!masks) > > > + goto out; > > > + > > > + while (last >= first) { > > > + mid = (last + first) / 2; > > > + > > > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > > > + first = mid + 1; > > > + continue; > > > + } > > > + > > > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); > > > > See below. > > > > > + if (w <= cpu) > > > + break; > > > + > > > + last = mid - 1; > > > + } > > > > We have lib/bsearch.h. I haven't really looked deeply into the above, but my > > gut feelings that that might be useful here. Can you check that? > > Yes we do. I tried it, and it didn't work because nodes arrays are > allocated dynamically, and distance between different pairs of hops > for a given node is not a constant, which is a requirement for > bsearch. > > However, distance between hops pointers in 1st level array should be > constant, and we can try feeding bsearch with it. I'll experiment with > bsearch for more. OK, I tried bsearch on array of hops, and it works. But it requires adding some black pointers magic. I'll send v2 based on bsearch soon. Thanks, Yury
On 11/11/2022 6:47 PM, Yury Norov wrote: > > > On Fri, Nov 11, 2022, 10:25 AM Jakub Kicinski <kuba@kernel.org > <mailto:kuba@kernel.org>> wrote: > > On Thu, 10 Nov 2022 20:00:23 -0800 Yury Norov wrote: > > cpumask_local_spread() currently checks local node for presence > of i'th > > CPU, and then if it finds nothing makes a flat search among all > non-local > > CPUs. We can do it better by checking CPUs per NUMA hops. > > Nice. > Thanks for your series. This improves them all, with no changes required to the network device drivers. > > This series is inspired by Valentin Schneider's "net/mlx5e: > Improve remote > > NUMA preferences used for the IRQ affinity hints" > > > > > https://patchwork.kernel.org/project/netdevbpf/patch/20220728191203.4055-3-tariqt@nvidia.com/ <https://patchwork.kernel.org/project/netdevbpf/patch/20220728191203.4055-3-tariqt@nvidia.com/> Find my very first version here, including the perf testing results: https://patchwork.kernel.org/project/netdevbpf/list/?series=660413&state=* > > > > According to Valentin's measurements, for mlx5e: > > > > Bottleneck in RX side is released, reached linerate (~1.8x > speedup). > > ~30% less cpu util on TX. > > > > This patch makes cpumask_local_spread() traversing CPUs based on NUMA > > distance, just as well, and I expect comparabale improvement for its > > users, as in Valentin's case. > > Right. > > I tested it on my VM with the following NUMA configuration: > > nit: the authorship is a bit more complicated, it'd be good to mention > Tariq. Both for the code and attribution of the testing / measurements. > > > Sure. Tariq and Valentine please send your tags as appropriate. > I wonder what fits best here? As the contribution is based upon previous work that I developed, then probably: Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Thanks, Tariq
On Sun, Nov 13, 2022 at 09:37:59AM +0200, Tariq Toukan wrote: > On 11/11/2022 6:47 PM, Yury Norov wrote: > > On Fri, Nov 11, 2022, 10:25 AM Jakub Kicinski <kuba@kernel.org > > <mailto:kuba@kernel.org>> wrote: > > On Thu, 10 Nov 2022 20:00:23 -0800 Yury Norov wrote: .. > > Sure. Tariq and Valentine please send your tags as appropriate. > > I wonder what fits best here? > > As the contribution is based upon previous work that I developed, then > probably: > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Then it probably means that one of you (Yury or you) should also have a Co-developed-by.