Message ID | 20201217162521.1134496-2-atenart@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | net-sysfs: fix race conditions in the xps code | expand |
On Thu, 17 Dec 2020 17:25:18 +0100 Antoine Tenart wrote: > Callers to netif_set_xps_queue should take the rtnl lock. Failing to do > so can lead to race conditions between netdev_set_num_tc and > netif_set_xps_queue, triggering various oops: > > - netif_set_xps_queue uses dev->tc_num as one of the parameters to > compute the size of new_dev_maps when allocating it. dev->tc_num is > also used to access the map, and the compiler may generate code to > retrieve this field multiple times in the function. > > - netdev_set_num_tc sets dev->tc_num. > > If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is > set to a higher value through netdev_set_num_tc, later accesses to > new_dev_maps in netif_set_xps_queue could lead to accessing memory > outside of new_dev_maps; triggering an oops. > > One way of triggering this is to set an iface up (for which the driver > uses netdev_set_num_tc in the open path, such as bnx2x) and writing to > xps_cpus in a concurrent thread. With the right timing an oops is > triggered. > > Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Let's CC Alex > Signed-off-by: Antoine Tenart <atenart@kernel.org> Two things: (a) is the datapath not exposed to a similar problem? __get_xps_queue_idx() uses dev->tc_num in a very similar fashion. Should we perhaps make the "num_tcs" part of the XPS maps which is under RCU protection rather than accessing the netdev copy? (b) if we always take rtnl_lock, why have xps_map_mutex? Can we rearrange things so that xps_map_mutex is sufficient? > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c > index 999b70c59761..7cc15dec1717 100644 > --- a/net/core/net-sysfs.c > +++ b/net/core/net-sysfs.c > @@ -1396,7 +1396,13 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue, > return err; > } > > + if (!rtnl_trylock()) { > + free_cpumask_var(mask); > + return restart_syscall(); > + } > + > err = netif_set_xps_queue(dev, mask, index); > + rtnl_unlock(); > > free_cpumask_var(mask); >
On Fri, Dec 18, 2020 at 4:30 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 17 Dec 2020 17:25:18 +0100 Antoine Tenart wrote: > > Callers to netif_set_xps_queue should take the rtnl lock. Failing to do > > so can lead to race conditions between netdev_set_num_tc and > > netif_set_xps_queue, triggering various oops: > > > > - netif_set_xps_queue uses dev->tc_num as one of the parameters to > > compute the size of new_dev_maps when allocating it. dev->tc_num is > > also used to access the map, and the compiler may generate code to > > retrieve this field multiple times in the function. > > > > - netdev_set_num_tc sets dev->tc_num. > > > > If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is > > set to a higher value through netdev_set_num_tc, later accesses to > > new_dev_maps in netif_set_xps_queue could lead to accessing memory > > outside of new_dev_maps; triggering an oops. > > > > One way of triggering this is to set an iface up (for which the driver > > uses netdev_set_num_tc in the open path, such as bnx2x) and writing to > > xps_cpus in a concurrent thread. With the right timing an oops is > > triggered. > > > > Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") > > Let's CC Alex > > > Signed-off-by: Antoine Tenart <atenart@kernel.org> > > Two things: (a) is the datapath not exposed to a similar problem? > __get_xps_queue_idx() uses dev->tc_num in a very similar fashion. I think we are shielded from this by the fact that if you change the number of tc the Tx path has to be torn down and rebuilt since you are normally changing the qdisc configuration anyway. > Should we perhaps make the "num_tcs" part of the XPS maps which is > under RCU protection rather than accessing the netdev copy? So it looks like the issue is the fact that we really need to synchronize netdev_reset_tc, netdev_set_tc_queue, and netdev_set_num_tc with __netif_set_xps_queue. > (b) if we always take rtnl_lock, why have xps_map_mutex? Can we > rearrange things so that xps_map_mutex is sufficient? It seems like the quick and dirty way would be to look at updating the 3 functions I called out so that they were holding the xps_map_mutex while they were updating things, and for __netif_set_xps_queue to expand out the mutex to include the code starting at "if (dev->num_tc) {".
Hi Jakub, Alexander, Quoting Alexander Duyck (2020-12-19 02:41:08) > On Fri, Dec 18, 2020 at 4:30 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > Two things: (a) is the datapath not exposed to a similar problem? > > __get_xps_queue_idx() uses dev->tc_num in a very similar fashion. > > I think we are shielded from this by the fact that if you change the > number of tc the Tx path has to be torn down and rebuilt since you are > normally changing the qdisc configuration anyway. That's right. But there's nothing preventing users to call functions using the xps maps in between. There are a few functions being exposed. One (similar) example of that is another bug I reproduced, were the old and the new map in __netif_set_xps_queue do not have the same size, because num_tc was updated in between two calls to this function. The root cause is the same: the size of the map is not embedded in it and whenever we access it we can make an out of bound access. > > Should we perhaps make the "num_tcs" part of the XPS maps which is > > under RCU protection rather than accessing the netdev copy? Yes, I have a local patch (untested, still WIP) doing exactly that. The idea is we can't make sure a num_tc update will trigger an xps reallocation / reconfiguration of the map; but at least we can make sure the map won't be accessed out of bounds. It's a different issue though: not being able to access a map out of bound once it has been allocated whereas this patch wants to prevent an update of num_tc while the xps map allocation/setup is in progress. > So it looks like the issue is the fact that we really need to > synchronize netdev_reset_tc, netdev_set_tc_queue, and > netdev_set_num_tc with __netif_set_xps_queue. > > > (b) if we always take rtnl_lock, why have xps_map_mutex? Can we > > rearrange things so that xps_map_mutex is sufficient? > > It seems like the quick and dirty way would be to look at updating the > 3 functions I called out so that they were holding the xps_map_mutex > while they were updating things, and for __netif_set_xps_queue to > expand out the mutex to include the code starting at "if (dev->num_tc) > {". That should do the trick. The only downside is xps_map_mutex is only defined with CONFIG_XPS while netdev_set_num_tc is not, adding more ifdef to it. But that's probably a better compromise than taking the rtnl lock. Thanks for the review and suggestions! Antoine
On Fri, Dec 18, 2020 at 05:11:28PM +0100, Antoine Tenart wrote: > That build issue seems unrelated to the patch. The series as a whole > builds fine according to the same report, and this code is not modified > by later patches. Hi Antoine, this is a false positive report, kindly ignore this. Sorry for inconvenience. > > It looks a lot like this report from yesterday: > https://www.spinics.net/lists/netdev/msg709132.html > > Which also seemed unrelated to the changes: > https://www.spinics.net/lists/netdev/msg709264.html > > Thanks! > Antoine > > Quoting kernel test robot (2020-12-18 16:27:46) > > Hi Antoine, > > > > I love your patch! Yet something to improve: > > > > [auto build test ERROR on net/master] > > > > url: https://github.com/0day-ci/linux/commits/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 > > base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3ae32c07815a24ae12de2e7838d9d429ba31e5e0 > > config: riscv-randconfig-r014-20201217 (attached as .config) > > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45) > > reproduce (this is a W=1 build): > > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > > chmod +x ~/bin/make.cross > > # install riscv cross compiling tool for clang build > > # apt-get install binutils-riscv64-linux-gnu > > # https://github.com/0day-ci/linux/commit/f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3 > > git remote add linux-review https://github.com/0day-ci/linux > > git fetch --no-tags linux-review Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 > > git checkout f989c3dcbe4d9abd1c6c48b34f08c6c0cd9d44b3 > > # save the attached .config to linux build tree > > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv > > > > If you fix the issue, kindly add following tag as appropriate > > Reported-by: kernel test robot <lkp@intel.com> > > > > Note: the linux-review/Antoine-Tenart/net-sysfs-fix-race-conditions-in-the-xps-code/20201218-002852 HEAD 563d144b47845dea594b409ecf22914b9797cd1e builds fine. > > It only hurts bisectibility. > > > > All errors (new ones prefixed by >>): > > > > /tmp/ics932s401-422897.s: Assembler messages: > > >> /tmp/ics932s401-422897.s:260: Error: unrecognized opcode `zext.b a1,s11' > > /tmp/ics932s401-422897.s:362: Error: unrecognized opcode `zext.b a1,s11' > > /tmp/ics932s401-422897.s:518: Error: unrecognized opcode `zext.b a1,s11' > > /tmp/ics932s401-422897.s:637: Error: unrecognized opcode `zext.b a1,s11' > > /tmp/ics932s401-422897.s:774: Error: unrecognized opcode `zext.b a1,s11' > > /tmp/ics932s401-422897.s:893: Error: unrecognized opcode `zext.b a1,s11' > > /tmp/ics932s401-422897.s:1021: Error: unrecognized opcode `zext.b a1,s11' > > >> /tmp/ics932s401-422897.s:1180: Error: unrecognized opcode `zext.b a1,s2' > > clang-12: error: assembler command failed with exit code 1 (use -v to see invocation) > > > > --- > > 0-DAY CI Kernel Test Service, Intel Corporation > > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org > _______________________________________________ > kbuild-all mailing list -- kbuild-all@lists.01.org > To unsubscribe send an email to kbuild-all-leave@lists.01.org
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 999b70c59761..7cc15dec1717 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1396,7 +1396,13 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue, return err; } + if (!rtnl_trylock()) { + free_cpumask_var(mask); + return restart_syscall(); + } + err = netif_set_xps_queue(dev, mask, index); + rtnl_unlock(); free_cpumask_var(mask);
Callers to netif_set_xps_queue should take the rtnl lock. Failing to do so can lead to race conditions between netdev_set_num_tc and netif_set_xps_queue, triggering various oops: - netif_set_xps_queue uses dev->tc_num as one of the parameters to compute the size of new_dev_maps when allocating it. dev->tc_num is also used to access the map, and the compiler may generate code to retrieve this field multiple times in the function. - netdev_set_num_tc sets dev->tc_num. If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is set to a higher value through netdev_set_num_tc, later accesses to new_dev_maps in netif_set_xps_queue could lead to accessing memory outside of new_dev_maps; triggering an oops. One way of triggering this is to set an iface up (for which the driver uses netdev_set_num_tc in the open path, such as bnx2x) and writing to xps_cpus in a concurrent thread. With the right timing an oops is triggered. Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Antoine Tenart <atenart@kernel.org> --- net/core/net-sysfs.c | 6 ++++++ 1 file changed, 6 insertions(+)