Message ID | 20201021194743.781583-1-harshitha.ramamurthy@intel.com |
---|---|
Headers | show |
Series | sock: Fix sock queue mapping to include device | expand |
On Wed, Oct 21, 2020 at 3:51 PM Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> wrote: > > In XPS, the transmit queue selected for a packet is saved in the associated > sock for the packet and is then used to avoid recalculating the queue > on subsequent sends. The problem is that the corresponding device is not > also recorded so that when the queue mapping is referenced it may > correspond to a different device than the sending one, resulting in an > incorrect queue being used for transmit. Particularly with xps_rxqs, this > can lead to non-deterministic behaviour as illustrated below. > > Consider a case where xps_rxqs is configured and there is a difference > in number of Tx and Rx queues. Suppose we have 2 devices A and B. Device A > has 0-7 queues and device B has 0-15 queues. Packets are transmitted from > Device A but packets are received on B. For packets received on queue 0-7 > of Device B, xps_rxqs will be applied for reply packets to transmit on > Device A's queues 0-7. However, when packets are received on queues > 8-15 of Device B, normal XPS is used to reply packets when transmitting > from Device A. This leads to non-deterministic behaviour. The case where > there are fewer receive queues is even more insidious. Consider Device > A, the trasmitting device has queues 0-15 and Device B, the receiver > has queues 0-7. With xps_rxqs enabled, the packets will be received only > on queues 0-7 of Device B, but sent only on 0-7 queues of Device A > thereby causing a load imbalance. So the issue is limited to xps_rxqs with multiple nics. When do we need sk_tx_dev_and_queue_mapping (patch 3/3)? It is used in netdev_pick_tx, but associations are reset on route change and recomputed if queue_index would exceed the current device queue count. > This patch set fixes the issue by recording both the device (via > ifindex) and the queue in the sock mapping. The pair is set and > retrieved atomically. I guess this is the reason for the somewhat convoluted cast to u64 logic in patch 1/3. Is the assumption that 64-bit loads and stores are atomic on all platforms? That is not correct. Is atomicity even needed? For the purpose of load balancing it isn't. Just adding a sk->rx_ifindex would be a lot simpler. sk->sk_napi_id already uniquely identifies the device. Unfortunately, dev_get_by_napi_id is not cheap (traverses a hashtable bucket). Though purely for the purpose of load balancing this validation could be sample based. The rx ifindex is also already recorded for inet sockets in rx_dst_ifindex, and the sk_rx_queue_get functions are limited to those, so could conceivably use that. But it is derived from skb_iif, which is overwritten with every reentry of __netif_receive_skb_core.