[5.5,169/189] IB/mlx5: Fix implicit ODP race

Message ID	20200310123656.987256451@linuxfoundation.org
State	New
Headers	show Return-Path: <SRS0=QSPG=43=vger.kernel.org=stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Artemy Kovalyov <artemyko@mellanox.com>, Jason Gunthorpe <jgg@mellanox.com>, Leon Romanovsky <leonro@mellanox.com> Subject: [PATCH 5.5 169/189] IB/mlx5: Fix implicit ODP race Date: Tue, 10 Mar 2020 13:40:06 +0100 Message-Id: <20200310123656.987256451@linuxfoundation.org> In-Reply-To: <20200310123639.608886314@linuxfoundation.org> References: <20200310123639.608886314@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk
Series	None \| expand [5.5,002/189] block, bfq: get a ref to a group when adding it to a service tree [5.5,003/189] block, bfq: get extra ref to prevent a queue from being freed during a group move [5.5,004/189] block, bfq: do not insert oom queue into position tree [5.5,005/189] dm thin metadata: fix lockdep complaint [5.5,007/189] RDMA/core: Fix pkey and port assignment in get_new_pps [5.5,010/189] netfilter: hashlimit: do not use indirect calls during gc [5.5,011/189] netfilter: xt_hashlimit: unregister proc file before releasing mutex [5.5,012/189] ALSA: hda: do not override bus codec_mask in link_get() [5.5,016/189] usb: gadget: composite: Support more than 500mA MaxPower [5.5,017/189] usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags [5.5,021/189] habanalabs: patched cb equals user cb in device memset [5.5,023/189] drm: msm: Fix return type of dsi_mgr_connector_mode_valid for kCFI [5.5,025/189] drm/modes: Allow DRM_MODE_ROTATE_0 when applying video mode parameters [5.5,027/189] drm/msm/dsi: save pll state before dsi host is powered off [5.5,028/189] drm/msm/dsi/pll: call vco set rate explicitly [5.5,033/189] net: ks8851-ml: Remove 8-bit bus accessors [5.5,034/189] net: ks8851-ml: Fix 16-bit data access [5.5,037/189] watchdog: da9062: do not ping the hw during stop() [5.5,038/189] s390/cio: cio_ignore_proc_seq_next should increase position index [5.5,040/189] efi: Only print errors about failing to get certs if EFI vars are found [5.5,042/189] iommu/amd: Disable IOMMU on Stoney Ridge systems [5.5,045/189] x86/boot/compressed: Dont declare __force_order in kaslr_64.c [5.5,047/189] nvme: Fix uninitialized-variable warning [5.5,049/189] x86/xen: Distribute switch variables for initialization [5.5,052/189] csky: Set regs->usp to kernel sp, when the exception is from kernel [5.5,055/189] csky: Fixup compile warning for three unimplemented syscalls [5.5,056/189] arch/csky: fix some Kconfig typos [5.5,058/189] firmware: imx: scu: Ensure sequential TX [5.5,060/189] binder: prevent UAF for binderfs devices II [5.5,062/189] ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1 [5.5,064/189] ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master [5.5,066/189] driver core: Call sync_state() even if supplier has no consumers [5.5,067/189] cifs: dont leak -EAGAIN for stat() during reconnect [5.5,069/189] usb: storage: Add quirk for Samsung Fit flash [5.5,071/189] usb: quirks: add NO_LPM quirk for Logitech Screen Share [5.5,073/189] usb: cdns3: gadget: link trb should point to next request [5.5,076/189] usb: core: hub: do error out if usb_autopm_get_interface() fails [5.5,077/189] usb: core: port: do error out if usb_autopm_get_interface() fails [5.5,080/189] mm: fix possible PMD dirty bit lost in set_pmd_migration_entry() [5.5,082/189] mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled [5.5,083/189] fat: fix uninit-memory access for partial initialized inode [5.5,085/189] arm64: dts: socfpga: agilex: Fix gmac compatible [5.5,086/189] arm: dts: dra76x: Fix mmc3 max-frequency [5.5,087/189] phy: allwinner: Fix GENMASK misuse [5.5,090/189] serial: 8250_exar: add support for ACCES cards [5.5,093/189] vt: selection, push console lock down [5.5,095/189] media: hantro: Fix broken media controller links [5.5,097/189] media: vicodec: process all 4 components for RGB32 formats [5.5,098/189] media: v4l2-mem2mem.c: fix broken links [5.5,100/189] perf intel-bts: Fix endless record after being terminated [5.5,102/189] perf arm-spe: Fix endless record after being terminated [5.5,105/189] x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes [5.5,106/189] s390/pci: Fix unexpected write combine on resource [5.5,108/189] selftests: pidfd: Add pidfd_fdinfo_test in .gitignore [5.5,110/189] drm/virtio: fix mmap page attributes [5.5,112/189] drm/amdgpu: disable 3D pipe 1 on Navi1x [5.5,114/189] dmaengine: imx-sdma: fix context cache [5.5,116/189] dmaengine: tegra-apb: Fix use-after-free [5.5,117/189] dmaengine: tegra-apb: Prevent race conditions of tasklet vs free list [5.5,118/189] dm integrity: fix recalculation when moving from journal mode to bitmap mode [5.5,120/189] dm integrity: fix invalid table returned due to argument count mismatch [5.5,121/189] dm cache: fix a crash due to incorrect work item cancelling [5.5,124/189] dm zoned: Fix reference counter initial value of chunk works [5.5,126/189] arm64: dts: meson-sm1-sei610: add missing interrupt-names [5.5,128/189] spi: bcm63xx-hsspi: Really keep pll clk enabled [5.5,130/189] ASoC: topology: Fix memleak in soc_tplg_link_elems_load() [5.5,133/189] ASoC: intel: skl: Fix pin debug prints [5.5,136/189] ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path [5.5,138/189] ASoC: dapm: Correct DAPM handling of active widgets during shutdown [5.5,139/189] ASoC: soc-component: tidyup snd_soc_pcm_component_sync_stop() [5.5,141/189] drm/panfrost: Dont try to map on error faults [5.5,146/189] drm/sun4i: Fix DE2 VI layer format support [5.5,147/189] drm/sun4i: de2/de3: Remove unsupported VI layer formats [5.5,148/189] drm/i915: Program MBUS with rmw during initialization [5.5,151/189] phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling [5.5,155/189] firmware: imx: misc: Align imx sc msg structs to 4 [5.5,156/189] firmware: imx: scu-pd: Align imx sc msg structs to 4 [5.5,159/189] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" [5.5,160/189] RDMA/rw: Fix error flow during RDMA context initialization [5.5,161/189] RDMA/odp: Ensure the mm is still alive before creating an implicit child [5.5,164/189] RDMA/iwcm: Fix iwcm work deallocation [5.5,167/189] regulator: qcom_spmi: Fix docs for PM8004 [5.5,168/189] RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen() [5.5,169/189] IB/mlx5: Fix implicit ODP race [5.5,171/189] ARM: imx: build v7_cpu_resume() unconditionally [5.5,174/189] ARM: dts: dra7xx-clocks: Fixup IPU1 mux clock parent source [5.5,175/189] ARM: dts: imx6dl-colibri-eval-v3: fix sram compatible properties [5.5,176/189] ARM: dts: imx7-colibri: Fix frequency for sd/mmc [5.5,179/189] dmaengine: coh901318: Fix a double lock bug in dma_tc_handle() [5.5,180/189] sched/fair: Fix statistics for find_idlest_group() [5.5,182/189] bus: ti-sysc: Fix 1-wire reset quirk [5.5,185/189] powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems [5.5,186/189] efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper

Message ID

20200310123656.987256451@linuxfoundation.org

State

New

Headers

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Artemy Kovalyov <artemyko@mellanox.com>,
	Jason Gunthorpe <jgg@mellanox.com>, Leon Romanovsky <leonro@mellanox.com>
Subject: [PATCH 5.5 169/189] IB/mlx5: Fix implicit ODP race
Date: Tue, 10 Mar 2020 13:40:06 +0100
Message-Id: <20200310123656.987256451@linuxfoundation.org>
In-Reply-To: <20200310123639.608886314@linuxfoundation.org>
References: <20200310123639.608886314@linuxfoundation.org>
User-Agent: quilt/0.66
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: stable-owner@vger.kernel.org
Precedence: bulk

Series

None | expand

Commit Message

Greg KH March 10, 2020, 12:40 p.m. UTC

From: Artemy Kovalyov <artemyko@mellanox.com>

commit de5ed007a03d71daaa505f5daa4d3666530c7090 upstream.

Following race may occur because of the call_srcu and the placement of
the synchronize_srcu vs the xa_erase.

CPU0				   CPU1

mlx5_ib_free_implicit_mr:	   destroy_unused_implicit_child_mr:
 xa_erase(odp_mkeys)
 synchronize_srcu()
				    xa_lock(implicit_children)
				    if (still in xarray)
				       atomic_inc()
				       call_srcu()
				    xa_unlock(implicit_children)
 xa_erase(implicit_children):
   xa_lock(implicit_children)
   __xa_erase()
   xa_unlock(implicit_children)

 flush_workqueue()
				   [..]
				    free_implicit_child_mr_rcu:
				     (via call_srcu)
				      queue_work()

 WARN_ON(atomic_read())
				   [..]
				    free_implicit_child_mr_work:
				     (via wq)
				      free_implicit_child_mr()
 mlx5_mr_cache_invalidate()
				     mlx5_ib_update_xlt() <-- UMR QP fail
				     atomic_dec()

The wait_event() solves the race because it blocks until
free_implicit_child_mr_work() completes.

Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |    1 +
 drivers/infiniband/hw/mlx5/odp.c     |   17 +++++++----------
 2 files changed, 8 insertions(+), 10 deletions(-)

--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -629,6 +629,7 @@  struct mlx5_ib_mr {
 
 	/* For ODP and implicit */
 	atomic_t		num_deferred_work;
+	wait_queue_head_t       q_deferred_work;
 	struct xarray		implicit_children;
 	union {
 		struct rcu_head rcu;
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -197,7 +197,8 @@  static void free_implicit_child_mr(struc
 	mr->parent = NULL;
 	mlx5_mr_cache_free(mr->dev, mr);
 	ib_umem_odp_release(odp);
-	atomic_dec(&imr->num_deferred_work);
+	if (atomic_dec_and_test(&imr->num_deferred_work))
+		wake_up(&imr->q_deferred_work);
 }
 
 static void free_implicit_child_mr_work(struct work_struct *work)
@@ -516,6 +517,7 @@  struct mlx5_ib_mr *mlx5_ib_alloc_implici
 	imr->umem = &umem_odp->umem;
 	imr->is_odp_implicit = true;
 	atomic_set(&imr->num_deferred_work, 0);
+	init_waitqueue_head(&imr->q_deferred_work);
 	xa_init(&imr->implicit_children);
 
 	err = mlx5_ib_update_xlt(imr, 0,
@@ -573,10 +575,7 @@  void mlx5_ib_free_implicit_mr(struct mlx
 	 * under xa_lock while the child is in the xarray. Thus at this point
 	 * it is only decreasing, and all work holding it is now on the wq.
 	 */
-	if (atomic_read(&imr->num_deferred_work)) {
-		flush_workqueue(system_unbound_wq);
-		WARN_ON(atomic_read(&imr->num_deferred_work));
-	}
+	wait_event(imr->q_deferred_work, !atomic_read(&imr->num_deferred_work));
 
 	/*
 	 * Fence the imr before we destroy the children. This allows us to
@@ -607,10 +606,7 @@  void mlx5_ib_fence_odp_mr(struct mlx5_ib
 	/* Wait for all running page-fault handlers to finish. */
 	synchronize_srcu(&mr->dev->odp_srcu);
 
-	if (atomic_read(&mr->num_deferred_work)) {
-		flush_workqueue(system_unbound_wq);
-		WARN_ON(atomic_read(&mr->num_deferred_work));
-	}
+	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
 
 	dma_fence_odp_mr(mr);
 }
@@ -1682,7 +1678,8 @@  static void destroy_prefetch_work(struct
 	u32 i;
 
 	for (i = 0; i < work->num_sge; ++i)
-		atomic_dec(&work->frags[i].mr->num_deferred_work);
+		if (atomic_dec_and_test(&work->frags[i].mr->num_deferred_work))
+			wake_up(&work->frags[i].mr->q_deferred_work);
 	kvfree(work);
 }

[5.5,169/189] IB/mlx5: Fix implicit ODP race

Commit Message

Patch