[5.4,312/340] printk: fix deadlock when kernel panic

Message ID	20210301161103.641895250@linuxfoundation.org
State	Superseded
Headers	show Return-Path: <stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Muchun Song <songmuchun@bytedance.com>, Petr Mladek <pmladek@suse.com>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH 5.4 312/340] printk: fix deadlock when kernel panic Date: Mon, 1 Mar 2021 17:14:16 +0100 Message-Id: <20210301161103.641895250@linuxfoundation.org> In-Reply-To: <20210301161048.294656001@linuxfoundation.org> References: <20210301161048.294656001@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [5.4,004/340] debugfs: do not attempt to create a new file before the filesystem is initalized [5.4,005/340] kdb: Make memory allocations more robust [5.4,006/340] PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064 [5.4,009/340] bfq: Avoid false bfq queue merging [5.4,010/340] ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode [5.4,012/340] random: fix the RNDRESEEDCRNG ioctl [5.4,014/340] Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe func... [5.4,015/340] Bluetooth: hci_uart: Fix a race for write_work scheduling [5.4,017/340] ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5 [5.4,020/340] ARM: dts: exynos: correct PMIC interrupt trigger level on Spring [5.4,021/340] ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa [5.4,023/340] arm64: dts: exynos: correct PMIC interrupt trigger level on TM2 [5.4,024/340] arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso [5.4,027/340] bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args [5.4,028/340] arm64: dts: allwinner: A64: properly connect USB PHY to port 0 [5.4,030/340] arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card [5.4,031/340] arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency [5.4,032/340] arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz [5.4,034/340] cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove() [5.4,035/340] ACPICA: Fix exception code class checks [5.4,036/340] usb: gadget: u_audio: Free requests only after callback [5.4,037/340] Bluetooth: drop HCI device reference before return [5.4,040/340] ARM: dts: Configure missing thermal interrupt for 4430 [5.4,044/340] staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules [5.4,045/340] ARM: dts: armada388-helios4: assign pinctrl to LEDs [5.4,047/340] arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware [5.4,048/340] Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv [5.4,050/340] ARM: s3c: fix fiq for clang IAS [5.4,051/340] soc: aspeed: snoop: Add clock control logic [5.4,055/340] bnxt_en: reverse order of TX disable and carrier off [5.4,056/340] xen/netback: fix spurious event detection for common event case [5.4,058/340] bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx [5.4,060/340] net: axienet: Handle deferred probe on clock properly [5.4,062/340] b43: N-PHY: Fix the update of coef for the PHY revision >= 3case [5.4,063/340] ibmvnic: add memory barrier to protect long term buffer [5.4,066/340] net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning [5.4,067/340] net: amd-xgbe: Reset link when the link never comes back [5.4,069/340] net: mvneta: Remove per-cpu queue mapping for Armada 3700 [5.4,071/340] drm/gma500: Fix error return code in psb_driver_load() [5.4,073/340] drm/fb-helper: Add missed unlocks in setcmap_legacy() [5.4,077/340] drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition [5.4,079/340] MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0 [5.4,080/340] media: i2c: ov5670: Fix PIXEL_RATE minimum value [5.4,081/340] media: imx: Unregister csc/scaler only if registered [5.4,082/340] media: imx: Fix csc/scaler unregister [5.4,083/340] media: camss: missing error code in msm_video_register() [5.4,085/340] media: em28xx: Fix use-after-free in em28xx_alloc_urbs [5.4,089/340] ASoC: cs42l56: fix up error handling in probe [5.4,091/340] crypto: bcm - Rename struct device_private to bcm_device_private [5.4,092/340] drm/sun4i: tcon: fix inverted DCLK polarity [5.4,095/340] drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction. [5.4,099/340] media: qm1d1c0042: fix error return code in qm1d1c0042_init() [5.4,100/340] media: cx25821: Fix a bug when reallocating some dma memory [5.4,102/340] media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values [5.4,103/340] sched/eas: Dont update misfit status if the task is pinned [5.4,107/340] ata: ahci_brcm: Add back regulators management [5.4,109/340] mtd: parsers: afs: Fix freeing the part name memory in failure [5.4,110/340] f2fs: fix to avoid inconsistent quota data [5.4,111/340] drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask() [5.4,112/340] f2fs: fix a wrong condition in __submit_bio [5.4,116/340] hwrng: timeriomem - Fix cooldown period calculation [5.4,117/340] crypto: ecdh_helper - Ensure len >= secret.len in decode_key() [5.4,118/340] ima: Free IMA measurement buffer on error [5.4,119/340] ima: Free IMA measurement buffer after kexec syscall [5.4,120/340] ASoC: simple-card-utils: Fix device module clock [5.4,124/340] ubifs: Fix error return code in alloc_wbufs() [5.4,125/340] capabilities: Dont allow writing ambiguous v3 file capabilities [5.4,126/340] HSI: Fix PM usage counter unbalance in ssi_hw_init [5.4,127/340] clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL [5.4,128/340] clk: meson: clk-pll: make "ret" a signed integer [5.4,130/340] selftests/powerpc: Make the test check in eeh-basic.sh posix compliant [5.4,131/340] quota: Fix memory leak when handling corrupted quota file [5.4,133/340] i2c: iproc: update slave isr mask (ISR_MASK_SLAVE) [5.4,135/340] spi: cadence-quadspi: Abort read if dummy cycles required are too many [5.4,137/340] HID: core: detect and skip invalid inputs to snto32() [5.4,138/340] RDMA/siw: Fix handling of zero-sized Read and Receive Queues. [5.4,139/340] dmaengine: fsldma: Fix a resource leak in the remove function [5.4,146/340] power: reset: at91-sama5d2_shdwc: fix wkupdbc mask [5.4,147/340] rtc: s5m: select REGMAP_I2C [5.4,148/340] clocksource/drivers/ixp4xx: Select TIMER_OF when needed [5.4,149/340] clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined [5.4,151/340] clk: sunxi-ng: h6: Fix clock divider range on some clocks [5.4,154/340] regulator: s5m8767: Fix reference count leak [5.4,155/340] spi: atmel: Put allocated master before return [5.4,156/340] regulator: s5m8767: Drop regulators OF node reference [5.4,160/340] objtool: Fix error handling for STD/CLD warnings [5.4,161/340] objtool: Fix ".cold" section suffix check for newer versions of GCC [5.4,163/340] IB/umad: Return EPOLLERR in case of when device disassociated [5.4,165/340] powerpc/47x: Disable 256k page size [5.4,166/340] powerpc/sstep: Fix incorrect return from analyze_instr() [5.4,167/340] mmc: sdhci-sprd: Fix some resource leaks in the remove function [5.4,175/340] IB/cm: Avoid a loop when device has 255 ports [5.4,178/340] perf vendor events arm64: Fix Ampere eMag event typo [5.4,180/340] RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt [5.4,184/340] powerpc/pseries/dlpar: handle ibm, configure-connector delay status [5.4,185/340] powerpc/8xx: Fix software emulation interrupt [5.4,186/340] clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs [5.4,187/340] RDMA/hns: Fixed wrong judgments in the goto branch [5.4,189/340] RDMA/hns: Fix type of sq_signal_bits [5.4,191/340] regulator: qcom-rpmh: fix pm8009 ldo7 [5.4,192/340] clk: aspeed: Fix APLL calculate formula from ast2600-A2 [5.4,195/340] Input: sur40 - fix an error code in sur40_probe() [5.4,201/340] misc: eeprom_93xx46: Fix module alias to enable module autoprobe [5.4,202/340] phy: rockchip-emmc: emmc_phy_init() always return 0 [5.4,203/340] misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree... [5.4,206/340] VMCI: Use set_page_dirty_lock() when unregistering guest memory [5.4,207/340] PCI: Align checking of syscall user config accessors [5.4,208/340] mei: hbm: call mei_set_devstate() on hbm stop response [5.4,211/340] vfio/iommu_type1: Fix some sanity checks in detach group [5.4,212/340] ext4: fix potential htree index checksum corruption [5.4,214/340] nvmem: core: skip child nodes not matching binding [5.4,215/340] regmap: sdw: use _no_pm functions in regmap_read/write [5.4,217/340] i40e: Add zero-initialization of AQ command structures [5.4,218/340] i40e: Fix overwriting flow control settings during driver loading [5.4,220/340] i40e: Fix VFs not created [5.4,223/340] vfio/type1: Use follow_pte() [5.4,224/340] net/mlx4_core: Add missed mlx4_free_cmd_mailbox() [5.4,226/340] ocfs2: fix a use after free on error [5.4,227/340] mm/memory.c: fix potential pte_unmap_unlock pte error [5.4,229/340] mm/compaction: fix misbehaviors of fast_find_migrateblock() [5.4,230/340] r8169: fix jumbo packet handling on RTL8168e [5.4,233/340] mm/rmap: fix potential pte_unmap on an not mapped pte [5.4,234/340] scsi: bnx2fc: Fix Kconfig warning & CNIC build errors [5.4,237/340] ACPI: configfs: add missing check after configfs_register_default_group() [5.4,238/340] HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming [5.4,241/340] Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X\|S [5.4,243/340] Input: i8042 - add ASUS Zenbook Flip to noselftest list [5.4,244/340] media: mceusb: Fix potential out-of-bounds shift [5.4,246/340] usb: musb: Fix runtime PM race in musb_queue_resume_work [5.4,247/340] usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1 [5.4,249/340] USB: serial: ftdi_sio: fix FTX sub-integer prescaler [5.4,252/340] ALSA: hda: Add another CometLake-H PCI ID [5.4,254/340] Revert "bcache: Kill btree_io_wq" [5.4,258/340] drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2) [5.4,259/340] drm/nouveau/kms: handle mDP connectors [5.4,262/340] tpm_tis: Fix check_locality for correct locality acquisition [5.4,264/340] KEYS: trusted: Fix migratable=1 failing [5.4,266/340] btrfs: fix reloc root leak with 0 ref reloc roots on recovery [5.4,268/340] btrfs: fix extent buffer leak on failure to copy root [5.4,269/340] crypto: arm64/sha - add missing module aliases [5.4,271/340] crypto: sun4i-ss - checking sg length is not sufficient [5.4,272/340] crypto: sun4i-ss - IV register does not work on A10 and A13 [5.4,275/340] seccomp: Add missing return in non-void function [5.4,279/340] dts64: mt7622: fix slow sd card access [5.4,281/340] staging: gdm724x: Fix DMA from stack [5.4,282/340] staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table [5.4,285/340] x86/reboot: Force all cpus to exit VMX root if VMX is supported [5.4,286/340] powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan [5.4,287/340] rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers [5.4,292/340] arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing [5.4,294/340] watchdog: mei_wdt: request stop on unregister [5.4,296/340] mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region [5.4,297/340] mtd: spi-nor: core: Fix erase type discovery for overlaid region [5.4,301/340] seq_file: document how per-entry resources are managed. [5.4,302/340] x86: fix seq_file iteration for pat/memtype.c [5.4,305/340] arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55 [5.4,306/340] media: smipcie: fix interrupt handling and IR timeout [5.4,311/340] gpio: pcf857x: Fix missing first interrupt [5.4,312/340] printk: fix deadlock when kernel panic [5.4,314/340] s390/vtime: fix inline assembly clobber list [5.4,318/340] sparc32: fix a user-triggerable oops in clear_user() [5.4,319/340] spi: spi-synquacer: fix set_cs handling [5.4,321/340] gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end [5.4,322/340] dm: fix deadlock when swapping to encrypted device [5.4,326/340] dm era: Fix bitset memory leaks [5.4,329/340] dm era: only resize metadata in preresume [5.4,330/340] drm/i915: Reject 446-480MHz HDMI clock on GLK [5.4,331/340] icmp: introduce helper for natd source address in network device context [5.4,332/340] gtp: use icmp_ndo_send helper [5.4,333/340] sunvnet: use icmp_ndo_send helper [5.4,334/340] xfrm: interface: use icmp_ndo_send helper [5.4,336/340] ipv6: silence compilation warning for non-IPV6 builds [5.4,337/340] net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending [5.4,339/340] dm era: Update in-core bitset after committing the metadata [5.4,340/340] net: qrtr: Fix memory leak in qrtr_tun_open

Message ID

20210301161103.641895250@linuxfoundation.org

State

Superseded

Headers

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Muchun Song <songmuchun@bytedance.com>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH 5.4 312/340] printk: fix deadlock when kernel panic
Date: Mon,  1 Mar 2021 17:14:16 +0100
Message-Id: <20210301161103.641895250@linuxfoundation.org>
In-Reply-To: <20210301161048.294656001@linuxfoundation.org>
References: <20210301161048.294656001@linuxfoundation.org>
User-Agent: quilt/0.66
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

None | expand

Commit Message

Greg KH March 1, 2021, 4:14 p.m. UTC

From: Muchun Song <songmuchun@bytedance.com>

commit 8a8109f303e25a27f92c1d8edd67d7cbbc60a4eb upstream.

printk_safe_flush_on_panic() caused the following deadlock on our
server:

CPU0:                                         CPU1:
panic                                         rcu_dump_cpu_stacks
  kdump_nmi_shootdown_cpus                      nmi_trigger_cpumask_backtrace
    register_nmi_handler(crash_nmi_callback)      printk_safe_flush
                                                    __printk_safe_flush
                                                      raw_spin_lock_irqsave(&read_lock)
    // send NMI to other processors
    apic_send_IPI_allbutself(NMI_VECTOR)
                                                        // NMI interrupt, dead loop
                                                        crash_nmi_callback
  printk_safe_flush_on_panic
    printk_safe_flush
      __printk_safe_flush
        // deadlock
        raw_spin_lock_irqsave(&read_lock)

DEADLOCK: read_lock is taken on CPU1 and will never get released.

It happens when panic() stops a CPU by NMI while it has been in
the middle of printk_safe_flush().

Handle the lock the same way as logbuf_lock. The printk_safe buffers
are flushed only when both locks can be safely taken. It can avoid
the deadlock _in this particular case_ at expense of losing contents
of printk_safe buffers.

Note: It would actually be safe to re-init the locks when all CPUs were
      stopped by NMI. But it would require passing this information
      from arch-specific code. It is not worth the complexity.
      Especially because logbuf_lock and printk_safe buffers have been
      obsoleted by the lockless ring buffer.

Fixes: cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: <stable@vger.kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210210034823.64867-1-songmuchun@bytedance.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/printk/printk_safe.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -43,6 +43,8 @@  struct printk_safe_seq_buf {
 static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
 static DEFINE_PER_CPU(int, printk_context);
 
+static DEFINE_RAW_SPINLOCK(safe_read_lock);
+
 #ifdef CONFIG_PRINTK_NMI
 static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
 #endif
@@ -178,8 +180,6 @@  static void report_message_lost(struct p
  */
 static void __printk_safe_flush(struct irq_work *work)
 {
-	static raw_spinlock_t read_lock =
-		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
 	struct printk_safe_seq_buf *s =
 		container_of(work, struct printk_safe_seq_buf, work);
 	unsigned long flags;
@@ -193,7 +193,7 @@  static void __printk_safe_flush(struct i
 	 * different CPUs. This is especially important when printing
 	 * a backtrace.
 	 */
-	raw_spin_lock_irqsave(&read_lock, flags);
+	raw_spin_lock_irqsave(&safe_read_lock, flags);
 
 	i = 0;
 more:
@@ -230,7 +230,7 @@  more:
 
 out:
 	report_message_lost(s);
-	raw_spin_unlock_irqrestore(&read_lock, flags);
+	raw_spin_unlock_irqrestore(&safe_read_lock, flags);
 }
 
 /**
@@ -276,6 +276,14 @@  void printk_safe_flush_on_panic(void)
 		raw_spin_lock_init(&logbuf_lock);
 	}
 
+	if (raw_spin_is_locked(&safe_read_lock)) {
+		if (num_online_cpus() > 1)
+			return;
+
+		debug_locks_off();
+		raw_spin_lock_init(&safe_read_lock);
+	}
+
 	printk_safe_flush();
 }

[5.4,312/340] printk: fix deadlock when kernel panic

Commit Message

Patch