From patchwork Tue Nov 19 15:34:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844377 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 900AC1CF7C7 for ; Tue, 19 Nov 2024 15:36:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030582; cv=none; b=PvORG7NVhDRa979DeDYZELnvMz2oZXAXLBvd0U+gjk5mwb23MfL19jffYAKWXukDBmGA0gYPzjYSbnPG/VNHGA9IPf4jBw2TyyyYR6F6+u2Lu2H1xBOFn50GcZje3hTqRtecsDomAU7/UOVXQ4jmgSqal7IOSrkRp5rplZxz4Ck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030582; c=relaxed/simple; bh=//juTrReR5RY2EXL4SOSTJ2NFApUk9waFhwjwwTnX5s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UiCFs40V5oV1gga0pcJeAcfLevah7RaYOBtUEs0kUi3Sx8i0U5cYKvY7BeEff6HCdCinR9osVFg7a/3Ibk2mvEg/IExX9zi08ExKghqWe8hc6XSq7ICaZB6hbwtBPy1XBQI7YtcQEdy9ZyOLdKJnNkbMn8c2Y4VW/a2+3TVLYOQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=B9NxGyMv; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B9NxGyMv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030579; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0gaKqKMKlmwETWNmgCjaiOWIlMUIKbL1+CgbCwZ0tnA=; b=B9NxGyMvPu5laG8kmc8nos1ko2+aOBw54wa1PDN+1NFaMku1IBdJ19hGgziMZnWPrjbtut 6w6JFsX/tlWEuJcZU+6O9ufmRiB626TljiqZmsZicbTLwn5Vt/QqgaByiT5xNYP8/JwAUj vrl64a6+pO79TMBzWhBuGyYe+6zj9Mw= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-533-9WS2Ju2dPyC3J89ZBnjLFA-1; Tue, 19 Nov 2024 10:36:15 -0500 X-MC-Unique: 9WS2Ju2dPyC3J89ZBnjLFA-1 X-Mimecast-MFC-AGG-ID: 9WS2Ju2dPyC3J89ZBnjLFA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8A8FD1955F2B; Tue, 19 Nov 2024 15:36:07 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 70AC63003C80; Tue, 19 Nov 2024 15:35:51 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 01/15] objtool: Make validate_call() recognize indirect calls to pv_ops[] Date: Tue, 19 Nov 2024 16:34:48 +0100 Message-ID: <20241119153502.41361-2-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 call_dest_name() does not get passed the file pointer of validate_call(), which means its invocation of insn_reloc() will always return NULL. Make it take a file pointer. While at it, make sure call_dest_name() uses arch_dest_reloc_offset(), otherwise it gets the pv_ops[] offset wrong. Fabricating an intentional warning shows the change; previously: vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x4: call to {dynamic}() leaves .noinstr.text section now: vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x4: call to pv_ops[1]() leaves .noinstr.text section Signed-off-by: Valentin Schneider --- tools/objtool/check.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 6604f5d038aad..5f1d0f95fc04b 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -3448,7 +3448,7 @@ static inline bool func_uaccess_safe(struct symbol *func) return false; } -static inline const char *call_dest_name(struct instruction *insn) +static inline const char *call_dest_name(struct objtool_file *file, struct instruction *insn) { static char pvname[19]; struct reloc *reloc; @@ -3457,9 +3457,9 @@ static inline const char *call_dest_name(struct instruction *insn) if (insn_call_dest(insn)) return insn_call_dest(insn)->name; - reloc = insn_reloc(NULL, insn); + reloc = insn_reloc(file, insn); if (reloc && !strcmp(reloc->sym->name, "pv_ops")) { - idx = (reloc_addend(reloc) / sizeof(void *)); + idx = (arch_dest_reloc_offset(reloc_addend(reloc)) / sizeof(void *)); snprintf(pvname, sizeof(pvname), "pv_ops[%d]", idx); return pvname; } @@ -3538,17 +3538,20 @@ static int validate_call(struct objtool_file *file, { if (state->noinstr && state->instr <= 0 && !noinstr_call_dest(file, insn, insn_call_dest(insn))) { - WARN_INSN(insn, "call to %s() leaves .noinstr.text section", call_dest_name(insn)); + WARN_INSN(insn, "call to %s() leaves .noinstr.text section", + call_dest_name(file, insn)); return 1; } if (state->uaccess && !func_uaccess_safe(insn_call_dest(insn))) { - WARN_INSN(insn, "call to %s() with UACCESS enabled", call_dest_name(insn)); + WARN_INSN(insn, "call to %s() with UACCESS enabled", + call_dest_name(file, insn)); return 1; } if (state->df) { - WARN_INSN(insn, "call to %s() with DF set", call_dest_name(insn)); + WARN_INSN(insn, "call to %s() with DF set", + call_dest_name(file, insn)); return 1; } From patchwork Tue Nov 19 15:34:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844376 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8D5C1CF7C7 for ; Tue, 19 Nov 2024 15:36:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030612; cv=none; b=EDN7s3ziSvhOo6ZMcuhNA/cjAom0IVOcnO8Nql+kGpi69bC1pdfCh3UOof9pdqAnnmbuxX+IXnvjLTgrvWZeXF6aIAwD9OVYoN/UQHkKyPg3nuBqfVZLOlxLWeUSjvclQQ3hDa8QEe0w+vgAjs4WslzcrvX90aWn5viRFLUPji4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030612; c=relaxed/simple; bh=f+M7JVmuzBY3THk0fyjhKmyngjuTaSG33xgNtD22Ej8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XarpWfvgSfMTqvH6hudAIn6W7QOtoBK//wD3aSvFxKmj/JP2hhYrd0Ljh8EzQFGUxnbP49TNmWHhSRHqPxW+nMn/9GD6o0poYvFL+yOYhptywegjSHvasz/PI8XAesJxUfkQf8Vkwf3vBzTcEtI8HGMeLny0h/GjP9IPYFloMvE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=btli5qXG; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="btli5qXG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vzSk9ZzswXx0ijJl+1BL1pcjLYKU11Wq/W2Kyfj2Pb8=; b=btli5qXG4PXeC+mww4LGIOu6uTi0Mj6ERF8XZCmsd0/4i47VVyR0axKCsTeSRn9nMl23vD 2A0FEgJR09+n2XW5AqfOas5anLG8EMF+xIHySPICmd7N6GzK4xgqAQz27HeHM5RlK/cLTv MryLvltNr1g45TQrsqfdswEoIN1Tobg= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-301-mg-v22g6NKyICZb_as9TEg-1; Tue, 19 Nov 2024 10:36:46 -0500 X-MC-Unique: mg-v22g6NKyICZb_as9TEg-1 X-Mimecast-MFC-AGG-ID: mg-v22g6NKyICZb_as9TEg Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7B5951955F34; Tue, 19 Nov 2024 15:36:40 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 77F8D3003B71; Tue, 19 Nov 2024 15:36:24 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 03/15] sched/clock: Make sched_clock_running __ro_after_init Date: Tue, 19 Nov 2024 16:34:50 +0100 Message-ID: <20241119153502.41361-4-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 sched_clock_running is only ever enabled in the __init functions sched_clock_init() and sched_clock_init_late(), and is never disabled. Mark it __ro_after_init. Signed-off-by: Valentin Schneider --- kernel/sched/clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index a09655b481402..200e5568b9894 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -66,7 +66,7 @@ notrace unsigned long long __weak sched_clock(void) } EXPORT_SYMBOL_GPL(sched_clock); -static DEFINE_STATIC_KEY_FALSE(sched_clock_running); +static DEFINE_STATIC_KEY_FALSE_RO(sched_clock_running); #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK /* From patchwork Tue Nov 19 15:34:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844375 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 980881D1E71 for ; Tue, 19 Nov 2024 15:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030631; cv=none; b=rqQ/pEPqDuEdYK/TIE0u3A+2fSvzJtWe+Rbl2BqWxJsAklNC73T9Dytmytilzx/BJJnD+nRHz13WIG0R5RxS5NxJk7Y1KhiwtEps6e1ieWsrn2KnKfAYhhh5OJwQbB292ViuuDasH+WjCjebfwTepSXlom916F5rq//pJra4VaA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030631; c=relaxed/simple; bh=bTnrUBgUrau8YQv+XQ05EAUivqZb8FcVO7sSJMvjJKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ayCW4IvKEN5FQbuox8HFBh+lpRGG0P0SXAxyO3VPfaCSYGUGaHknVYLutDKv4xqbPLhO/88MhZYIVjYexZlVZFMPYpIBnS6MLp8LnyJxcPAEO+DFKCB47AiBzFeFnMpJTDt4FWnIDS31LS1fbdf3R8iuD75qrClUCVZ+sSf700I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hw1vnOoG; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hw1vnOoG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9/zjMAAzjZ63CvtK65oto4Glzya8ix3n4q04/5M16qM=; b=hw1vnOoGvihMnz5J0IjLUXQNS+DglJmSwpysWTGyYdip7IYJGFoqRJkQQ3H+Tppm9HYil4 tJU/uFtitqo03rbyfWv2vzIt6V4PF8kVXp650EfpvanJnRpNOlMq4eXtA/3gXMl9qHWdH3 5kGjaLlCj3NNZ5RAGo8uWxQzoN3ncxs= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-668-nbrc9rTzMP-Bp1uCFPcarg-1; Tue, 19 Nov 2024 10:37:04 -0500 X-MC-Unique: nbrc9rTzMP-Bp1uCFPcarg-1 X-Mimecast-MFC-AGG-ID: nbrc9rTzMP-Bp1uCFPcarg Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C4EB41952D0D; Tue, 19 Nov 2024 15:36:56 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC7F330001A2; Tue, 19 Nov 2024 15:36:40 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: "Paul E . McKenney" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 04/15] rcu: Add a small-width RCU watching counter debug option Date: Tue, 19 Nov 2024 16:34:51 +0100 Message-ID: <20241119153502.41361-5-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 A later commit will reduce the size of the RCU watching counter to free up some bits for another purpose. Paul suggested adding a config option to test the extreme case where the counter is reduced to its minimum usable width for rcutorture to poke at, so do that. Make it only configurable under RCU_EXPERT. While at it, add a comment to explain the layout of context_tracking->state. Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney Signed-off-by: Valentin Schneider Reviewed-by: Paul E. McKenney --- include/linux/context_tracking_state.h | 44 ++++++++++++++++++++++---- kernel/rcu/Kconfig.debug | 14 ++++++++ 2 files changed, 51 insertions(+), 7 deletions(-) diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 7b8433d5a8efe..0b81248aa03e2 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -18,12 +18,6 @@ enum ctx_state { CT_STATE_MAX = 4, }; -/* Odd value for watching, else even. */ -#define CT_RCU_WATCHING CT_STATE_MAX - -#define CT_STATE_MASK (CT_STATE_MAX - 1) -#define CT_RCU_WATCHING_MASK (~CT_STATE_MASK) - struct context_tracking { #ifdef CONFIG_CONTEXT_TRACKING_USER /* @@ -44,9 +38,45 @@ struct context_tracking { #endif }; +/* + * We cram two different things within the same atomic variable: + * + * CT_RCU_WATCHING_START CT_STATE_START + * | | + * v v + * MSB [ RCU watching counter ][ context_state ] LSB + * ^ ^ + * | | + * CT_RCU_WATCHING_END CT_STATE_END + * + * Bits are used from the LSB upwards, so unused bits (if any) will always be in + * upper bits of the variable. + */ #ifdef CONFIG_CONTEXT_TRACKING +#define CT_SIZE (sizeof(((struct context_tracking *)0)->state) * BITS_PER_BYTE) + +#define CT_STATE_WIDTH bits_per(CT_STATE_MAX - 1) +#define CT_STATE_START 0 +#define CT_STATE_END (CT_STATE_START + CT_STATE_WIDTH - 1) + +#define CT_RCU_WATCHING_MAX_WIDTH (CT_SIZE - CT_STATE_WIDTH) +#define CT_RCU_WATCHING_WIDTH (IS_ENABLED(CONFIG_RCU_DYNTICKS_TORTURE) ? 2 : CT_RCU_WATCHING_MAX_WIDTH) +#define CT_RCU_WATCHING_START (CT_STATE_END + 1) +#define CT_RCU_WATCHING_END (CT_RCU_WATCHING_START + CT_RCU_WATCHING_WIDTH - 1) +#define CT_RCU_WATCHING BIT(CT_RCU_WATCHING_START) + +#define CT_STATE_MASK GENMASK(CT_STATE_END, CT_STATE_START) +#define CT_RCU_WATCHING_MASK GENMASK(CT_RCU_WATCHING_END, CT_RCU_WATCHING_START) + +#define CT_UNUSED_WIDTH (CT_RCU_WATCHING_MAX_WIDTH - CT_RCU_WATCHING_WIDTH) + +static_assert(CT_STATE_WIDTH + + CT_RCU_WATCHING_WIDTH + + CT_UNUSED_WIDTH == + CT_SIZE); + DECLARE_PER_CPU(struct context_tracking, context_tracking); -#endif +#endif /* CONFIG_CONTEXT_TRACKING */ #ifdef CONFIG_CONTEXT_TRACKING_USER static __always_inline int __ct_state(void) diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 9b0b52e1836fa..8dc505d841f8d 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -168,4 +168,18 @@ config RCU_STRICT_GRACE_PERIOD when looking for certain types of RCU usage bugs, for example, too-short RCU read-side critical sections. + +config RCU_DYNTICKS_TORTURE + bool "Minimize RCU dynticks counter size" + depends on RCU_EXPERT + default n + help + This option controls the width of the dynticks counter. + + Lower values will make overflows more frequent, which will increase + the likelihood of extending grace-periods. This option sets the width + to its minimum usable value. + + This has no value for production and is only for testing. + endmenu # "RCU Debugging" From patchwork Tue Nov 19 15:34:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844374 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29EE41D0E10 for ; Tue, 19 Nov 2024 15:37:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030657; cv=none; b=dz4pNUjzXfMZdDsX6UrY3LDAAFE22ElQGHEII+uyQqPC/OfHngurMdNn5NhTclxNSJ/XjjngUVa8IepFpiaPDpch24Y9Nsis2wrFRz7HuvRSzXEqLioc1L8G/6iNLH+1szkiuvbgWhNjvc8NFkwSFAfbFXWXWlsBMVfcF16NICw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030657; c=relaxed/simple; bh=g4RPCjUX2beCsOhmA+cf1EgW2WlK56h+myfKo4lgSbM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TKCvlyY/79Z5BOsX6B707hAwbWj8FMOr3EHjAOwNT97Yo5/FvoihZuKCX/aSnemO8W42MypD3nOEmS6bGfuzsBBa+ilVoPeiXl3r6HfHc/U26KTsKqF8ylWhRjKE1HVAOFg/B9AyK2PfyWCEsm3TcLdvikmXkjtdobb5O1ctRgk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YcW6xxsd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YcW6xxsd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030655; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wuJGo/SWDYxjpRhP1lHMjQsEt0PrcPcPyU5QylcJ498=; b=YcW6xxsdzDfWLYOzoZGGCwbUDf9UYpmgJMe9dvif+3qY60OAqlrjoAFQbPLPhLtsuxWDFv V3ClElQbm0+A7xaUKR2ejAZKy26BqV2peT1z0HMQlsi5xUgvqXVDWY95s8iMFcS5xnpqgk 7eU410cUYUMfK+aYr8b5MatAymSlxTQ= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-427-4V10RU0zMVukkEJohwz2Cw-1; Tue, 19 Nov 2024 10:37:32 -0500 X-MC-Unique: 4V10RU0zMVukkEJohwz2Cw-1 X-Mimecast-MFC-AGG-ID: 4V10RU0zMVukkEJohwz2Cw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4177119773FC; Tue, 19 Nov 2024 15:37:27 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 859B830001A0; Tue, 19 Nov 2024 15:37:12 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 06/15] jump_label: Add forceful jump label type Date: Tue, 19 Nov 2024 16:34:53 +0100 Message-ID: <20241119153502.41361-7-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Later commits will cause objtool to warn about non __ro_after_init static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. Two such keys currently exist: mds_idle_clear and __sched_clock_stable, which can both be modified at runtime. As discussed at LPC 2024 during the realtime micro-conference, modifying these specific keys incurs additional interference (SMT hotplug) or can even be downright incompatible with NOHZ_FULL (unstable TSC). Suppressing the IPI associated with modifying such keys is thus a minor concern wrt NOHZ_FULL interference, so add a jump type that will be leveraged by both the kernel (to know not to defer the IPI) and objtool (to know not to generate the aforementioned warning). Signed-off-by: Valentin Schneider --- include/linux/jump_label.h | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index f5a2727ca4a9a..93e729545b941 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -200,7 +200,8 @@ struct module; #define JUMP_TYPE_FALSE 0UL #define JUMP_TYPE_TRUE 1UL #define JUMP_TYPE_LINKED 2UL -#define JUMP_TYPE_MASK 3UL +#define JUMP_TYPE_FORCEFUL 4UL +#define JUMP_TYPE_MASK 7UL static __always_inline bool static_key_false(struct static_key *key) { @@ -244,12 +245,15 @@ extern enum jump_label_type jump_label_init_type(struct jump_entry *entry); * raw value, but have added a BUILD_BUG_ON() to catch any issues in * jump_label_init() see: kernel/jump_label.c. */ -#define STATIC_KEY_INIT_TRUE \ - { .enabled = { 1 }, \ - { .type = JUMP_TYPE_TRUE } } -#define STATIC_KEY_INIT_FALSE \ - { .enabled = { 0 }, \ - { .type = JUMP_TYPE_FALSE } } +#define __STATIC_KEY_INIT(_true, force) \ + { .enabled = { _true }, \ + { .type = (_true ? JUMP_TYPE_TRUE : JUMP_TYPE_FALSE) | \ + (force ? JUMP_TYPE_FORCEFUL : 0UL)} } + +#define STATIC_KEY_INIT_TRUE __STATIC_KEY_INIT(true, false) +#define STATIC_KEY_INIT_FALSE __STATIC_KEY_INIT(false, false) +#define STATIC_KEY_INIT_TRUE_FORCE __STATIC_KEY_INIT(true, true) +#define STATIC_KEY_INIT_FALSE_FORCE __STATIC_KEY_INIT(false, true) #else /* !CONFIG_JUMP_LABEL */ @@ -369,6 +373,8 @@ struct static_key_false { #define STATIC_KEY_TRUE_INIT (struct static_key_true) { .key = STATIC_KEY_INIT_TRUE, } #define STATIC_KEY_FALSE_INIT (struct static_key_false){ .key = STATIC_KEY_INIT_FALSE, } +#define STATIC_KEY_TRUE_FORCE_INIT (struct static_key_true) { .key = STATIC_KEY_INIT_TRUE_FORCE, } +#define STATIC_KEY_FALSE_FORCE_INIT (struct static_key_false){ .key = STATIC_KEY_INIT_FALSE_FORCE, } #define DEFINE_STATIC_KEY_TRUE(name) \ struct static_key_true name = STATIC_KEY_TRUE_INIT @@ -376,6 +382,9 @@ struct static_key_false { #define DEFINE_STATIC_KEY_TRUE_RO(name) \ struct static_key_true name __ro_after_init = STATIC_KEY_TRUE_INIT +#define DEFINE_STATIC_KEY_TRUE_FORCE(name) \ + struct static_key_true name = STATIC_KEY_TRUE_FORCE_INIT + #define DECLARE_STATIC_KEY_TRUE(name) \ extern struct static_key_true name @@ -385,6 +394,9 @@ struct static_key_false { #define DEFINE_STATIC_KEY_FALSE_RO(name) \ struct static_key_false name __ro_after_init = STATIC_KEY_FALSE_INIT +#define DEFINE_STATIC_KEY_FALSE_FORCE(name) \ + struct static_key_false name = STATIC_KEY_FALSE_FORCE_INIT + #define DECLARE_STATIC_KEY_FALSE(name) \ extern struct static_key_false name From patchwork Tue Nov 19 15:34:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844373 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F19C51D0174 for ; Tue, 19 Nov 2024 15:38:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030691; cv=none; b=eN4V+rQeocRSUbaByn6dxJb93uSJx6mP5CZswz3u8yPfrr1rYUTaCs086CUTdN4AhSdH1wczzaNFuugUXfhSKsWzJSjUA024M2IjGFClbsAOlArNUcf/xUiWBzewf50I2Xuh5nr+sGZcaM9Bv+h9D9El/jMc8HqF09/3vnygu50= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030691; c=relaxed/simple; bh=WcoRXmH8EHSSUkrYncG38ywpgNOrSaCBNAPretltWvc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P9YFCzNA4daxWKWB7OJu08mV9DdQupEST0fDquMv0h2y2qbkeRqXIYMbrpBHlz5wLyg6rdHHsXE0tU4Ra1Q7xIW5j23KH/DzOPS3he9633msLd3AockNNMw6snAOZMYnKQ1Cxy8zPKkUdAXzv0bBTtaULzoKlcGx/EaFtJIiCzI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WDWS0Mmr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WDWS0Mmr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030687; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lpSGcS2OHkivSptNEqV1yoUV7L+1Tpt77Idg0XadxAw=; b=WDWS0MmrljDrCbAucRmvh9uXGEmnQh2nUk1MsRqi26/u7AqJJ3kZM28eSeF+lUKwEUDIRA ykWQSX7f3nGKCMyhst5dlVrBO6rmFKPD8V/E1dXblrsWwHIcjN0DfweZM+Z95gs05fvNj7 5L4OhAXHuerqw9YOcQm4MiCeHwMCgl8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-209-w8WZReDNPhuTuNl53MRXLw-1; Tue, 19 Nov 2024 10:38:05 -0500 X-MC-Unique: w8WZReDNPhuTuNl53MRXLw-1 X-Mimecast-MFC-AGG-ID: w8WZReDNPhuTuNl53MRXLw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8F25E1955F41; Tue, 19 Nov 2024 15:37:59 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 59F3B30001A2; Tue, 19 Nov 2024 15:37:44 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 08/15] sched/clock, x86: Make __sched_clock_stable forceful Date: Tue, 19 Nov 2024 16:34:55 +0100 Message-ID: <20241119153502.41361-9-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Later commits will cause objtool to warn about non __ro_after_init static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. __sched_clock_stable is used in .noinstr code, and can be modified at runtime (e.g. KVM module loading). Suppressing the text_poke_sync() IPI has little benefits for this key, as NOHZ_FULL is incompatible with an unstable TSC anyway. Mark it as forceful to let the kernel know to always send the text_poke_sync() IPI for it, and to let objtool know not to warn about it. Signed-off-by: Valentin Schneider --- kernel/sched/clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index 200e5568b9894..dc94b3717f5ce 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -76,7 +76,7 @@ static DEFINE_STATIC_KEY_FALSE_RO(sched_clock_running); * Similarly we start with __sched_clock_stable_early, thereby assuming we * will become stable, such that there's only a single 1 -> 0 transition. */ -static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable); +static DEFINE_STATIC_KEY_FALSE_FORCE(__sched_clock_stable); static int __sched_clock_stable_early = 1; /* From patchwork Tue Nov 19 15:34:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844372 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18BE21CEAA6 for ; Tue, 19 Nov 2024 15:38:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030729; cv=none; b=O1zmie0wVluAKsNZ4vmaXkLWcAmNX3GptQTyDE7Ng4TLuEplspoQWkp+T0BS29OaK4M1nHbA+WeOJFjVIAPtks3J2mZHrEWyWrV+wIysY93krQKf4qrwESQwkNzNUfQhIKwnzu0jKItS8GfdYgfGPKjrQFPb2NP3GXlB8hVsihw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030729; c=relaxed/simple; bh=soZGZa5eZvVq13Yv3+buGnph0yEGo0cZoyN46yk11Tc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u2PN5ZWKt5+GcOX5uKuivUF58EJwSwmf+aMhfbRAVsqbHQgnBqn7Oy3zIrcKWK8OAGqmXnOKaaP8A95VIK3wJUeSMaoaEOW+simCZcRGSG/m3C3+n07XR05ral7grQ5vZx96pKi9cWiKsWd8z2UV164HzDqQlTIYoSNXXnF/12o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YhBkDsO3; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YhBkDsO3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030727; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=URgg3URVd8Hv8WB8p504Oe6mTtzB0H+jxnfnYfZO9Vc=; b=YhBkDsO3/SN+8BlGXacmr+Y4ALqQvU2OQwxh+bUPL4x+yhl+HBvrIXZ1yB7LsDwGKPkhOP gGPtWCy++4AHTBrV5edTA+HUiz8OSoVuHQkrzjCn7RI9/47ZUUUseCzlkOwu90Qj8B9p2l gYMvvJsi26xFzjeH3r/1FOUM9t974fs= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-99-3C0haj_PPdCTmZEKdqoqww-1; Tue, 19 Nov 2024 10:38:43 -0500 X-MC-Unique: 3C0haj_PPdCTmZEKdqoqww-1 X-Mimecast-MFC-AGG-ID: 3C0haj_PPdCTmZEKdqoqww Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DC0AB193EF53; Tue, 19 Nov 2024 15:38:32 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 613EB30001A2; Tue, 19 Nov 2024 15:38:15 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 10/15] x86/alternatives: Record text_poke's of JUMP_TYPE_FORCEFUL labels Date: Tue, 19 Nov 2024 16:34:57 +0100 Message-ID: <20241119153502.41361-11-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Forceful static keys are used in early entry code where it is unsafe to defer the sync_core() IPIs, and flagged as such via their ->type field. Record that information when creating a text_poke_loc. The text_poke_loc.old field is written to when first iterating a text_poke() entry, and as such can be (ab)used to store this information at the start of text_poke_bp_batch(). Signed-off-by: Valentin Schneider --- arch/x86/include/asm/text-patching.h | 12 ++++++++++-- arch/x86/kernel/alternative.c | 16 ++++++++++------ arch/x86/kernel/jump_label.c | 7 ++++--- 3 files changed, 24 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 6259f1937fe77..e34de36cab61e 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -38,9 +38,17 @@ extern void *text_poke_copy(void *addr, const void *opcode, size_t len); extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok); extern void *text_poke_set(void *addr, int c, size_t len); extern int poke_int3_handler(struct pt_regs *regs); -extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); +extern void __text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi); +static inline void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate) +{ + __text_poke_bp(addr, opcode, len, emulate, false); +} -extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate); +extern void __text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi); +static inline void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate) +{ + __text_poke_queue(addr, opcode, len, emulate, false); +} extern void text_poke_finish(void); #define INT3_INSN_SIZE 1 diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index d17518ca19b8b..954c4c0f7fc58 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -2098,7 +2098,10 @@ struct text_poke_loc { u8 opcode; const u8 text[POKE_MAX_OPCODE_SIZE]; /* see text_poke_bp_batch() */ - u8 old; + union { + u8 old; + u8 force_ipi; + }; }; struct bp_patching_desc { @@ -2385,7 +2388,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries } static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, - const void *opcode, size_t len, const void *emulate) + const void *opcode, size_t len, const void *emulate, bool force_ipi) { struct insn insn; int ret, i = 0; @@ -2402,6 +2405,7 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, tp->rel_addr = addr - (void *)_stext; tp->len = len; tp->opcode = insn.opcode.bytes[0]; + tp->force_ipi = force_ipi; if (is_jcc32(&insn)) { /* @@ -2493,14 +2497,14 @@ void text_poke_finish(void) text_poke_flush(NULL); } -void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate) +void __ref __text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi) { struct text_poke_loc *tp; text_poke_flush(addr); tp = &tp_vec[tp_vec_nr++]; - text_poke_loc_init(tp, addr, opcode, len, emulate); + text_poke_loc_init(tp, addr, opcode, len, emulate, force_ipi); } /** @@ -2514,10 +2518,10 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi * dynamically allocated memory. This function should be used when it is * not possible to allocate memory. */ -void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate) +void __ref __text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi) { struct text_poke_loc tp; - text_poke_loc_init(&tp, addr, opcode, len, emulate); + text_poke_loc_init(&tp, addr, opcode, len, emulate, force_ipi); text_poke_bp_batch(&tp, 1); } diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index f5b8ef02d172c..e03a4f56b30fd 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -101,8 +101,8 @@ __jump_label_transform(struct jump_entry *entry, text_poke_early((void *)jump_entry_code(entry), jlp.code, jlp.size); return; } - - text_poke_bp((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL); + __text_poke_bp((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL, + jump_entry_key(entry)->type & JUMP_TYPE_FORCEFUL); } static void __ref jump_label_transform(struct jump_entry *entry, @@ -135,7 +135,8 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, mutex_lock(&text_mutex); jlp = __jump_label_patch(entry, type); - text_poke_queue((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL); + __text_poke_queue((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL, + jump_entry_key(entry)->type & JUMP_TYPE_FORCEFUL); mutex_unlock(&text_mutex); return true; } From patchwork Tue Nov 19 15:34:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844371 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B63BB1D0B9C for ; Tue, 19 Nov 2024 15:39:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030758; cv=none; b=s6QYdMsav6zhKZoq6LvYNUFG7SMJAbj6DLvlqXgiqkThb5GUVi/T7T+Lwpc6T7SIBNLZm7EdHbnD8dDzzgbBG2eSyX2XWguNy0uBSTwNeXktE8vd29pKG8CQrmRCbT8K41ma/l5hrFqyP87IT/wvpUYS/gpeBLcvP8YxVzQfF+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030758; c=relaxed/simple; bh=cXoNA/ZSNgL3xQCJGLUfOCoVbT9uZZp4m09FWAiotH8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O+bBgKxEhm9kv8pyLC6iAeA5C2Ah7w0mFv0NN7UJuX++MwgBy1cFsdCvSUBAiz3pN9V5dmjLlzfEIEr6EsVkvKLJ/L2PfSbobqpSgXkC+T+62WDu8XSr29EZPMD1GaxUz3/dWVvz3VXmIamKjuBMiVGFL2HY+9QEz//HiyfK9zY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZJovQOz/; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZJovQOz/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gHLzoO2lla+YiGjlL3mjjdCwCDcrDiVLGjLeux29GIU=; b=ZJovQOz/9dzf4VIxh9Q3noIPFwY43WnEdImzZxZ4VpQgkFEAjOUjbfGViUXRhdW/ExoX1Z Gi5WW+j1sV60GaWRKO70BwURFzRmZbPJe1nOiN64kyr9enykqzHLhXZM0EL0Hw8IZ6V/Gm 8o8mD7Mwoa2H8BSyYBfW7rFdsTDbJxI= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-467-Mc4eVWE3PlutRNpeiXlJfw-1; Tue, 19 Nov 2024 10:39:11 -0500 X-MC-Unique: Mc4eVWE3PlutRNpeiXlJfw-1 X-Mimecast-MFC-AGG-ID: Mc4eVWE3PlutRNpeiXlJfw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0E26E1955EA3; Tue, 19 Nov 2024 15:39:06 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B578D3003B7E; Tue, 19 Nov 2024 15:38:49 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Peter Zijlstra , Nicolas Saenz Julienne , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 12/15] context_tracking, x86: Defer kernel text patching IPIs Date: Tue, 19 Nov 2024 16:34:59 +0100 Message-ID: <20241119153502.41361-13-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 text_poke_bp_batch() sends IPIs to all online CPUs to synchronize them vs the newly patched instruction. CPUs that are executing in userspace do not need this synchronization to happen immediately, and this is actually harmful interference for NOHZ_FULL CPUs. As the synchronization IPIs are sent using a blocking call, returning from text_poke_bp_batch() implies all CPUs will observe the patched instruction(s), and this should be preserved even if the IPI is deferred. In other words, to safely defer this synchronization, any kernel instruction leading to the execution of the deferred instruction sync (ct_work_flush()) must *not* be mutable (patchable) at runtime. This means we must pay attention to mutable instructions in the early entry code: - alternatives - static keys - all sorts of probes (kprobes/ftrace/bpf/???) The early entry code leading to ct_work_flush() is noinstr, which gets rid of the probes. Alternatives are safe, because it's boot-time patching (before SMP is even brought up) which is before any IPI deferral can happen. This leaves us with static keys. Any static key used in early entry code should be only forever-enabled at boot time, IOW __ro_after_init (pretty much like alternatives). Exceptions are marked as forceful and will always generate an IPI when flipped. Objtool is now able to point at static keys that don't respect this, and all static keys used in early entry code have now been verified as behaving like so. Leverage the new context_tracking infrastructure to defer sync_core() IPIs to a target CPU's next kernel entry. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Valentin Schneider --- arch/x86/include/asm/context_tracking_work.h | 6 ++-- arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 33 +++++++++++++++++--- arch/x86/kernel/kprobes/core.c | 4 +-- arch/x86/kernel/kprobes/opt.c | 4 +-- arch/x86/kernel/module.c | 2 +- include/linux/context_tracking_work.h | 4 +-- 7 files changed, 41 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h index 5bc29e6b2ed38..2c66687ce00e2 100644 --- a/arch/x86/include/asm/context_tracking_work.h +++ b/arch/x86/include/asm/context_tracking_work.h @@ -2,11 +2,13 @@ #ifndef _ASM_X86_CONTEXT_TRACKING_WORK_H #define _ASM_X86_CONTEXT_TRACKING_WORK_H +#include + static __always_inline void arch_context_tracking_work(int work) { switch (work) { - case CONTEXT_WORK_n: - // Do work... + case CONTEXT_WORK_SYNC: + sync_core(); break; } } diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index e34de36cab61e..37344e95f738f 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -33,6 +33,7 @@ extern void apply_relocation(u8 *buf, const u8 * const instr, size_t instrlen, u */ extern void *text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); +extern void text_poke_sync_deferrable(void); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern void *text_poke_copy(void *addr, const void *opcode, size_t len); extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 954c4c0f7fc58..4ce224d927b03 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -2080,9 +2081,24 @@ static void do_sync_core(void *info) sync_core(); } +static bool do_sync_core_defer_cond(int cpu, void *info) +{ + return !ct_set_cpu_work(cpu, CONTEXT_WORK_SYNC); +} + +static void __text_poke_sync(smp_cond_func_t cond_func) +{ + on_each_cpu_cond(cond_func, do_sync_core, NULL, 1); +} + void text_poke_sync(void) { - on_each_cpu(do_sync_core, NULL, 1); + __text_poke_sync(NULL); +} + +void text_poke_sync_deferrable(void) +{ + __text_poke_sync(do_sync_core_defer_cond); } /* @@ -2257,6 +2273,8 @@ static int tp_vec_nr; static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries) { unsigned char int3 = INT3_INSN_OPCODE; + bool force_ipi = false; + void (*sync_fn)(void); unsigned int i; int do_sync; @@ -2291,11 +2309,18 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * First step: add a int3 trap to the address that will be patched. */ for (i = 0; i < nr_entries; i++) { + /* + * Record that we need to send the IPI if at least one location + * in the batch requires it. + */ + force_ipi |= tp[i].force_ipi; tp[i].old = *(u8 *)text_poke_addr(&tp[i]); text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE); } - text_poke_sync(); + sync_fn = force_ipi ? text_poke_sync : text_poke_sync_deferrable; + + sync_fn(); /* * Second step: update all but the first byte of the patched range. @@ -2357,7 +2382,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * not necessary and we'd be safe even without it. But * better safe than sorry (plus there's not only Intel). */ - text_poke_sync(); + sync_fn(); } /* @@ -2378,7 +2403,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries } if (do_sync) - text_poke_sync(); + sync_fn(); /* * Remove and wait for refs to be zero. diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 72e6a45e7ec24..c2fd2578fd5fc 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -817,7 +817,7 @@ void arch_arm_kprobe(struct kprobe *p) u8 int3 = INT3_INSN_OPCODE; text_poke(p->addr, &int3, 1); - text_poke_sync(); + text_poke_sync_deferrable(); perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1); } @@ -827,7 +827,7 @@ void arch_disarm_kprobe(struct kprobe *p) perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1); text_poke(p->addr, &p->opcode, 1); - text_poke_sync(); + text_poke_sync_deferrable(); } void arch_remove_kprobe(struct kprobe *p) diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 36d6809c6c9e1..b2ce4d9c3ba56 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -513,11 +513,11 @@ void arch_unoptimize_kprobe(struct optimized_kprobe *op) JMP32_INSN_SIZE - INT3_INSN_SIZE); text_poke(addr, new, INT3_INSN_SIZE); - text_poke_sync(); + text_poke_sync_deferrable(); text_poke(addr + INT3_INSN_SIZE, new + INT3_INSN_SIZE, JMP32_INSN_SIZE - INT3_INSN_SIZE); - text_poke_sync(); + text_poke_sync_deferrable(); perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE); } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 837450b6e882f..00e71ad30c01d 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -191,7 +191,7 @@ static int write_relocate_add(Elf64_Shdr *sechdrs, write, apply); if (!early) { - text_poke_sync(); + text_poke_sync_deferrable(); mutex_unlock(&text_mutex); } diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h index fb74db8876dd2..13fc97b395030 100644 --- a/include/linux/context_tracking_work.h +++ b/include/linux/context_tracking_work.h @@ -5,12 +5,12 @@ #include enum { - CONTEXT_WORK_n_OFFSET, + CONTEXT_WORK_SYNC_OFFSET, CONTEXT_WORK_MAX_OFFSET }; enum ct_work { - CONTEXT_WORK_n = BIT(CONTEXT_WORK_n_OFFSET), + CONTEXT_WORK_SYNC = BIT(CONTEXT_WORK_SYNC_OFFSET), CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET) }; From patchwork Tue Nov 19 15:35:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 844370 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57F0C1D1514 for ; Tue, 19 Nov 2024 15:39:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030786; cv=none; b=atc9bOJhwHdCyz0VMlewq5eC4sJxFdk3newmgqbhrei2s936Jgu8m3RIZT7CdQqynovphqXaxJmVIF6EF+zb5IrNc7TA+9B09r/fU8Gj/APCKDk2pUHTji06FhLOC0lpnxUAZVdnYU25KOgzOTeKBNlmH88s7kUsMeaBQAYv5s4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030786; c=relaxed/simple; bh=xDTiVCke+dcOcHJv8qISnvLe6s6Z2yfLt5JyeXbmVNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S84GCw/Mq4QptXywDAqZIr+TCmQs43KfitmlG/iUjtuzRChzYwGba216BfmubUCYsxlMqahBhvZ1VPGK1Mw6PeYofdxRsT/ododsGMvaDDBqNww7xKMhm4T5dg/6dvSLCb1QpO6kApy+huH26uuWwHJbLP/PErrMMfkbGbOWNWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=JJeT2Tyl; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JJeT2Tyl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xImyrBZyvIgQkmg3c7kz+ISL+g7A1zTmv4nEmku3rgI=; b=JJeT2Tyl0AB0TWw6cslgZoBG+I/YkoALTGILYVjDpfm1MKG2XHNktYy+U8S1ecXs2huUhU Ty9YZZHrFPpxEpTNqz67bOVNf3ZIaN7drJheODbTJvNQQIq3egdmB6ZNBCHN1tJp6kiOpM u/BzbgaDmlOt0sqQZYQsTWYZtxuQ5Xk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-Nxl_Cjx6PWO_riG5D1782Q-1; Tue, 19 Nov 2024 10:39:41 -0500 X-MC-Unique: Nxl_Cjx6PWO_riG5D1782Q-1 X-Mimecast-MFC-AGG-ID: Nxl_Cjx6PWO_riG5D1782Q Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E9AB019560A2; Tue, 19 Nov 2024 15:39:36 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 901D830001A0; Tue, 19 Nov 2024 15:39:22 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 14/15] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Date: Tue, 19 Nov 2024 16:35:01 +0100 Message-ID: <20241119153502.41361-15-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 vunmap()'s issued from housekeeping CPUs are a relatively common source of interference for isolated NOHZ_FULL CPUs, as they are hit by the flush_tlb_kernel_range() IPIs. Given that CPUs executing in userspace do not access data in the vmalloc range, these IPIs could be deferred until their next kernel entry. Deferral vs early entry danger zone =================================== This requires a guarantee that nothing in the vmalloc range can be vunmap'd and then accessed in early entry code. Vmalloc uses are, as reported by vmallocinfo: $ cat /proc/vmallocinfo | awk '{ print $3 }' | sort | uniq __pci_enable_msix_range+0x32b/0x560 acpi_os_map_iomem+0x22d/0x250 bpf_prog_alloc_no_stats+0x34/0x140 fork_idle+0x79/0x120 gen_pool_add_owner+0x3e/0xb0 ? hpet_enable+0xbf/0x470 irq_init_percpu_irqstack+0x129/0x160 kernel_clone+0xab/0x3b0 memremap+0x164/0x330 n_tty_open+0x18/0xa0 pcpu_create_chunk+0x4e/0x1b0 pcpu_create_chunk+0x75/0x1b0 pcpu_get_vm_areas+0x0/0x1100 unpurged vp_modern_map_capability+0x140/0x270 zisofs_init+0x16/0x30 I've categorized these as: a) Device or percpu mappings For these to be unmapped, the device (or CPU) has to be removed and an eventual IRQ freed. Even if the IRQ isn't freed, context tracking entry happens before handling the IRQ itself, per irqentry_enter() usage. __pci_enable_msix_range() acpi_os_map_iomem() irq_init_percpu_irqstack() (not even unmapped when CPU is hot-unplugged!) memremap() n_tty_open() pcpu_create_chunk() pcpu_get_vm_areas() vp_modern_map_capability() b) CONFIG_VMAP_STACK fork_idle() & kernel_clone() vmalloc'd kernel stacks are AFAICT a safe example, as a task running in userspace needs to enter kernelspace to execute do_exit() before its stack can be vfree'd. c) Non-issues bpf_prog_alloc_no_stats() - early entry is noinstr, no BPF! hpet_enable() - hpet_clear_mapping() is only called if __init function fails, no runtime worries zisofs_init () - used for zisofs block decompression, that's way past context tracking entry d) I'm not sure, have a look? gen_pool_add_owner() - AIUI this is mainly for PCI / DMA stuff, which again I wouldn't expect to be accessed before context tracking entry. Changes ====== Blindly deferring any and all flush of the kernel mappings is a risky move, so introduce a variant of flush_tlb_kernel_range() that explicitly allows deferral. Use it for vunmap flushes. Note that while flush_tlb_kernel_range() may end up issuing a full flush (including user mappings), this only happens when reaching a invalidation range threshold where it is cheaper to do a full flush than to individually invalidate each page in the range via INVLPG. IOW, it doesn't *require* invalidating user mappings, and thus remains safe to defer until a later kernel entry. Signed-off-by: Valentin Schneider --- arch/x86/include/asm/tlbflush.h | 1 + arch/x86/mm/tlb.c | 23 +++++++++++++++++++--- mm/vmalloc.c | 35 ++++++++++++++++++++++++++++----- 3 files changed, 51 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index a653b5f47f0e6..d89345c85fa9c 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -249,6 +249,7 @@ extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables); extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); +extern void flush_tlb_kernel_range_deferrable(unsigned long start, unsigned long end); static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 973a4ab3f53b3..bf6ff16a1a523 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -1041,6 +1042,11 @@ static void do_flush_tlb_all(void *info) __flush_tlb_all(); } +static bool do_kernel_flush_defer_cond(int cpu, void *info) +{ + return !ct_set_cpu_work(cpu, CONTEXT_WORK_TLBI); +} + void flush_tlb_all(void) { count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); @@ -1057,12 +1063,13 @@ static void do_kernel_range_flush(void *info) flush_tlb_one_kernel(addr); } -void flush_tlb_kernel_range(unsigned long start, unsigned long end) +static inline void +__flush_tlb_kernel_range(smp_cond_func_t cond_func, unsigned long start, unsigned long end) { /* Balance as user space task's flush, a bit conservative */ if (end == TLB_FLUSH_ALL || (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) { - on_each_cpu(do_flush_tlb_all, NULL, 1); + on_each_cpu_cond(cond_func, do_flush_tlb_all, NULL, 1); } else { struct flush_tlb_info *info; @@ -1070,13 +1077,23 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) info = get_flush_tlb_info(NULL, start, end, 0, false, TLB_GENERATION_INVALID); - on_each_cpu(do_kernel_range_flush, info, 1); + on_each_cpu_cond(cond_func, do_kernel_range_flush, info, 1); put_flush_tlb_info(); preempt_enable(); } } +void flush_tlb_kernel_range(unsigned long start, unsigned long end) +{ + __flush_tlb_kernel_range(NULL, start, end); +} + +void flush_tlb_kernel_range_deferrable(unsigned long start, unsigned long end) +{ + __flush_tlb_kernel_range(do_kernel_flush_defer_cond, start, end); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 634162271c004..02838c515ce2c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -467,6 +467,31 @@ void vunmap_range_noflush(unsigned long start, unsigned long end) __vunmap_range_noflush(start, end); } +#ifdef CONFIG_CONTEXT_TRACKING_WORK +/* + * !!! BIG FAT WARNING !!! + * + * The CPU is free to cache any part of the paging hierarchy it wants at any + * time. It's also free to set accessed and dirty bits at any time, even for + * instructions that may never execute architecturally. + * + * This means that deferring a TLB flush affecting freed page-table-pages (IOW, + * keeping them in a CPU's paging hierarchy cache) is akin to dancing in a + * minefield. + * + * This isn't a problem for deferral of TLB flushes in vmalloc, because + * page-table-pages used for vmap() mappings are never freed - see how + * __vunmap_range_noflush() walks the whole mapping but only clears the leaf PTEs. + * If this ever changes, TLB flush deferral will cause misery. + */ +void __weak flush_tlb_kernel_range_deferrable(unsigned long start, unsigned long end) +{ + flush_tlb_kernel_range(start, end); +} +#else +#define flush_tlb_kernel_range_deferrable(start, end) flush_tlb_kernel_range(start, end) +#endif + /** * vunmap_range - unmap kernel virtual addresses * @addr: start of the VM area to unmap @@ -480,7 +505,7 @@ void vunmap_range(unsigned long addr, unsigned long end) { flush_cache_vunmap(addr, end); vunmap_range_noflush(addr, end); - flush_tlb_kernel_range(addr, end); + flush_tlb_kernel_range_deferrable(addr, end); } static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, @@ -2265,7 +2290,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end, nr_purge_nodes = cpumask_weight(&purge_nodes); if (nr_purge_nodes > 0) { - flush_tlb_kernel_range(start, end); + flush_tlb_kernel_range_deferrable(start, end); /* One extra worker is per a lazy_max_pages() full set minus one. */ nr_purge_helpers = atomic_long_read(&vmap_lazy_nr) / lazy_max_pages(); @@ -2368,7 +2393,7 @@ static void free_unmap_vmap_area(struct vmap_area *va) flush_cache_vunmap(va->va_start, va->va_end); vunmap_range_noflush(va->va_start, va->va_end); if (debug_pagealloc_enabled_static()) - flush_tlb_kernel_range(va->va_start, va->va_end); + flush_tlb_kernel_range_deferrable(va->va_start, va->va_end); free_vmap_area_noflush(va); } @@ -2816,7 +2841,7 @@ static void vb_free(unsigned long addr, unsigned long size) vunmap_range_noflush(addr, addr + size); if (debug_pagealloc_enabled_static()) - flush_tlb_kernel_range(addr, addr + size); + flush_tlb_kernel_range_deferrable(addr, addr + size); spin_lock(&vb->lock); @@ -2881,7 +2906,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) free_purged_blocks(&purge_list); if (!__purge_vmap_area_lazy(start, end, false) && flush) - flush_tlb_kernel_range(start, end); + flush_tlb_kernel_range_deferrable(start, end); mutex_unlock(&vmap_purge_lock); }