From patchwork Tue Aug 20 09:51:07 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 19351 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-gh0-f198.google.com (mail-gh0-f198.google.com [209.85.160.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 5271A25E11 for ; Tue, 20 Aug 2013 09:51:10 +0000 (UTC) Received: by mail-gh0-f198.google.com with SMTP id r13sf176279ghr.9 for ; Tue, 20 Aug 2013 02:51:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:delivered-to:mime-version:in-reply-to:references :date:message-id:subject:from:to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe:content-type; bh=An7esZjo76pfYQfin1c4UJKhEK0t6NRQJlA6H2ffZmM=; b=HqGMkMKQvK+TNJ4ptyLrSkTOABabMz2oIUkXQSiHCNPTXuq6HHg8+MhBGnPOSID4jm WWBYgukG4+lFTRIax4sX/ynRXoyJXWIAiCEgGpdsXkAf0skL08uZ4bcUznnPNcZoYl9s n95FA1ip7w827wzVc4ZBXmaa7FfFKyd6bUqRJg4GC3108ISFRKZA+KH3lkvKCxI3NiVI Pfdvr6Qy+cNF5OWDyGqkNZp+b3G3BaOA3RwMylVUVyuw+0166U8S0czrOgsgBpBrep34 Px3XkER+qTQoaW9HSE47qo6eQa8HKHvOfm0ZHDjGV+28shb4iVCtGoEUxlpbfVdJ9yK8 IE2w== X-Received: by 10.236.189.167 with SMTP id c27mr282460yhn.28.1376992270027; Tue, 20 Aug 2013 02:51:10 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.48.49 with SMTP id i17ls135189qen.44.gmail; Tue, 20 Aug 2013 02:51:09 -0700 (PDT) X-Received: by 10.52.96.100 with SMTP id dr4mr354140vdb.17.1376992269864; Tue, 20 Aug 2013 02:51:09 -0700 (PDT) Received: from mail-ve0-f171.google.com (mail-ve0-f171.google.com [209.85.128.171]) by mx.google.com with ESMTPS id i9si161587vcy.18.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 Aug 2013 02:51:09 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.128.171 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.171; Received: by mail-ve0-f171.google.com with SMTP id pa12so121552veb.16 for ; Tue, 20 Aug 2013 02:51:09 -0700 (PDT) X-Gm-Message-State: ALoCoQna4h7+0HIMoxUfXwsQR6S8me7jIsXFsJP0qQg1QXBdzaYhoXWYitQurjRVjJtEO7hRJObt X-Received: by 10.220.145.132 with SMTP id d4mr425991vcv.9.1376992269537; Tue, 20 Aug 2013 02:51:09 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp146775vcz; Tue, 20 Aug 2013 02:51:08 -0700 (PDT) X-Received: by 10.152.30.74 with SMTP id q10mr584546lah.27.1376992268104; Tue, 20 Aug 2013 02:51:08 -0700 (PDT) Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com [209.85.217.177]) by mx.google.com with ESMTPS id nw5si867483lbb.86.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 Aug 2013 02:51:08 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.217.177 is neither permitted nor denied by best guess record for domain of ard.biesheuvel@linaro.org) client-ip=209.85.217.177; Received: by mail-lb0-f177.google.com with SMTP id n6so312771lbi.22 for ; Tue, 20 Aug 2013 02:51:07 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.18.232 with SMTP id z8mr2006758lbd.23.1376992267456; Tue, 20 Aug 2013 02:51:07 -0700 (PDT) Received: by 10.112.22.200 with HTTP; Tue, 20 Aug 2013 02:51:07 -0700 (PDT) In-Reply-To: <1376897504-24672-1-git-send-email-ard.biesheuvel@linaro.org> References: <1376897504-24672-1-git-send-email-ard.biesheuvel@linaro.org> Date: Tue, 20 Aug 2013 11:51:07 +0200 Message-ID: Subject: Fwd: [PATCH v2] ARM: document the use of NEON in kernel mode From: Ard Biesheuvel To: Patch Tracking X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: ard.biesheuvel@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.171 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , ---------- Forwarded message ---------- From: Ard Biesheuvel Date: 19 August 2013 09:31 Subject: [PATCH v2] ARM: document the use of NEON in kernel mode To: linux@arm.linux.org.uk Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel Reviewed-by: Nicolas Pitre Signed-off-by: Ard Biesheuvel --- v2: updated the NEON intrinsics section to reflect that the type ambiguity issue has been addressed by patch 'ARM: add workaround for ambiguous C99 stdint.h types' Documentation/arm/kernel_mode_neon.txt | 121 +++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 Documentation/arm/kernel_mode_neon.txt +--------------- +NEON intrinsics are also supported. However, as code using NEON intrinsics +relies on the GCC header , (which #includes ), you should +observe the following in addition to the rules above: +* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC + uses its builtin version of (this is a C99 header which the kernel + does not supply); +* Include last, or at least after -- 1.8.1.2 diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt new file mode 100644 index 0000000..5254527 --- /dev/null +++ b/Documentation/arm/kernel_mode_neon.txt @@ -0,0 +1,121 @@ +Kernel mode NEON +================ + +TL;DR summary +------------- +* Use only NEON instructions, or VFP instructions that don't rely on support + code +* Isolate your NEON code in a separate compilation unit, and compile it with + '-mfpu=neon -mfloat-abi=softfp' +* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your + NEON code +* Don't sleep in your NEON code, and be aware that it will be executed with + preemption disabled + + +Introduction +------------ +It is possible to use NEON instructions (and in some cases, VFP instructions) in +code that runs in kernel mode. However, for performance reasons, the NEON/VFP +register file is not preserved and restored at every context switch or taken +exception like the normal register file is, so some manual intervention is +required. Furthermore, special care is required for code that may sleep [i.e., +may call schedule()], as NEON or VFP instructions will be executed in a +non-preemptible section for reasons outlined below. + + +Lazy preserve and restore +------------------------- +The NEON/VFP register file is managed using lazy preserve (on UP systems) and +lazy restore (on both SMP and UP systems). This means that the register file is +kept 'live', and is only preserved and restored when multiple tasks are +contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to +another core). Lazy restore is implemented by disabling the NEON/VFP unit after +every context switch, resulting in a trap when subsequently a NEON/VFP +instruction is issued, allowing the kernel to step in and perform the restore if +necessary. + +Any use of the NEON/VFP unit in kernel mode should not interfere with this, so +it is required to do an 'eager' preserve of the NEON/VFP register file, and +enable the NEON/VFP unit explicitly so no exceptions are generated on first +subsequent use. This is handled by the function kernel_neon_begin(), which +should be called before any kernel mode NEON or VFP instructions are issued. +Likewise, the NEON/VFP unit should be disabled again after use to make sure user +mode will hit the lazy restore trap upon next use. This is handled by the +function kernel_neon_end(). + + +Interruptions in kernel mode +---------------------------- +For reasons of performance and simplicity, it was decided that there shall be no +preserve/restore mechanism for the kernel mode NEON/VFP register contents. This +implies that interruptions of a kernel mode NEON section can only be allowed if +they are guaranteed not to touch the NEON/VFP registers. For this reason, the +following rules and restrictions apply in the kernel: +* NEON/VFP code is not allowed in interrupt context; +* NEON/VFP code is not allowed to sleep; +* NEON/VFP code is executed with preemption disabled. + +If latency is a concern, it is possible to put back to back calls to +kernel_neon_end() and kernel_neon_begin() in places in your code where none of +the NEON registers are live. (Additional calls to kernel_neon_begin() should be +reasonably cheap if no context switch occurred in the meantime) + + +VFP and support code +-------------------- +Earlier versions of VFP (prior to version 3) rely on software support for things +like IEEE-754 compliant underflow handling etc. When the VFP unit needs such +software assistance, it signals the kernel by raising an undefined instruction +exception. The kernel responds by inspecting the VFP control registers and the +current instruction and arguments, and emulates the instruction in software. + +Such software assistance is currently not implemented for VFP instructions +executed in kernel mode. If such a condition is encountered, the kernel will +fail and generate an OOPS. + + +Separating NEON code from ordinary code +--------------------------------------- +The compiler is not aware of the special significance of kernel_neon_begin() and +kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions +between calls to these respective functions. Furthermore, GCC may generate NEON +instructions of its own at -O3 level if -mfpu=neon is selected, and even if the +kernel is currently compiled at -O2, future changes may result in NEON/VFP +instructions appearing in unexpected places if no special care is taken. + +Therefore, the recommended and only supported way of using NEON/VFP in the +kernel is by adhering to the following rules: +* isolate the NEON code in a separate compilation unit and compile it with + '-mfpu=neon -mfloat-abi=softfp'; +* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls + into the unit containing the NEON code from a compilation unit which is *not* + built with the GCC flag '-mfpu=neon' set. + +As the kernel is compiled with '-msoft-float', the above will guarantee that +both NEON and VFP instructions will only ever appear in designated compilation +units at any optimization level. + + +NEON assembler +-------------- +NEON assembler is supported with no additional caveats as long as the rules +above are followed. + + +NEON code generated by GCC +-------------------------- +The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit +parallelism, and generates NEON code from ordinary C source code. This is fully +supported as long as the rules above are followed. + + +NEON intrinsics