From patchwork Thu Mar 15 14:02:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 131788 Delivered-To: patch@linaro.org Received: by 10.46.84.17 with SMTP id i17csp1161266ljb; Thu, 15 Mar 2018 07:02:56 -0700 (PDT) X-Google-Smtp-Source: AG47ELvF4DdEmbnXekePAGEPEGNgie8jVzupNUJz5A2hSAIW8ClT0inQqN4uA33RY2rf/XaCzFNf X-Received: by 10.80.141.193 with SMTP id s1mr9033421edh.0.1521122576080; Thu, 15 Mar 2018 07:02:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521122576; cv=none; d=google.com; s=arc-20160816; b=Ta8dcOcOnS9UAdAAWiS+Eb8xR47X6c91UN1mVGEMV34rh6MAPOBS8H+OFa4IwX0Pj2 onSzwL4lxn7mGZKGLo3WphwsuFL3MKn3UVyaWa8lJ6kPhx2KlM2b5AP3aqxuOSg/9Jv7 ciFkjjAlfUz+F5KU6JWA2jK9LrNLeIe0Zvau2ZwHIyDSHBIQ5Zo9LShxQyVM29fEtwUr iUyHDmynwiHiE6YWnx2vo6U5ilIVIsnDceMMbuK+Jy2cRcefob4or3ohXf17dW6wbVVM nVXqwjQej3pEJpE9ppuRYheM5OAZVvMVkwUyc8YcsNhNdRXHOzFZiRVJKLDmDIuKnBBc XsiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:cc:references:in-reply-to:message-id :date:to:from:arc-authentication-results; bh=ADZuaQNEDiJguBIFhncDAYYhbuMkXpMTBoDwyfuRyoQ=; b=Mqd0Tn+ePf7O8aXkk7jQ3SDsBU7DR3tK9xNXvyLwa0thvZz2fCRxm1PRAzEi1X/MjE /ioevrnrJTo2wmURE6wLCWACx6wo4qiLOVJQB9gU7xwruKtc8+QFfUU/uxFRjioYEVCN rsb1Dbx8Qe5rG0OlLZP57CrjkynwZgJDyiYc5YI2c4eEuSNWB5NpKzItZYfQh4fpIwcK RMORETiHMuN2DYtshcjhQ03yR0gQhvW0KY4MtBZ41eCzskJDeEFTdZtOy3i4xF4hwhZz CypNuaa8J+MHPvxBLMUX40VDHqFkIOF+DFyaMhZp23adcEL2zSzvKyRzSj+gvSbCtW87 lE8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of u-boot-bounces@lists.denx.de designates 81.169.180.215 as permitted sender) smtp.mailfrom=u-boot-bounces@lists.denx.de Return-Path: Received: from lists.denx.de (dione.denx.de. [81.169.180.215]) by mx.google.com with ESMTP id g62si1943374ede.394.2018.03.15.07.02.36; Thu, 15 Mar 2018 07:02:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of u-boot-bounces@lists.denx.de designates 81.169.180.215 as permitted sender) client-ip=81.169.180.215; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of u-boot-bounces@lists.denx.de designates 81.169.180.215 as permitted sender) smtp.mailfrom=u-boot-bounces@lists.denx.de Received: by lists.denx.de (Postfix, from userid 105) id C7AC8C21DEC; Thu, 15 Mar 2018 14:02:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on lists.denx.de X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_IN_DNSWL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from lists.denx.de (localhost [IPv6:::1]) by lists.denx.de (Postfix) with ESMTP id 132A1C21C29; Thu, 15 Mar 2018 14:02:32 +0000 (UTC) Received: by lists.denx.de (Postfix, from userid 105) id B8DB3C21C2F; Thu, 15 Mar 2018 14:02:30 +0000 (UTC) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by lists.denx.de (Postfix) with ESMTPS id 4CEFDC21C29 for ; Thu, 15 Mar 2018 14:02:30 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id DACBFAFD2; Thu, 15 Mar 2018 14:02:29 +0000 (UTC) From: Alexander Graf To: u-boot@lists.denx.de Date: Thu, 15 Mar 2018 15:02:29 +0100 Message-Id: <20180315140229.7737-2-agraf@suse.de> X-Mailer: git-send-email 2.12.3 In-Reply-To: <20180315140229.7737-1-agraf@suse.de> References: <20180315140229.7737-1-agraf@suse.de> Cc: Heinrich Schuchardt Subject: [U-Boot] [PATCH 2/2] efi_loader: Optimize GOP more X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.18 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" The GOP path was optimized, but still not as fast as it should be. Let's push it even further by trimming the hot path into simple 32bit load/store operations for buf->vid 32bpp operations. Signed-off-by: Alexander Graf --- lib/efi_loader/efi_gop.c | 176 ++++++++++++++++++++++++++++++----------------- 1 file changed, 114 insertions(+), 62 deletions(-) diff --git a/lib/efi_loader/efi_gop.c b/lib/efi_loader/efi_gop.c index bbdf34e1dd..7b76e49ab0 100644 --- a/lib/efi_loader/efi_gop.c +++ b/lib/efi_loader/efi_gop.c @@ -78,18 +78,20 @@ static inline u16 efi_blt_col_to_vid16(struct efi_gop_pixel *blt) } static __always_inline efi_status_t gop_blt_int(struct efi_gop *this, - struct efi_gop_pixel *buffer, + struct efi_gop_pixel *bufferp, u32 operation, efi_uintn_t sx, efi_uintn_t sy, efi_uintn_t dx, efi_uintn_t dy, efi_uintn_t width, efi_uintn_t height, - efi_uintn_t delta) + efi_uintn_t delta, + efi_uintn_t vid_bpp) { struct efi_gop_obj *gopobj = container_of(this, struct efi_gop_obj, ops); - efi_uintn_t i, j, linelen; + efi_uintn_t i, j, linelen, slineoff = 0, dlineoff, swidth, dwidth; u32 *fb32 = gopobj->fb; u16 *fb16 = gopobj->fb; + struct efi_gop_pixel *buffer = __builtin_assume_aligned(bufferp, 4); if (delta) { /* Check for 4 byte alignment */ @@ -133,6 +135,37 @@ static __always_inline efi_status_t gop_blt_int(struct efi_gop *this, break; } + /* Calculate line width */ + switch (operation) { + case EFI_BLT_BUFFER_TO_VIDEO: + swidth = linelen; + break; + case EFI_BLT_VIDEO_TO_BLT_BUFFER: + case EFI_BLT_VIDEO_TO_VIDEO: + swidth = gopobj->info.width; + if (!vid_bpp) + return EFI_UNSUPPORTED; + break; + case EFI_BLT_VIDEO_FILL: + swidth = 0; + break; + } + + switch (operation) { + case EFI_BLT_BUFFER_TO_VIDEO: + case EFI_BLT_VIDEO_FILL: + case EFI_BLT_VIDEO_TO_VIDEO: + dwidth = gopobj->info.width; + if (!vid_bpp) + return EFI_UNSUPPORTED; + break; + case EFI_BLT_VIDEO_TO_BLT_BUFFER: + dwidth = linelen; + break; + } + + slineoff = swidth * sy; + dlineoff = dwidth * dy; for (i = 0; i < height; i++) { for (j = 0; j < width; j++) { struct efi_gop_pixel pix; @@ -143,70 +176,65 @@ static __always_inline efi_status_t gop_blt_int(struct efi_gop *this, pix = *buffer; break; case EFI_BLT_BUFFER_TO_VIDEO: - pix = buffer[linelen * (i + sy) + j + sx]; + pix = buffer[slineoff + j + sx]; break; case EFI_BLT_VIDEO_TO_BLT_BUFFER: case EFI_BLT_VIDEO_TO_VIDEO: - switch (gopobj->bpix) { -#ifdef CONFIG_DM_VIDEO - case VIDEO_BPP32: -#else - case LCD_COLOR32: -#endif + if (vid_bpp == 32) pix = *(struct efi_gop_pixel *)&fb32[ - gopobj->info.width * - (i + sy) + j + sx]; - break; -#ifdef CONFIG_DM_VIDEO - case VIDEO_BPP16: -#else - case LCD_COLOR16: -#endif + slineoff + j + sx]; + else pix = efi_vid16_to_blt_col(fb16[ - gopobj->info.width * - (i + sy) + j + sx]); - break; - default: - return EFI_UNSUPPORTED; - } + slineoff + j + sx]); break; } /* Write destination pixel */ switch (operation) { case EFI_BLT_VIDEO_TO_BLT_BUFFER: - buffer[linelen * (i + dy) + j + dx] = pix; + buffer[dlineoff + j + dx] = pix; break; case EFI_BLT_BUFFER_TO_VIDEO: case EFI_BLT_VIDEO_FILL: case EFI_BLT_VIDEO_TO_VIDEO: - switch (gopobj->bpix) { + if (vid_bpp == 32) + fb32[dlineoff + j + dx] = *(u32 *)&pix; + else + fb16[dlineoff + j + dx] = + efi_blt_col_to_vid16(&pix); + break; + } + } + slineoff += swidth; + dlineoff += dwidth; + } + + return EFI_SUCCESS; +} + +static efi_uintn_t gop_get_bpp(struct efi_gop *this) +{ + struct efi_gop_obj *gopobj = container_of(this, struct efi_gop_obj, ops); + efi_uintn_t vid_bpp = 0; + + switch (gopobj->bpix) { #ifdef CONFIG_DM_VIDEO - case VIDEO_BPP32: + case VIDEO_BPP32: #else - case LCD_COLOR32: + case LCD_COLOR32: #endif - fb32[gopobj->info.width * - (i + dy) + j + dx] = *(u32 *)&pix; - break; + vid_bpp = 32; + break; #ifdef CONFIG_DM_VIDEO - case VIDEO_BPP16: + case VIDEO_BPP16: #else - case LCD_COLOR16: + case LCD_COLOR16: #endif - fb16[gopobj->info.width * - (i + dy) + j + dx] = - efi_blt_col_to_vid16(&pix); - break; - default: - return EFI_UNSUPPORTED; - } - break; - } - } + vid_bpp = 16; + break; } - return EFI_SUCCESS; + return vid_bpp; } /* @@ -223,21 +251,33 @@ static efi_status_t gop_blt_video_fill(struct efi_gop *this, u32 foo, efi_uintn_t sx, efi_uintn_t sy, efi_uintn_t dx, efi_uintn_t dy, efi_uintn_t width, - efi_uintn_t height, efi_uintn_t delta) + efi_uintn_t height, efi_uintn_t delta, + efi_uintn_t vid_bpp) { return gop_blt_int(this, buffer, EFI_BLT_VIDEO_FILL, sx, sy, dx, - dy, width, height, delta); + dy, width, height, delta, vid_bpp); } -static efi_status_t gop_blt_buf_to_vid(struct efi_gop *this, - struct efi_gop_pixel *buffer, - u32 foo, efi_uintn_t sx, - efi_uintn_t sy, efi_uintn_t dx, - efi_uintn_t dy, efi_uintn_t width, - efi_uintn_t height, efi_uintn_t delta) +static efi_status_t gop_blt_buf_to_vid16(struct efi_gop *this, + struct efi_gop_pixel *buffer, + u32 foo, efi_uintn_t sx, + efi_uintn_t sy, efi_uintn_t dx, + efi_uintn_t dy, efi_uintn_t width, + efi_uintn_t height, efi_uintn_t delta) { return gop_blt_int(this, buffer, EFI_BLT_BUFFER_TO_VIDEO, sx, sy, dx, - dy, width, height, delta); + dy, width, height, delta, 16); +} + +static efi_status_t gop_blt_buf_to_vid32(struct efi_gop *this, + struct efi_gop_pixel *buffer, + u32 foo, efi_uintn_t sx, + efi_uintn_t sy, efi_uintn_t dx, + efi_uintn_t dy, efi_uintn_t width, + efi_uintn_t height, efi_uintn_t delta) +{ + return gop_blt_int(this, buffer, EFI_BLT_BUFFER_TO_VIDEO, sx, sy, dx, + dy, width, height, delta, 32); } static efi_status_t gop_blt_vid_to_vid(struct efi_gop *this, @@ -245,10 +285,11 @@ static efi_status_t gop_blt_vid_to_vid(struct efi_gop *this, u32 foo, efi_uintn_t sx, efi_uintn_t sy, efi_uintn_t dx, efi_uintn_t dy, efi_uintn_t width, - efi_uintn_t height, efi_uintn_t delta) + efi_uintn_t height, efi_uintn_t delta, + efi_uintn_t vid_bpp) { return gop_blt_int(this, buffer, EFI_BLT_VIDEO_TO_VIDEO, sx, sy, dx, - dy, width, height, delta); + dy, width, height, delta, vid_bpp); } static efi_status_t gop_blt_vid_to_buf(struct efi_gop *this, @@ -256,10 +297,11 @@ static efi_status_t gop_blt_vid_to_buf(struct efi_gop *this, u32 foo, efi_uintn_t sx, efi_uintn_t sy, efi_uintn_t dx, efi_uintn_t dy, efi_uintn_t width, - efi_uintn_t height, efi_uintn_t delta) + efi_uintn_t height, efi_uintn_t delta, + efi_uintn_t vid_bpp) { return gop_blt_int(this, buffer, EFI_BLT_VIDEO_TO_BLT_BUFFER, sx, sy, - dx, dy, width, height, delta); + dx, dy, width, height, delta, vid_bpp); } /* @@ -287,27 +329,37 @@ efi_status_t EFIAPI gop_blt(struct efi_gop *this, struct efi_gop_pixel *buffer, efi_uintn_t height, efi_uintn_t delta) { efi_status_t ret = EFI_INVALID_PARAMETER; + efi_uintn_t vid_bpp; EFI_ENTRY("%p, %p, %u, %zu, %zu, %zu, %zu, %zu, %zu, %zu", this, buffer, operation, sx, sy, dx, dy, width, height, delta); + vid_bpp = gop_get_bpp(this); + /* Allow for compiler optimization */ switch (operation) { case EFI_BLT_VIDEO_FILL: ret = gop_blt_video_fill(this, buffer, operation, sx, sy, dx, - dy, width, height, delta); + dy, width, height, delta, vid_bpp); break; case EFI_BLT_BUFFER_TO_VIDEO: - ret = gop_blt_buf_to_vid(this, buffer, operation, sx, sy, dx, - dy, width, height, delta); + /* This needs to be super-fast, so duplicate for 16/32bpp */ + if (vid_bpp == 32) + ret = gop_blt_buf_to_vid32(this, buffer, operation, sx, + sy, dx, dy, width, height, + delta); + else + ret = gop_blt_buf_to_vid16(this, buffer, operation, sx, + sy, dx, dy, width, height, + delta); break; case EFI_BLT_VIDEO_TO_VIDEO: ret = gop_blt_vid_to_vid(this, buffer, operation, sx, sy, dx, - dy, width, height, delta); + dy, width, height, delta, vid_bpp); break; case EFI_BLT_VIDEO_TO_BLT_BUFFER: ret = gop_blt_vid_to_buf(this, buffer, operation, sx, sy, dx, - dy, width, height, delta); + dy, width, height, delta, vid_bpp); break; default: ret = EFI_UNSUPPORTED;