From patchwork Mon Mar 22 18:30:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 407484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 968C5C433C1 for ; Mon, 22 Mar 2021 18:31:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56DE061994 for ; Mon, 22 Mar 2021 18:31:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231366AbhCVSbM (ORCPT ); Mon, 22 Mar 2021 14:31:12 -0400 Received: from mail1.protonmail.ch ([185.70.40.18]:62916 "EHLO mail1.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231304AbhCVSbC (ORCPT ); Mon, 22 Mar 2021 14:31:02 -0400 Date: Mon, 22 Mar 2021 18:30:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1616437860; bh=b5bnSC21HmdPSasvb0lPwEo3mq8xZMPl9SmiGDrxhUA=; h=Date:To:From:Cc:Reply-To:Subject:From; b=T7gFIEvlwC6tvcyiAGO7XGyjFHGvUsCeAnSnlZePqC9ObCS2k+FPNvlA6BCtbIfvK 0IiSFn0JyKe8s3lslelNM21/WjpYjbT9egUHW2iXTNhB/Iz9js+pkFG3w8OFZSYBIC Fp4PCGRVWGivgvfad8WOg819SMJ6F0CM5PjghkyVxGl3pO2IhW+JmhzjmK4ksM1sUO cflq/xEqRaQAAdxnFDGx+PTPBngvlY6XHklEwEtu/tfaYxXsn7zcDXzEWeNGpUaWuN O+jKscOJ20r9mriCfMJhbrbzHrA0KOl3WtqfLf2d6xPjBGvsS1S/XTNdwnAACN4M++ sAcXS508z8sVQ== To: "David S. Miller" , Jakub Kicinski From: Alexander Lobakin Cc: Jesper Dangaard Brouer , Ilias Apalodimas , Matteo Croce , Mel Gorman , Alexander Lobakin , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH net-next] page_pool: let the compiler optimize and inline core functions Message-ID: <20210322183047.10768-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org As per disscussion in Page Pool bulk allocator thread [0], there are two functions in Page Pool core code that are marked as 'noinline'. The reason for this is not so clear, and even if it was made to reduce hotpath overhead, in fact it only makes things worse. As both of these functions as being called only once through the code, they could be inlined/folded into the non-static entry point. However, 'noinline' marks effectively prevent from doing that and induce totally unneeded fragmentation (baseline -> after removal): add/remove: 0/3 grow/shrink: 1/0 up/down: 1024/-1096 (-72) Function old new delta page_pool_alloc_pages 100 1124 +1024 page_pool_dma_map 164 - -164 page_pool_refill_alloc_cache 332 - -332 __page_pool_alloc_pages_slow 600 - -600 (taken from Mel's branch, hence factored-out page_pool_dma_map()) 1124 is a normal hotpath frame size, but these jumps between tiny page_pool_alloc_pages(), page_pool_refill_alloc_cache() and __page_pool_alloc_pages_slow() are really redundant and harmful for performance. This simple removal of 'noinline' keywords bumps the throughput on XDP_PASS + napi_build_skb() + napi_gro_receive() on 25+ Mbps for 1G embedded NIC. [0] https://lore.kernel.org/netdev/20210317222506.1266004-1-alobakin@pm.me Signed-off-by: Alexander Lobakin --- net/core/page_pool.c | 2 -- 1 file changed, 2 deletions(-) -- 2.31.0 diff --git a/net/core/page_pool.c b/net/core/page_pool.c index ad8b0707af04..589e4df6ef2b 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -102,7 +102,6 @@ EXPORT_SYMBOL(page_pool_create); static void page_pool_return_page(struct page_pool *pool, struct page *page); -noinline static struct page *page_pool_refill_alloc_cache(struct page_pool *pool) { struct ptr_ring *r = &pool->ring; @@ -181,7 +180,6 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool, } /* slow path */ -noinline static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, gfp_t _gfp) {