From patchwork Tue Jul 13 16:19:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 477149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, MIME_BASE64_TEXT, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EE37C11F69 for ; Tue, 13 Jul 2021 16:19:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 843CA61175 for ; Tue, 13 Jul 2021 16:19:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231736AbhGMQWN (ORCPT ); Tue, 13 Jul 2021 12:22:13 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:57598 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230376AbhGMQWJ (ORCPT ); Tue, 13 Jul 2021 12:22:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626193159; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cPecu7ZNX7bQJ/iS17rhbYrep1HtpV5Xs2IePf8joKs=; b=dDVAjOw/WKfno6rbF/DLgCMsJaY2raE2eoRRDXSuhPQqbsSK83DoLLVzTbuiUKfSkqwGXs Zd0bGHzANs5wq4SK8hGK2KeQ7XTACObxMQxsTSiuHJPJFmJJ20LPdJiZfqVYV2CyLpoKtU kvIR4UFbTk4Do+/fpZV/CY5k8GOqQKo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-482-WVhsITTWO6iMOj6mb1Vxlg-1; Tue, 13 Jul 2021 12:19:18 -0400 X-MC-Unique: WVhsITTWO6iMOj6mb1Vxlg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9B0D5DF8A0; Tue, 13 Jul 2021 16:19:16 +0000 (UTC) Received: from localhost (ovpn-112-172.ams2.redhat.com [10.36.112.172]) by smtp.corp.redhat.com (Postfix) with ESMTP id 32C5C60C2B; Tue, 13 Jul 2021 16:19:16 +0000 (UTC) From: Stefan Hajnoczi To: linux-kernel@vger.kernel.org Cc: Daniel Lezcano , Stefano Garzarella , Ming Lei , "Michael S . Tsirkin" , Marcelo Tosatti , Jens Axboe , Jason Wang , linux-block@vger.kernel.org, "Rafael J. Wysocki" , virtualization@lists.linux-foundation.org, linux-pm@vger.kernel.org, Christoph Hellwig , Stefan Hajnoczi Subject: [RFC 1/3] cpuidle: add poll_source API Date: Tue, 13 Jul 2021 17:19:04 +0100 Message-Id: <20210713161906.457857-2-stefanha@redhat.com> In-Reply-To: <20210713161906.457857-1-stefanha@redhat.com> References: <20210713161906.457857-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Introduce an API for adding cpuidle poll callbacks: struct poll_source_ops { void (*start)(struct poll_source *src); void (*stop)(struct poll_source *src); void (*poll)(struct poll_source *src); }; int poll_source_register(struct poll_source *src); int poll_source_unregister(struct poll_source *src); When cpuidle enters the poll state it invokes ->start() and then invokes ->poll() repeatedly from the busy wait loop. Finally ->stop() is invoked when the busy wait loop finishes. The ->poll() function should check for activity and cause TIF_NEED_RESCHED to be set in order to stop the busy wait loop. This API is intended to be used by drivers that can cheaply poll for events. Participating in cpuidle polling allows them to avoid interrupt latencies during periods where the CPU is going to poll anyway. Note that each poll_source is bound to a particular CPU. The API is mainly intended to by used by drivers that have multiple queues with irq affinity. Signed-off-by: Stefan Hajnoczi --- drivers/cpuidle/Makefile | 1 + include/linux/poll_source.h | 53 +++++++++++++++++++ drivers/cpuidle/poll_source.c | 99 +++++++++++++++++++++++++++++++++++ drivers/cpuidle/poll_state.c | 6 +++ 4 files changed, 159 insertions(+) create mode 100644 include/linux/poll_source.h create mode 100644 drivers/cpuidle/poll_source.c diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index 26bbc5e74123..994f72d6fe95 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -7,6 +7,7 @@ obj-y += cpuidle.o driver.o governor.o sysfs.o governors/ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o +obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_source.o obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o ################################################################################## diff --git a/include/linux/poll_source.h b/include/linux/poll_source.h new file mode 100644 index 000000000000..ccfb424e170b --- /dev/null +++ b/include/linux/poll_source.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * poll_source.h - cpuidle busy waiting API + */ +#ifndef __LINUX_POLLSOURCE_H__ +#define __LINUX_POLLSOURCE_H__ + +#include + +struct poll_source; + +struct poll_source_ops { + void (*start)(struct poll_source *src); + void (*stop)(struct poll_source *src); + void (*poll)(struct poll_source *src); +}; + +struct poll_source { + const struct poll_source_ops *ops; + struct list_head node; + int cpu; +}; + +/** + * poll_source_register - Add a poll_source for a CPU + */ +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX) +int poll_source_register(struct poll_source *src); +#else +static inline int poll_source_register(struct poll_source *src) +{ + return 0; +} +#endif + +/** + * poll_source_unregister - Remove a previously registered poll_source + */ +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX) +int poll_source_unregister(struct poll_source *src); +#else +static inline int poll_source_unregister(struct poll_source *src) +{ + return 0; +} +#endif + +/* Used by the cpuidle driver */ +void poll_source_start(void); +void poll_source_run_once(void); +void poll_source_stop(void); + +#endif /* __LINUX_POLLSOURCE_H__ */ diff --git a/drivers/cpuidle/poll_source.c b/drivers/cpuidle/poll_source.c new file mode 100644 index 000000000000..46100e5a71e4 --- /dev/null +++ b/drivers/cpuidle/poll_source.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * poll_source.c - cpuidle busy waiting API + */ + +#include +#include +#include + +/* The per-cpu list of registered poll sources */ +DEFINE_PER_CPU(struct list_head, poll_source_list); + +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ +void poll_source_start(void) +{ + struct poll_source *src; + + list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) + src->ops->start(src); +} + +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ +void poll_source_run_once(void) +{ + struct poll_source *src; + + list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) + src->ops->poll(src); +} + +/* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ +void poll_source_stop(void) +{ + struct poll_source *src; + + list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) + src->ops->stop(src); +} + +static void poll_source_register_this_cpu(void *opaque) +{ + struct poll_source *src = opaque; + + lockdep_assert_irqs_disabled(); + + list_add_tail(&src->node, this_cpu_ptr(&poll_source_list)); +} + +int poll_source_register(struct poll_source *src) +{ + if (!list_empty(&src->node)) + return -EBUSY; + + /* + * There is no race with src->cpu iterating over poll_source_list + * because smp_call_function_single() just sets TIF_NEED_RESCHED + * instead of sending an IPI during idle. + */ + /* TODO but what happens if the flag isn't set yet when smp_call_function_single() is invoked? */ + return smp_call_function_single(src->cpu, + poll_source_register_this_cpu, + src, + 1); +} +EXPORT_SYMBOL_GPL(poll_source_register); + +static void poll_source_unregister_this_cpu(void *opaque) +{ + struct poll_source *src = opaque; + + lockdep_assert_irqs_disabled(); + + /* + * See comment in poll_source_register() about why this does not race + * with the idle CPU iterating over poll_source_list. + */ + list_del_init(&src->node); +} + +int poll_source_unregister(struct poll_source *src) +{ + return smp_call_function_single(src->cpu, + poll_source_unregister_this_cpu, + src, + 1); +} +EXPORT_SYMBOL_GPL(poll_source_unregister); + +/* TODO what happens when a CPU goes offline? */ +static int __init poll_source_init(void) +{ + int i; + + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(poll_source_list, i)); + + return 0; +} +core_initcall(poll_source_init); diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c index f7e83613ae94..aa26870034ac 100644 --- a/drivers/cpuidle/poll_state.c +++ b/drivers/cpuidle/poll_state.c @@ -7,6 +7,7 @@ #include #include #include +#include #define POLL_IDLE_RELAX_COUNT 200 @@ -22,9 +23,12 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, unsigned int loop_count = 0; u64 limit; + poll_source_start(); + limit = cpuidle_poll_time(drv, dev); while (!need_resched()) { + poll_source_run_once(); cpu_relax(); if (loop_count++ < POLL_IDLE_RELAX_COUNT) continue; @@ -35,6 +39,8 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, break; } } + + poll_source_stop(); } current_clr_polling(); From patchwork Tue Jul 13 16:19:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 474946 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, MIME_BASE64_TEXT, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB7E8C11F66 for ; Tue, 13 Jul 2021 16:19:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D272E61175 for ; Tue, 13 Jul 2021 16:19:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230072AbhGMQWU (ORCPT ); Tue, 13 Jul 2021 12:22:20 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:29084 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230342AbhGMQWM (ORCPT ); Tue, 13 Jul 2021 12:22:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626193161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hdgt87jDYVPvBMHlz3oOFZWil5KClrfBwnUpI6smyYE=; b=RxFFvjGh7HEILsBOonncHJDid5zOBa3LK7TYnezNYYyruxGH7ARRsyyPWQdVK1xFhiXW9g vJPKRDypY0h8oOeGmRllVV9R1BwEH2DUqpmkbFOhzzHXyIzny/j23ZzRF5xtEv0ihjSnpi 1MEVjBZalX26macJpNTuHXCydAmhdZU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-29-spjfYe4aPiyycmVJ66oQdA-1; Tue, 13 Jul 2021 12:19:20 -0400 X-MC-Unique: spjfYe4aPiyycmVJ66oQdA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 63A9610766D5; Tue, 13 Jul 2021 16:19:18 +0000 (UTC) Received: from localhost (ovpn-112-172.ams2.redhat.com [10.36.112.172]) by smtp.corp.redhat.com (Postfix) with ESMTP id C53EA5D9CA; Tue, 13 Jul 2021 16:19:17 +0000 (UTC) From: Stefan Hajnoczi To: linux-kernel@vger.kernel.org Cc: Daniel Lezcano , Stefano Garzarella , Ming Lei , "Michael S . Tsirkin" , Marcelo Tosatti , Jens Axboe , Jason Wang , linux-block@vger.kernel.org, "Rafael J. Wysocki" , virtualization@lists.linux-foundation.org, linux-pm@vger.kernel.org, Christoph Hellwig , Stefan Hajnoczi Subject: [RFC 2/3] virtio: add poll_source virtqueue polling Date: Tue, 13 Jul 2021 17:19:05 +0100 Message-Id: <20210713161906.457857-3-stefanha@redhat.com> In-Reply-To: <20210713161906.457857-1-stefanha@redhat.com> References: <20210713161906.457857-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org VIRTIO drivers can cheaply disable interrupts by setting RING_EVENT_FLAGS_DISABLE in the packed virtqueues's Driver Event Suppression structure. See "2.7.10 Driver and Device Event Suppression" in the VIRTIO 1.1 specification for details (https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html). Add a per-virtqueue poll_source that disables events in ->start(), processes completed virtqueue buffers in ->poll(), and re-enables events in ->stop(). This optimization avoids interrupt injection during cpuidle polling. Workloads that submit requests and then halt the vCPU until completion benefit from this. To enable this optimization: 1. Enable the cpuidle haltpoll driver: https://www.kernel.org/doc/html/latest/virt/guest-halt-polling.html 2. Enable poll_source on the virtio device: # echo -n 1 > /sys/block/vda/device/poll_source Note that this feature currently as no effect on split virtqueues when VIRTIO_F_EVENT_IDX is negotiated. It may be possible to tweak virtqueue_disable_cb_split() but I haven't attempted it here. This optimization has been benchmarked successfully with virtio-blk devices. Currently it does not improve virtio-net performance, probably because it doesn't combine with NAPI polling. Signed-off-by: Stefan Hajnoczi --- drivers/virtio/virtio_pci_common.h | 7 +++ include/linux/virtio.h | 2 + include/linux/virtio_config.h | 2 + drivers/virtio/virtio.c | 34 ++++++++++++ drivers/virtio/virtio_pci_common.c | 86 ++++++++++++++++++++++++++++++ drivers/virtio/virtio_pci_modern.c | 2 + 6 files changed, 133 insertions(+) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index beec047a8f8d..630746043183 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -38,6 +39,9 @@ struct virtio_pci_vq_info { /* MSI-X vector (or none) */ unsigned msix_vector; + + /* the cpuidle poll_source */ + struct poll_source poll_source; }; /* Our device structure */ @@ -102,6 +106,9 @@ static struct virtio_pci_device *to_vp_device(struct virtio_device *vdev) return container_of(vdev, struct virtio_pci_device, vdev); } +/* enable poll_source API for vq polling */ +int vp_enable_poll_source(struct virtio_device *vdev, bool enable); + /* wait for pending irq handlers */ void vp_synchronize_vectors(struct virtio_device *vdev); /* the notify function used when creating a virt queue */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b1894e0323fa..baaa3c8fadb1 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -93,6 +93,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq); * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore) + * @poll_source_enabled: poll_source API enabled for vq polling * @config_enabled: configuration change reporting enabled * @config_change_pending: configuration change reported while disabled * @config_lock: protects configuration change reporting @@ -107,6 +108,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq); struct virtio_device { int index; bool failed; + bool poll_source_enabled; bool config_enabled; bool config_change_pending; spinlock_t config_lock; diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 8519b3ae5d52..5fb78d56fd44 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -72,6 +72,7 @@ struct virtio_shm_region { * @set_vq_affinity: set the affinity for a virtqueue (optional). * @get_vq_affinity: get the affinity for a virtqueue (optional). * @get_shm_region: get a shared memory region based on the index. + * @enable_poll_source: enable/disable poll_source API vq polling (optional). */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -97,6 +98,7 @@ struct virtio_config_ops { int index); bool (*get_shm_region)(struct virtio_device *vdev, struct virtio_shm_region *region, u8 id); + int (*enable_poll_source)(struct virtio_device *dev, bool enable); }; /* If driver didn't advertise the feature, it will never appear. */ diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 4b15c00c0a0a..22fee71bbe34 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -59,12 +59,44 @@ static ssize_t features_show(struct device *_d, } static DEVICE_ATTR_RO(features); +static ssize_t poll_source_show(struct device *_d, + struct device_attribute *attr, char *buf) +{ + struct virtio_device *dev = dev_to_virtio(_d); + return sprintf(buf, "%d\n", dev->poll_source_enabled); +} + +static ssize_t poll_source_store(struct device *_d, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct virtio_device *dev = dev_to_virtio(_d); + bool val; + int rc; + + rc = kstrtobool(buf, &val); + if (rc) + return rc; + + if (val == dev->poll_source_enabled) + return count; + if (!dev->config->enable_poll_source) + return -ENOTSUPP; + + rc = dev->config->enable_poll_source(dev, val); + if (rc) + return rc; + return count; +} +static DEVICE_ATTR_RW(poll_source); + static struct attribute *virtio_dev_attrs[] = { &dev_attr_device.attr, &dev_attr_vendor.attr, &dev_attr_status.attr, &dev_attr_modalias.attr, &dev_attr_features.attr, + &dev_attr_poll_source.attr, NULL, }; ATTRIBUTE_GROUPS(virtio_dev); @@ -343,6 +375,8 @@ int register_virtio_device(struct virtio_device *dev) dev->index = err; dev_set_name(&dev->dev, "virtio%u", dev->index); + dev->poll_source_enabled = false; + spin_lock_init(&dev->config_lock); dev->config_enabled = false; dev->config_change_pending = false; diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 222d630c41fc..6de372e12afb 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -24,6 +24,83 @@ MODULE_PARM_DESC(force_legacy, "Force legacy mode for transitional virtio 1 devices"); #endif +static void vp_poll_source_start(struct poll_source *src) +{ + struct virtio_pci_vq_info *info = + container_of(src, struct virtio_pci_vq_info, poll_source); + + /* This API does not require a lock */ + virtqueue_disable_cb(info->vq); +} + +static void vp_poll_source_stop(struct poll_source *src) +{ + struct virtio_pci_vq_info *info = + container_of(src, struct virtio_pci_vq_info, poll_source); + + /* Poll one last time in case */ + /* TODO allow driver to provide spinlock */ + if (!virtqueue_enable_cb(info->vq)) + vring_interrupt(0 /* ignored */, info->vq); +} + +static void vp_poll_source_poll(struct poll_source *src) +{ + struct virtio_pci_vq_info *info = + container_of(src, struct virtio_pci_vq_info, poll_source); + + vring_interrupt(0 /* ignored */, info->vq); +} + +static const struct poll_source_ops vp_poll_source_ops = { + .start = vp_poll_source_start, + .stop = vp_poll_source_stop, + .poll = vp_poll_source_poll, +}; + +/* call this when irq affinity changes to update cpuidle poll_source */ +/* TODO this function waits for a smp_call_function_single() completion, is that allowed in all caller contexts? */ +/* TODO this function is not thread-safe, do all callers hold the same lock? */ +static int vp_update_poll_source(struct virtio_device *vdev, int index) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct poll_source *src = &vp_dev->vqs[index]->poll_source; + const struct cpumask *mask; + + if (!list_empty(&src->node)) + poll_source_unregister(src); + + if (!vdev->poll_source_enabled) + return 0; + + mask = vp_get_vq_affinity(vdev, index); + if (!mask) + return -ENOTTY; + + /* Update the poll_source cpu */ + src->cpu = cpumask_first(mask); + + /* Don't use poll_source if interrupts are handled on multiple CPUs */ + if (cpumask_next(src->cpu, mask) < nr_cpu_ids) + return 0; + + return poll_source_register(src); +} + +/* TODO add this to virtio_pci_legacy config ops? */ +int vp_enable_poll_source(struct virtio_device *vdev, bool enable) +{ + struct virtqueue *vq; + + vdev->poll_source_enabled = enable; + + /* TODO locking */ + list_for_each_entry(vq, &vdev->vqs, list) { + vp_update_poll_source(vdev, vq->index); /* TODO handle errors? */ + } + return 0; +} + /* wait for pending irq handlers */ void vp_synchronize_vectors(struct virtio_device *vdev) { @@ -186,6 +263,9 @@ static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned index, if (!info) return ERR_PTR(-ENOMEM); + info->poll_source.ops = &vp_poll_source_ops; + INIT_LIST_HEAD(&info->poll_source.node); + vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, ctx, msix_vec); if (IS_ERR(vq)) @@ -237,6 +317,7 @@ void vp_del_vqs(struct virtio_device *vdev) int irq = pci_irq_vector(vp_dev->pci_dev, v); irq_set_affinity_hint(irq, NULL); + vp_update_poll_source(vdev, vq->index); free_irq(irq, vq); } } @@ -342,6 +423,9 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, vqs[i]); if (err) goto error_find; + + if (desc) + vp_update_poll_source(vdev, i); } return 0; @@ -440,6 +524,8 @@ int vp_set_vq_affinity(struct virtqueue *vq, const struct cpumask *cpu_mask) cpumask_copy(mask, cpu_mask); irq_set_affinity_hint(irq, mask); } + + vp_update_poll_source(vdev, vq->index); } return 0; } diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 30654d3a0b41..417d568590f2 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = { .set_vq_affinity = vp_set_vq_affinity, .get_vq_affinity = vp_get_vq_affinity, .get_shm_region = vp_get_shm_region, + .enable_poll_source = vp_enable_poll_source, }; static const struct virtio_config_ops virtio_pci_config_ops = { @@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = { .set_vq_affinity = vp_set_vq_affinity, .get_vq_affinity = vp_get_vq_affinity, .get_shm_region = vp_get_shm_region, + .enable_poll_source = vp_enable_poll_source, }; /* the PCI probing function */ From patchwork Tue Jul 13 16:19:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 474947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, MIME_BASE64_TEXT, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E497C11F68 for ; Tue, 13 Jul 2021 16:19:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18CF260C40 for ; Tue, 13 Jul 2021 16:19:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231967AbhGMQWO (ORCPT ); Tue, 13 Jul 2021 12:22:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:24864 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231428AbhGMQWN (ORCPT ); Tue, 13 Jul 2021 12:22:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626193162; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LkVbvILKBq40WaEmHLBRSgoHvH/vzwrA0dUxLC2cHSE=; b=h4NVAU573xF6cAzB8SuI9BaVEwASb+RKGKVCOETmsdML3QHe8s8FsGt4NzHrPAskGWAbK7 p8fBOY2ENmIXGh5LXEcHR3/Q2pQMNUxblg7UTRMj7xtCFTVlBsLWoZisZtYS98kQVPOoDu vG0qD+imlrDNaM7bcGWu3/VDOcl8h7E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-522-kWE6OaooNECjV1bN-9szzQ-1; Tue, 13 Jul 2021 12:19:21 -0400 X-MC-Unique: kWE6OaooNECjV1bN-9szzQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CF48C1009444; Tue, 13 Jul 2021 16:19:19 +0000 (UTC) Received: from localhost (ovpn-112-172.ams2.redhat.com [10.36.112.172]) by smtp.corp.redhat.com (Postfix) with ESMTP id 713906091B; Tue, 13 Jul 2021 16:19:19 +0000 (UTC) From: Stefan Hajnoczi To: linux-kernel@vger.kernel.org Cc: Daniel Lezcano , Stefano Garzarella , Ming Lei , "Michael S . Tsirkin" , Marcelo Tosatti , Jens Axboe , Jason Wang , linux-block@vger.kernel.org, "Rafael J. Wysocki" , virtualization@lists.linux-foundation.org, linux-pm@vger.kernel.org, Christoph Hellwig , Stefan Hajnoczi Subject: [RFC 3/3] softirq: participate in cpuidle polling Date: Tue, 13 Jul 2021 17:19:06 +0100 Message-Id: <20210713161906.457857-4-stefanha@redhat.com> In-Reply-To: <20210713161906.457857-1-stefanha@redhat.com> References: <20210713161906.457857-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Normally softirqs are invoked when exiting irqs. When polling in the cpuidle driver there may be no irq. Therefore pending softirqs go unnoticed and polling continues without invoking them. Add a softirq_poll() function to explicitly check for and invoke softirqs. Signed-off-by: Stefan Hajnoczi --- This commit is not needed for virtio-blk. I added it when I realized virtio-net's NAPI scheduling might not be detected by the cpuidle busy wait loop because it is unaware of softirqs. However, even after doing this virtio-net's NAPI polling doesn't combine with cpuidle haltpoll. Perhaps this patch is still desirable for cpuidle poll_state in case a softirq is raised? --- include/linux/interrupt.h | 2 ++ drivers/cpuidle/poll_source.c | 3 +++ kernel/softirq.c | 14 ++++++++++++++ 3 files changed, 19 insertions(+) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 4777850a6dc7..9bfdcc466ba8 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -573,6 +573,8 @@ struct softirq_action asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); +extern void softirq_poll(void); + extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); extern void __raise_softirq_irqoff(unsigned int nr); diff --git a/drivers/cpuidle/poll_source.c b/drivers/cpuidle/poll_source.c index 46100e5a71e4..ed200feb0daa 100644 --- a/drivers/cpuidle/poll_source.c +++ b/drivers/cpuidle/poll_source.c @@ -6,6 +6,7 @@ #include #include #include +#include /* The per-cpu list of registered poll sources */ DEFINE_PER_CPU(struct list_head, poll_source_list); @@ -26,6 +27,8 @@ void poll_source_run_once(void) list_for_each_entry(src, this_cpu_ptr(&poll_source_list), node) src->ops->poll(src); + + softirq_poll(); } /* Called from idle task with TIF_POLLING_NRFLAG set and irqs enabled */ diff --git a/kernel/softirq.c b/kernel/softirq.c index 4992853ef53d..f45bf0204218 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -611,6 +611,20 @@ void irq_enter(void) irq_enter_rcu(); } +/** + * softirq_poll() - invoke pending softirqs + * + * Normally it is not necessary to explicitly poll for softirqs, but in the + * cpuidle driver a polling function may have raised a softirq with no irq exit + * to invoke it. Therefore it is necessary to poll for pending softirqs and + * invoke them explicitly. + */ +void softirq_poll(void) +{ + if (!in_interrupt() && local_softirq_pending()) + invoke_softirq(); +} + static inline void tick_irq_exit(void) { #ifdef CONFIG_NO_HZ_COMMON