From patchwork Fri Oct 2 12:36:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 303851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F231C4363D for ; Fri, 2 Oct 2020 12:40:25 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5D8F820644 for ; Fri, 2 Oct 2020 12:40:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Z6wZb3lx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D8F820644 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56146 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kOKMJ-0006WH-DI for qemu-devel@archiver.kernel.org; Fri, 02 Oct 2020 08:40:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34288) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kOKIR-0002az-FX for qemu-devel@nongnu.org; Fri, 02 Oct 2020 08:36:23 -0400 Received: from merlin.infradead.org ([2001:8b0:10b:1231::1]:40260) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kOKIN-0006cl-DW for qemu-devel@nongnu.org; Fri, 02 Oct 2020 08:36:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=Mime-Version:Content-Type:Date:Cc:To: From:Subject:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=uChHDOMe5/+z9rYHNt7c+ckvccD1B7zUe8ePbuImat4=; b=Z6wZb3lxXf22r7FTZICOIKHmGu HcUbyTVN8XzIS8UuXfmH30wrzda5iTYhc4g/zePm5lUMofSHYYgJLwqHrw5jLsBhucuT09YRds3qB ImvBEl4ibPyIz3HRw5KKgEgbaK/O1KpfhhNUSJG03gpzJ1KGXBpZL3RdTP9EcfXM5drBVo6VkFPhT nlLe8mbMcDG5DO9tVjtPlPjUBq/x24d3BOUH210bAtkOIoiUrpUAiWN2Sjfi8wcgEe1HDNu0YOdY0 jb4TENFdfgTCreITSK0rqW1np7+sIjM9kSwXPhTwkeT71gQdQ7mol5sqhpp3/H/3kPKbJIpB8PAOB HmfQ1HYQ==; Received: from 54-240-197-231.amazon.com ([54.240.197.231] helo=edge-m3-r3-159.e-iad51.amazon.com) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kOKIG-0005jC-St; Fri, 02 Oct 2020 12:36:13 +0000 Message-ID: <38b94080aa2d616a0ecb98d5afcd7cbe9f69f9e8.camel@infradead.org> Subject: [RFC][POC PATCH] Supporting >255 guest CPUs without interrupt remapping From: David Woodhouse To: qemu-devel@nongnu.org Date: Fri, 02 Oct 2020 13:36:11 +0100 X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by merlin.infradead.org. See http://www.infradead.org/rpr.html Received-SPF: none client-ip=2001:8b0:10b:1231::1; envelope-from=BATV+71a21e484f4bc027aef9+6249+infradead.org+dwmw2@merlin.srs.infradead.org; helo=merlin.infradead.org X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: x86 Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" AFAICT there's not actually any good reason why guests can't use x2apic and have more than 255 CPUs today, even without exposing interrupt remapping to the guest. The only issue is that guests can't direct external IOAPIC and MSI interrupts at the higher APIC IDs. So what? A guest might have a workload where it makes plenty of sense to use those extra CPUs and just refrain from targeting external interrupts at them. In fact, if you take a close look at the hyperv-iommu driver in the Linux guest kernel, you'll note that it doesn't actually do any remapping at all; all it does is return -EINVAL if asked to set affinity to a CPU which can't be targeted. For Linux at least, it should be fairly simple to have a per-IRQ controller affinity limit, so it doesn't attempt to target CPUs it can't reach. But actually, it's really simple to extend the limit of reachable APICs even without the complexity of adding a full vIOMMU. There are 8 bits of extended destination ID in the IOAPIC RTE, which maps to bits 11-4 of the MSI address. This was historically not used in bare metal, but IRQ remapping now uses the lowest bit to indicate a remappable format interrupt. A VMM can use the other 7 bits to allow guests to target 15 bits of APIC ID, which gives support for 32Ki vCPUs without needing to expose IRQ remapping to the guest. Here's a proof-of-concept hack, which I've tested with a Linux guest that knows where to put the additional 7 bits in the IOAPIC RTE and MSI message. At least IOAPIC and emulated AHCI (MSI) are working; I haven't tested assigned PCI devices yet. diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c index 4eb2d77b87..b0f4b1a630 100644 --- a/hw/i386/kvm/apic.c +++ b/hw/i386/kvm/apic.c @@ -14,6 +14,7 @@ #include "qemu/module.h" #include "cpu.h" #include "hw/i386/apic_internal.h" +#include "hw/i386/apic-msidef.h" #include "hw/pci/msi.h" #include "sysemu/hw_accel.h" #include "sysemu/kvm.h" @@ -183,6 +184,13 @@ static void kvm_send_msi(MSIMessage *msg) { int ret; + /* + * The message has already passed through interrupt remapping if enabled, + * but the legacy extended destination ID in low bits still needs to be + * handled. + */ + msg->address = apic_convert_ext_dest_id(msg->address); + ret = kvm_irqchip_send_msi(kvm_state, *msg); if (ret < 0) { fprintf(stderr, "KVM: injection failed, MSI lost (%s)\n", diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e87be5d29a..eb4901d6b7 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -807,7 +807,7 @@ void pc_machine_done(Notifier *notifier, void *data) fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus); } - if (x86ms->apic_id_limit > 255 && !xen_enabled()) { + if (0 && x86ms->apic_id_limit > 255 && !xen_enabled()) { IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default()); if (!iommu || !x86_iommu_ir_supported(X86_IOMMU_DEVICE(iommu)) || diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h index 420b41167d..b3e0da64a5 100644 --- a/include/hw/i386/apic-msidef.h +++ b/include/hw/i386/apic-msidef.h @@ -28,4 +28,20 @@ #define MSI_ADDR_DEST_IDX_SHIFT 4 #define MSI_ADDR_DEST_ID_MASK 0x000ff000 +static inline uint64_t apic_convert_ext_dest_id(uint64_t address) +{ + uint64_t ext_id = address & (0xff << MSI_ADDR_DEST_IDX_SHIFT); + /* + * If the remappable format bit is set, or the upper bits are + * already set in address_hi, or the low extended bits aren't + * there anyway, do nothing. + */ + if (!ext_id || (ext_id & (1 << MSI_ADDR_DEST_IDX_SHIFT)) || + (address >> 32)) + return address; + + address &= ~ext_id; + address |= ext_id << 35; + return address; +} #endif /* HW_APIC_MSIDEF_H */ diff --git a/target/i386/kvm.c b/target/i386/kvm.c index f6dae4cfb6..547a2faf72 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -4589,13 +4589,11 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route, X86IOMMUState *iommu = x86_iommu_get_default(); if (iommu) { - int ret; - MSIMessage src, dst; X86IOMMUClass *class = X86_IOMMU_DEVICE_GET_CLASS(iommu); - if (!class->int_remap) { - return 0; - } + if (class->int_remap) { + int ret; + MSIMessage src, dst; src.address = route->u.msi.address_hi; src.address <<= VTD_MSI_ADDR_HI_SHIFT; @@ -4610,11 +4608,21 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route, return 1; } + /* + * Handled untranslated compatibilty format interrupt with + * extended destination ID in the low bits 11-5. */ + dst.address = apic_convert_ext_dest_id(dst.address); + route->u.msi.address_hi = dst.address >> VTD_MSI_ADDR_HI_SHIFT; route->u.msi.address_lo = dst.address & VTD_MSI_ADDR_LO_MASK; route->u.msi.data = dst.data; + return 0; + } } + address = apic_convert_ext_dest_id(address); + route->u.msi.address_hi = address >> VTD_MSI_ADDR_HI_SHIFT; + route->u.msi.address_lo = address & VTD_MSI_ADDR_LO_MASK; return 0; }