From patchwork Fri Jul 8 11:23:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Ene X-Patchwork-Id: 588675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 854F6C433EF for ; Fri, 8 Jul 2022 11:23:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237965AbiGHLX5 (ORCPT ); Fri, 8 Jul 2022 07:23:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237843AbiGHLX5 (ORCPT ); Fri, 8 Jul 2022 07:23:57 -0400 Received: from mail-wr1-x449.google.com (mail-wr1-x449.google.com [IPv6:2a00:1450:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 658598737B for ; Fri, 8 Jul 2022 04:23:56 -0700 (PDT) Received: by mail-wr1-x449.google.com with SMTP id m7-20020adfa3c7000000b0021d7ae39d1dso1696993wrb.12 for ; Fri, 08 Jul 2022 04:23:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=mnKejCdshNN0Dmo2YAvyT6ckd8FDauFN2fL/+9frCl8=; b=pekMrts+wxArWkuFEhjVDCEEOdk4vNDgqo9QHF6jbPYoSC5M8/7R8GrquZehUGjL0B MNZApvAsBXa0QtmW7m9cHpWpKem4rQo2MOHSZlmI9ftd7QrGMuih4VedwtgUskAgF2xL 3NWeFpNzci3LS8GpH6gLtz0dtcAbgXGhOFZXNH6rFsvGTtY6K3HYWmCDJKGP9uX7iiAo Z5JsANO8GE5BDfXL5CBP4/Q0I8oUQJT2IZgCWjGovadNSyDnFbnQtT9bBrMmzrqHd0o3 wp0N+SUcU2bXyi0TEdijdTd0ELqgZ8AXIGTWaRPIZt5m8AKcKNSVO0GIWFgofzR58zCW KPdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=mnKejCdshNN0Dmo2YAvyT6ckd8FDauFN2fL/+9frCl8=; b=cPIDLt7tTVl/zc7nEDwsk+swZNbdofEN+1wfQypTY7XyGTmffT4ldsjUM5SaojbHIL /osHpQ3gE9eED0aPxjzj/DklvHa47TclhZvHVO+s98YYdt8DN66pBqvNCM6zP3NbOStf KH6M3VK5MIhmf5aJU3YfDTRYH+hidreJTAOkICyOXqwlEHW3AZncLqE1zPDcp+GHqxML BqPFhlQHzydyCRDjyjmzxLOyBJJ7avsAbLPNMH5D4NmoeAd9nGGBWTjnJZYC5JMf+CD6 0w8hlUrzmqBVJBloqM1qMqufoyyF+vZulrO6eZApORtTnH9YhVRguizIAVHDL0Upf/F6 WgzA== X-Gm-Message-State: AJIora/CGGdw30hWiLl2ofwx+MaB8odaT6RlDZ3anPYvxDWFYqLcX03C 9WIAs0Yhyhx0kH0OL7R7RbPlgPgFkdZ34ys70Bg= X-Google-Smtp-Source: AGRyM1s/RnDXpum5oZRKKJm7fZvKMQqsLhRlQ8Xqp0r/cLrSv28xBMnvn1W9wG9KVTJI+iRVHLSHRi+gxLnvCZ92cjw= X-Received: from seb.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:31bd]) (user=sebastianene job=sendgmr) by 2002:a05:600c:3ac6:b0:3a1:95b6:3fc3 with SMTP id d6-20020a05600c3ac600b003a195b63fc3mr10249273wms.57.1657279434940; Fri, 08 Jul 2022 04:23:54 -0700 (PDT) Date: Fri, 8 Jul 2022 11:23:43 +0000 Message-Id: <20220708112344.1965947-1-sebastianene@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.37.0.rc0.161.g10f37bed90-goog Subject: [PATCH v11 0/2] Detect stalls on guest vCPUS From: Sebastian Ene To: Rob Herring , Greg Kroah-Hartman , Arnd Bergmann , Dragan Cvetic Cc: linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, maz@kernel.org, will@kernel.org, vdonnefort@google.com, Guenter Roeck , Sebastian Ene Precedence: bulk List-ID: X-Mailing-List: devicetree@vger.kernel.org Minor changes from v10 with some cosmetic fixes and DT values validation. This adds a mechanism to detect stalls on the guest vCPUS by creating a per CPU hrtimer which periodically 'pets' the host backend driver. On a conventional watchdog-core driver, the userspace is responsible for delivering the 'pet' events by writing to the particular /dev/watchdogN node. In this case we require a strong thread affinity to be able to account for lost time on a per vCPU basis. This device driver acts as a soft lockup detector by relying on the host backend driver to measure the elapesed time between subsequent 'pet' events. If the elapsed time doesn't match an expected value, the backend driver decides that the guest vCPU is locked and resets the guest. The host backend driver takes into account the time that the guest is not running. The communication with the backend driver is done through MMIO and the register layout of the virtual watchdog is described as part of the backend driver changes. The host backend driver is implemented as part of: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 Changelog v11: - verify the values from DT if they are in an expected range and fallback to default values in case they are not. - added Will's review-by tag Changelog v10: - keep only the hrtimer and a flag in the per_cpu structure and move the other fields in a separate config structure - fix a potential race condition as pointed out by Will: the driver remove(..) can race with the hotplug cpu notifiers - replace alloc_percpu with devm_alloc_percpu and remove the free_percpu - unregister the hotplug notifiers - improve the Kconfig description and fix the license in the header file - add the review-by tag from Rob as the DT has not changed since v9 Changelog v9: - make the driver depend on CONFIG_OF - remove the platform_(set|get)_drvdata calls and keep a per-cpu static variable `vm_stall_detect` as suggested by Guenter on the (v8) series - improve commit description and fix styling Sebastian Ene (2): dt-bindings: vcpu_stall_detector: Add qemu,vcpu-stall-detector compatible misc: Add a mechanism to detect stalls on guest vCPUs .../misc/qemu,vcpu-stall-detector.yaml | 51 ++++ drivers/misc/Kconfig | 14 ++ drivers/misc/Makefile | 1 + drivers/misc/vcpu_stall_detector.c | 223 ++++++++++++++++++ 4 files changed, 289 insertions(+) create mode 100644 Documentation/devicetree/bindings/misc/qemu,vcpu-stall-detector.yaml create mode 100644 drivers/misc/vcpu_stall_detector.c