From patchwork Mon Dec 11 09:13:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 752513 Delivered-To: patch@linaro.org Received: by 2002:a5d:4c83:0:b0:333:3a04:f257 with SMTP id z3csp1198450wrs; Mon, 11 Dec 2023 01:17:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IF2P1OOotS4cDeOdxxMT2Hsfm4M9RTJ81sOLrAsLYhojX936K9qXfkguAZqqkAbvK0A4Mhb X-Received: by 2002:a05:620a:45a6:b0:77e:fba3:9cf1 with SMTP id bp38-20020a05620a45a600b0077efba39cf1mr6209708qkb.85.1702286265905; Mon, 11 Dec 2023 01:17:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702286265; cv=none; d=google.com; s=arc-20160816; b=c+Rt/lTq/IpUt5BY7O9meypDUprkTcxhnmfw9Iw+g4Pz1nm6uYGhFBu4UGwhtbwBkS l3aYh7z3oStZzFu6gOy1fcwIXWjBs9j1aXiivMIlfRIBrbuq0JuPMSUKhzFjw9JSGZej iSUayL53czAYTp0zuApUaBu312OlAX48vgrGF+ThKaXFSedmvzET6zxGFZx0rIRSEofu EBqG9NXWlku0GFWtvpmS9HjL5iKmqxjBYpAvHnONycAy0TGZSwQL0OjS+9ua/E90mO5Z RBCizmCqMPEMyV/Ac0W9o/Ccrkad//EV+O9PvR3Pv1zZry2nxpwzH9fjucGcrtvDsd1E vxWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature; bh=i1/Dp2Wa4oci+EhSWdiqYLZ0NCdvccY/wB+yXSQ90+Q=; fh=5P/wIgpI0FmPL++m2PNNQNTrejW6t3h7Ix4gz+ehGxw=; b=lM2hlLgMNMIataPk5MZTjmXMvp1Eh/lymvYX7YGsEpmRmUZQBpfQFfQvEMPp5PaWTW Him2I8VFR2L98PyhuM3TBKksKi0vSEtJUA7LeNEWZzuYXxt9xbKnNN+rEcX7Q3AvLtiY 1T7TtFc7bQRpXbAyfpuKLSPziHrllNpaFiGy3W4B7x2ltxPmeXzq3s4T7/0N2C2jZGBD L2jbnKQIIl2yDCfg4miQnY2buZXWMafHk+Q2v2Yh7vSCf7gj91PwfSUOhDTF7MinKWRN 9nadZRTj+pSyy/klUBaQ12xN3lx14v2IXsj+viESgsGwSxHD+u4wVJZgpzrrLW8Cjq2L zBRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vhT5Ig2J; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id v20-20020a05620a441400b0077f29495173si8392387qkp.371.2023.12.11.01.17.45 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Mon, 11 Dec 2023 01:17:45 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vhT5Ig2J; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rCcMK-0001Nk-QM; Mon, 11 Dec 2023 04:13:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rCcMJ-0001NE-9T for qemu-devel@nongnu.org; Mon, 11 Dec 2023 04:13:51 -0500 Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rCcMH-0004KK-5f for qemu-devel@nongnu.org; Mon, 11 Dec 2023 04:13:51 -0500 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-40c3f68b69aso17391845e9.1 for ; Mon, 11 Dec 2023 01:13:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1702286027; x=1702890827; darn=nongnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=i1/Dp2Wa4oci+EhSWdiqYLZ0NCdvccY/wB+yXSQ90+Q=; b=vhT5Ig2JjKnA7JQjwVNt0vsfxEYnpq60D3MeHAcZMvPtNzFySGAo2yMsu5U2wvDFPH NtEFgF6S6hz0nsYjtArmZoqRl3hG+EG+RMNz/u6S/R1JDN33jDxRR57iAJNazGitOVAD ZOrPL1hQz1HKW4F/HK2xGIKWe6HSfT+DWCmmIhKc2Dtqxziqlxl6YYYsrG6ws/vYRY+R bw2w6qh/EenoFUNz6NeYY/7YofW2/ita3vJvW3v06d4sv9zrRSroBnGLgTQ8OhdKVX81 MyCv4vQXcZO17KHlkLy3bzh9GT4x1e1BrvPuiqBQD3vt4oQzirEHV/ioJByC12D8TagE 9SAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702286027; x=1702890827; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i1/Dp2Wa4oci+EhSWdiqYLZ0NCdvccY/wB+yXSQ90+Q=; b=Qhhq8lCyDHurGjGjgtzUcz1xLiy1EzMrMuXHcH7kwkzKSJP49KxS4F10E7FvKJL/DS bYZnbxV8N9/IM5bl3VBo84KJF+8nnUIskZJ2Kzg1tFg49NBC1bjVTjnfV+/Cdvsevm5k 02zx1iXvwsLq4hlzB0UL7euGqzCg3trYxHKYeNNnbnqvi9oEORnlWcAbE+C5sRkgoFfP yB3L1O9Pckci3WYHc9YmANfJWtDA3UuqN6BgLlzp/ADycfO3GT0eueHdPkxTXbSh2LoQ BFurkG64Xc+nl6KGnZqvL2UR9iQhiFMJFsF8BCFHgPT8FztuWZG/ZLisqCkcV3EdJTEO NxHg== X-Gm-Message-State: AOJu0YzGhVTir62hOGE1Sx+GstIf4zy9DCFpOg7EMx77rab11w7+PKlA uBVL33QCYpp7gX8zBy59MVUNqg== X-Received: by 2002:a05:600c:450e:b0:40c:2b4c:ea8 with SMTP id t14-20020a05600c450e00b0040c2b4c0ea8mr2221428wmo.113.1702286026864; Mon, 11 Dec 2023 01:13:46 -0800 (PST) Received: from draig.lan ([85.9.250.243]) by smtp.gmail.com with ESMTPSA id e12-20020a05600c4e4c00b0040b398f0585sm12484403wmq.9.2023.12.11.01.13.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 01:13:46 -0800 (PST) Received: from draig.lan (localhost [IPv6:::1]) by draig.lan (Postfix) with ESMTP id 1A74D5FBC6; Mon, 11 Dec 2023 09:13:46 +0000 (GMT) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: qemu-devel@nongnu.org Cc: John Snow , Eduardo Habkost , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Paolo Bonzini , Wainer dos Santos Moschetta , Cleber Rosa , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , Beraldo Leal , Richard Henderson , Pavel Dovgalyuk , =?utf-8?q?Alex_Benn=C3=A9e?= Subject: [PATCH v2 00/16] record/replay fixes: attempting to get avocado green Date: Mon, 11 Dec 2023 09:13:29 +0000 Message-Id: <20231211091346.14616-1-alex.bennee@linaro.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::329; envelope-from=alex.bennee@linaro.org; helo=mail-wm1-x329.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org As I'm a glutton for punishment I thought I'd have a go at fixing the slowly growing number of record/replay bugs. The two fixes are: replay: stop us hanging in rr_wait_io_event chardev: force write all when recording replay logs I think we are beyond 8.2 material but it would be nice to get this functionality stable again. We have a growing number of bugs under the icount label on gitlab: https://gitlab.com/qemu-project/qemu/-/issues/?label_name%5B%5D=icount Changes ------- v2 Apart from addressing tidy ups and tags I've been investigating the failures in replay_linux.py which are the more exhaustive tests which boot the kernel and user-space. The "fix": replay: report sync error when no exception in log (!DEBUG INVESTIGATION) triggers around the time of the hang in the logs and despite the rather hairy EXCP->INT transitions around cpu_exec_loop() I think points to a genuine problem. I added the tracing to cputlb to verify the page tables are the same and started detecting divergence between record and replay a lot earlier on that when the replay_sync_error() catches things. I see patterns like this: 1878 tlb_fill 0x4770c000/1 1 2 tlb_fill 0x4770c000/1 1 2 1879 tlb_fill 0x4770d000/1 1 2 tlb_fill 0x4770d000/1 1 2 1880 tlb_fill 0x59000/1 0 2 tlb_fill 0x59000/1 0 2 1881 > tlb_fill 0x476dd116/1 0 2 1882 tlb_fill 0x4770e000/1 1 2 tlb_fill 0x4770e000/1 1 2 1883 tlb_fill 0x476dd527/1 0 2 | tlb_fill 0x476dfb17/1 0 2 1884 > tlb_fill 0x476de0fd/1 0 2 1885 > tlb_fill 0x476dce2e/1 0 2 1886 tlb_fill 0x4770f000/1 1 2 tlb_fill 0x4770f000/1 1 2 1887 tlb_fill 0x476df939/1 0 2 < 1888 tlb_fill 0x47710000/1 1 2 tlb_fill 0x47710000/1 1 2 1889 tlb_fill 0x47711000/1 1 2 tlb_fill 0x47711000/1 1 2 These don't seem to affect the overall program flow but are concerning because the memory access patterns should be the same. My investigations with rr seem to indicate the difference is due to behaviour of the victim_tlb_cache which again AFAICT should be deterministic. Anyway I can't spend any time debugging it this week so I thought I'd post the current state in case anyone is curious enough to want to go diving into record/replay. The following need review: replay: report sync error when no exception in log (!DEBUG INVESTIGATION) accel/tcg: add trace_tlb_resize trace point accel/tcg: define tlb_fill as a trace point tests/avocado: remove skips from replay_kernel (1 acks, 1 sobs, 0 tbs) replay: stop us hanging in rr_wait_io_event replay/replay-char: use report_sync_error tests/avocado: modernise the drive args for replay_linux tests/avocado: add a simple i386 replay kernel test (2 acks, 1 sobs, 0 tbs) Alex Bennée (16): tests/avocado: add a simple i386 replay kernel test tests/avocado: fix typo in replay_linux tests/avocado: modernise the drive args for replay_linux scripts/replay-dump: update to latest format scripts/replay_dump: track total number of instructions replay: remove host_clock_last replay: add proper kdoc for ReplayState replay: make has_unread_data a bool replay: introduce a central report point for sync errors replay/replay-char: use report_sync_error replay: stop us hanging in rr_wait_io_event chardev: force write all when recording replay logs tests/avocado: remove skips from replay_kernel accel/tcg: define tlb_fill as a trace point accel/tcg: add trace_tlb_resize trace point replay: report sync error when no exception in log (!DEBUG INVESTIGATION) include/sysemu/replay.h | 5 ++ replay/replay-internal.h | 50 ++++++++---- accel/tcg/cputlb.c | 4 + accel/tcg/tcg-accel-ops-rr.c | 2 +- chardev/char.c | 12 +++ replay/replay-char.c | 6 +- replay/replay-internal.c | 5 +- replay/replay-snapshot.c | 7 +- replay/replay.c | 141 ++++++++++++++++++++++++++++++++- accel/tcg/trace-events | 2 + scripts/replay-dump.py | 95 +++++++++++++++++++--- tests/avocado/replay_kernel.py | 27 ++++--- tests/avocado/replay_linux.py | 9 ++- 13 files changed, 314 insertions(+), 51 deletions(-)