Message ID | 20240111142326.1743444-4-pierrick.bouvier@linaro.org |
---|---|
State | New |
Headers | show |
Series | TCG Plugin inline operation enhancement | expand |
Hi Pierrick, On 11/1/24 15:23, Pierrick Bouvier wrote: > For now, it simply performs instruction, bb and mem count, and ensure > that inline vs callback versions have the same result. Later, we'll > extend it when new inline operations are added. > > Use existing plugins to test everything works is a bit cumbersome, as > different events are treated in different plugins. Thus, this new one. > > Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> > --- > tests/plugin/inline.c | 183 +++++++++++++++++++++++++++++++++++++++ > tests/plugin/meson.build | 2 +- > 2 files changed, 184 insertions(+), 1 deletion(-) > create mode 100644 tests/plugin/inline.c > +#define MAX_CPUS 8 Where does this value come from? Should the pluggin API provide a helper to ask TCG how many vCPUs are created?
On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: > Hi Pierrick, > > On 11/1/24 15:23, Pierrick Bouvier wrote: >> For now, it simply performs instruction, bb and mem count, and ensure >> that inline vs callback versions have the same result. Later, we'll >> extend it when new inline operations are added. >> >> Use existing plugins to test everything works is a bit cumbersome, as >> different events are treated in different plugins. Thus, this new one. >> >> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> >> --- >> tests/plugin/inline.c | 183 +++++++++++++++++++++++++++++++++++++++ >> tests/plugin/meson.build | 2 +- >> 2 files changed, 184 insertions(+), 1 deletion(-) >> create mode 100644 tests/plugin/inline.c > >> +#define MAX_CPUS 8 > > Where does this value come from? > The plugin tests/plugin/insn.c had this constant, so I picked it up from here. > Should the pluggin API provide a helper to ask TCG how many > vCPUs are created? In user mode, we can't know how many simultaneous threads (and thus vcpu) will be triggered by advance. I'm not sure if additional cpus can be added in system mode. One problem though, is that when you register an inline op with a dynamic array, when you resize it (when detecting a new vcpu), you can't change it afterwards. So, you need a storage statically sized somewhere. Your question is good, and maybe we should define a MAX constant that plugins should rely on, instead of a random amount.
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: >> Hi Pierrick, >> On 11/1/24 15:23, Pierrick Bouvier wrote: >>> For now, it simply performs instruction, bb and mem count, and ensure >>> that inline vs callback versions have the same result. Later, we'll >>> extend it when new inline operations are added. >>> >>> Use existing plugins to test everything works is a bit cumbersome, as >>> different events are treated in different plugins. Thus, this new one. >>> >>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> >>> --- >>> tests/plugin/inline.c | 183 +++++++++++++++++++++++++++++++++++++++ >>> tests/plugin/meson.build | 2 +- >>> 2 files changed, 184 insertions(+), 1 deletion(-) >>> create mode 100644 tests/plugin/inline.c >> >>> +#define MAX_CPUS 8 >> Where does this value come from? >> > > The plugin tests/plugin/insn.c had this constant, so I picked it up > from here. > >> Should the pluggin API provide a helper to ask TCG how many >> vCPUs are created? > > In user mode, we can't know how many simultaneous threads (and thus > vcpu) will be triggered by advance. I'm not sure if additional cpus > can be added in system mode. > > One problem though, is that when you register an inline op with a > dynamic array, when you resize it (when detecting a new vcpu), you > can't change it afterwards. So, you need a storage statically sized > somewhere. > > Your question is good, and maybe we should define a MAX constant that > plugins should rely on, instead of a random amount. For user-mode it can be infinite. The existing plugins do this by ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the scoreboard as well? Of course that does introduce a trap for those using user-mode...
On 1/12/24 21:20, Alex Bennée wrote: > Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > >> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: >>> Hi Pierrick, >>> On 11/1/24 15:23, Pierrick Bouvier wrote: >>>> For now, it simply performs instruction, bb and mem count, and ensure >>>> that inline vs callback versions have the same result. Later, we'll >>>> extend it when new inline operations are added. >>>> >>>> Use existing plugins to test everything works is a bit cumbersome, as >>>> different events are treated in different plugins. Thus, this new one. >>>> >>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> >>>> --- >>>> tests/plugin/inline.c | 183 +++++++++++++++++++++++++++++++++++++++ >>>> tests/plugin/meson.build | 2 +- >>>> 2 files changed, 184 insertions(+), 1 deletion(-) >>>> create mode 100644 tests/plugin/inline.c >>> >>>> +#define MAX_CPUS 8 >>> Where does this value come from? >>> >> >> The plugin tests/plugin/insn.c had this constant, so I picked it up >> from here. >> >>> Should the pluggin API provide a helper to ask TCG how many >>> vCPUs are created? >> >> In user mode, we can't know how many simultaneous threads (and thus >> vcpu) will be triggered by advance. I'm not sure if additional cpus >> can be added in system mode. >> >> One problem though, is that when you register an inline op with a >> dynamic array, when you resize it (when detecting a new vcpu), you >> can't change it afterwards. So, you need a storage statically sized >> somewhere. >> >> Your question is good, and maybe we should define a MAX constant that >> plugins should rely on, instead of a random amount. > > For user-mode it can be infinite. The existing plugins do this by > ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the > scoreboard as well? Of course that does introduce a trap for those using > user-mode... > The problem with vcpu-index % max_vcpu is that it reintroduces race condition, though it's probably less frequent than on a single variable. IMHO, yes it solves memory error, but does not solve the initial problem itself. The simplest solution would be to have a size "big enough" for most cases, and abort when it's reached. Another solution, much more complicated, but correct, would be to move memory management of plugin scoreboard to plugin runtime, and add a level of indirection to access it. Every time a new vcpu is added, we can grow dynamically. This way, the array can grow, and ultimately, plugin can poke its content/size. I'm not sure this complexity is what we want though.
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > On 1/12/24 21:20, Alex Bennée wrote: >> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: >> >>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: >>>> Hi Pierrick, >>>> On 11/1/24 15:23, Pierrick Bouvier wrote: >>>>> For now, it simply performs instruction, bb and mem count, and ensure >>>>> that inline vs callback versions have the same result. Later, we'll >>>>> extend it when new inline operations are added. >>>>> >>>>> Use existing plugins to test everything works is a bit cumbersome, as >>>>> different events are treated in different plugins. Thus, this new one. >>>>> <snip> >>>>> +#define MAX_CPUS 8 >>>> Where does this value come from? >>>> >>> >>> The plugin tests/plugin/insn.c had this constant, so I picked it up >>> from here. >>> >>>> Should the pluggin API provide a helper to ask TCG how many >>>> vCPUs are created? >>> >>> In user mode, we can't know how many simultaneous threads (and thus >>> vcpu) will be triggered by advance. I'm not sure if additional cpus >>> can be added in system mode. >>> >>> One problem though, is that when you register an inline op with a >>> dynamic array, when you resize it (when detecting a new vcpu), you >>> can't change it afterwards. So, you need a storage statically sized >>> somewhere. >>> >>> Your question is good, and maybe we should define a MAX constant that >>> plugins should rely on, instead of a random amount. >> For user-mode it can be infinite. The existing plugins do this by >> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the >> scoreboard as well? Of course that does introduce a trap for those using >> user-mode... >> > > The problem with vcpu-index % max_vcpu is that it reintroduces race > condition, though it's probably less frequent than on a single > variable. IMHO, yes it solves memory error, but does not solve the > initial problem itself. > > The simplest solution would be to have a size "big enough" for most > cases, and abort when it's reached. Well that is simple enough for system emulation as max_vcpus is a bounded number. > Another solution, much more complicated, but correct, would be to move > memory management of plugin scoreboard to plugin runtime, and add a > level of indirection to access it. That certainly gives us the most control and safety. We can then ensure we'll never to writing past the bounds of the buffer. The plugin would have to use an access function to get the pointer to read at the time it cared and of course inline checks should be pretty simple. > Every time a new vcpu is added, we > can grow dynamically. This way, the array can grow, and ultimately, > plugin can poke its content/size. I'm not sure this complexity is what > we want though. It doesn't seem too bad. We have a start/end_exclusive is *-user do_fork where we could update pointers. If we are smart about growing the size of the arrays we could avoid too much re-translation. Do we want a limit of one scoreboard per thread? Can we store structures in there?
On 1/13/24 21:16, Alex Bennée wrote: > Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > >> On 1/12/24 21:20, Alex Bennée wrote: >>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: >>> >>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: >>>>> Hi Pierrick, >>>>> On 11/1/24 15:23, Pierrick Bouvier wrote: >>>>>> For now, it simply performs instruction, bb and mem count, and ensure >>>>>> that inline vs callback versions have the same result. Later, we'll >>>>>> extend it when new inline operations are added. >>>>>> >>>>>> Use existing plugins to test everything works is a bit cumbersome, as >>>>>> different events are treated in different plugins. Thus, this new one. >>>>>> > <snip> >>>>>> +#define MAX_CPUS 8 >>>>> Where does this value come from? >>>>> >>>> >>>> The plugin tests/plugin/insn.c had this constant, so I picked it up >>>> from here. >>>> >>>>> Should the pluggin API provide a helper to ask TCG how many >>>>> vCPUs are created? >>>> >>>> In user mode, we can't know how many simultaneous threads (and thus >>>> vcpu) will be triggered by advance. I'm not sure if additional cpus >>>> can be added in system mode. >>>> >>>> One problem though, is that when you register an inline op with a >>>> dynamic array, when you resize it (when detecting a new vcpu), you >>>> can't change it afterwards. So, you need a storage statically sized >>>> somewhere. >>>> >>>> Your question is good, and maybe we should define a MAX constant that >>>> plugins should rely on, instead of a random amount. >>> For user-mode it can be infinite. The existing plugins do this by >>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the >>> scoreboard as well? Of course that does introduce a trap for those using >>> user-mode... >>> >> >> The problem with vcpu-index % max_vcpu is that it reintroduces race >> condition, though it's probably less frequent than on a single >> variable. IMHO, yes it solves memory error, but does not solve the >> initial problem itself. >> >> The simplest solution would be to have a size "big enough" for most >> cases, and abort when it's reached. > > Well that is simple enough for system emulation as max_vcpus is a bounded > number. > >> Another solution, much more complicated, but correct, would be to move >> memory management of plugin scoreboard to plugin runtime, and add a >> level of indirection to access it. > > That certainly gives us the most control and safety. We can then ensure > we'll never to writing past the bounds of the buffer. The plugin would > have to use an access function to get the pointer to read at the time it > cared and of course inline checks should be pretty simple. > >> Every time a new vcpu is added, we >> can grow dynamically. This way, the array can grow, and ultimately, >> plugin can poke its content/size. I'm not sure this complexity is what >> we want though. > > It doesn't seem too bad. We have a start/end_exclusive is *-user do_fork > where we could update pointers. If we are smart about growing the size > of the arrays we could avoid too much re-translation. > I was concerned about a potential race when the scoreboard updates this pointer, and other cpus are executing tb (using it). But this concern is not valid, since start_exclusive ensures all other cpus are stopped. vcpu_init_hook function in plugins/core.c seems a good location to add this logic. We would check if an update is needed, then start_exclusive(), update the scoreboard and exit exclusive section. Do you think it's worth to try to inline scoreboard pointer (and flush all tb when updated), instead of simply adding an indirection to it? With this, we could avoid any need to re-translate anything. > Do we want a limit of one scoreboard per thread? Can we store structures > in there? > From the current plugins use case, it seems that several scoreboards are needed. Allowing structure storage seems a bit more tricky to me, because since memory may be reallocated, users won't be allowed to point directly to it. I would be in favor to avoid this (comments are welcome).
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > On 1/13/24 21:16, Alex Bennée wrote: >> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: >> >>> On 1/12/24 21:20, Alex Bennée wrote: >>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: >>>> >>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: >>>>>> Hi Pierrick, >>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote: >>>>>>> For now, it simply performs instruction, bb and mem count, and ensure >>>>>>> that inline vs callback versions have the same result. Later, we'll >>>>>>> extend it when new inline operations are added. >>>>>>> >>>>>>> Use existing plugins to test everything works is a bit cumbersome, as >>>>>>> different events are treated in different plugins. Thus, this new one. >>>>>>> >> <snip> >>>>>>> +#define MAX_CPUS 8 >>>>>> Where does this value come from? >>>>>> >>>>> >>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up >>>>> from here. >>>>> >>>>>> Should the pluggin API provide a helper to ask TCG how many >>>>>> vCPUs are created? >>>>> >>>>> In user mode, we can't know how many simultaneous threads (and thus >>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus >>>>> can be added in system mode. >>>>> >>>>> One problem though, is that when you register an inline op with a >>>>> dynamic array, when you resize it (when detecting a new vcpu), you >>>>> can't change it afterwards. So, you need a storage statically sized >>>>> somewhere. >>>>> >>>>> Your question is good, and maybe we should define a MAX constant that >>>>> plugins should rely on, instead of a random amount. >>>> For user-mode it can be infinite. The existing plugins do this by >>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the >>>> scoreboard as well? Of course that does introduce a trap for those using >>>> user-mode... >>>> >>> >>> The problem with vcpu-index % max_vcpu is that it reintroduces race >>> condition, though it's probably less frequent than on a single >>> variable. IMHO, yes it solves memory error, but does not solve the >>> initial problem itself. >>> >>> The simplest solution would be to have a size "big enough" for most >>> cases, and abort when it's reached. >> Well that is simple enough for system emulation as max_vcpus is a >> bounded >> number. >> >>> Another solution, much more complicated, but correct, would be to move >>> memory management of plugin scoreboard to plugin runtime, and add a >>> level of indirection to access it. >> That certainly gives us the most control and safety. We can then >> ensure >> we'll never to writing past the bounds of the buffer. The plugin would >> have to use an access function to get the pointer to read at the time it >> cared and of course inline checks should be pretty simple. >> >>> Every time a new vcpu is added, we >>> can grow dynamically. This way, the array can grow, and ultimately, >>> plugin can poke its content/size. I'm not sure this complexity is what >>> we want though. >> It doesn't seem too bad. We have a start/end_exclusive is *-user >> do_fork >> where we could update pointers. If we are smart about growing the size >> of the arrays we could avoid too much re-translation. >> > > I was concerned about a potential race when the scoreboard updates > this pointer, and other cpus are executing tb (using it). But this > concern is not valid, since start_exclusive ensures all other cpus are > stopped. > > vcpu_init_hook function in plugins/core.c seems a good location to add > this logic. We would check if an update is needed, then > start_exclusive(), update the scoreboard and exit exclusive section. It might already be in the exclusive section. We should try and re-use an existing exclusive region rather than adding a new one. It's expensive so best avoided if we can. > Do you think it's worth to try to inline scoreboard pointer (and flush > all tb when updated), instead of simply adding an indirection to it? > With this, we could avoid any need to re-translate anything. Depends on the cost/complexity of the indirection I guess. Re-translation isn't super expensive if we say double the size of the score board each time we need to. >> Do we want a limit of one scoreboard per thread? Can we store structures >> in there? >> > > From the current plugins use case, it seems that several scoreboards > are needed. > Allowing structure storage seems a bit more tricky to me, because > since memory may be reallocated, users won't be allowed to point > directly to it. I would be in favor to avoid this (comments are > welcome). The init function can take a sizeof(entry) field and the inline op can have an offset field (which for the simple 0 case can be folded away by TCG).
On 1/15/24 13:04, Alex Bennée wrote:> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:> >> On 1/13/24 21:16, Alex Bennée wrote: >>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: >>> >>>> On 1/12/24 21:20, Alex Bennée wrote: >>>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: >>>>> >>>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote: >>>>>>> Hi Pierrick, >>>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote: >>>>>>>> For now, it simply performs instruction, bb and mem count, and ensure >>>>>>>> that inline vs callback versions have the same result. Later, we'll >>>>>>>> extend it when new inline operations are added. >>>>>>>> >>>>>>>> Use existing plugins to test everything works is a bit cumbersome, as >>>>>>>> different events are treated in different plugins. Thus, this new one. >>>>>>>> >>> <snip> >>>>>>>> +#define MAX_CPUS 8 >>>>>>> Where does this value come from? >>>>>>> >>>>>> >>>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up >>>>>> from here. >>>>>> >>>>>>> Should the pluggin API provide a helper to ask TCG how many >>>>>>> vCPUs are created? >>>>>> >>>>>> In user mode, we can't know how many simultaneous threads (and thus >>>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus >>>>>> can be added in system mode. >>>>>> >>>>>> One problem though, is that when you register an inline op with a >>>>>> dynamic array, when you resize it (when detecting a new vcpu), you >>>>>> can't change it afterwards. So, you need a storage statically sized >>>>>> somewhere. >>>>>> >>>>>> Your question is good, and maybe we should define a MAX constant that >>>>>> plugins should rely on, instead of a random amount. >>>>> For user-mode it can be infinite. The existing plugins do this by >>>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the >>>>> scoreboard as well? Of course that does introduce a trap for those using >>>>> user-mode... >>>>> >>>> >>>> The problem with vcpu-index % max_vcpu is that it reintroduces race >>>> condition, though it's probably less frequent than on a single >>>> variable. IMHO, yes it solves memory error, but does not solve the >>>> initial problem itself. >>>> >>>> The simplest solution would be to have a size "big enough" for most >>>> cases, and abort when it's reached. >>> Well that is simple enough for system emulation as max_vcpus is a >>> bounded >>> number. >>> >>>> Another solution, much more complicated, but correct, would be to move >>>> memory management of plugin scoreboard to plugin runtime, and add a >>>> level of indirection to access it. >>> That certainly gives us the most control and safety. We can then >>> ensure >>> we'll never to writing past the bounds of the buffer. The plugin would >>> have to use an access function to get the pointer to read at the time it >>> cared and of course inline checks should be pretty simple. >>> >>>> Every time a new vcpu is added, we >>>> can grow dynamically. This way, the array can grow, and ultimately, >>>> plugin can poke its content/size. I'm not sure this complexity is what >>>> we want though. >>> It doesn't seem too bad. We have a start/end_exclusive is *-user >>> do_fork >>> where we could update pointers. If we are smart about growing the size >>> of the arrays we could avoid too much re-translation. >>> >> >> I was concerned about a potential race when the scoreboard updates >> this pointer, and other cpus are executing tb (using it). But this >> concern is not valid, since start_exclusive ensures all other cpus are >> stopped. >> >> vcpu_init_hook function in plugins/core.c seems a good location to add >> this logic. We would check if an update is needed, then >> start_exclusive(), update the scoreboard and exit exclusive section. > > It might already be in the exclusive section. We should try and re-use > an existing exclusive region rather than adding a new one. It's > expensive so best avoided if we can. > >> Do you think it's worth to try to inline scoreboard pointer (and flush >> all tb when updated), instead of simply adding an indirection to it? >> With this, we could avoid any need to re-translate anything. > > Depends on the cost/complexity of the indirection I guess. > Re-translation isn't super expensive if we say double the size of the > score board each time we need to. > >>> Do we want a limit of one scoreboard per thread? Can we store structures >>> in there? >>> >> >> From the current plugins use case, it seems that several scoreboards >> are needed. >> Allowing structure storage seems a bit more tricky to me, because >> since memory may be reallocated, users won't be allowed to point >> directly to it. I would be in favor to avoid this (comments are >> welcome). > > The init function can take a sizeof(entry) field and the inline op can > have an offset field (which for the simple 0 case can be folded away by > TCG). > Thanks for all your comments and guidance on this. I implemented a new version, working with a scoreboard that gets resized automatically, which allows usage of structs as well. The result is pretty satisfying as there is almost no more need to manually keep track of how many cpus have been used, while offering thread-safety by default. I'll re-spin the serie once I cleaned up commits, and ported existing plugins to this.
diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c new file mode 100644 index 00000000000..6114ebca545 --- /dev/null +++ b/tests/plugin/inline.c @@ -0,0 +1,183 @@ +/* + * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org> + * + * Demonstrates and tests usage of inline ops. + * + * License: GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include <glib.h> +#include <stdint.h> +#include <stdio.h> + +#include <qemu-plugin.h> + +#define MAX_CPUS 8 + +static uint64_t count_tb; +static uint64_t count_tb_per_vcpu[MAX_CPUS]; +static uint64_t count_tb_inline_per_vcpu[MAX_CPUS]; +static uint64_t count_tb_inline_racy; +static uint64_t count_insn; +static uint64_t count_insn_per_vcpu[MAX_CPUS]; +static uint64_t count_insn_inline_per_vcpu[MAX_CPUS]; +static uint64_t count_insn_inline_racy; +static uint64_t count_mem; +static uint64_t count_mem_per_vcpu[MAX_CPUS]; +static uint64_t count_mem_inline_per_vcpu[MAX_CPUS]; +static uint64_t count_mem_inline_racy; +static GMutex tb_lock; +static GMutex insn_lock; +static GMutex mem_lock; + +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION; + +static uint64_t collect_per_vcpu(uint64_t *values) +{ + uint64_t count = 0; + for (int i = 0; i < MAX_CPUS; ++i) { + count += values[i]; + } + return count; +} + +static void stats_insn(void) +{ + const uint64_t expected = count_insn; + const uint64_t per_vcpu = collect_per_vcpu(count_insn_per_vcpu); + const uint64_t inl_per_vcpu = collect_per_vcpu(count_insn_inline_per_vcpu); + printf("insn: %" PRIu64 "\n", expected); + printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu); + printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu); + printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy); + g_assert(expected > 0); + g_assert(per_vcpu == expected); + g_assert(inl_per_vcpu == expected); + g_assert(count_insn_inline_racy <= expected); +} + +static void stats_tb(void) +{ + const uint64_t expected = count_tb; + const uint64_t per_vcpu = collect_per_vcpu(count_tb_per_vcpu); + const uint64_t inl_per_vcpu = collect_per_vcpu(count_tb_inline_per_vcpu); + printf("tb: %" PRIu64 "\n", expected); + printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu); + printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu); + printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy); + g_assert(expected > 0); + g_assert(per_vcpu == expected); + g_assert(inl_per_vcpu == expected); + g_assert(count_tb_inline_racy <= expected); +} + +static void stats_mem(void) +{ + const uint64_t expected = count_mem; + const uint64_t per_vcpu = collect_per_vcpu(count_mem_per_vcpu); + const uint64_t inl_per_vcpu = collect_per_vcpu(count_mem_inline_per_vcpu); + printf("mem: %" PRIu64 "\n", expected); + printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu); + printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu); + printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy); + g_assert(expected > 0); + g_assert(per_vcpu == expected); + g_assert(inl_per_vcpu == expected); + g_assert(count_mem_inline_racy <= expected); +} + +static void plugin_exit(qemu_plugin_id_t id, void *udata) +{ + for (int i = 0; i < MAX_CPUS; ++i) { + const uint64_t tb = count_tb_per_vcpu[i]; + const uint64_t tb_inline = count_tb_inline_per_vcpu[i]; + const uint64_t insn = count_insn_per_vcpu[i]; + const uint64_t insn_inline = count_insn_inline_per_vcpu[i]; + const uint64_t mem = count_mem_per_vcpu[i]; + const uint64_t mem_inline = count_mem_inline_per_vcpu[i]; + printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | " + "insn (%" PRIu64 ", %" PRIu64 ") | " + "mem (%" PRIu64 ", %" PRIu64 ")" + "\n", + i, tb, tb_inline, insn, insn_inline, mem, mem_inline); + g_assert(tb == tb_inline); + g_assert(insn == insn_inline); + g_assert(mem == mem_inline); + } + + stats_tb(); + stats_insn(); + stats_mem(); +} + +static void vcpu_tb_exec(unsigned int cpu_index, void *udata) +{ + count_tb_per_vcpu[cpu_index]++; + g_mutex_lock(&tb_lock); + count_tb++; + g_mutex_unlock(&tb_lock); +} + +static void vcpu_insn_exec(unsigned int cpu_index, void *udata) +{ + count_insn_per_vcpu[cpu_index]++; + g_mutex_lock(&insn_lock); + count_insn++; + g_mutex_unlock(&insn_lock); +} + +static void vcpu_mem_access(unsigned int cpu_index, + qemu_plugin_meminfo_t info, + uint64_t vaddr, + void *userdata) +{ + count_mem_per_vcpu[cpu_index]++; + g_mutex_lock(&mem_lock); + count_mem++; + g_mutex_unlock(&mem_lock); +} + +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb) +{ + qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec, + QEMU_PLUGIN_CB_NO_REGS, 0); + qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64, + &count_tb_inline_racy, 1); + qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu( + tb, QEMU_PLUGIN_INLINE_ADD_U64, + count_tb_inline_per_vcpu, sizeof(uint64_t), 1); + + for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) { + struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx); + qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec, + QEMU_PLUGIN_CB_NO_REGS, 0); + qemu_plugin_register_vcpu_insn_exec_inline( + insn, QEMU_PLUGIN_INLINE_ADD_U64, + &count_insn_inline_racy, 1); + qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu( + insn, QEMU_PLUGIN_INLINE_ADD_U64, + count_insn_inline_per_vcpu, sizeof(uint64_t), 1); + qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access, + QEMU_PLUGIN_CB_NO_REGS, + QEMU_PLUGIN_MEM_RW, 0); + qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW, + QEMU_PLUGIN_INLINE_ADD_U64, + &count_mem_inline_racy, 1); + qemu_plugin_register_vcpu_mem_inline_per_vcpu( + insn, QEMU_PLUGIN_MEM_RW, + QEMU_PLUGIN_INLINE_ADD_U64, + count_mem_inline_per_vcpu, sizeof(uint64_t), 1); + } +} + +QEMU_PLUGIN_EXPORT +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info, + int argc, char **argv) +{ + g_assert(info->system.smp_vcpus <= MAX_CPUS); + qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans); + qemu_plugin_register_atexit_cb(id, plugin_exit, NULL); + + return 0; +} diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build index e18183aaeda..9eece5bab51 100644 --- a/tests/plugin/meson.build +++ b/tests/plugin/meson.build @@ -1,6 +1,6 @@ t = [] if get_option('plugins') - foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall'] + foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall'] if host_os == 'windows' t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c', include_directories: '../../include/qemu',
For now, it simply performs instruction, bb and mem count, and ensure that inline vs callback versions have the same result. Later, we'll extend it when new inline operations are added. Use existing plugins to test everything works is a bit cumbersome, as different events are treated in different plugins. Thus, this new one. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> --- tests/plugin/inline.c | 183 +++++++++++++++++++++++++++++++++++++++ tests/plugin/meson.build | 2 +- 2 files changed, 184 insertions(+), 1 deletion(-) create mode 100644 tests/plugin/inline.c