diff mbox series

[RFC,2/2] platform/x86/amd: pmf: Add manual control support

Message ID 20240926025955.1728766-3-superm1@kernel.org
State New
Headers show
Series "custom" ACPI platform profile support | expand

Commit Message

Mario Limonciello Sept. 26, 2024, 2:59 a.m. UTC
From: Mario Limonciello <mario.limonciello@amd.com>

A number of users resort to using reverse engineered software like
ryzenadj to manipulate debugging interfaces for modifying APU settings.

At a glance these tools are useful, but the problem is they break
state machines in other software such as the PMF driver or the OEM
EC.

Offer a knob for PMF to allow 'manual control' which will users can
directly change things like fPPT and sPPT. As this can be harmful for
a system to try to push limits outside of a thermal design, taint the
kernel and show a critical message when in use.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 Documentation/ABI/testing/sysfs-amd-pmf | 10 +++
 drivers/platform/x86/amd/pmf/Makefile   |  1 +
 drivers/platform/x86/amd/pmf/core.c     |  9 +++
 drivers/platform/x86/amd/pmf/manual.c   | 88 +++++++++++++++++++++++++
 drivers/platform/x86/amd/pmf/pmf.h      |  5 ++
 drivers/platform/x86/amd/pmf/sps.c      |  4 ++
 6 files changed, 117 insertions(+)
 create mode 100644 drivers/platform/x86/amd/pmf/manual.c

Comments

Antheas Kapenekakis Sept. 27, 2024, 8:44 a.m. UTC | #1
Hi Mario,

On Thu, 26 Sept 2024 at 20:09, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 9/26/2024 06:00, Antheas Kapenekakis wrote:
> > Hi Shyam,
> >
> >> I appreciate the proposal, but giving users this control seems similar
> >> to using tools like Ryzenadj or Ryzen Master, which are primarily for
> >> overclocking. Atleast Ryzen Master has a dedicated mailbox with PMFW.
> >
> > In the laptop market I agree with you. However, in the handheld
> > market, users expect to be able to lower the power envelope of the
> > device on demand in a granular fashion. As the battery drop is
> > measured in Watts, tying a slider to Watts is a natural solution.
> >
> > Most of the time, when those controls are used it is to limit the
> > thermal envelope of the device, not exceed it. We want to remove the
> > use of these tools and allow manufacturers the ability to customise
> > the power envelope they offer to users.
> >
> >> While some existing PMF mailboxes are being deprecated, and SPL has
> >> been removed starting with Strix[1] due to the APTS method.
>
> Hmm, what do you think about about offering a wrapper for this for
> people to manipulate?

Having a single call that sets everything would be my preference, so I
would support this.

Although looking at [1], seems like it will be separate calls anway.

Link: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd/pmf/sps.c#L193
[1]

> >> It's important to use some settings together rather than individually
> >> (which the users might not be aware of). For instance, updating SPL
> >> requires corresponding updates to STT limits to avoid negative outcomes.
> >
>
> The tough part about striking the balance here is how would an end user
> know what values to set in tandem.  I think a lot of people just assume
> they can "just change SPL" and that's it and have a good experience.

Spoken like a true linux user. Users do not know what a kernel or
sysfs is and they will not be touching any of this. It just needs to
be baby-proofed enough so for the 5 users that do it is safe.

Let us focus on the problem here. There are currently around 5
manufacturers shipping products in a space where granular TDP control
is expected and where AMD has not provided them with a solution.

And for this, there are two issues. First, there is no standard for
granular TDP control tuned by the manufacturer. Second, when such a
standard is created, there is a healthy pool of devices in the market
where the manufacturer cannot be expected to provide an updated BIOS
for them.
Therefore, we need a proposal where 1) the manufacturer can provide
granular TDP controls in a fully customizable manner (e.g., with a LUT
that controls everything), and 2) for devices that will not get that
tuning, a custom profile setting that will expose important tuning
parameters to userspace so that we can retrofit it and extend the
their lifespan.

> > This suggestion was referring to a combined slider, much like the
> > suggestion below. So STT limits would be modified in tandem,
> > respecting manufacturer profiles. See comments below.
> >
> > If you find the name SPL disagreeable, it could be named {tdp,
> > tdp_min, tdp_max}. This is the solution used by Valve on the Steam
> > Deck (power1_cap{+min,max}, power2_cap{+min,max}).
>
> It's not so much that it's disagreeable term but Shyam is pointing out
> that SPL is no longer a valid argument to the platform mailbox.

I'd tend to agree since the current mailbox targets that I know of are
STAPM limit (for STAPM) and skin temp limit (for STT). Since you used
the term SPL, I carried that over to the proposal, but it would not
control SPL. Instead it would control both of the former, including
sPPT and fPPT (if that is still supported; unclear in [1]; but
disabling boost will be a requirement).

> >
> > In addition, boost is seen as detrimental to handheld devices, with
> > most users disliking and disabling it. Steam Deck does not use boost.
> > It is disabled by Steam (power1_cap == power2_cap). So STT and STAPM
> > are not very relevant. In addition, Steam Deck van gogh has a more
> > linear response so TDP limits are less required.
> >
> >> Additionally, altering these parameters can exceed thermal limits and
> >> potentially void warranties.
> >>
> >> Considering CnQF, why not let OEMs opt-in and allow the algorithm to
> >> manage power budgets, rather than providing these controls to users
> >> from the kernel when userspace tools already exist?
>
> The problem is all of the RE tools rely upon PCI config space access or
> /dev/mem access to manipulate undocumented register offsets.
>
> When the system is under kernel lockdown (such as with distro kernel
> when UEFI secure boot is turned on) then those interfaces are
> intentionally locked down.
>
> That's why I'm hoping we can strike some sort of balance at the request
> for some advanced users being able to tune values in a predictable
> fashion while also allowing OEMs to configure policies like CNQF or
> Smart PC when users for users that don't tinker.

I will have to repeat that as far as the handheld market is concerned,
we are not talking about advanced users. Instead, we are talking for
all users.

> >>
> >> Please note that on systems with Smart PC enabled, if users manually
> >> adjust the system thermals, it can lead to the thermal controls
> >> becoming unmanageable.
>
> Yeah; that's why as this RFC patch I didn't let CNQF, ITS or Smart PC
> initialize.  Basically if manual control is enabled then "SPS" and
> manual sysfs control is the only thing available.

Sounds like you have your work cut out for you if the custom profile
is supposed to dynamically load.

> >
> > Much like you, we dislike AutoTDP solutions that use e.g., RyzenAdj, as they:
> >   1) Do not respect manufacturer limits
> >   2) Cause system instability such as stutters when setting values
> >   3) Can cause crashes if they access the mailbox at the same time as
> > the AMD drm driver.
> >
>
> Yes.  Exactly why I feel that if we offer an interface instead people
> can use such an interface instead of these tools.

While (in Bazzite) we have a solution that works very reliably and is
safe (not RyzenAdj), we have to begin cleaning up loose ends so that
we can 1) enable TDP control in a stock secureboot kernel with early
lockdown enabled (e.g., Fedora), 2) provide manufacturers with certain
reliability guarantees so they can warranty units running under linux,
3) prepare our solutions for being packaged in upstream distribution
repositories (Debian, Fedora), where using an existing solution is a
blocker as they do not provide or should provide such hardware access
when secure boot is enabled.

Though, since manufacturers like Ayaneo currently use RyzenAdj in
Windows, I might be nitpicking too much.

As for why Secure Boot is important, let add [2], where Rockstar
points the finger to Valve for BattlEye not working. Much of the
anticheat issue is due to the fact that it is trivial to cheat without
having a secureboot enabled kernel with the early lockdown flag
engaged, as it allows both custom drivers and userspace to gain access
to sensitive process memory in a way that is undetectable by
anticheat. Vanguard does not work in Linux for much of the same
reason.

Steam Deck is, for those uninitiated, a device that does not carry
Secureboot keys, and SteamOS is a distribution that does not support
Secure boot. Although both can change (Steam Deck BIOS supports secure
boot). However, Bazzite is secure boot enabled and we encourage our
users to leave it enabled, although for the moment they have to enroll
our MOK key, which most of them do.

Antheas

Link: https://www.pcgamer.com/games/grand-theft-auto/gta-online-is-no-longer-compatible-with-steam-deck-thanks-to-its-new-anti-cheat-software-despite-battleye-having-an-opt-in-system-for-this-sort-of-thing/
[2]
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-amd-pmf b/Documentation/ABI/testing/sysfs-amd-pmf
index 7fc0e1c2b76b..6f3d5cbf443f 100644
--- a/Documentation/ABI/testing/sysfs-amd-pmf
+++ b/Documentation/ABI/testing/sysfs-amd-pmf
@@ -11,3 +11,13 @@  Description:	Reading this file tells if the AMD Platform Management(PMF)
 		To turn off CnQF user can write "off" to the sysfs node.
 		Note: Systems that support auto mode will not have this sysfs file
 		available.
+
+What:		/sys/devices/platform/*/{spl, fppt, sppt, sppt_apu_only, stt_min, stt_limit_apu, stt_skip_temp}
+Date:		December 2024
+Contact:	Mario Limonciello <mario.limonciello@amd.com>
+Description:	Manual control of AMD PMF APU coefficients
+		.
+		These files are used to manually control the APU coefficients.
+		In order to write to these files the module most be
+		loaded with manual_control=1 and the user must write "custom"
+		to the ACPI platform profile.
diff --git a/drivers/platform/x86/amd/pmf/Makefile b/drivers/platform/x86/amd/pmf/Makefile
index 7d6079b02589..81444d6f4428 100644
--- a/drivers/platform/x86/amd/pmf/Makefile
+++ b/drivers/platform/x86/amd/pmf/Makefile
@@ -7,4 +7,5 @@ 
 obj-$(CONFIG_AMD_PMF) += amd-pmf.o
 amd-pmf-objs := core.o acpi.o sps.o \
 		auto-mode.o cnqf.o \
+		manual.o \
 		tee-if.o spc.o pmf-quirks.o
diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
index d6af0ca036f1..52a68ca094be 100644
--- a/drivers/platform/x86/amd/pmf/core.c
+++ b/drivers/platform/x86/amd/pmf/core.c
@@ -53,6 +53,10 @@  static bool force_load;
 module_param(force_load, bool, 0444);
 MODULE_PARM_DESC(force_load, "Force load this driver on supported older platforms (experimental)");
 
+bool pmf_manual_control;
+module_param_named(manual_control, pmf_manual_control, bool, 0444);
+MODULE_PARM_DESC(manual_control, "Expose manual control knobs (experimental)");
+
 static int amd_pmf_pwr_src_notify_call(struct notifier_block *nb, unsigned long event, void *data)
 {
 	struct amd_pmf_dev *pmf = container_of(nb, struct amd_pmf_dev, pwr_src_notifier);
@@ -349,6 +353,10 @@  static void amd_pmf_init_features(struct amd_pmf_dev *dev)
 		dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
 	}
 
+	if (pmf_manual_control) {
+		amd_pmf_init_manual_control(dev);
+		return;
+	}
 	amd_pmf_init_smart_pc(dev);
 	if (dev->smart_pc_enabled) {
 		dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
@@ -485,6 +493,7 @@  static void amd_pmf_remove(struct platform_device *pdev)
 
 static const struct attribute_group *amd_pmf_driver_groups[] = {
 	&cnqf_feature_attribute_group,
+	&manual_attribute_group,
 	NULL,
 };
 
diff --git a/drivers/platform/x86/amd/pmf/manual.c b/drivers/platform/x86/amd/pmf/manual.c
new file mode 100644
index 000000000000..b33fc3cd8d61
--- /dev/null
+++ b/drivers/platform/x86/amd/pmf/manual.c
@@ -0,0 +1,88 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * AMD Platform Management Framework Driver
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Mario Limonciello <mario.limonciello@amd.com>
+ */
+
+#include "pmf.h"
+
+#define pmf_manual_attribute(_name, _set_command, _get_command)		\
+static ssize_t _name##_store(struct device *d,				\
+			     struct device_attribute *attr,		\
+			     const char *buf, size_t count)		\
+{									\
+	struct amd_pmf_dev *dev = dev_get_drvdata(d);			\
+	uint val;							\
+									\
+	if (dev->current_profile != PLATFORM_PROFILE_CUSTOM) {		\
+		dev_warn_once(dev->dev,					\
+			      "Manual control is disabled, please set "	\
+			      "platform profile to custom.\n");		\
+		return -EINVAL;						\
+	}								\
+									\
+	if (kstrtouint(buf, 10, &val) < 0)				\
+		return -EINVAL;						\
+									\
+	amd_pmf_send_cmd(dev, _set_command, false, val, NULL);		\
+									\
+	return count;							\
+}									\
+static ssize_t _name##_show(struct device *d,				\
+			   struct device_attribute *attr,		\
+			   char *buf)					\
+{									\
+	struct amd_pmf_dev *dev = dev_get_drvdata(d);			\
+	uint val;							\
+									\
+	amd_pmf_send_cmd(dev, _get_command, true, ARG_NONE, &val);	\
+									\
+	return sysfs_emit(buf, "%u\n", val);				\
+}
+
+pmf_manual_attribute(spl, SET_SPL, GET_SPL);
+static DEVICE_ATTR_RW(spl);
+pmf_manual_attribute(fppt, SET_FPPT, GET_FPPT);
+static DEVICE_ATTR_RW(fppt);
+pmf_manual_attribute(sppt, SET_SPPT, GET_SPPT);
+static DEVICE_ATTR_RW(sppt);
+pmf_manual_attribute(sppt_apu_only, SET_SPPT_APU_ONLY, GET_SPPT_APU_ONLY);
+static DEVICE_ATTR_RW(sppt_apu_only);
+pmf_manual_attribute(stt_min, SET_STT_MIN_LIMIT, GET_STT_MIN_LIMIT);
+static DEVICE_ATTR_RW(stt_min);
+pmf_manual_attribute(stt_limit_apu, SET_STT_LIMIT_APU, GET_STT_LIMIT_APU);
+static DEVICE_ATTR_RW(stt_limit_apu);
+pmf_manual_attribute(stt_skin_temp, SET_STT_LIMIT_HS2, GET_STT_LIMIT_HS2);
+static DEVICE_ATTR_RW(stt_skin_temp);
+
+static umode_t manual_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
+{
+	return pmf_manual_control ? 0660 : 0;
+}
+
+static struct attribute *manual_attrs[] = {
+	&dev_attr_spl.attr,
+	&dev_attr_fppt.attr,
+	&dev_attr_sppt.attr,
+	&dev_attr_sppt_apu_only.attr,
+	&dev_attr_stt_min.attr,
+	&dev_attr_stt_limit_apu.attr,
+	&dev_attr_stt_skin_temp.attr,
+	NULL,
+};
+
+const struct attribute_group manual_attribute_group = {
+	.attrs = manual_attrs,
+	.is_visible = manual_attr_is_visible,
+};
+
+void amd_pmf_init_manual_control(struct amd_pmf_dev *dev)
+{
+	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+	pr_crit("Manual PMF control is enabled, please disable it before "
+		"reporting any bugs unrelated to PMF.\n");
+}
diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
index 8ce8816da9c1..ca3df63cf190 100644
--- a/drivers/platform/x86/amd/pmf/pmf.h
+++ b/drivers/platform/x86/amd/pmf/pmf.h
@@ -798,4 +798,9 @@  void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
 /* Quirk infrastructure */
 void amd_pmf_quirks_init(struct amd_pmf_dev *dev);
 
+/* Manual configuration */
+extern bool pmf_manual_control;
+extern const struct attribute_group manual_attribute_group;
+void amd_pmf_init_manual_control(struct amd_pmf_dev *dev);
+
 #endif /* PMF_H */
diff --git a/drivers/platform/x86/amd/pmf/sps.c b/drivers/platform/x86/amd/pmf/sps.c
index 92f7fb22277d..6db88e523a86 100644
--- a/drivers/platform/x86/amd/pmf/sps.c
+++ b/drivers/platform/x86/amd/pmf/sps.c
@@ -305,6 +305,8 @@  int amd_pmf_get_pprof_modes(struct amd_pmf_dev *pmf)
 	case PLATFORM_PROFILE_LOW_POWER:
 		mode = POWER_MODE_POWER_SAVER;
 		break;
+	case PLATFORM_PROFILE_CUSTOM:
+		return 0;
 	default:
 		dev_err(pmf->dev, "Unknown Platform Profile.\n");
 		return -EOPNOTSUPP;
@@ -412,6 +414,8 @@  int amd_pmf_init_sps(struct amd_pmf_dev *dev)
 	set_bit(PLATFORM_PROFILE_LOW_POWER, dev->pprof.choices);
 	set_bit(PLATFORM_PROFILE_BALANCED, dev->pprof.choices);
 	set_bit(PLATFORM_PROFILE_PERFORMANCE, dev->pprof.choices);
+	if (pmf_manual_control)
+		set_bit(PLATFORM_PROFILE_CUSTOM, dev->pprof.choices);
 
 	/* Create platform_profile structure and register */
 	err = platform_profile_register(&dev->pprof);