diff mbox series

[1/4] ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8

Message ID 20210519170357.58410-1-jeremy.szu@canonical.com
State Accepted
Commit 0e68c4b11f1e66d211ad242007e9f1076a6b7709
Headers show
Series [1/4] ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8 | expand

Commit Message

Jeremy Szu May 19, 2021, 5:03 p.m. UTC
The HP EliteBook 855 G8 Notebook PC is using ALC285 codec which needs
ALC285_FIXUP_HP_MUTE_LED fixup to make it works. After applying the
fixup, the mute/micmute LEDs work good.

Signed-off-by: Jeremy Szu <jeremy.szu@canonical.com>
---
 sound/pci/hda/patch_realtek.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jeremy Szu May 27, 2021, 2 a.m. UTC | #1
Hi Takashi,

Would you please help to review these quirks? Many thanks.


On Thu, May 20, 2021 at 1:04 AM Jeremy Szu <jeremy.szu@canonical.com> wrote:
>
> The HP EliteBook 855 G8 Notebook PC is using ALC285 codec which needs
> ALC285_FIXUP_HP_MUTE_LED fixup to make it works. After applying the
> fixup, the mute/micmute LEDs work good.
>
> Signed-off-by: Jeremy Szu <jeremy.szu@canonical.com>
> ---
>  sound/pci/hda/patch_realtek.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
> index 552e2cb73291..9d68f591c6bf 100644
> --- a/sound/pci/hda/patch_realtek.c
> +++ b/sound/pci/hda/patch_realtek.c
> @@ -8291,6 +8291,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
>         SND_PCI_QUIRK(0x103c, 0x87f7, "HP Spectre x360 14", ALC245_FIXUP_HP_X360_AMP),
>         SND_PCI_QUIRK(0x103c, 0x8846, "HP EliteBook 850 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
>         SND_PCI_QUIRK(0x103c, 0x884c, "HP EliteBook 840 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
> +       SND_PCI_QUIRK(0x103c, 0x8896, "HP EliteBook 855 G8 Notebook PC", ALC285_FIXUP_HP_MUTE_LED),
>         SND_PCI_QUIRK(0x103c, 0x8898, "HP EliteBook 845 G8 Notebook PC", ALC285_FIXUP_HP_LIMIT_INT_MIC_BOOST),
>         SND_PCI_QUIRK(0x1043, 0x103e, "ASUS X540SA", ALC256_FIXUP_ASUS_MIC),
>         SND_PCI_QUIRK(0x1043, 0x103f, "ASUS TX300", ALC282_FIXUP_ASUS_TX300),
> --
> 2.31.1
>
Takashi Iwai May 27, 2021, 6:14 a.m. UTC | #2
On Thu, 27 May 2021 04:00:34 +0200,
Jeremy Szu wrote:
> 
> Hi Takashi,
> 
> Would you please help to review these quirks? Many thanks.

Sorry, it was overlooked.

Now applied all four patches.  Thanks.


Takashi

> 
> 
> On Thu, May 20, 2021 at 1:04 AM Jeremy Szu <jeremy.szu@canonical.com> wrote:
> >
> > The HP EliteBook 855 G8 Notebook PC is using ALC285 codec which needs
> > ALC285_FIXUP_HP_MUTE_LED fixup to make it works. After applying the
> > fixup, the mute/micmute LEDs work good.
> >
> > Signed-off-by: Jeremy Szu <jeremy.szu@canonical.com>
> > ---
> >  sound/pci/hda/patch_realtek.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
> > index 552e2cb73291..9d68f591c6bf 100644
> > --- a/sound/pci/hda/patch_realtek.c
> > +++ b/sound/pci/hda/patch_realtek.c
> > @@ -8291,6 +8291,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
> >         SND_PCI_QUIRK(0x103c, 0x87f7, "HP Spectre x360 14", ALC245_FIXUP_HP_X360_AMP),
> >         SND_PCI_QUIRK(0x103c, 0x8846, "HP EliteBook 850 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
> >         SND_PCI_QUIRK(0x103c, 0x884c, "HP EliteBook 840 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
> > +       SND_PCI_QUIRK(0x103c, 0x8896, "HP EliteBook 855 G8 Notebook PC", ALC285_FIXUP_HP_MUTE_LED),
> >         SND_PCI_QUIRK(0x103c, 0x8898, "HP EliteBook 845 G8 Notebook PC", ALC285_FIXUP_HP_LIMIT_INT_MIC_BOOST),
> >         SND_PCI_QUIRK(0x1043, 0x103e, "ASUS X540SA", ALC256_FIXUP_ASUS_MIC),
> >         SND_PCI_QUIRK(0x1043, 0x103f, "ASUS TX300", ALC282_FIXUP_ASUS_TX300),
> > --
> > 2.31.1
> >
> 
> 
> -- 
> Sincerely,
> Jeremy Su
>
Alexander Sergeyev Jan. 13, 2022, 9:19 p.m. UTC | #3
On Thu, Jan 13, 2022 at 09:31:43PM +0300, Alexander Sergeyev wrote:
>This is the only change in /proc/asound after the first arecord run. Overall, 
>seems like a small annoyance, but I'm curious -- is it how it's supposed to 
>work?

Just to clarify, this particular Digital control behavior is the same on live 
Ubuntu (which uses modules and supposedly works correctly).

Also, I've posted a patch for review. It adds the quirk for my PCI subdevice id 
which is not present in the current set of SND_PCI_QUIRK. The former micmute 
quirk was picked up by the codec SSID and not PCI id. The patch fixes the 
speakers problem for me (including the built-in drivers usecase).
Takashi Iwai Jan. 14, 2022, 4:37 p.m. UTC | #4
On Thu, 13 Jan 2022 19:31:41 +0100,
Alexander Sergeyev wrote:
> 
> On Thu, Jan 13, 2022 at 08:14:29AM +0100, Takashi Iwai wrote:
> >> First, about unbind and bind via sysfs -- attempts to unbind the
> >> HD-audio controller immediately trigger BUGs:
> >> Is it normal/expected?
> >
> >A sort of. The sysfs unbind is little tested and may be still buggy
> >if done during the stream operation.
> >
> >To be sure, could you check with my latest sound.git tree for-linus
> >branch?  There are a few fixes that harden the dynamic unbind.
> 
> I assume that the referred repository is the one at [1]. I've tried
> 081c73701ef0 "ALSA: hda: intel-dsp-config: reorder the config
> table". It crashed with nearly identical logs.

OK, then it's still something to do with the led cdev
unregisteration.

Could you try the patch below?

> >> 1) Coeff 0x0b is flapping between 0x8003 and 0x7770 and does not seem
> >> to have any effect in both non-working and working versions. Not sure
> >> about this, maybe microphone is not operational since I haven't
> >> checked it yet.
> 
> I got some time to poke the internal microphone. It works, but
> oddities are there as well. Initially I get "Mic Boost", "Capture" and
> "Internal Mic Boost" controls in alsamixer. When I run arecord,
> "Digital" control appears, but it cannot be changed until arecord is
> stopped. Subsequent arecord calls do not lock "Digital" control. This
> control affects sensitivity of the microphone and seems useful.

That's presumably an alsa-lib softvol stuff.  It's created
dynamically.  So not really a kernel problem.


Takashi

---
diff --git a/sound/pci/hda/hda_generic.c b/sound/pci/hda/hda_generic.c
index 3bf5e3410703..503cd979c908 100644
--- a/sound/pci/hda/hda_generic.c
+++ b/sound/pci/hda/hda_generic.c
@@ -84,13 +84,21 @@ static void free_kctls(struct hda_gen_spec *spec)
 	snd_array_free(&spec->kctls);
 }
 
-static void snd_hda_gen_spec_free(struct hda_gen_spec *spec)
+static void snd_hda_gen_spec_free(struct hda_codec *codec)
 {
+	struct hda_gen_spec *spec = codec->spec;
+
 	if (!spec)
 		return;
 	free_kctls(spec);
 	snd_array_free(&spec->paths);
 	snd_array_free(&spec->loopback_list);
+#ifdef CONFIG_SND_HDA_GENERIC_LEDS
+	if (spec->led_cdevs[LED_AUDIO_MUTE])
+		led_classdev_unregister(spec->led_cdevs[LED_AUDIO_MUTE]);
+	if (spec->led_cdevs[LED_AUDIO_MICMUTE])
+		led_classdev_unregister(spec->led_cdevs[LED_AUDIO_MICMUTE]);
+#endif
 }
 
 /*
@@ -3922,7 +3930,9 @@ static int create_mute_led_cdev(struct hda_codec *codec,
 						enum led_brightness),
 				bool micmute)
 {
+	struct hda_gen_spec *spec = codec->spec;
 	struct led_classdev *cdev;
+	int err;
 
 	cdev = devm_kzalloc(&codec->core.dev, sizeof(*cdev), GFP_KERNEL);
 	if (!cdev)
@@ -3935,7 +3945,11 @@ static int create_mute_led_cdev(struct hda_codec *codec,
 	cdev->brightness = ledtrig_audio_get(micmute ? LED_AUDIO_MICMUTE : LED_AUDIO_MUTE);
 	cdev->flags = LED_CORE_SUSPENDRESUME;
 
-	return devm_led_classdev_register(&codec->core.dev, cdev);
+	err = led_classdev_register(&codec->core.dev, cdev);
+	if (err < 0)
+		return err;
+	spec->led_cdevs[micmute ? LED_AUDIO_MICMUTE : LED_AUDIO_MUTE] = cdev;
+	return 0;
 }
 
 /**
@@ -5998,7 +6012,7 @@ EXPORT_SYMBOL_GPL(snd_hda_gen_init);
 void snd_hda_gen_free(struct hda_codec *codec)
 {
 	snd_hda_apply_fixup(codec, HDA_FIXUP_ACT_FREE);
-	snd_hda_gen_spec_free(codec->spec);
+	snd_hda_gen_spec_free(codec);
 	kfree(codec->spec);
 	codec->spec = NULL;
 }
diff --git a/sound/pci/hda/hda_generic.h b/sound/pci/hda/hda_generic.h
index 8e1bc8ea74fc..34eba40cc6e6 100644
--- a/sound/pci/hda/hda_generic.h
+++ b/sound/pci/hda/hda_generic.h
@@ -294,6 +294,9 @@ struct hda_gen_spec {
 				   struct hda_jack_callback *cb);
 	void (*mic_autoswitch_hook)(struct hda_codec *codec,
 				    struct hda_jack_callback *cb);
+
+	/* leds */
+	struct led_classdev *led_cdevs[NUM_AUDIO_LEDS];
 };
 
 /* values for add_stereo_mix_input flag */
Alexander Sergeyev Jan. 15, 2022, 3:22 p.m. UTC | #5
On Sat, Jan 15, 2022 at 08:55:28AM +0100, Takashi Iwai wrote:
> > This patch solved the BUG problem. But after unbind/bind micmute LED
> > stopped working. Speakers and mute LED are fine though.
> Does the corresponding sysfs entry exist in /sys/class/leds/*?

Yes.

> And can you control LED over there?

After "out of range cmd" messages the sysfs entry remains present, but writing 
ones to the brightness file stops to produce any effect.

Actually, the timing issues are present here as well. Sometimes unbind & bind 
works fine. But fails on the second round.

> This seems to be a bogus COEF.  But I have no idea from where this
> comes.  The values look completely wrong.
> I guess you'd need to put some debug prints to trace down how those
> are triggered...

Okay, I'll try. It's going to take some time though.
Alexander Sergeyev Jan. 22, 2022, 7:05 p.m. UTC | #6
On Wed, Jan 19, 2022 at 12:32:51PM +0300, Alexander Sergeyev wrote:
> No, I mean "IO_PAGE_FAULT" and "out of range cmd" don't appear on every 
> unbind & bind. Sometimes it works cleanly.

Unbind & bind generally works without error messages on the first attempt. The 
second unbind & bind tends to generate io page faults like "AMD-Vi: Event 
logged [IO_PAGE_FAULT ...]", but might work fine as well.

In some cases "snd_hda_codec_realtek: out of range cmd" errors are triggered in 
addition to io page faults.

> This seems to be a bogus COEF. But I have no idea from where this
> comes. The values look completely wrong.

In such cases unexpected COEF values are coming from COEF reads during 
read/write pairs that implement update operations.

For example (these traces are from added printk statements):

alc_update_coef_led(codec, led {.idx=0x19, .mask=0x2000, .on=0x2000, .off=0x0}, polarity=0, on=1)
alc_update_coefex_idx(codec, nid, coef_idx=0x19, mask=0x2000, bits_set=0x2000)
alc_update_coef_led(codec, led {.idx=0xb, .mask=0x8, .on=0x8, .off=0x0}, polarity=0, on=1)
alc_update_coefex_idx(codec, nid, coef_idx=0xb, mask=0x8, bits_set=0x8)
snd_hdac_codec_write(hdac, nid, flags=0, verb=0x500, parm=0x19) = 0x0
snd_hdac_codec_write(hdac, nid, flags=0, verb=0x500, parm=0xb) = 0x0
snd_hdac_codec_read(hdac, nid, flags=0, verb=0xc00, parm=0x0) = 0x90170110
alc_update_coefex_idx: alc_read_coefex_idx(codec, nid, coef_idx=0xb) returned 0x90170110
alc_update_coefex_idx: calling alc_write_coefex_idx(codec, nid, coef_idx=0xb, coef_val=0x90170118)
snd_hdac_codec_read(hdac, nid, flags=0, verb=0xc00, parm=0x0) = 0x0
alc_update_coefex_idx: alc_read_coefex_idx(codec, nid, coef_idx=0x19) returned 0x0
alc_update_coefex_idx: calling alc_write_coefex_idx(codec, nid, coef_idx=0x19, coef_val=0x2000)
snd_hdac_codec_write(hdac, nid, flags=0, verb=0x500, parm=0xb) = 0x0
snd_hda_codec_realtek hdaudioC1D0: out of range cmd 0:20:400:90170118
snd_hdac_codec_write(hdac, nid, flags=0, verb=0x400, parm=0x90170118) = 0xffffffff

Then I've specifically added printk on alc_update_coefex_idx entry and exit:

[4.036211] alc_update_coefex_idx(codec, nid, coef_idx=0x10, mask=0x200, bits_set=0x0): entering
[4.036503] alc_update_coefex_idx(codec, nid, coef_idx=0x10, mask=0x200, bits_set=0x0): exiting
[4.036543] alc_update_coefex_idx(codec, nid, coef_idx=0xb, mask=0x8, bits_set=0x0): entering
[4.036546] alc_update_coefex_idx(codec, nid, coef_idx=0x19, mask=0x2000, bits_set=0x0): entering
[4.037139] alc_update_coefex_idx(codec, nid, coef_idx=0xb, mask=0x8, bits_set=0x0): exiting
[4.037609] alc_update_coefex_idx(codec, nid, coef_idx=0x19, mask=0x2000, bits_set=0x0): exiting

I'm not sure about kernel log buffering or maybe the device support for 
pipelining, but is it okay that alc_update_coefex_idx() seem to overlap?
Alexander Sergeyev Jan. 22, 2022, 8:56 p.m. UTC | #7
On Sat, Jan 22, 2022 at 10:05:24PM +0300, Alexander Sergeyev wrote:
> I'm not sure about kernel log buffering or maybe the device support for 
> pipelining, but is it okay that alc_update_coefex_idx() seem to overlap?

Given that the CPU number is the same in alc_update_coefex_idx(), it seems 
these calls execution is interrupted and interleaved on the same core.

And, actually, there are two LEDs to set (mute and micmute). Am I onto 
something here?

[4.043235] alc_update_coefex_idx(codec, nid, coef_idx=0xb, mask=0x8, bits_set=0x0): entering
[4.043237] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G                T 5.16.0-rc1-00001-g5c38c8c84e47-dirty #18
[4.043910] Hardware name: HP HP EliteBook 855 G8 Notebook PC/8895, BIOS T82 Ver. 01.06.03 09/17/2021
[4.044559] Workqueue: events set_brightness_delayed
[4.044559] Call Trace:
[4.044559]  <TASK>
[4.046289]  dump_stack_lvl+0x34/0x4c
[4.046289]  alc_update_coefex_idx+0x34/0x7a
[4.046289]  coef_mute_led_set+0x48/0x56
[4.046289]  set_brightness_delayed+0x6f/0xb0
[4.046289]  process_one_work+0x1e1/0x380
[4.046289]  worker_thread+0x4e/0x3b0
[4.046289]  ? rescuer_thread+0x370/0x370
[4.046289]  kthread+0x145/0x170
[4.046289]  ? set_kthread_struct+0x50/0x50
[4.046289]  ret_from_fork+0x22/0x30
[4.046289]  </TASK>
[4.052793] alc_update_coefex_idx(codec, nid, coef_idx=0x19, mask=0x2000, bits_set=0x0): entering
[4.052794] CPU: 0 PID: 171 Comm: kworker/0:2 Tainted: G                T 5.16.0-rc1-00001-g5c38c8c84e47-dirty #18
[4.053363] Hardware name: HP HP EliteBook 855 G8 Notebook PC/8895, BIOS T82 Ver. 01.06.03 09/17/2021
[4.053364] Workqueue: events set_brightness_delayed
[4.053366] Call Trace:
[4.053366]  <TASK>
[4.053367]  dump_stack_lvl+0x34/0x4c
[4.053790]  alc_update_coefex_idx+0x34/0x7a
[4.053790]  coef_micmute_led_set+0x48/0x56
[4.053790]  set_brightness_delayed+0x6f/0xb0
[4.053790]  process_one_work+0x1e1/0x380
[4.053790]  worker_thread+0x4e/0x3b0
[4.053790]  ? rescuer_thread+0x370/0x370
[4.053790]  kthread+0x145/0x170
[4.053790]  ? set_kthread_struct+0x50/0x50
[4.053790]  ret_from_fork+0x22/0x30
[4.053790]  </TASK>
Takashi Iwai Jan. 26, 2022, 3:24 p.m. UTC | #8
On Sat, 22 Jan 2022 21:56:37 +0100,
Alexander Sergeyev wrote:
> 
> On Sat, Jan 22, 2022 at 10:05:24PM +0300, Alexander Sergeyev wrote:
> > I'm not sure about kernel log buffering or maybe the device support for 
> > pipelining, but is it okay that alc_update_coefex_idx() seem to overlap?
> 
> Given that the CPU number is the same in alc_update_coefex_idx(), it seems 
> these calls execution is interrupted and interleaved on the same core.
> 
> And, actually, there are two LEDs to set (mute and micmute). Am I onto 
> something here?

That's an interesting finding, and yes, such a race is quite
possible.  Below is a quick fix as an attempt to cover it.
Could you give it a try?

BTW, the fix for the unbind problem was submitted.  It's a slightly
more simplified version than what I posted here beforehand.


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: hda: realtek: Fix race at concurrent COEF updates

The COEF access is done with two steps: setting the index then read or
write the data.  When multiple COEF accesses are performed
concurrently, the index and data might be paired unexpectedly.
In most cases, this isn't a big problem as the COEF setup is done at
the initialization, but some dynamic changes like the mute LED may hit
such a race.

For avoiding the racy COEF accesses, this patch introduces a new
mutex coef_mutex to alc_spec, and wrap the COEF accessing functions
with it.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/patch_realtek.c | 61 ++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 11 deletions(-)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 668274e52674..a5677be0a405 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -98,6 +98,7 @@ struct alc_spec {
 	unsigned int gpio_mic_led_mask;
 	struct alc_coef_led mute_led_coef;
 	struct alc_coef_led mic_led_coef;
+	struct mutex coef_mutex;
 
 	hda_nid_t headset_mic_pin;
 	hda_nid_t headphone_mic_pin;
@@ -137,8 +138,8 @@ struct alc_spec {
  * COEF access helper functions
  */
 
-static int alc_read_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
-			       unsigned int coef_idx)
+static int __alc_read_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
+				 unsigned int coef_idx)
 {
 	unsigned int val;
 
@@ -147,28 +148,61 @@ static int alc_read_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
 	return val;
 }
 
+static int alc_read_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
+			       unsigned int coef_idx)
+{
+	struct alc_spec *spec = codec->spec;
+	unsigned int val;
+
+	mutex_lock(&spec->coef_mutex);
+	val = __alc_read_coefex_idx(codec, nid, coef_idx);
+	mutex_unlock(&spec->coef_mutex);
+	return val;
+}
+
 #define alc_read_coef_idx(codec, coef_idx) \
 	alc_read_coefex_idx(codec, 0x20, coef_idx)
 
-static void alc_write_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
-				 unsigned int coef_idx, unsigned int coef_val)
+static void __alc_write_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
+				   unsigned int coef_idx, unsigned int coef_val)
 {
 	snd_hda_codec_write(codec, nid, 0, AC_VERB_SET_COEF_INDEX, coef_idx);
 	snd_hda_codec_write(codec, nid, 0, AC_VERB_SET_PROC_COEF, coef_val);
 }
 
+static void alc_write_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
+				 unsigned int coef_idx, unsigned int coef_val)
+{
+	struct alc_spec *spec = codec->spec;
+
+	mutex_lock(&spec->coef_mutex);
+	__alc_write_coefex_idx(codec, nid, coef_idx, coef_val);
+	mutex_unlock(&spec->coef_mutex);
+}
+
 #define alc_write_coef_idx(codec, coef_idx, coef_val) \
 	alc_write_coefex_idx(codec, 0x20, coef_idx, coef_val)
 
+static void __alc_update_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
+				    unsigned int coef_idx, unsigned int mask,
+				    unsigned int bits_set)
+{
+	unsigned int val = __alc_read_coefex_idx(codec, nid, coef_idx);
+
+	if (val != -1)
+		__alc_write_coefex_idx(codec, nid, coef_idx,
+				       (val & ~mask) | bits_set);
+}
+
 static void alc_update_coefex_idx(struct hda_codec *codec, hda_nid_t nid,
 				  unsigned int coef_idx, unsigned int mask,
 				  unsigned int bits_set)
 {
-	unsigned int val = alc_read_coefex_idx(codec, nid, coef_idx);
+	struct alc_spec *spec = codec->spec;
 
-	if (val != -1)
-		alc_write_coefex_idx(codec, nid, coef_idx,
-				     (val & ~mask) | bits_set);
+	mutex_lock(&spec->coef_mutex);
+	__alc_update_coefex_idx(codec, nid, coef_idx, mask, bits_set);
+	mutex_unlock(&spec->coef_mutex);
 }
 
 #define alc_update_coef_idx(codec, coef_idx, mask, bits_set)	\
@@ -201,13 +235,17 @@ struct coef_fw {
 static void alc_process_coef_fw(struct hda_codec *codec,
 				const struct coef_fw *fw)
 {
+	struct alc_spec *spec = codec->spec;
+
+	mutex_lock(&spec->coef_mutex);
 	for (; fw->nid; fw++) {
 		if (fw->mask == (unsigned short)-1)
-			alc_write_coefex_idx(codec, fw->nid, fw->idx, fw->val);
+			__alc_write_coefex_idx(codec, fw->nid, fw->idx, fw->val);
 		else
-			alc_update_coefex_idx(codec, fw->nid, fw->idx,
-					      fw->mask, fw->val);
+			__alc_update_coefex_idx(codec, fw->nid, fw->idx,
+						fw->mask, fw->val);
 	}
+	mutex_unlock(&spec->coef_mutex);
 }
 
 /*
@@ -1153,6 +1191,7 @@ static int alc_alloc_spec(struct hda_codec *codec, hda_nid_t mixer_nid)
 	codec->spdif_status_reset = 1;
 	codec->forced_resume = 1;
 	codec->patch_ops = alc_patch_ops;
+	mutex_init(&spec->coef_mutex);
 
 	err = alc_codec_rename_from_preset(codec);
 	if (err < 0) {
Alexander Sergeyev Jan. 29, 2022, 2:47 p.m. UTC | #9
On Wed, Jan 26, 2022 at 04:24:36PM +0100, Takashi Iwai wrote:
> > Given that the CPU number is the same in alc_update_coefex_idx(), it seems 
> > these calls execution is interrupted and interleaved on the same core.
> > And, actually, there are two LEDs to set (mute and micmute). Am I onto 
> > something here?
> That's an interesting finding, and yes, such a race is quite
> possible. Below is a quick fix as an attempt to cover it.
> Could you give it a try?

Well, results are somewhat mixed.

With the supplied patch (with a mutex), the original fixup 91502a9a0b0d ("ALSA: 
hda/realtek: fix speakers and micmute on HP 855 G8") is no longer needed for 
speakers to work. So, the original timing issue is identified now.

But unbind-bind problems with IO_PAGE_FAULT and "out of range cmd" are not 
eliminated. IO_PAGE_FAULT are often logged without accompanying "out of range 
cmd". And after adding debugging printk() I haven't managed to trigger "out of 
range cmd" yet. But IO_PAGE_FAULT are more easily triggered.

Are there ways to trace origins of IO_PAGE_FAULT itself?
Alexander Sergeyev Jan. 30, 2022, 11:10 a.m. UTC | #10
On Sat, Jan 29, 2022 at 05:47:07PM +0300, Alexander Sergeyev wrote:
> But unbind-bind problems with IO_PAGE_FAULT and "out of range cmd" are not 
> eliminated. IO_PAGE_FAULT are often logged without accompanying "out of range 
> cmd". And after adding debugging printk() I haven't managed to trigger "out 
> of range cmd" yet. But IO_PAGE_FAULT are more easily triggered.

IO_PAGE_FAULTs go away with CONFIG_IOMMU_DEFAULT_PASSTHROUGH enabled. As I 
understand, this leads to reduced DMA device isolation which is generally not 
desirable. I was initially thinking about races between some delayed code and 
io-memory pages unmapping, but first IO_PAGE_FAULTs (running non-passthrough 
iommu) happen during bind operations as well.

What is also interesting, unbind & bind consistently fails on 31th bind:

echo -n '0000:05:00.6' > /sys/bus/pci/drivers/snd_hda_intel/bind
-bash: echo: write error: No such device

And does not recover from there until a reboot.
Takashi Iwai Jan. 31, 2022, 2:57 p.m. UTC | #11
On Sun, 30 Jan 2022 12:10:20 +0100,
Alexander Sergeyev wrote:
> 
> On Sat, Jan 29, 2022 at 05:47:07PM +0300, Alexander Sergeyev wrote:
> > But unbind-bind problems with IO_PAGE_FAULT and "out of range cmd" are not 
> > eliminated. IO_PAGE_FAULT are often logged without accompanying "out of range 
> > cmd". And after adding debugging printk() I haven't managed to trigger "out 
> > of range cmd" yet. But IO_PAGE_FAULT are more easily triggered.
> 
> IO_PAGE_FAULTs go away with CONFIG_IOMMU_DEFAULT_PASSTHROUGH enabled. As I 
> understand, this leads to reduced DMA device isolation which is generally not 
> desirable. I was initially thinking about races between some delayed code and 
> io-memory pages unmapping, but first IO_PAGE_FAULTs (running non-passthrough 
> iommu) happen during bind operations as well.

That's logical, as IOMMU is bypassed for DMA with that option.

I still don't get what really triggers it.  This won't happen when you
build modules and load/unload the driver instead of dynamic binding?

And the out-of-range access is puzzling, too.  I guess this comes from
the update of a COEF, i.e. it reads a strange value and tries to
update the value based on it.  The problem is that it's no -1 but some
random value.  This might be tied with the IOMMU error, and it might
reading unexpected address, which would be really bad.

In anyway, we need to track down exactly which access triggers those
errors...

> What is also interesting, unbind & bind consistently fails on 31th bind:
> 
> echo -n '0000:05:00.6' > /sys/bus/pci/drivers/snd_hda_intel/bind
> -bash: echo: write error: No such device
> 
> And does not recover from there until a reboot.

This is intended behavior.  The driver has a static device index that
is incremented at each probe, so that the driver may probe multiple
instances.  It'll be tricky to reset this for dynamic binding,
though.  This problem is not only for HD-audio but for most of other
drivers.  But I leave this as is for now, since the dynamic binding is
rarely used for PCI and other buses, so far.


Takashi
Alexander Sergeyev Feb. 5, 2022, 3 p.m. UTC | #12
On Mon, Jan 31, 2022 at 03:57:04PM +0100, Takashi Iwai wrote:
> > IO_PAGE_FAULTs go away with CONFIG_IOMMU_DEFAULT_PASSTHROUGH enabled. As I 
> > understand, this leads to reduced DMA device isolation which is generally 
> > not desirable. I was initially thinking about races between some delayed 
> > code and io-memory pages unmapping, but first IO_PAGE_FAULTs (running 
> > non-passthrough iommu) happen during bind operations as well.

> I still don't get what really triggers it.  This won't happen when you
> build modules and load/unload the driver instead of dynamic binding?

I've built snd_hda_intel as a module, everything else is left built-in.
The initial modprobe and rmmod were clean, but the subsequent modprobe gave 
IO_PAGE_FAULT.

# modprobe snd-hda-intel
snd_hda_intel 0000:05:00.1: bound 0000:05:00.0 (ops amdgpu_dm_audio_component_bind_ops)
input: HD-Audio Generic HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:08.1/0000:05:00.1/sound/card0/input24
input: HD-Audio Generic HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:08.1/0000:05:00.1/sound/card0/input25
input: HD-Audio Generic HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:08.1/0000:05:00.1/sound/card0/input26
input: HD-Audio Generic HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:08.1/0000:05:00.1/sound/card0/input27
snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC285: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:speaker
snd_hda_codec_realtek hdaudioC1D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
snd_hda_codec_realtek hdaudioC1D0:    hp_outs=1 (0x21/0x0/0x0/0x0/0x0)
snd_hda_codec_realtek hdaudioC1D0:    mono: mono_out=0x0
snd_hda_codec_realtek hdaudioC1D0:    inputs:
snd_hda_codec_realtek hdaudioC1D0:      Mic=0x19
snd_hda_codec_realtek hdaudioC1D0:      Internal Mic=0x12
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
Alexander Sergeyev Feb. 5, 2022, 5:51 p.m. UTC | #13
On Mon, Jan 31, 2022 at 03:57:04PM +0100, Takashi Iwai wrote:
> In anyway, we need to track down exactly which access triggers those 
> errors...

I went deeper into codec reads and writes:
- snd_hda_codec_write
- snd_hdac_codec_write
- codec_write
- snd_hdac_exec_verb
- codec_exec_verb
- snd_hdac_bus_exec_verb_unlocked
- azx_send_cmd / azx_get_response
- snd_hdac_bus_send_cmd / azx_rirb_get_response

In the last functions a circular buffer is used to write commands. The 
problem is that "bus->corb.buf[wp]" and "bus->rirb.res[addr]" are 
nowhere close to the IOMMU-reported address of the offending memory 
access. It's likely that I've missed other communication channels. But 
is it possible that IOMMU-reported address and buffers addresses are of 
different kinds (physical/virtual) or different regions mapped to the 
same physical pages?

Example:
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x3b8000, wp=0xfb, &buf[wp]=00000000f1fd4592
snd_hdac_bus_get_response: reading result from 0000000059c4003d
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x339000, wp=0xfc, &buf[wp]=000000007f14c128
snd_hdac_bus_get_response: reading result from 0000000059c4003d
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x1470740, wp=0xfd, &buf[wp]=00000000a6b14901
snd_hdac_bus_get_response: reading result from 0000000059c4003d
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14ba000, wp=0xfe, &buf[wp]=00000000d8d1672a
snd_hdac_bus_get_response: reading result from 0000000059c4003d
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14b8000, wp=0xff, &buf[wp]=00000000b87b3287
snd_hdac_bus_get_response: reading result from 0000000059c4003d
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2ba000, wp=0x0, &buf[wp]=000000002162c728
snd_hdac_bus_get_response: reading result from 0000000059c4003d
snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2b8000, wp=0x1, &buf[wp]=0000000095f61061
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
Takashi Iwai Feb. 7, 2022, 2:21 p.m. UTC | #14
On Sat, 05 Feb 2022 18:51:32 +0100,
Alexander Sergeyev wrote:
> 
> On Mon, Jan 31, 2022 at 03:57:04PM +0100, Takashi Iwai wrote:
> > In anyway, we need to track down exactly which access triggers those 
> > errors...
> 
> I went deeper into codec reads and writes:
> - snd_hda_codec_write
> - snd_hdac_codec_write
> - codec_write
> - snd_hdac_exec_verb
> - codec_exec_verb
> - snd_hdac_bus_exec_verb_unlocked
> - azx_send_cmd / azx_get_response
> - snd_hdac_bus_send_cmd / azx_rirb_get_response
> 
> In the last functions a circular buffer is used to write commands. The 
> problem is that "bus->corb.buf[wp]" and "bus->rirb.res[addr]" are 
> nowhere close to the IOMMU-reported address of the offending memory 
> access. It's likely that I've missed other communication channels. But 
> is it possible that IOMMU-reported address and buffers addresses are of 
> different kinds (physical/virtual) or different regions mapped to the 
> same physical pages?
> 
> Example:
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x3b8000, wp=0xfb, &buf[wp]=00000000f1fd4592
> snd_hdac_bus_get_response: reading result from 0000000059c4003d
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x339000, wp=0xfc, &buf[wp]=000000007f14c128
> snd_hdac_bus_get_response: reading result from 0000000059c4003d
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x1470740, wp=0xfd, &buf[wp]=00000000a6b14901
> snd_hdac_bus_get_response: reading result from 0000000059c4003d
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14ba000, wp=0xfe, &buf[wp]=00000000d8d1672a
> snd_hdac_bus_get_response: reading result from 0000000059c4003d
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14b8000, wp=0xff, &buf[wp]=00000000b87b3287
> snd_hdac_bus_get_response: reading result from 0000000059c4003d
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2ba000, wp=0x0, &buf[wp]=000000002162c728
> snd_hdac_bus_get_response: reading result from 0000000059c4003d
> snd_hdac_bus_send_cmd: bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2b8000, wp=0x1, &buf[wp]=0000000095f61061
> snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]

Hm, I'm not sure, either.  But let's try to avoid some possible
confusion at first, e.g. a patch like below.


Takashi

-- 8< --
diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
index f7bd6e2db085..074199aa73ea 100644
--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -618,7 +618,7 @@ int snd_hdac_bus_alloc_stream_pages(struct hdac_bus *bus)
 	if (WARN_ON(!num_streams))
 		return -EINVAL;
 	/* allocate memory for the position buffer */
-	err = snd_dma_alloc_pages(dma_type, bus->dev,
+	err = snd_dma_alloc_pages(SNDRV_DMA_TYPE_DEV, bus->dev,
 				  num_streams * 8, &bus->posbuf);
 	if (err < 0)
 		return -ENOMEM;
@@ -626,7 +626,7 @@ int snd_hdac_bus_alloc_stream_pages(struct hdac_bus *bus)
 		s->posbuf = (__le32 *)(bus->posbuf.area + s->index * 8);
 
 	/* single page (at least 4096 bytes) must suffice for both ringbuffes */
-	return snd_dma_alloc_pages(dma_type, bus->dev, PAGE_SIZE, &bus->rb);
+	return snd_dma_alloc_pages(SNDRV_DMA_TYPE_DEV, bus->dev, PAGE_SIZE, &bus->rb);
 }
 EXPORT_SYMBOL_GPL(snd_hdac_bus_alloc_stream_pages);
Takashi Iwai Feb. 8, 2022, 2:36 p.m. UTC | #15
On Mon, 31 Jan 2022 15:57:04 +0100,
Takashi Iwai wrote:
> 
> On Sun, 30 Jan 2022 12:10:20 +0100,
> Alexander Sergeyev wrote:
> > What is also interesting, unbind & bind consistently fails on 31th bind:
> > 
> > echo -n '0000:05:00.6' > /sys/bus/pci/drivers/snd_hda_intel/bind
> > -bash: echo: write error: No such device
> > 
> > And does not recover from there until a reboot.
> 
> This is intended behavior.  The driver has a static device index that
> is incremented at each probe, so that the driver may probe multiple
> instances.  It'll be tricky to reset this for dynamic binding,
> though.  This problem is not only for HD-audio but for most of other
> drivers.  But I leave this as is for now, since the dynamic binding is
> rarely used for PCI and other buses, so far.

... and here is a fix patch for allowing more rebinds.
Give it a try.


Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: hda: Fix driver index handling at re-binding

HD-audio driver handles the multiple instances and keeps the static
index that is incremented at each probe.  This becomes a problem when
user tries to re-bind the device via sysfs multiple times; as the
device index isn't cleared unlike rmmod case, it points to the next
element at re-binding, and eventually later you can't probe any more
when it reaches to SNDRV_CARDS_MAX (usually 32).

This patch is an attempt to improve the handling at rebinding.
Instead of a static device index, now we keep a bitmap and assigns to
the first zero bit position.  At the driver remove, in return, the
bitmap slot is cleared again, so that it'll be available for the next
probe.

Reported-by: Alexander Sergeyev <sergeev917@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_intel.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4b0338c4c543..a2922233e85f 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2064,14 +2064,16 @@ static const struct hda_controller_ops pci_hda_ops = {
 	.position_check = azx_position_check,
 };
 
+static DECLARE_BITMAP(probed_devs, SNDRV_CARDS);
+
 static int azx_probe(struct pci_dev *pci,
 		     const struct pci_device_id *pci_id)
 {
-	static int dev;
 	struct snd_card *card;
 	struct hda_intel *hda;
 	struct azx *chip;
 	bool schedule_probe;
+	int dev;
 	int err;
 
 	if (pci_match_id(driver_denylist, pci)) {
@@ -2079,10 +2081,11 @@ static int azx_probe(struct pci_dev *pci,
 		return -ENODEV;
 	}
 
+	dev = find_first_zero_bit(probed_devs, SNDRV_CARDS);
 	if (dev >= SNDRV_CARDS)
 		return -ENODEV;
 	if (!enable[dev]) {
-		dev++;
+		set_bit(dev, probed_devs);
 		return -ENOENT;
 	}
 
@@ -2149,7 +2152,7 @@ static int azx_probe(struct pci_dev *pci,
 	if (schedule_probe)
 		schedule_delayed_work(&hda->probe_work, 0);
 
-	dev++;
+	set_bit(dev, probed_devs);
 	if (chip->disabled)
 		complete_all(&hda->probe_wait);
 	return 0;
@@ -2372,6 +2375,7 @@ static void azx_remove(struct pci_dev *pci)
 		cancel_delayed_work_sync(&hda->probe_work);
 		device_lock(&pci->dev);
 
+		clear_bit(chip->dev_index, probed_devs);
 		pci_set_drvdata(pci, NULL);
 		snd_card_free(card);
 	}
Alexander Sergeyev Feb. 8, 2022, 7:49 p.m. UTC | #16
On Mon, Feb 07, 2022 at 03:21:58PM +0100, Takashi Iwai wrote:
> > In the last functions a circular buffer is used to write commands. The 
> > problem is that "bus->corb.buf[wp]" and "bus->rirb.res[addr]" are nowhere 
> > close to the IOMMU-reported address of the offending memory access. It's 
> > likely that I've missed other communication channels. But is it possible 
> > that IOMMU-reported address and buffers addresses are of different kinds 
> > (physical/virtual) or different regions mapped to the same physical pages?
> Hm, I'm not sure, either.  But let's try to avoid some possible
> confusion at first, e.g. a patch like below.

No changes with the patch applied. Also, I added logs for dma_type used:

snd_hdac_bus_alloc_stream_pages: dma_type = 2
snd_hdac_bus_alloc_stream_pages: dma_type = 2
snd_hdac_bus_alloc_stream_pages: dma_type = 2

Which matches SNDRV_DMA_TYPE_DEV, so the same behavior is expected.

I've noticed that the IO_PAGE_FAULT regularly comes shortly after the write 
position overflows and restarts from 0, while after the driver binding the wp 
starts from 1 and not 0. Correlation does not mean causation, through. A 
similar overflow happens during the initial kernel bootup with no error 
messages. An another way of looking on it -- the fault comes on wp=0x1, which 
corresponds to the first re-used address in the buffer.

bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14ba000, wp=0xfe, &buf[wp]=000000005b92167d
snd_hdac_bus_get_response: reading result from 0000000096c36d67
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14b8000, wp=0xff, &buf[wp]=00000000a91a3679
snd_hdac_bus_get_response: reading result from 0000000096c36d67
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2ba000, wp=0x0, &buf[wp]=000000002fda9222
snd_hdac_bus_get_response: reading result from 0000000096c36d67
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2b8000, wp=0x1, &buf[wp]=000000009747a629
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]

And I finally got "out of range cmd", but logging is limited to IO addresses.

bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14ba000, wp=0xfe, &buf[wp]=0000000036a02eae
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x14b8000, wp=0xff, &buf[wp]=00000000ce140303
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2ba000, wp=0x0, &buf[wp]=000000004c6aa283
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x2b8000, wp=0x1, &buf[wp]=000000002a825cc8
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x239000, wp=0x2, &buf[wp]=0000000078eca2cf
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x1970724, wp=0x3, &buf[wp]=00000000613071da
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x1270720, wp=0x4, &buf[wp]=000000006db33d93
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff800 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff820 flags=0x0020]
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x8b2000, wp=0x5, &buf[wp]=000000002a3c7e90
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff820 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x836080, wp=0x6, &buf[wp]=00000000571d53bf
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x8b0000, wp=0x7, &buf[wp]=000000000a52a2af
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff820 flags=0x0020]
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff820 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x835080, wp=0x8, &buf[wp]=00000000f139c302
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff840 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x23f000d, wp=0x9, &buf[wp]=000000003c565021
snd_hda_intel 0000:05:00.6: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0x1fffff840 flags=0x0020]
snd_hdac_bus_get_response: reading result from 0000000037aa0724
...
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x205000b, wp=0x3e, &buf[wp]=000000002ce016ac
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x20c0000, wp=0x3f, &buf[wp]=000000003ad48d6f
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x8350a7, wp=0x40, &buf[wp]=0000000098c2fb2d
snd_hdac_bus_get_response: reading result from 0000000037aa0724
bus->corb.buf[wp] = cpu_to_le32(val) // = 0x205000b, wp=0x41, &buf[wp]=000000006e281f5b
snd_hdac_bus_get_response: reading result from 0000000037aa0724
snd_hda_codec_realtek hdaudioC1D0: out of range cmd 0:20:400:40600001
Alexander Sergeyev Feb. 8, 2022, 7:52 p.m. UTC | #17
On Tue, Feb 08, 2022 at 03:36:08PM +0100, Takashi Iwai wrote:
> ... and here is a fix patch for allowing more rebinds.
> Give it a try.

It works, no problems with large numbers of rebinds.

> -- 8< --
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] ALSA: hda: Fix driver index handling at re-binding
> 
> HD-audio driver handles the multiple instances and keeps the static
> index that is incremented at each probe.  This becomes a problem when
> user tries to re-bind the device via sysfs multiple times; as the
> device index isn't cleared unlike rmmod case, it points to the next
> element at re-binding, and eventually later you can't probe any more
> when it reaches to SNDRV_CARDS_MAX (usually 32).
> 
> This patch is an attempt to improve the handling at rebinding.
> Instead of a static device index, now we keep a bitmap and assigns to
> the first zero bit position.  At the driver remove, in return, the
> bitmap slot is cleared again, so that it'll be available for the next
> probe.
> 
> Reported-by: Alexander Sergeyev <sergeev917@gmail.com>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/pci/hda/hda_intel.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 4b0338c4c543..a2922233e85f 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -2064,14 +2064,16 @@ static const struct hda_controller_ops pci_hda_ops = {
>  	.position_check = azx_position_check,
>  };
>  
> +static DECLARE_BITMAP(probed_devs, SNDRV_CARDS);
> +
>  static int azx_probe(struct pci_dev *pci,
>  		     const struct pci_device_id *pci_id)
>  {
> -	static int dev;
>  	struct snd_card *card;
>  	struct hda_intel *hda;
>  	struct azx *chip;
>  	bool schedule_probe;
> +	int dev;
>  	int err;
>  
>  	if (pci_match_id(driver_denylist, pci)) {
> @@ -2079,10 +2081,11 @@ static int azx_probe(struct pci_dev *pci,
>  		return -ENODEV;
>  	}
>  
> +	dev = find_first_zero_bit(probed_devs, SNDRV_CARDS);
>  	if (dev >= SNDRV_CARDS)
>  		return -ENODEV;
>  	if (!enable[dev]) {
> -		dev++;
> +		set_bit(dev, probed_devs);
>  		return -ENOENT;
>  	}
>  
> @@ -2149,7 +2152,7 @@ static int azx_probe(struct pci_dev *pci,
>  	if (schedule_probe)
>  		schedule_delayed_work(&hda->probe_work, 0);
>  
> -	dev++;
> +	set_bit(dev, probed_devs);
>  	if (chip->disabled)
>  		complete_all(&hda->probe_wait);
>  	return 0;
> @@ -2372,6 +2375,7 @@ static void azx_remove(struct pci_dev *pci)
>  		cancel_delayed_work_sync(&hda->probe_work);
>  		device_lock(&pci->dev);
>  
> +		clear_bit(chip->dev_index, probed_devs);
>  		pci_set_drvdata(pci, NULL);
>  		snd_card_free(card);
>  	}
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 552e2cb73291..9d68f591c6bf 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -8291,6 +8291,7 @@  static const struct snd_pci_quirk alc269_fixup_tbl[] = {
 	SND_PCI_QUIRK(0x103c, 0x87f7, "HP Spectre x360 14", ALC245_FIXUP_HP_X360_AMP),
 	SND_PCI_QUIRK(0x103c, 0x8846, "HP EliteBook 850 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
 	SND_PCI_QUIRK(0x103c, 0x884c, "HP EliteBook 840 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
+	SND_PCI_QUIRK(0x103c, 0x8896, "HP EliteBook 855 G8 Notebook PC", ALC285_FIXUP_HP_MUTE_LED),
 	SND_PCI_QUIRK(0x103c, 0x8898, "HP EliteBook 845 G8 Notebook PC", ALC285_FIXUP_HP_LIMIT_INT_MIC_BOOST),
 	SND_PCI_QUIRK(0x1043, 0x103e, "ASUS X540SA", ALC256_FIXUP_ASUS_MIC),
 	SND_PCI_QUIRK(0x1043, 0x103f, "ASUS TX300", ALC282_FIXUP_ASUS_TX300),