diff mbox series

usb: typec: tcpci: fix NULL pointer issue on shared irq case

Message ID 20241211105953.1205343-1-xu.yang_2@nxp.com
State Superseded
Headers show
Series usb: typec: tcpci: fix NULL pointer issue on shared irq case | expand

Commit Message

Xu Yang Dec. 11, 2024, 10:59 a.m. UTC
The tcpci_irq() may meet below NULL pointer dereference issue:

[    2.641851] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
[    2.641951] status 0x1, 0x37f
[    2.650659] Mem abort info:
[    2.656490]   ESR = 0x0000000096000004
[    2.660230]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.665532]   SET = 0, FnV = 0
[    2.668579]   EA = 0, S1PTW = 0
[    2.671715]   FSC = 0x04: level 0 translation fault
[    2.676584] Data abort info:
[    2.679459]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    2.684936]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    2.689980]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    2.695284] [0000000000000010] user address but active_mm is swapper
[    2.701632] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[    2.707883] Modules linked in:
[    2.710936] CPU: 1 UID: 0 PID: 87 Comm: irq/111-2-0051 Not tainted 6.12.0-rc6-06316-g7f63786ad3d1-dirty #4
[    2.720570] Hardware name: NXP i.MX93 11X11 EVK board (DT)
[    2.726040] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    2.732989] pc : tcpci_irq+0x38/0x318
[    2.736647] lr : _tcpci_irq+0x14/0x20
[    2.740295] sp : ffff80008324bd30
[    2.743597] x29: ffff80008324bd70 x28: ffff800080107894 x27: ffff800082198f70
[    2.750721] x26: ffff0000050e6680 x25: ffff000004d172ac x24: ffff0000050f0000
[    2.757845] x23: ffff000004d17200 x22: 0000000000000001 x21: ffff0000050f0000
[    2.764969] x20: ffff000004d17200 x19: 0000000000000000 x18: 0000000000000001
[    2.772093] x17: 0000000000000000 x16: ffff80008183d8a0 x15: ffff00007fbab040
[    2.779217] x14: ffff00007fb918c0 x13: 0000000000000000 x12: 000000000000017a
[    2.786341] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff80008324bd00
[    2.793465] x8 : ffff0000050f0af0 x7 : ffff00007fbaa840 x6 : 0000000000000031
[    2.800589] x5 : 000000000000017a x4 : 0000000000000002 x3 : 0000000000000002
[    2.807713] x2 : ffff80008324bd3a x1 : 0000000000000010 x0 : 0000000000000000
[    2.814838] Call trace:
[    2.817273]  tcpci_irq+0x38/0x318
[    2.820583]  _tcpci_irq+0x14/0x20
[    2.823885]  irq_thread_fn+0x2c/0xa8
[    2.827456]  irq_thread+0x16c/0x2f4
[    2.830940]  kthread+0x110/0x114
[    2.834164]  ret_from_fork+0x10/0x20
[    2.837738] Code: f9426420 f9001fe0 d2800000 52800201 (f9400a60)

This may happen on shared irq case. Such as two Type-C ports share one
irq. After the first port finished tcpci_register_port(), it may trigger
interrupt. However, if the interrupt comes by chance the 2nd port finishes
devm_request_threaded_irq(), the 2nd port interrupt handler may be run at
first. Then the above issue may happen.

  devm_request_threaded_irq()
				<-- port1 irq comes
  disable_irq(client->irq);
  tcpci_register_port()

This will restore the logic to the state before commit (77e85107a771 "usb:
typec: tcpci: support edge irq").

Fixes: 77e85107a771 ("usb: typec: tcpci: support edge irq")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
---
 drivers/usb/typec/tcpm/tcpci.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Comments

Francesco Dolcini Dec. 11, 2024, 11:09 a.m. UTC | #1
Hello,

On Wed, Dec 11, 2024 at 06:59:53PM +0800, Xu Yang wrote:
> The tcpci_irq() may meet below NULL pointer dereference issue:
> 
> [    2.641851] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> [    2.641951] status 0x1, 0x37f
> [    2.650659] Mem abort info:
> [    2.656490]   ESR = 0x0000000096000004
> [    2.660230]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    2.665532]   SET = 0, FnV = 0
> [    2.668579]   EA = 0, S1PTW = 0
> [    2.671715]   FSC = 0x04: level 0 translation fault
> [    2.676584] Data abort info:
> [    2.679459]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [    2.684936]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [    2.689980]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [    2.695284] [0000000000000010] user address but active_mm is swapper
> [    2.701632] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [    2.707883] Modules linked in:
> [    2.710936] CPU: 1 UID: 0 PID: 87 Comm: irq/111-2-0051 Not tainted 6.12.0-rc6-06316-g7f63786ad3d1-dirty #4
> [    2.720570] Hardware name: NXP i.MX93 11X11 EVK board (DT)
> [    2.726040] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    2.732989] pc : tcpci_irq+0x38/0x318
> [    2.736647] lr : _tcpci_irq+0x14/0x20
> [    2.740295] sp : ffff80008324bd30
> [    2.743597] x29: ffff80008324bd70 x28: ffff800080107894 x27: ffff800082198f70
> [    2.750721] x26: ffff0000050e6680 x25: ffff000004d172ac x24: ffff0000050f0000
> [    2.757845] x23: ffff000004d17200 x22: 0000000000000001 x21: ffff0000050f0000
> [    2.764969] x20: ffff000004d17200 x19: 0000000000000000 x18: 0000000000000001
> [    2.772093] x17: 0000000000000000 x16: ffff80008183d8a0 x15: ffff00007fbab040
> [    2.779217] x14: ffff00007fb918c0 x13: 0000000000000000 x12: 000000000000017a
> [    2.786341] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff80008324bd00
> [    2.793465] x8 : ffff0000050f0af0 x7 : ffff00007fbaa840 x6 : 0000000000000031
> [    2.800589] x5 : 000000000000017a x4 : 0000000000000002 x3 : 0000000000000002
> [    2.807713] x2 : ffff80008324bd3a x1 : 0000000000000010 x0 : 0000000000000000
> [    2.814838] Call trace:
> [    2.817273]  tcpci_irq+0x38/0x318
> [    2.820583]  _tcpci_irq+0x14/0x20
> [    2.823885]  irq_thread_fn+0x2c/0xa8
> [    2.827456]  irq_thread+0x16c/0x2f4
> [    2.830940]  kthread+0x110/0x114
> [    2.834164]  ret_from_fork+0x10/0x20
> [    2.837738] Code: f9426420 f9001fe0 d2800000 52800201 (f9400a60)
> 
> This may happen on shared irq case. Such as two Type-C ports share one
> irq. After the first port finished tcpci_register_port(), it may trigger
> interrupt. However, if the interrupt comes by chance the 2nd port finishes
> devm_request_threaded_irq(), the 2nd port interrupt handler may be run at
> first. Then the above issue may happen.
> 
>   devm_request_threaded_irq()
> 				<-- port1 irq comes
>   disable_irq(client->irq);
>   tcpci_register_port()
> 
> This will restore the logic to the state before commit (77e85107a771 "usb:
> typec: tcpci: support edge irq").
> 
> Fixes: 77e85107a771 ("usb: typec: tcpci: support edge irq")
> Cc: stable@vger.kernel.org
> Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
> ---
>  drivers/usb/typec/tcpm/tcpci.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/usb/typec/tcpm/tcpci.c b/drivers/usb/typec/tcpm/tcpci.c
> index 2f15734a5043..db42f4bf3632 100644
> --- a/drivers/usb/typec/tcpm/tcpci.c
> +++ b/drivers/usb/typec/tcpm/tcpci.c
> @@ -923,22 +923,20 @@ static int tcpci_probe(struct i2c_client *client)
>  
>  	chip->data.set_orientation = err;
>  
> +	chip->tcpci = tcpci_register_port(&client->dev, &chip->data);
> +	if (IS_ERR(chip->tcpci))
> +		return PTR_ERR(chip->tcpci);
> +
>  	err = devm_request_threaded_irq(&client->dev, client->irq, NULL,
>  					_tcpci_irq,
>  					IRQF_SHARED | IRQF_ONESHOT,
>  					dev_name(&client->dev), chip);

I do not think this is the correct fix, what about using
IRQF_NO_AUTOEN ? Would it solve the issue? You need to adjust also the
disable/enable irq accordingly.

Francesco
Francesco Dolcini Dec. 11, 2024, 11:51 a.m. UTC | #2
On Wed, Dec 11, 2024 at 07:40:51PM +0800, Xu Yang wrote:
> On Wed, Dec 11, 2024 at 12:09:28PM +0100, Francesco Dolcini wrote:
> > Hello,
> > 
> > On Wed, Dec 11, 2024 at 06:59:53PM +0800, Xu Yang wrote:
> > > The tcpci_irq() may meet below NULL pointer dereference issue:
> > > 
> > > [    2.641851] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > > [    2.641951] status 0x1, 0x37f
> > > [    2.650659] Mem abort info:
> > > [    2.656490]   ESR = 0x0000000096000004
> > > [    2.660230]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [    2.665532]   SET = 0, FnV = 0
> > > [    2.668579]   EA = 0, S1PTW = 0
> > > [    2.671715]   FSC = 0x04: level 0 translation fault
> > > [    2.676584] Data abort info:
> > > [    2.679459]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > > [    2.684936]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > [    2.689980]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > > [    2.695284] [0000000000000010] user address but active_mm is swapper
> > > [    2.701632] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > > [    2.707883] Modules linked in:
> > > [    2.710936] CPU: 1 UID: 0 PID: 87 Comm: irq/111-2-0051 Not tainted 6.12.0-rc6-06316-g7f63786ad3d1-dirty #4
> > > [    2.720570] Hardware name: NXP i.MX93 11X11 EVK board (DT)
> > > [    2.726040] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [    2.732989] pc : tcpci_irq+0x38/0x318
> > > [    2.736647] lr : _tcpci_irq+0x14/0x20
> > > [    2.740295] sp : ffff80008324bd30
> > > [    2.743597] x29: ffff80008324bd70 x28: ffff800080107894 x27: ffff800082198f70
> > > [    2.750721] x26: ffff0000050e6680 x25: ffff000004d172ac x24: ffff0000050f0000
> > > [    2.757845] x23: ffff000004d17200 x22: 0000000000000001 x21: ffff0000050f0000
> > > [    2.764969] x20: ffff000004d17200 x19: 0000000000000000 x18: 0000000000000001
> > > [    2.772093] x17: 0000000000000000 x16: ffff80008183d8a0 x15: ffff00007fbab040
> > > [    2.779217] x14: ffff00007fb918c0 x13: 0000000000000000 x12: 000000000000017a
> > > [    2.786341] x11: 0000000000000001 x10: 0000000000000a90 x9 : ffff80008324bd00
> > > [    2.793465] x8 : ffff0000050f0af0 x7 : ffff00007fbaa840 x6 : 0000000000000031
> > > [    2.800589] x5 : 000000000000017a x4 : 0000000000000002 x3 : 0000000000000002
> > > [    2.807713] x2 : ffff80008324bd3a x1 : 0000000000000010 x0 : 0000000000000000
> > > [    2.814838] Call trace:
> > > [    2.817273]  tcpci_irq+0x38/0x318
> > > [    2.820583]  _tcpci_irq+0x14/0x20
> > > [    2.823885]  irq_thread_fn+0x2c/0xa8
> > > [    2.827456]  irq_thread+0x16c/0x2f4
> > > [    2.830940]  kthread+0x110/0x114
> > > [    2.834164]  ret_from_fork+0x10/0x20
> > > [    2.837738] Code: f9426420 f9001fe0 d2800000 52800201 (f9400a60)
> > > 
> > > This may happen on shared irq case. Such as two Type-C ports share one
> > > irq. After the first port finished tcpci_register_port(), it may trigger
> > > interrupt. However, if the interrupt comes by chance the 2nd port finishes
> > > devm_request_threaded_irq(), the 2nd port interrupt handler may be run at
> > > first. Then the above issue may happen.
> > > 
> > >   devm_request_threaded_irq()
> > > 				<-- port1 irq comes
> > >   disable_irq(client->irq);
> > >   tcpci_register_port()
> > > 
> > > This will restore the logic to the state before commit (77e85107a771 "usb:
> > > typec: tcpci: support edge irq").
> > > 
> > > Fixes: 77e85107a771 ("usb: typec: tcpci: support edge irq")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
> > > ---
> > >  drivers/usb/typec/tcpm/tcpci.c | 18 ++++++++----------
> > >  1 file changed, 8 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/usb/typec/tcpm/tcpci.c b/drivers/usb/typec/tcpm/tcpci.c
> > > index 2f15734a5043..db42f4bf3632 100644
> > > --- a/drivers/usb/typec/tcpm/tcpci.c
> > > +++ b/drivers/usb/typec/tcpm/tcpci.c
> > > @@ -923,22 +923,20 @@ static int tcpci_probe(struct i2c_client *client)
> > >  
> > >  	chip->data.set_orientation = err;
> > >  
> > > +	chip->tcpci = tcpci_register_port(&client->dev, &chip->data);
> > > +	if (IS_ERR(chip->tcpci))
> > > +		return PTR_ERR(chip->tcpci);
> > > +
> > >  	err = devm_request_threaded_irq(&client->dev, client->irq, NULL,
> > >  					_tcpci_irq,
> > >  					IRQF_SHARED | IRQF_ONESHOT,
> > >  					dev_name(&client->dev), chip);
> > 
> > I do not think this is the correct fix, what about using
> > IRQF_NO_AUTOEN ? Would it solve the issue? You need to adjust also the
> > disable/enable irq accordingly.
> 
> Not work. Probe failed directly.
> 
> [    2.646391] tcpci 2-0050: probe with driver tcpci failed with error -22
> [    2.680086] tcpci 2-0051: probe with driver tcpci failed with error -22

Ok, clear. The reason is the shared IRQ.

I think your change will break the support for edge IRQ, because we have
interrupt generated calling tcpci_register_port and they will just be lost if
the interrupt is not already requested.

I think a better solution is needed.

Francesco
Xu Yang Dec. 11, 2024, 2:07 p.m. UTC | #3
On Wed, Dec 11, 2024 at 12:51:06PM +0100, Francesco Dolcini wrote:
> On Wed, Dec 11, 2024 at 07:40:51PM +0800, Xu Yang wrote:
> > On Wed, Dec 11, 2024 at 12:09:28PM +0100, Francesco Dolcini wrote:
> > > Hello,
> > > 
> > > On Wed, Dec 11, 2024 at 06:59:53PM +0800, Xu Yang wrote:
> > > > The tcpci_irq() may meet below NULL pointer dereference issue:
> > > > 
> > > > [    2.641851] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > > > [    2.641951] status 0x1, 0x37f
> > > > [    2.650659] Mem abort info:
> > > > [    2.656490]   ESR = 0x0000000096000004
> > > > [    2.660230]   EC = 0x25: DABT (current EL), IL = 32 bits

[...]

> > > 
> > > I do not think this is the correct fix, what about using
> > > IRQF_NO_AUTOEN ? Would it solve the issue? You need to adjust also the
> > > disable/enable irq accordingly.
> > 
> > Not work. Probe failed directly.
> > 
> > [    2.646391] tcpci 2-0050: probe with driver tcpci failed with error -22
> > [    2.680086] tcpci 2-0051: probe with driver tcpci failed with error -22
> 
> Ok, clear. The reason is the shared IRQ.
> 
> I think your change will break the support for edge IRQ, because we have
> interrupt generated calling tcpci_register_port and they will just be lost if
> the interrupt is not already requested.

Do you really meet issue where interrupts got lost?

I can't find the case because ALERT_MASK is set to 0 firstly, and
ALERT_MASK is assigned some value in tcpci_init() which should be
the last step of tcpm_register_port(). So the interrupts are masked
before tcpci_register_port() completed. That's say, typec chip can't
pull down ALERT line during this stage even though you don't call
disable_irq(). This behavior is suit for both level and edge type
interrupt.
Xu Yang Dec. 12, 2024, 4:09 a.m. UTC | #4
On Wed, Dec 11, 2024 at 10:07:55PM +0800, Xu Yang wrote:
> On Wed, Dec 11, 2024 at 12:51:06PM +0100, Francesco Dolcini wrote:
> > On Wed, Dec 11, 2024 at 07:40:51PM +0800, Xu Yang wrote:
> > > On Wed, Dec 11, 2024 at 12:09:28PM +0100, Francesco Dolcini wrote:
> > > > Hello,
> > > > 
> > > > On Wed, Dec 11, 2024 at 06:59:53PM +0800, Xu Yang wrote:
> > > > > The tcpci_irq() may meet below NULL pointer dereference issue:
> > > > > 
> > > > > [    2.641851] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > > > > [    2.641951] status 0x1, 0x37f
> > > > > [    2.650659] Mem abort info:
> > > > > [    2.656490]   ESR = 0x0000000096000004
> > > > > [    2.660230]   EC = 0x25: DABT (current EL), IL = 32 bits
> 
> [...]
> 
> > > > 
> > > > I do not think this is the correct fix, what about using
> > > > IRQF_NO_AUTOEN ? Would it solve the issue? You need to adjust also the
> > > > disable/enable irq accordingly.
> > > 
> > > Not work. Probe failed directly.
> > > 
> > > [    2.646391] tcpci 2-0050: probe with driver tcpci failed with error -22
> > > [    2.680086] tcpci 2-0051: probe with driver tcpci failed with error -22
> > 
> > Ok, clear. The reason is the shared IRQ.
> > 
> > I think your change will break the support for edge IRQ, because we have
> > interrupt generated calling tcpci_register_port and they will just be lost if
> > the interrupt is not already requested.
> 
> Do you really meet issue where interrupts got lost?
> 
> I can't find the case because ALERT_MASK is set to 0 firstly, and
> ALERT_MASK is assigned some value in tcpci_init() which should be
> the last step of tcpm_register_port(). So the interrupts are masked
> before tcpci_register_port() completed. That's say, typec chip can't
> pull down ALERT line during this stage even though you don't call
> disable_irq(). This behavior is suit for both level and edge type
> interrupt.

Well, above assumption is based on commit 77e85107a771 ("usb: typec:
tcpci: support edge irq"). I think the irq may be missed with this
patch. A better way may be to lookup ALERT and handle event before
probe() finishes.

Thanks,
Xu Yang
diff mbox series

Patch

diff --git a/drivers/usb/typec/tcpm/tcpci.c b/drivers/usb/typec/tcpm/tcpci.c
index 2f15734a5043..db42f4bf3632 100644
--- a/drivers/usb/typec/tcpm/tcpci.c
+++ b/drivers/usb/typec/tcpm/tcpci.c
@@ -923,22 +923,20 @@  static int tcpci_probe(struct i2c_client *client)
 
 	chip->data.set_orientation = err;
 
+	chip->tcpci = tcpci_register_port(&client->dev, &chip->data);
+	if (IS_ERR(chip->tcpci))
+		return PTR_ERR(chip->tcpci);
+
 	err = devm_request_threaded_irq(&client->dev, client->irq, NULL,
 					_tcpci_irq,
 					IRQF_SHARED | IRQF_ONESHOT,
 					dev_name(&client->dev), chip);
-	if (err < 0)
+	if (err < 0) {
+		tcpci_unregister_port(chip->tcpci);
 		return err;
+	}
 
-	/*
-	 * Disable irq while registering port. If irq is configured as an edge
-	 * irq this allow to keep track and process the irq as soon as it is enabled.
-	 */
-	disable_irq(client->irq);
-	chip->tcpci = tcpci_register_port(&client->dev, &chip->data);
-	enable_irq(client->irq);
-
-	return PTR_ERR_OR_ZERO(chip->tcpci);
+	return 0;
 }
 
 static void tcpci_remove(struct i2c_client *client)