diff mbox series

[PATCHv6] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes

Message ID 20220310032636.7286-1-quic_saipraka@quicinc.com
State New
Headers show
Series [PATCHv6] tty: hvc: dcc: Bind driver to CPU core0 for reads and writes | expand

Commit Message

Sai Prakash Ranjan March 10, 2022, 3:26 a.m. UTC
From: Shanker Donthineni <shankerd@codeaurora.org>

Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
reads/writes from/to DCC on secondary cores. Each core has its
own DCC device registers, so when a core reads or writes from/to DCC,
it only accesses its own DCC device. Since kernel code can run on
any core, every time the kernel wants to write to the console, it
might write to a different DCC.

In SMP mode, Trace32 creates multiple windows, and each window shows
the DCC output only from that core's DCC. The result is that console
output is either lost or scattered across windows.

Selecting this option will enable code that serializes all console
input and output to core 0. The DCC driver will create input and
output FIFOs that all cores will use. Reads and writes from/to DCC
are handled by a workqueue that runs only core 0.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Elliot Berman <eberman@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
---

Changes in v6:
 * Disable CPU hotplug when CONFIG_HVC_DCC_SERIALIZE_SMP=y.

Changes in v5:
 * Use get_cpu() and put_cpu() for CPU id check in preemptible context.
 * Revert back to build time Kconfig.
 * Remove unnecessary hotplug locks, they result in sleeping in atomic context bugs.
 * Add a comment for the spinlock.

Changes in v4:
 * Use module parameter for runtime choice of enabling this feature.
 * Use hotplug locks to avoid race between cpu online check and work schedule.
 * Remove ifdefs and move to common ops.
 * Remove unnecessary check for this configuration.
 * Use macros for buf size instead of magic numbers.
 * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/

Changes in v3:
 * Handle case where core0 is not online.

Changes in v2:
 * Checkpatch warning fixes.
 * Use of IS_ENABLED macros instead of ifdefs.

---
 drivers/tty/hvc/Kconfig   |  20 +++++
 drivers/tty/hvc/hvc_dcc.c | 175 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 192 insertions(+), 3 deletions(-)


base-commit: 6705cd745adbbeac6b13002c7a30060f7b2568a5

Comments

Greg Kroah-Hartman April 8, 2022, 11:36 a.m. UTC | #1
On Fri, Apr 08, 2022 at 04:52:35PM +0530, Sai Prakash Ranjan wrote:
> Hi,
> 
> On 3/10/2022 8:56 AM, Sai Prakash Ranjan wrote:
> > From: Shanker Donthineni <shankerd@codeaurora.org>
> > 
> > Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> > reads/writes from/to DCC on secondary cores. Each core has its
> > own DCC device registers, so when a core reads or writes from/to DCC,
> > it only accesses its own DCC device. Since kernel code can run on
> > any core, every time the kernel wants to write to the console, it
> > might write to a different DCC.
> > 
> > In SMP mode, Trace32 creates multiple windows, and each window shows
> > the DCC output only from that core's DCC. The result is that console
> > output is either lost or scattered across windows.
> > 
> > Selecting this option will enable code that serializes all console
> > input and output to core 0. The DCC driver will create input and
> > output FIFOs that all cores will use. Reads and writes from/to DCC
> > are handled by a workqueue that runs only core 0.
> > 
> > Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> > Acked-by: Adam Wallis <awallis@codeaurora.org>
> > Signed-off-by: Timur Tabi <timur@codeaurora.org>
> > Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> > Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> > ---
> > 
> > Changes in v6:
> >   * Disable CPU hotplug when CONFIG_HVC_DCC_SERIALIZE_SMP=y.
> > 
> > Changes in v5:
> >   * Use get_cpu() and put_cpu() for CPU id check in preemptible context.
> >   * Revert back to build time Kconfig.
> >   * Remove unnecessary hotplug locks, they result in sleeping in atomic context bugs.
> >   * Add a comment for the spinlock.
> > 
> > Changes in v4:
> >   * Use module parameter for runtime choice of enabling this feature.
> >   * Use hotplug locks to avoid race between cpu online check and work schedule.
> >   * Remove ifdefs and move to common ops.
> >   * Remove unnecessary check for this configuration.
> >   * Use macros for buf size instead of magic numbers.
> >   * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
> > 
> > Changes in v3:
> >   * Handle case where core0 is not online.
> > 
> > Changes in v2:
> >   * Checkpatch warning fixes.
> >   * Use of IS_ENABLED macros instead of ifdefs.
> > 
> > ---
> >   drivers/tty/hvc/Kconfig   |  20 +++++
> >   drivers/tty/hvc/hvc_dcc.c | 175 +++++++++++++++++++++++++++++++++++++-
> >   2 files changed, 192 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
> > index 8d60e0ff67b4..62560cd0c04d 100644
> > --- a/drivers/tty/hvc/Kconfig
> > +++ b/drivers/tty/hvc/Kconfig
> > @@ -87,6 +87,26 @@ config HVC_DCC
> >   	  driver. This console is used through a JTAG only on ARM. If you don't have
> >   	  a JTAG then you probably don't want this option.
> > +config HVC_DCC_SERIALIZE_SMP
> > +	bool "Use DCC only on CPU core 0"
> > +	depends on SMP && HVC_DCC
> > +	help
> > +	  Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> > +	  reads/writes from/to DCC on more than one CPU core. Each core has its
> > +	  own DCC device registers, so when a CPU core reads or writes from/to
> > +	  DCC, it only accesses its own DCC device. Since kernel code can run on
> > +	  any CPU core, every time the kernel wants to write to the console, it
> > +	  might write to a different DCC.
> > +
> > +	  In SMP mode, Trace32 creates multiple windows, and each window shows
> > +	  the DCC output only from that core's DCC. The result is that console
> > +	  output is either lost or scattered across windows.
> > +
> > +	  Selecting this option will enable code that serializes all console
> > +	  input and output to CPU core 0. The DCC driver will create input and
> > +	  output FIFOs that all cores will use. Reads and writes from/to DCC
> > +	  are handled by a workqueue that runs only CPU core 0.
> > +
> >   config HVC_RISCV_SBI
> >   	bool "RISC-V SBI console support"
> >   	depends on RISCV_SBI_V01
> > diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> > index 8e0edb7d93fd..e87b82e873d7 100644
> > --- a/drivers/tty/hvc/hvc_dcc.c
> > +++ b/drivers/tty/hvc/hvc_dcc.c
> > @@ -2,9 +2,14 @@
> >   /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
> >   #include <linux/console.h>
> > +#include <linux/cpu.h>
> > +#include <linux/cpumask.h>
> >   #include <linux/init.h>
> > +#include <linux/kfifo.h>
> >   #include <linux/serial.h>
> >   #include <linux/serial_core.h>
> > +#include <linux/smp.h>
> > +#include <linux/spinlock.h>
> >   #include <asm/dcc.h>
> >   #include <asm/processor.h>
> > @@ -15,6 +20,15 @@
> >   #define DCC_STATUS_RX		(1 << 30)
> >   #define DCC_STATUS_TX		(1 << 29)
> > +#define DCC_INBUF_SIZE		128
> > +#define DCC_OUTBUF_SIZE		1024
> > +
> > +/* Lock to serialize access to DCC fifo */
> > +static DEFINE_SPINLOCK(dcc_lock);
> > +
> > +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> > +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> > +
> >   static void dcc_uart_console_putchar(struct uart_port *port, int ch)
> >   {
> >   	while (__dcc_getstatus() & DCC_STATUS_TX)
> > @@ -67,24 +81,176 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
> >   	return i;
> >   }
> > +/*
> > + * Check if the DCC is enabled. If CONFIG_HVC_DCC_SERIALIZE_SMP is enabled,
> > + * then we assume then this function will be called first on core0. That way,
> > + * dcc_core0_available will be true only if it's available on core0.
> > + */
> >   static bool hvc_dcc_check(void)
> >   {
> >   	unsigned long time = jiffies + (HZ / 10);
> > +	static bool dcc_core0_available;
> > +
> > +	/*
> > +	 * If we're not on core 0, but we previously confirmed that DCC is
> > +	 * active, then just return true.
> > +	 */
> > +	int cpu = get_cpu();
> > +
> > +	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP) && cpu && dcc_core0_available) {
> > +		put_cpu();
> > +		return true;
> > +	}
> > +
> > +	put_cpu();
> >   	/* Write a test character to check if it is handled */
> >   	__dcc_putchar('\n');
> >   	while (time_is_after_jiffies(time)) {
> > -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> > +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> > +			dcc_core0_available = true;
> >   			return true;
> > +		}
> >   	}
> >   	return false;
> >   }
> > +/*
> > + * Workqueue function that writes the output FIFO to the DCC on core 0.
> > + */
> > +static void dcc_put_work(struct work_struct *work)
> > +{
> > +	unsigned char ch;
> > +	unsigned long irqflags;
> > +
> > +	spin_lock_irqsave(&dcc_lock, irqflags);
> > +
> > +	/* While there's data in the output FIFO, write it to the DCC */
> > +	while (kfifo_get(&outbuf, &ch))
> > +		hvc_dcc_put_chars(0, &ch, 1);
> > +
> > +	/* While we're at it, check for any input characters */
> > +	while (!kfifo_is_full(&inbuf)) {
> > +		if (!hvc_dcc_get_chars(0, &ch, 1))
> > +			break;
> > +		kfifo_put(&inbuf, ch);
> > +	}
> > +
> > +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> > +}
> > +
> > +static DECLARE_WORK(dcc_pwork, dcc_put_work);
> > +
> > +/*
> > + * Workqueue function that reads characters from DCC and puts them into the
> > + * input FIFO.
> > + */
> > +static void dcc_get_work(struct work_struct *work)
> > +{
> > +	unsigned char ch;
> > +	unsigned long irqflags;
> > +
> > +	/*
> > +	 * Read characters from DCC and put them into the input FIFO, as
> > +	 * long as there is room and we have characters to read.
> > +	 */
> > +	spin_lock_irqsave(&dcc_lock, irqflags);
> > +
> > +	while (!kfifo_is_full(&inbuf)) {
> > +		if (!hvc_dcc_get_chars(0, &ch, 1))
> > +			break;
> > +		kfifo_put(&inbuf, ch);
> > +	}
> > +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> > +}
> > +
> > +static DECLARE_WORK(dcc_gwork, dcc_get_work);
> > +
> > +/*
> > + * Write characters directly to the DCC if we're on core 0 and the FIFO
> > + * is empty, or write them to the FIFO if we're not.
> > + */
> > +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
> > +{
> > +	int len;
> > +	unsigned long irqflags;
> > +
> > +	if (!IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
> > +		return hvc_dcc_put_chars(vt, buf, count);
> > +
> > +	spin_lock_irqsave(&dcc_lock, irqflags);
> > +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
> > +		len = kfifo_in(&outbuf, buf, count);
> > +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> > +
> > +		/*
> > +		 * We just push data to the output FIFO, so schedule the
> > +		 * workqueue that will actually write that data to DCC.
> > +		 * CPU hotplug is disabled so CPU0 cannot be offlined
> > +		 * after the cpu online check.
> > +		 */
> > +		if (cpu_online(0))
> > +			schedule_work_on(0, &dcc_pwork);
> > +
> > +		return len;
> > +	}
> > +
> > +	/*
> > +	 * If we're already on core 0, and the FIFO is empty, then just
> > +	 * write the data to DCC.
> > +	 */
> > +	len = hvc_dcc_put_chars(vt, buf, count);
> > +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> > +
> > +	return len;
> > +}
> > +
> > +/*
> > + * Read characters directly from the DCC if we're on core 0 and the FIFO
> > + * is empty, or read them from the FIFO if we're not.
> > + */
> > +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
> > +{
> > +	int len;
> > +	unsigned long irqflags;
> > +
> > +	if (!IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
> > +		return hvc_dcc_get_chars(vt, buf, count);
> > +
> > +	spin_lock_irqsave(&dcc_lock, irqflags);
> > +
> > +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
> > +		len = kfifo_out(&inbuf, buf, count);
> > +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> > +
> > +		/*
> > +		 * If the FIFO was empty, there may be characters in the DCC
> > +		 * that we haven't read yet.  Schedule a workqueue to fill
> > +		 * the input FIFO, so that the next time this function is
> > +		 * called, we'll have data. CPU hotplug is disabled so CPU0
> > +		 * cannot be offlined after the cpu online check.
> > +		 */
> > +		if (!len && cpu_online(0))
> > +			schedule_work_on(0, &dcc_gwork);
> > +
> > +		return len;
> > +	}
> > +
> > +	/*
> > +	 * If we're already on core 0, and the FIFO is empty, then just
> > +	 * read the data from DCC.
> > +	 */
> > +	len = hvc_dcc_get_chars(vt, buf, count);
> > +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> > +
> > +	return len;
> > +}
> > +
> >   static const struct hv_ops hvc_dcc_get_put_ops = {
> > -	.get_chars = hvc_dcc_get_chars,
> > -	.put_chars = hvc_dcc_put_chars,
> > +	.get_chars = hvc_dcc0_get_chars,
> > +	.put_chars = hvc_dcc0_put_chars,
> >   };
> >   static int __init hvc_dcc_console_init(void)
> > @@ -108,6 +274,9 @@ static int __init hvc_dcc_init(void)
> >   	if (!hvc_dcc_check())
> >   		return -ENODEV;
> > +	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
> > +		cpu_hotplug_disable();
> > +
> >   	p = hvc_alloc(0, 0, &hvc_dcc_get_put_ops, 128);
> >   	return PTR_ERR_OR_ZERO(p);
> > 
> > base-commit: 6705cd745adbbeac6b13002c7a30060f7b2568a5
> 
> Gentle ping to take a look at this version.

It has been just a few days since the merge window closed.

And here is my to-review queue:
	$ mdfrm -c ~/mail/todo/
	2211 messages in /home/gregkh/mail/todo/

Please help out and review other patches that have been submitted on the
lists to help make this go faster for everyone involved.

thanks,

greg k-h
Greg Kroah-Hartman April 15, 2022, 6:25 a.m. UTC | #2
On Thu, Mar 10, 2022 at 08:56:36AM +0530, Sai Prakash Ranjan wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> reads/writes from/to DCC on secondary cores. Each core has its
> own DCC device registers, so when a core reads or writes from/to DCC,
> it only accesses its own DCC device. Since kernel code can run on
> any core, every time the kernel wants to write to the console, it
> might write to a different DCC.
> 
> In SMP mode, Trace32 creates multiple windows, and each window shows
> the DCC output only from that core's DCC. The result is that console
> output is either lost or scattered across windows.
> 
> Selecting this option will enable code that serializes all console
> input and output to core 0. The DCC driver will create input and
> output FIFOs that all cores will use. Reads and writes from/to DCC
> are handled by a workqueue that runs only core 0.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Acked-by: Adam Wallis <awallis@codeaurora.org>
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
> ---
> 
> Changes in v6:
>  * Disable CPU hotplug when CONFIG_HVC_DCC_SERIALIZE_SMP=y.
> 
> Changes in v5:
>  * Use get_cpu() and put_cpu() for CPU id check in preemptible context.
>  * Revert back to build time Kconfig.
>  * Remove unnecessary hotplug locks, they result in sleeping in atomic context bugs.
>  * Add a comment for the spinlock.
> 
> Changes in v4:
>  * Use module parameter for runtime choice of enabling this feature.
>  * Use hotplug locks to avoid race between cpu online check and work schedule.
>  * Remove ifdefs and move to common ops.
>  * Remove unnecessary check for this configuration.
>  * Use macros for buf size instead of magic numbers.
>  * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
> 
> Changes in v3:
>  * Handle case where core0 is not online.
> 
> Changes in v2:
>  * Checkpatch warning fixes.
>  * Use of IS_ENABLED macros instead of ifdefs.
> 
> ---
>  drivers/tty/hvc/Kconfig   |  20 +++++
>  drivers/tty/hvc/hvc_dcc.c | 175 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 192 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
> index 8d60e0ff67b4..62560cd0c04d 100644
> --- a/drivers/tty/hvc/Kconfig
> +++ b/drivers/tty/hvc/Kconfig
> @@ -87,6 +87,26 @@ config HVC_DCC
>  	  driver. This console is used through a JTAG only on ARM. If you don't have
>  	  a JTAG then you probably don't want this option.
>  
> +config HVC_DCC_SERIALIZE_SMP
> +	bool "Use DCC only on CPU core 0"
> +	depends on SMP && HVC_DCC
> +	help
> +	  Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
> +	  reads/writes from/to DCC on more than one CPU core. Each core has its
> +	  own DCC device registers, so when a CPU core reads or writes from/to
> +	  DCC, it only accesses its own DCC device. Since kernel code can run on
> +	  any CPU core, every time the kernel wants to write to the console, it
> +	  might write to a different DCC.
> +
> +	  In SMP mode, Trace32 creates multiple windows, and each window shows
> +	  the DCC output only from that core's DCC. The result is that console
> +	  output is either lost or scattered across windows.

Why are we documenting, and supporting, a closed source userspace tool
with kernel changes?  Does this advertisement deserve to be in the
kernel source tree?

And why can't they just fix their tool if this is such a big issue?  Why
does this only affect this one platform and not all other smp systems?

> +
> +	  Selecting this option will enable code that serializes all console
> +	  input and output to CPU core 0. The DCC driver will create input and
> +	  output FIFOs that all cores will use. Reads and writes from/to DCC
> +	  are handled by a workqueue that runs only CPU core 0.

No need to describe implementation details in a help text, right?  What
happens when we change this to not be a workqueue?



> +
>  config HVC_RISCV_SBI
>  	bool "RISC-V SBI console support"
>  	depends on RISCV_SBI_V01
> diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
> index 8e0edb7d93fd..e87b82e873d7 100644
> --- a/drivers/tty/hvc/hvc_dcc.c
> +++ b/drivers/tty/hvc/hvc_dcc.c
> @@ -2,9 +2,14 @@
>  /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */

No copyright update?

>  
>  #include <linux/console.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/init.h>
> +#include <linux/kfifo.h>
>  #include <linux/serial.h>
>  #include <linux/serial_core.h>
> +#include <linux/smp.h>
> +#include <linux/spinlock.h>
>  
>  #include <asm/dcc.h>
>  #include <asm/processor.h>
> @@ -15,6 +20,15 @@
>  #define DCC_STATUS_RX		(1 << 30)
>  #define DCC_STATUS_TX		(1 << 29)
>  
> +#define DCC_INBUF_SIZE		128
> +#define DCC_OUTBUF_SIZE		1024
> +
> +/* Lock to serialize access to DCC fifo */
> +static DEFINE_SPINLOCK(dcc_lock);
> +
> +static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
> +static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
> +
>  static void dcc_uart_console_putchar(struct uart_port *port, int ch)
>  {
>  	while (__dcc_getstatus() & DCC_STATUS_TX)
> @@ -67,24 +81,176 @@ static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
>  	return i;
>  }
>  
> +/*
> + * Check if the DCC is enabled. If CONFIG_HVC_DCC_SERIALIZE_SMP is enabled,
> + * then we assume then this function will be called first on core0. That way,
> + * dcc_core0_available will be true only if it's available on core0.
> + */
>  static bool hvc_dcc_check(void)
>  {
>  	unsigned long time = jiffies + (HZ / 10);
> +	static bool dcc_core0_available;
> +
> +	/*
> +	 * If we're not on core 0, but we previously confirmed that DCC is
> +	 * active, then just return true.
> +	 */
> +	int cpu = get_cpu();
> +
> +	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP) && cpu && dcc_core0_available) {
> +		put_cpu();
> +		return true;
> +	}
> +
> +	put_cpu();
>  
>  	/* Write a test character to check if it is handled */
>  	__dcc_putchar('\n');
>  
>  	while (time_is_after_jiffies(time)) {
> -		if (!(__dcc_getstatus() & DCC_STATUS_TX))
> +		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
> +			dcc_core0_available = true;
>  			return true;
> +		}
>  	}
>  
>  	return false;
>  }
>  
> +/*
> + * Workqueue function that writes the output FIFO to the DCC on core 0.
> + */
> +static void dcc_put_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	/* While there's data in the output FIFO, write it to the DCC */
> +	while (kfifo_get(&outbuf, &ch))
> +		hvc_dcc_put_chars(0, &ch, 1);
> +
> +	/* While we're at it, check for any input characters */
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_pwork, dcc_put_work);
> +
> +/*
> + * Workqueue function that reads characters from DCC and puts them into the
> + * input FIFO.
> + */
> +static void dcc_get_work(struct work_struct *work)
> +{
> +	unsigned char ch;
> +	unsigned long irqflags;
> +
> +	/*
> +	 * Read characters from DCC and put them into the input FIFO, as
> +	 * long as there is room and we have characters to read.
> +	 */
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	while (!kfifo_is_full(&inbuf)) {
> +		if (!hvc_dcc_get_chars(0, &ch, 1))
> +			break;
> +		kfifo_put(&inbuf, ch);
> +	}
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +}
> +
> +static DECLARE_WORK(dcc_gwork, dcc_get_work);
> +
> +/*
> + * Write characters directly to the DCC if we're on core 0 and the FIFO
> + * is empty, or write them to the FIFO if we're not.
> + */
> +static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
> +		return hvc_dcc_put_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
> +		len = kfifo_in(&outbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * We just push data to the output FIFO, so schedule the
> +		 * workqueue that will actually write that data to DCC.
> +		 * CPU hotplug is disabled so CPU0 cannot be offlined
> +		 * after the cpu online check.

How is cpu hotplug disabled here?

> +		 */
> +		if (cpu_online(0))

What happens if the cpu state changes right after checking it?

> +			schedule_work_on(0, &dcc_pwork);

What happens if cpu 0 never comes back?

> +
> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * write the data to DCC.
> +	 */
> +	len = hvc_dcc_put_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
> +/*
> + * Read characters directly from the DCC if we're on core 0 and the FIFO
> + * is empty, or read them from the FIFO if we're not.
> + */
> +static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
> +{
> +	int len;
> +	unsigned long irqflags;
> +
> +	if (!IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
> +		return hvc_dcc_get_chars(vt, buf, count);
> +
> +	spin_lock_irqsave(&dcc_lock, irqflags);
> +
> +	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
> +		len = kfifo_out(&inbuf, buf, count);
> +		spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +		/*
> +		 * If the FIFO was empty, there may be characters in the DCC
> +		 * that we haven't read yet.  Schedule a workqueue to fill
> +		 * the input FIFO, so that the next time this function is
> +		 * called, we'll have data. CPU hotplug is disabled so CPU0
> +		 * cannot be offlined after the cpu online check.

Again, how is cpu hotplug disabled?

> +		 */
> +		if (!len && cpu_online(0))
> +			schedule_work_on(0, &dcc_gwork);
> +
> +		return len;
> +	}
> +
> +	/*
> +	 * If we're already on core 0, and the FIFO is empty, then just
> +	 * read the data from DCC.
> +	 */
> +	len = hvc_dcc_get_chars(vt, buf, count);
> +	spin_unlock_irqrestore(&dcc_lock, irqflags);
> +
> +	return len;
> +}
> +
>  static const struct hv_ops hvc_dcc_get_put_ops = {
> -	.get_chars = hvc_dcc_get_chars,
> -	.put_chars = hvc_dcc_put_chars,
> +	.get_chars = hvc_dcc0_get_chars,
> +	.put_chars = hvc_dcc0_put_chars,
>  };
>  
>  static int __init hvc_dcc_console_init(void)
> @@ -108,6 +274,9 @@ static int __init hvc_dcc_init(void)
>  	if (!hvc_dcc_check())
>  		return -ENODEV;
>  
> +	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
> +		cpu_hotplug_disable();

Ah.  And you never enable it again?  This feels very very wrong.  Did
you just ensure that you can never use this option in a real system?

For this reason alone I can't take this change, as you just increased
the power consumption of any system with this enabled :(

greg k-h
Sai Prakash Ranjan April 28, 2022, 7:05 a.m. UTC | #3
On 4/28/2022 12:29 PM, Greg Kroah-Hartman wrote:
> On Thu, Apr 28, 2022 at 10:04:34AM +0530, Sai Prakash Ranjan wrote:
>> Hi Greg,
>>
>> On 4/15/2022 11:55 AM, Greg Kroah-Hartman wrote:
>>> On Thu, Mar 10, 2022 at 08:56:36AM +0530, Sai Prakash Ranjan wrote:
>>>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>>>
>>>> Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>>>> reads/writes from/to DCC on secondary cores. Each core has its
>>>> own DCC device registers, so when a core reads or writes from/to DCC,
>>>> it only accesses its own DCC device. Since kernel code can run on
>>>> any core, every time the kernel wants to write to the console, it
>>>> might write to a different DCC.
>>>>
>>>> In SMP mode, Trace32 creates multiple windows, and each window shows
>>>> the DCC output only from that core's DCC. The result is that console
>>>> output is either lost or scattered across windows.
>>>>
>>>> Selecting this option will enable code that serializes all console
>>>> input and output to core 0. The DCC driver will create input and
>>>> output FIFOs that all cores will use. Reads and writes from/to DCC
>>>> are handled by a workqueue that runs only core 0.
>>>>
>>>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>>>> Acked-by: Adam Wallis <awallis@codeaurora.org>
>>>> Signed-off-by: Timur Tabi <timur@codeaurora.org>
>>>> Signed-off-by: Elliot Berman <eberman@codeaurora.org>
>>>> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
>>>> ---
>>>>
>>>> Changes in v6:
>>>>    * Disable CPU hotplug when CONFIG_HVC_DCC_SERIALIZE_SMP=y.
>>>>
>>>> Changes in v5:
>>>>    * Use get_cpu() and put_cpu() for CPU id check in preemptible context.
>>>>    * Revert back to build time Kconfig.
>>>>    * Remove unnecessary hotplug locks, they result in sleeping in atomic context bugs.
>>>>    * Add a comment for the spinlock.
>>>>
>>>> Changes in v4:
>>>>    * Use module parameter for runtime choice of enabling this feature.
>>>>    * Use hotplug locks to avoid race between cpu online check and work schedule.
>>>>    * Remove ifdefs and move to common ops.
>>>>    * Remove unnecessary check for this configuration.
>>>>    * Use macros for buf size instead of magic numbers.
>>>>    * v3 - https://lore.kernel.org/lkml/20211213141013.21464-1-quic_saipraka@quicinc.com/
>>>>
>>>> Changes in v3:
>>>>    * Handle case where core0 is not online.
>>>>
>>>> Changes in v2:
>>>>    * Checkpatch warning fixes.
>>>>    * Use of IS_ENABLED macros instead of ifdefs.
>>>>
>>>> ---
>>>>    drivers/tty/hvc/Kconfig   |  20 +++++
>>>>    drivers/tty/hvc/hvc_dcc.c | 175 +++++++++++++++++++++++++++++++++++++-
>>>>    2 files changed, 192 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
>>>> index 8d60e0ff67b4..62560cd0c04d 100644
>>>> --- a/drivers/tty/hvc/Kconfig
>>>> +++ b/drivers/tty/hvc/Kconfig
>>>> @@ -87,6 +87,26 @@ config HVC_DCC
>>>>    	  driver. This console is used through a JTAG only on ARM. If you don't have
>>>>    	  a JTAG then you probably don't want this option.
>>>> +config HVC_DCC_SERIALIZE_SMP
>>>> +	bool "Use DCC only on CPU core 0"
>>>> +	depends on SMP && HVC_DCC
>>>> +	help
>>>> +	  Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
>>>> +	  reads/writes from/to DCC on more than one CPU core. Each core has its
>>>> +	  own DCC device registers, so when a CPU core reads or writes from/to
>>>> +	  DCC, it only accesses its own DCC device. Since kernel code can run on
>>>> +	  any CPU core, every time the kernel wants to write to the console, it
>>>> +	  might write to a different DCC.
>>>> +
>>>> +	  In SMP mode, Trace32 creates multiple windows, and each window shows
>>>> +	  the DCC output only from that core's DCC. The result is that console
>>>> +	  output is either lost or scattered across windows.
>>> Why are we documenting, and supporting, a closed source userspace tool
>>> with kernel changes?  Does this advertisement deserve to be in the
>>> kernel source tree?
>> Ok, I will remove the comment.
>>
>>> And why can't they just fix their tool if this is such a big issue?  Why
>>> does this only affect this one platform and not all other smp systems?
>> Hmm, this has been discussed in all the past versions of this series and still we
>> are at the same question :) I will write a small summary below which will cover
>> mostly relevant discussions we discussed till now and then I can point to it
>> whenever this question is asked again.
> No, it needs to go into the changelog text so that we know what we are
> reviewing and considering when you submit it.  Never refer back to some
> old conversation, how are we supposed to remember that?

True given the amount of patches you handle. I will be explicit and add more details in the next
version.

> So this all seems to be debugging-only code, and this config option
> should NEVER be turned on for a real system.  That makes much more
> sense, and is something that I don't recall anyone saying before.

Ah my bad, I thought it was known thing given the DCC driver was already
present.

> So make this very very explicit, both in the changelog, and in the
> Kconfig text, AND when the driver loads have it spit out a huge message
> in the kernel log saying that this is for debugging only and that no one
> should see this on a normal running system.  We have examples of other
> Kconfig options that do this at runtime, copy what they do so it's
> painfully obvious.  Like what is in clk_debug_init().

Sure, I will make these changes and post.

Thanks,
Sai
diff mbox series

Patch

diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
index 8d60e0ff67b4..62560cd0c04d 100644
--- a/drivers/tty/hvc/Kconfig
+++ b/drivers/tty/hvc/Kconfig
@@ -87,6 +87,26 @@  config HVC_DCC
 	  driver. This console is used through a JTAG only on ARM. If you don't have
 	  a JTAG then you probably don't want this option.
 
+config HVC_DCC_SERIALIZE_SMP
+	bool "Use DCC only on CPU core 0"
+	depends on SMP && HVC_DCC
+	help
+	  Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
+	  reads/writes from/to DCC on more than one CPU core. Each core has its
+	  own DCC device registers, so when a CPU core reads or writes from/to
+	  DCC, it only accesses its own DCC device. Since kernel code can run on
+	  any CPU core, every time the kernel wants to write to the console, it
+	  might write to a different DCC.
+
+	  In SMP mode, Trace32 creates multiple windows, and each window shows
+	  the DCC output only from that core's DCC. The result is that console
+	  output is either lost or scattered across windows.
+
+	  Selecting this option will enable code that serializes all console
+	  input and output to CPU core 0. The DCC driver will create input and
+	  output FIFOs that all cores will use. Reads and writes from/to DCC
+	  are handled by a workqueue that runs only CPU core 0.
+
 config HVC_RISCV_SBI
 	bool "RISC-V SBI console support"
 	depends on RISCV_SBI_V01
diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
index 8e0edb7d93fd..e87b82e873d7 100644
--- a/drivers/tty/hvc/hvc_dcc.c
+++ b/drivers/tty/hvc/hvc_dcc.c
@@ -2,9 +2,14 @@ 
 /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
 
 #include <linux/console.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <linux/init.h>
+#include <linux/kfifo.h>
 #include <linux/serial.h>
 #include <linux/serial_core.h>
+#include <linux/smp.h>
+#include <linux/spinlock.h>
 
 #include <asm/dcc.h>
 #include <asm/processor.h>
@@ -15,6 +20,15 @@ 
 #define DCC_STATUS_RX		(1 << 30)
 #define DCC_STATUS_TX		(1 << 29)
 
+#define DCC_INBUF_SIZE		128
+#define DCC_OUTBUF_SIZE		1024
+
+/* Lock to serialize access to DCC fifo */
+static DEFINE_SPINLOCK(dcc_lock);
+
+static DEFINE_KFIFO(inbuf, unsigned char, DCC_INBUF_SIZE);
+static DEFINE_KFIFO(outbuf, unsigned char, DCC_OUTBUF_SIZE);
+
 static void dcc_uart_console_putchar(struct uart_port *port, int ch)
 {
 	while (__dcc_getstatus() & DCC_STATUS_TX)
@@ -67,24 +81,176 @@  static int hvc_dcc_get_chars(uint32_t vt, char *buf, int count)
 	return i;
 }
 
+/*
+ * Check if the DCC is enabled. If CONFIG_HVC_DCC_SERIALIZE_SMP is enabled,
+ * then we assume then this function will be called first on core0. That way,
+ * dcc_core0_available will be true only if it's available on core0.
+ */
 static bool hvc_dcc_check(void)
 {
 	unsigned long time = jiffies + (HZ / 10);
+	static bool dcc_core0_available;
+
+	/*
+	 * If we're not on core 0, but we previously confirmed that DCC is
+	 * active, then just return true.
+	 */
+	int cpu = get_cpu();
+
+	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP) && cpu && dcc_core0_available) {
+		put_cpu();
+		return true;
+	}
+
+	put_cpu();
 
 	/* Write a test character to check if it is handled */
 	__dcc_putchar('\n');
 
 	while (time_is_after_jiffies(time)) {
-		if (!(__dcc_getstatus() & DCC_STATUS_TX))
+		if (!(__dcc_getstatus() & DCC_STATUS_TX)) {
+			dcc_core0_available = true;
 			return true;
+		}
 	}
 
 	return false;
 }
 
+/*
+ * Workqueue function that writes the output FIFO to the DCC on core 0.
+ */
+static void dcc_put_work(struct work_struct *work)
+{
+	unsigned char ch;
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	/* While there's data in the output FIFO, write it to the DCC */
+	while (kfifo_get(&outbuf, &ch))
+		hvc_dcc_put_chars(0, &ch, 1);
+
+	/* While we're at it, check for any input characters */
+	while (!kfifo_is_full(&inbuf)) {
+		if (!hvc_dcc_get_chars(0, &ch, 1))
+			break;
+		kfifo_put(&inbuf, ch);
+	}
+
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+}
+
+static DECLARE_WORK(dcc_pwork, dcc_put_work);
+
+/*
+ * Workqueue function that reads characters from DCC and puts them into the
+ * input FIFO.
+ */
+static void dcc_get_work(struct work_struct *work)
+{
+	unsigned char ch;
+	unsigned long irqflags;
+
+	/*
+	 * Read characters from DCC and put them into the input FIFO, as
+	 * long as there is room and we have characters to read.
+	 */
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	while (!kfifo_is_full(&inbuf)) {
+		if (!hvc_dcc_get_chars(0, &ch, 1))
+			break;
+		kfifo_put(&inbuf, ch);
+	}
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+}
+
+static DECLARE_WORK(dcc_gwork, dcc_get_work);
+
+/*
+ * Write characters directly to the DCC if we're on core 0 and the FIFO
+ * is empty, or write them to the FIFO if we're not.
+ */
+static int hvc_dcc0_put_chars(u32 vt, const char *buf, int count)
+{
+	int len;
+	unsigned long irqflags;
+
+	if (!IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
+		return hvc_dcc_put_chars(vt, buf, count);
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+	if (smp_processor_id() || (!kfifo_is_empty(&outbuf))) {
+		len = kfifo_in(&outbuf, buf, count);
+		spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+		/*
+		 * We just push data to the output FIFO, so schedule the
+		 * workqueue that will actually write that data to DCC.
+		 * CPU hotplug is disabled so CPU0 cannot be offlined
+		 * after the cpu online check.
+		 */
+		if (cpu_online(0))
+			schedule_work_on(0, &dcc_pwork);
+
+		return len;
+	}
+
+	/*
+	 * If we're already on core 0, and the FIFO is empty, then just
+	 * write the data to DCC.
+	 */
+	len = hvc_dcc_put_chars(vt, buf, count);
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+	return len;
+}
+
+/*
+ * Read characters directly from the DCC if we're on core 0 and the FIFO
+ * is empty, or read them from the FIFO if we're not.
+ */
+static int hvc_dcc0_get_chars(u32 vt, char *buf, int count)
+{
+	int len;
+	unsigned long irqflags;
+
+	if (!IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
+		return hvc_dcc_get_chars(vt, buf, count);
+
+	spin_lock_irqsave(&dcc_lock, irqflags);
+
+	if (smp_processor_id() || (!kfifo_is_empty(&inbuf))) {
+		len = kfifo_out(&inbuf, buf, count);
+		spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+		/*
+		 * If the FIFO was empty, there may be characters in the DCC
+		 * that we haven't read yet.  Schedule a workqueue to fill
+		 * the input FIFO, so that the next time this function is
+		 * called, we'll have data. CPU hotplug is disabled so CPU0
+		 * cannot be offlined after the cpu online check.
+		 */
+		if (!len && cpu_online(0))
+			schedule_work_on(0, &dcc_gwork);
+
+		return len;
+	}
+
+	/*
+	 * If we're already on core 0, and the FIFO is empty, then just
+	 * read the data from DCC.
+	 */
+	len = hvc_dcc_get_chars(vt, buf, count);
+	spin_unlock_irqrestore(&dcc_lock, irqflags);
+
+	return len;
+}
+
 static const struct hv_ops hvc_dcc_get_put_ops = {
-	.get_chars = hvc_dcc_get_chars,
-	.put_chars = hvc_dcc_put_chars,
+	.get_chars = hvc_dcc0_get_chars,
+	.put_chars = hvc_dcc0_put_chars,
 };
 
 static int __init hvc_dcc_console_init(void)
@@ -108,6 +274,9 @@  static int __init hvc_dcc_init(void)
 	if (!hvc_dcc_check())
 		return -ENODEV;
 
+	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP))
+		cpu_hotplug_disable();
+
 	p = hvc_alloc(0, 0, &hvc_dcc_get_put_ops, 128);
 
 	return PTR_ERR_OR_ZERO(p);