[V3] block: fix conversion of GPT partition name to 7-bit

Message ID	20250305022154.3903128-1-ming.lei@redhat.com
State	New
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A185155300 for <linux-efi@vger.kernel.org>; Wed, 5 Mar 2025 02:22:24 +0000 (UTC) From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org Cc: linux-efi@vger.kernel.org, Olivier Gayot <olivier.gayot@canonical.com>, Mulhern <amulhern@redhat.com>, Davidlohr Bueso <dave@stgolabs.net>, stable@vger.kernel.org, Ming Lei <ming.lei@redhat.com> Subject: [PATCH V3] block: fix conversion of GPT partition name to 7-bit Date: Wed, 5 Mar 2025 10:21:54 +0800 Message-ID: <20250305022154.3903128-1-ming.lei@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	[V3] block: fix conversion of GPT partition name to 7-bit \| expand [V3] block: fix conversion of GPT partition name to 7-bit

Message ID

20250305022154.3903128-1-ming.lei@redhat.com

State

New

Headers

From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Cc: linux-efi@vger.kernel.org,
	Olivier Gayot <olivier.gayot@canonical.com>,
	Mulhern <amulhern@redhat.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	stable@vger.kernel.org,
	Ming Lei <ming.lei@redhat.com>
Subject: [PATCH V3] block: fix conversion of GPT partition name to 7-bit
Date: Wed,  5 Mar 2025 10:21:54 +0800
Message-ID: <20250305022154.3903128-1-ming.lei@redhat.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Series

[V3] block: fix conversion of GPT partition name to 7-bit | expand

Commit Message

Ming Lei March 5, 2025, 2:21 a.m. UTC

From: Olivier Gayot <olivier.gayot@canonical.com>

The utf16_le_to_7bit function claims to, naively, convert a UTF-16
string to a 7-bit ASCII string. By naively, we mean that it:
 * drops the first byte of every character in the original UTF-16 string
 * checks if all characters are printable, and otherwise replaces them
   by exclamation mark "!".

This means that theoretically, all characters outside the 7-bit ASCII
range should be replaced by another character. Examples:

 * lower-case alpha (ɒ) 0x0252 becomes 0x52 (R)
 * ligature OE (œ) 0x0153 becomes 0x53 (S)
 * hangul letter pieup (ㅂ) 0x3142 becomes 0x42 (B)
 * upper-case gamma (Ɣ) 0x0194 becomes 0x94 (not printable) so gets
   replaced by "!"

The result of this conversion for the GPT partition name is passed to
user-space as PARTNAME via udev, which is confusing and feels questionable.

However, there is a flaw in the conversion function itself. By dropping
one byte of each character and using isprint() to check if the remaining
byte corresponds to a printable character, we do not actually guarantee
that the resulting character is 7-bit ASCII.

This happens because we pass 8-bit characters to isprint(), which
in the kernel returns 1 for many values > 0x7f - as defined in ctype.c.

This results in many values which should be replaced by "!" to be kept
as-is, despite not being valid 7-bit ASCII. Examples:

 * e with acute accent (é) 0x00E9 becomes 0xE9 - kept as-is because
   isprint(0xE9) returns 1.
 * euro sign (€) 0x20AC becomes 0xAC - kept as-is because isprint(0xAC)
   returns 1.

This way has broken pyudev utility[1], fixes it by using a mask of 7 bits
instead of 8 bits before calling isprint.

Link: https://github.com/pyudev/pyudev/issues/490#issuecomment-2685794648 [1]
Link: https://lore.kernel.org/linux-block/4cac90c2-e414-4ebb-ae62-2a4589d9dc6e@canonical.com/
Cc: Mulhern <amulhern@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: stable@vger.kernel.org
Signed-off-by: Olivier Gayot <olivier.gayot@canonical.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V3:
	- userspace break words in commit log
	- Cc more list and guys

V2:
	- No change - resubmitted with subsystem maintainers in CC

 block/partitions/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ben Hutchings March 27, 2025, 10:34 p.m. UTC | #1

On Wed, 2025-03-05 at 10:21 +0800, Ming Lei wrote:
> From: Olivier Gayot <olivier.gayot@canonical.com>
> 
> The utf16_le_to_7bit function claims to, naively, convert a UTF-16
> string to a 7-bit ASCII string. By naively, we mean that it:
>  * drops the first byte of every character in the original UTF-16 string
>  * checks if all characters are printable, and otherwise replaces them
>    by exclamation mark "!".
> 
> This means that theoretically, all characters outside the 7-bit ASCII
> range should be replaced by another character. Examples:
> 
>  * lower-case alpha (ɒ) 0x0252 becomes 0x52 (R)
>  * ligature OE (œ) 0x0153 becomes 0x53 (S)
>  * hangul letter pieup (ㅂ) 0x3142 becomes 0x42 (B)
>  * upper-case gamma (Ɣ) 0x0194 becomes 0x94 (not printable) so gets
>    replaced by "!"

Also any character with low 8 bits equal to 0 terminates the string.

> The result of this conversion for the GPT partition name is passed to
> user-space as PARTNAME via udev, which is confusing and feels questionable.

Indeed.  But this change seems to make it worse!

[...]
> This results in many values which should be replaced by "!" to be kept
> as-is, despite not being valid 7-bit ASCII. Examples:
> 
>  * e with acute accent (é) 0x00E9 becomes 0xE9 - kept as-is because
>    isprint(0xE9) returns 1.
>
>  * euro sign (€) 0x20AC becomes 0xAC - kept as-is because isprint(0xAC)
>    returns 1.
[...]
> --- a/block/partitions/efi.c
> +++ b/block/partitions/efi.c
> @@ -682,7 +682,7 @@ static void utf16_le_to_7bit(const __le16 *in, unsigned int size, u8 *out)
>  	out[size] = 0;
>  
>  	while (i < size) {
> -		u8 c = le16_to_cpu(in[i]) & 0xff;
> +		u8 c = le16_to_cpu(in[i]) & 0x7f;
>  
>  		if (c && !isprint(c))
>  			c = '!';

Now we map 'é' to 'i' and '€' to ','.  Didn't we want to map them to
'!'?

We shouldn't mask the input character; instead we should do a range
check before calling isprint().  Something like:

	u16 uc = le16_to_cpu(in[i]);
	u8 c;

	if (uc < 0x80 && (uc == 0 || isprint(uc)))
		c = uc;
	else
		c = '!';

Ben.

Olivier Gayot March 28, 2025, 9:29 a.m. UTC | #2

> We shouldn't mask the input character; instead we should do a range
> check before calling isprint().  Something like:
> 
> 	u16 uc = le16_to_cpu(in[i]);
> 	u8 c;
> 
> 	if (uc < 0x80 && (uc == 0 || isprint(uc)))
> 		c = uc;
> 	else
> 		c = '!';

Sounds like a good alternative to me.

Would a conversion from utf-16-le to utf-8 be a viable solution?

Ben Hutchings March 28, 2025, 6:03 p.m. UTC | #3

On Fri, 2025-03-28 at 10:29 +0100, Olivier Gayot wrote:
> > We shouldn't mask the input character; instead we should do a range
> > check before calling isprint().  Something like:
> > 
> > 	u16 uc = le16_to_cpu(in[i]);
> > 	u8 c;
> > 
> > 	if (uc < 0x80 && (uc == 0 || isprint(uc)))
> > 		c = uc;
> > 	else
> > 		c = '!';
> 
> Sounds like a good alternative to me.
> 
> Would a conversion from utf-16-le to utf-8 be a viable solution?

If we were adding partition name support today I think UTF-8 conversion
would be reasonable, but I would guess that there are now consumers that
depend on the output being restricted to ASCII.

Also I think we would still want to replace non-printable code points,
and we don't have iswprint() in the kernel.

Ben.

David Laight March 28, 2025, 7:57 p.m. UTC | #4

On Thu, 27 Mar 2025 23:34:29 +0100
Ben Hutchings <ben@decadent.org.uk> wrote:

> On Wed, 2025-03-05 at 10:21 +0800, Ming Lei wrote:
> > From: Olivier Gayot <olivier.gayot@canonical.com>
> > 
> > The utf16_le_to_7bit function claims to, naively, convert a UTF-16
> > string to a 7-bit ASCII string. By naively, we mean that it:
> >  * drops the first byte of every character in the original UTF-16 string
> >  * checks if all characters are printable, and otherwise replaces them
> >    by exclamation mark "!".
> > 
> > This means that theoretically, all characters outside the 7-bit ASCII
> > range should be replaced by another character. Examples:
> > 
> >  * lower-case alpha (ɒ) 0x0252 becomes 0x52 (R)
> >  * ligature OE (œ) 0x0153 becomes 0x53 (S)
> >  * hangul letter pieup (ㅂ) 0x3142 becomes 0x42 (B)
> >  * upper-case gamma (Ɣ) 0x0194 becomes 0x94 (not printable) so gets
> >    replaced by "!"  
> 
> Also any character with low 8 bits equal to 0 terminates the string.
> 
> > The result of this conversion for the GPT partition name is passed to
> > user-space as PARTNAME via udev, which is confusing and feels questionable.  
> 
> Indeed.  But this change seems to make it worse!
> 
> [...]
> > This results in many values which should be replaced by "!" to be kept
> > as-is, despite not being valid 7-bit ASCII. Examples:
> > 
> >  * e with acute accent (é) 0x00E9 becomes 0xE9 - kept as-is because
> >    isprint(0xE9) returns 1.
> >
> >  * euro sign (€) 0x20AC becomes 0xAC - kept as-is because isprint(0xAC)
> >    returns 1.  
> [...]
> > --- a/block/partitions/efi.c
> > +++ b/block/partitions/efi.c
> > @@ -682,7 +682,7 @@ static void utf16_le_to_7bit(const __le16 *in, unsigned int size, u8 *out)
> >  	out[size] = 0;
> >  
> >  	while (i < size) {
> > -		u8 c = le16_to_cpu(in[i]) & 0xff;
> > +		u8 c = le16_to_cpu(in[i]) & 0x7f;
> >  
> >  		if (c && !isprint(c))
> >  			c = '!';  
> 
> Now we map 'é' to 'i' and '€' to ','.  Didn't we want to map them to
> '!'?
> 
> We shouldn't mask the input character; instead we should do a range
> check before calling isprint().  Something like:
> 
> 	u16 uc = le16_to_cpu(in[i]);
> 	u8 c;
> 
> 	if (uc < 0x80 && (uc == 0 || isprint(uc)))

Given that, for 7-bit ascii, isprint(uc) is equivalent to (uc >= 0x20 && uc <= 0x7e)
you can do:
	if (uc >= 0x20 && uc <= 0x7e)
		c = uc;
	else
		c = uc ? '!' : 0;

  David

> 		c = uc;
> 	else
> 		c = '!';
> 
> Ben.
>

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 5e9be13a56a8..7acba66eed48 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -682,7 +682,7 @@  static void utf16_le_to_7bit(const __le16 *in, unsigned int size, u8 *out)
 	out[size] = 0;
 
 	while (i < size) {
-		u8 c = le16_to_cpu(in[i]) & 0xff;
+		u8 c = le16_to_cpu(in[i]) & 0x7f;
 
 		if (c && !isprint(c))
 			c = '!';

[V3] block: fix conversion of GPT partition name to 7-bit

Commit Message

Comments

Patch