From patchwork Thu Apr 10 01:13:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880091 Received: from fout-a6-smtp.messagingengine.com (fout-a6-smtp.messagingengine.com [103.168.172.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 808D759B71; Thu, 10 Apr 2025 01:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.149 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247945; cv=none; b=a6bw+1HDo8vkEaAQkn89NWTavjHXoj8dHjAOffzIo0oze4cfWY5xzC3k6HwfvZMU5nnWpy4XhZSHXopqdgkSxOTRRckzP5KGL+62nUOKaJkBK0INJir78OH+GkpGNxPF7hQ6jajZ8AHgIcNzwPNluYF2C9JrGl88gd4RS8xJ3/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247945; c=relaxed/simple; bh=ft54p6bhTxDj+z0qJTAsmGd30EjVDhq/ynj+X5E4H/g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C9tEA/j5SIqi0GPz9wpL5gTjN2kMgUFYuZfWcHRJRedCkn7hoMwtDb/suOLssmd/JhI+AZqrfhKnwJWhHgAhQFtM7b98nZdvU/iNSuE4kwBkBkJh5duFAZ9FufojCjBW1erF8K0yxBIUje+B07Xp2bCSL1hjQF/pOEEc2Oz4Teo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=GYT7VLYQ; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=VzdpraBe; arc=none smtp.client-ip=103.168.172.149 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="GYT7VLYQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="VzdpraBe" Received: from phl-compute-09.internal (phl-compute-09.phl.internal [10.202.2.49]) by mailfout.phl.internal (Postfix) with ESMTP id 9D08813801DE; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-09.internal (MEProxy); Wed, 09 Apr 2025 21:19:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247942; x= 1744334342; bh=zNG1YKUsmyDP/ojfVRvL6wcBsNKK7C86m9dSyKIqDh8=; b=G YT7VLYQrEkgcvd7ErWD6swkG3DzXD+IJ8RlVU/KQL4F47YXMH9mbiKBUAAzpmfsf xFv7Q/Dm90XbQg+6zz1JaEQMwsjPDQjSBEwz4RcXl7kTQdcZZY5Xp7w7G48dLx29 oiyOWsL9+0N+eAXLLXlrrHrT+WgkLDClEj64DJvfNxirCKgUt0LEi2M5xJM4THoK Q+Rw350vPe7fLHrIOlvrvaVeIg921JwPqRN/mGZiHkgWujGOZIZ+dSTbmL15lBuE F8dEs5gOc7ziX990LDct74+ut0ZNHNfNYAOIcaHyhk36rXYF4gvCcSbV9GlTaKDM 9vCFyL21r8Gy+UTKO5/dw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247942; x=1744334342; bh=z NG1YKUsmyDP/ojfVRvL6wcBsNKK7C86m9dSyKIqDh8=; b=VzdpraBeULecAOql4 JLZmmktWXvSRiH8gU4/YEoABf63A9fboOq8oqTSa5YgbJrwAYP6dPl/YaUa8fAeB YU8nZkcSWcn4161XoAyQzP81aqhG+r97lOwdNVeqrFSgTK6xF00qJMp/s7N78WoI uijf/MCGPgp4+TuAIFOuRLlZHoPEyanXlNc9MKT/U2Qfmzi+0xMZx7XGGOKckQbp Z8e9bQRs02Q3Lr24fRKmhFKBUBIhQmDtNRQVSzZ0aEywoIGTtdaQrx9I800tvo4a JrdkwzqcBpLcSZ4FrWo+iDilcm7J+bwRVwMzjKd1q9Qkq0mSMxJJy9Zzu+6Knn28 4VNJA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 6A8EF10D8B6E; Wed, 9 Apr 2025 21:19:01 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 01/11] vt: minor cleanup to vc_translate_unicode() Date: Wed, 9 Apr 2025 21:13:53 -0400 Message-ID: <20250410011839.64418-2-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Make it clearer when a sequence is bad. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/vt.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index f5642b3038..b5f3c8a818 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2817,7 +2817,7 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) if ((c & 0xc0) == 0x80) { /* Unexpected continuation byte? */ if (!vc->vc_utf_count) - return 0xfffd; + goto bad_sequence; vc->vc_utf_char = (vc->vc_utf_char << 6) | (c & 0x3f); vc->vc_npar++; @@ -2829,17 +2829,17 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) /* Reject overlong sequences */ if (c <= utf8_length_changes[vc->vc_npar - 1] || c > utf8_length_changes[vc->vc_npar]) - return 0xfffd; + goto bad_sequence; return vc_sanitize_unicode(c); } /* Single ASCII byte or first byte of a sequence received */ if (vc->vc_utf_count) { - /* Continuation byte expected */ + /* A continuation byte was expected */ *rescan = true; vc->vc_utf_count = 0; - return 0xfffd; + goto bad_sequence; } /* Nothing to do if an ASCII byte was received */ @@ -2858,11 +2858,14 @@ static int vc_translate_unicode(struct vc_data *vc, int c, bool *rescan) vc->vc_utf_count = 3; vc->vc_utf_char = (c & 0x07); } else { - return 0xfffd; + goto bad_sequence; } need_more_bytes: return -1; + +bad_sequence: + return 0xfffd; } static int vc_translate(struct vc_data *vc, int *c, bool *rescan) From patchwork Thu Apr 10 01:13:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880089 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CD0B81ACA; Thu, 10 Apr 2025 01:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247946; cv=none; b=E+6M9epmns2RKvhSWogDhAvGm7PllT3lNvbq5q4QOQlz3zPxalvAGOeH0tNSW69KHEGaTviyFDaU72REy3aZRMkz2QNYJBuLrtEJ97FNDopKhdAIFJzgQ16DGsLFCSau1cERip2OYpcpm4okswZ0i1WQL/Sht8lkqzdZzysmIKA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247946; c=relaxed/simple; bh=EPQ2+T5lvQ77EhfT2FB/qC2cOFBtiEGp1xB+ycffcgU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HR0gcR+8sFKiWwNTwzIMMto9TlBf7JYi3RN98lDl8KZBiMG7nl6X49Ol1MW8IBf+xHTfGxtxYKJF5j/nC6lnp2XOUBwBetOa70MKi4py2GVhvzv6aIeT72mdE+mzMxk2+YrjTc1HGiLLOHpwSOzrZLoblSVVhoWI2xJy4R3Adck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=JC/H7c1x; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=D1I7SxNz; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="JC/H7c1x"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="D1I7SxNz" Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfhigh.phl.internal (Postfix) with ESMTP id 9E7C1114028B; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-06.internal (MEProxy); Wed, 09 Apr 2025 21:19:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247942; x= 1744334342; bh=a3C+3gt5QH+yfaJkHSgYOXuOg6LaWFH4BONRE0arKmk=; b=J C/H7c1xFAAgOmJ6shNYN4aRSpVAXhfSiivW6TGutdzopblZGzWOWEu+BxrPyaR4h mMI3rRl7LiLDDSqUQyA7gV4jGeDbMRsAjmwGk3QhVY+bXzIfe4z6Hv8U/sjDmf/O HFRY+IXCY3h8IjTFqmdRM1XPeGRaNSDYux68csEjIPoZLy6NbLcpjwq4d586kRDq Fd75ZOFArKm6XUcMVAuCvclnwy7eZ/0angkjJ3PuoYrgYmWyC2bj5xBEpv4WUPhg 9I/a51eSfcDozVMT0+sevbTrAY/kaEMLiixBg4hCneoUwitjsZCUcTGwzNrKGH70 U35UM3cQnD9w+D1gavZ8w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247942; x=1744334342; bh=a 3C+3gt5QH+yfaJkHSgYOXuOg6LaWFH4BONRE0arKmk=; b=D1I7SxNzyb0fIq56Z SrR4FfnOz4m0FdT2S7R++JJrCvUlfGvYa7+eYHlfdt/S4qWcasl/OIuaIAWPOt5N q046Sl/U+sfVDqOM2JrzWqaM9K0cMv+C37Wer76NtlQ2BdwSfvYdBaPbTObp6IUA XMz4yVsTkmYrMp7verkk2X7j+t3TVirkJAGYulsVcpB3XrPprftx4fivdfIRw/T8 Y9oraBP0AsKJy6KDaZEOBZ72thbKHSnfZdSiqnC6NTSRFbt0IIqVqwhxleyW5YH3 MmIcdXKA6KPCYmdzc6TsVYtvqLDnValHAyPTovhRMFVBGqoPIbRZxMQFhAxn4MlE G16CA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepleektdffjeevfefhiedtudevudetjeejvdej uedtvdevveefuedujeejffetieeknecuffhomhgrihhnpegtrghmrdgrtgdruhhknecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhhitghosehf lhhugihnihgtrdhnvghtpdhnsggprhgtphhtthhopeeipdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehnphhithhrvgessggrhihlihgsrhgvrdgtohhmpdhrtghpthhtohep jhhirhhishhlrggshieskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepghhrvghgkhhhse hlihhnuhigfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtohepuggrvhgvsehmihgv lhhkvgdrtggtpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrh hnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgv rhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 8858E10D8B70; Wed, 9 Apr 2025 21:19:01 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 02/11] vt: move unicode processing to a separate file Date: Wed, 9 Apr 2025 21:13:54 -0400 Message-ID: <20250410011839.64418-3-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This will make it easier to maintain. Also make it depend on CONFIG_CONSOLE_TRANSLATIONS. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/Makefile | 3 ++- drivers/tty/vt/ucs_width.c | 45 ++++++++++++++++++++++++++++++++++++++ drivers/tty/vt/vt.c | 40 +-------------------------------- include/linux/consolemap.h | 6 +++++ 4 files changed, 54 insertions(+), 40 deletions(-) create mode 100644 drivers/tty/vt/ucs_width.c diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile index 2c8ce8b592..bee69277bb 100644 --- a/drivers/tty/vt/Makefile +++ b/drivers/tty/vt/Makefile @@ -7,7 +7,8 @@ FONTMAPFILE = cp437.uni obj-$(CONFIG_VT) += vt_ioctl.o vc_screen.o \ selection.o keyboard.o \ vt.o defkeymap.o -obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o +obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o \ + ucs_width.o # Files generated that shall be removed upon make clean clean-files := consolemap_deftbl.c defkeymap.c diff --git a/drivers/tty/vt/ucs_width.c b/drivers/tty/vt/ucs_width.c new file mode 100644 index 0000000000..5f0bde30a1 --- /dev/null +++ b/drivers/tty/vt/ucs_width.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include + +/* ucs_is_double_width() is based on the wcwidth() implementation by + * Markus Kuhn -- 2007-05-26 (Unicode 5.0) + * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c + */ + +struct interval { + uint32_t first; + uint32_t last; +}; + +static int ucs_cmp(const void *key, const void *elt) +{ + uint32_t cp = *(uint32_t *)key; + struct interval e = *(struct interval *) elt; + + if (cp > e.last) + return 1; + else if (cp < e.first) + return -1; + return 0; +} + +static const struct interval double_width[] = { + { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, + { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, + { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, + { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } +}; + +bool ucs_is_double_width(uint32_t cp) +{ + if (cp < double_width[0].first || + cp > double_width[ARRAY_SIZE(double_width) - 1].last) + return false; + + return bsearch(&cp, double_width, ARRAY_SIZE(double_width), + sizeof(struct interval), ucs_cmp) != NULL; +} diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index b5f3c8a818..bcb508bc15 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -104,7 +104,6 @@ #include #include #include -#include #include #define MAX_NR_CON_DRIVER 16 @@ -2712,43 +2711,6 @@ static void do_con_trol(struct tty_struct *tty, struct vc_data *vc, u8 c) } } -/* is_double_width() is based on the wcwidth() implementation by - * Markus Kuhn -- 2007-05-26 (Unicode 5.0) - * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - */ -struct interval { - uint32_t first; - uint32_t last; -}; - -static int ucs_cmp(const void *key, const void *elt) -{ - uint32_t ucs = *(uint32_t *)key; - struct interval e = *(struct interval *) elt; - - if (ucs > e.last) - return 1; - else if (ucs < e.first) - return -1; - return 0; -} - -static int is_double_width(uint32_t ucs) -{ - static const struct interval double_width[] = { - { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, - { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, - { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, - { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } - }; - if (ucs < double_width[0].first || - ucs > double_width[ARRAY_SIZE(double_width) - 1].last) - return 0; - - return bsearch(&ucs, double_width, ARRAY_SIZE(double_width), - sizeof(struct interval), ucs_cmp) != NULL; -} - struct vc_draw_region { unsigned long from, to; int x; @@ -2953,7 +2915,7 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - if (is_double_width(c)) + if (ucs_is_double_width(c)) width = 2; } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index c35db4896c..caf079bcb8 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -28,6 +28,7 @@ int conv_uni_to_pc(struct vc_data *conp, long ucs); u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); +bool ucs_is_double_width(uint32_t cp); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -57,6 +58,11 @@ static inline int conv_uni_to_8bit(u32 uni) } static inline void console_map_init(void) { } + +static inline bool ucs_is_double_width(uint32_t cp) +{ + return false; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Thu Apr 10 01:13:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880897 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CD4E13C695; Thu, 10 Apr 2025 01:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247945; cv=none; b=r0SpE8qO18kCDW4RrhgAI7WjwNdSx/N0hhbq38ONTZ0C5lesXfbO4JmYgUXzgQZsW85yvIdtqqT/EI3W7/8JoDIvteGnN9zZFHfvwdzt3YS2uibSxp7BlBH2NvvU9SDhI/MCPzofVk+eQc8sXHGvvHWXlYd6xVr7iem3ulsZHnY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247945; c=relaxed/simple; bh=gcqSMWGe//HpgJ0uuOoj1Z1QowqSEzpoSR6AckfBzdQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RD/Il6qOheX3gIOHbH16wNWt3lVH+zPmAV+1d0aN0ZVMNeOsX+wDjKBpqJvnnCggHrBNygohZCi4DM14Z6f3VUhtgL0rJRZ7HZa9+JJG0KxrkrWc2Uk9g0eAIz+4hq5kGdSalLYxOXiyuuQM8tx/Hv814exHCdaTIcCk3z3MJx0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=gf3ZRIZC; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=GBeUHmGY; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="gf3ZRIZC"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="GBeUHmGY" Received: from phl-compute-09.internal (phl-compute-09.phl.internal [10.202.2.49]) by mailfhigh.phl.internal (Postfix) with ESMTP id 9D21A1140262; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-09.internal (MEProxy); Wed, 09 Apr 2025 21:19:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247942; x= 1744334342; bh=ugSjOtKUvF1Fs7tFsnjRkTf9rFKxSG6I+VHXKwAvhMo=; b=g f3ZRIZCdkckuHfg1d2wYFeP0N9sdRsLmEGzzqZvbos9YL5BySGOLp0uLNBPPXGvM Qz3cHFK4ZEtnP+Fkkr8zxdGgorr3Nj072q+iZTeABgL9z4lD4XanP3QjuA+HMwFZ agazFEU562UfsTfGOsGrGUyj5p9dHoG1eHgmow0EN1Hozwi38R99hmIEKM+KJlZ8 jW/RU+eFscOlLCEF3BOXoNudhhSEgCYBquWzY8eGVAVyND6q/apurIsEnmvpGRGP gncfBkC4PIFef9M99IuUkJDQRL2dj/VAxB+8qJktEFR5kMROfe3DKoBdIKuSW1x/ Hpi9EelNxgyCFGe2fk8UQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247942; x=1744334342; bh=u gSjOtKUvF1Fs7tFsnjRkTf9rFKxSG6I+VHXKwAvhMo=; b=GBeUHmGY/8BqjhnfN UH0B4cm5qq7kf1Aw1OwHNgvyPUzpxc0II7tesTX9Pj9ZDvu/033f7GZuu5MKmdm2 tzPdILrjOOi1UidY06QcFiTxg26jnva/T8KWM1G/BEI+VFDiDdoR5+jdfmgsS2+B M4+qgsedv5ruOeMiyaQwXV1VKziT9B4LfEODWjlppaThRuLIvU81YeF4PIeTiR9Y RR06OKzT83Rud7rhZwCThh6zWqcKjSlQwd8V8maE+i82GyrV/UGoMXCpGDhZHTmv rv+YWYGMkJHDOKinxeTJj2bc6dNrWQoJAcaLzFx1jGGANTfTL9aE/4UQptMuO1AX wDkOg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id A5C0C10D8B71; Wed, 9 Apr 2025 21:19:01 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 03/11] vt: properly support zero-width Unicode code points Date: Wed, 9 Apr 2025 21:13:55 -0400 Message-ID: <20250410011839.64418-4-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Zero-width Unicode code points are causing misalignment in vertically aligned content, disrupting the visual layout. Let's handle zero-width code points more intelligently. Double-width code points are stored in the screen grid followed by a white space code point to create the expected screen layout. When a double-width code point is followed by a zero-width code point in the console incoming bytestream (e.g., an emoji with a presentation selector) then we may replace the white space padding by that zero-width code point instead of dropping it. This maximize screen content information while preserving proper layout. If a zero-width code point is preceded by a single-width code point then the above trick is not possible and such zero-width code point must be dropped. VS16 (Variation Selector 16, U+FE0F) is special as it doubles the width of the preceding single-width code point. We handle that case by giving VS16 a width of 1 when that happens. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/vt.c | 46 ++++++++++++++++++++++++++++++++++++-- include/linux/consolemap.h | 10 +++++++++ 2 files changed, 54 insertions(+), 2 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index bcb508bc15..5d53feeb5d 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -443,6 +443,15 @@ static void vc_uniscr_scroll(struct vc_data *vc, unsigned int top, } } +static u32 vc_uniscr_getc(struct vc_data *vc, int relative_pos) +{ + int pos = vc->state.x + vc->vc_need_wrap + relative_pos; + + if (vc->vc_uni_lines && pos >= 0 && pos < vc->vc_cols) + return vc->vc_uni_lines[vc->state.y][pos]; + return 0; +} + static void vc_uniscr_copy_area(u32 **dst_lines, unsigned int dst_cols, unsigned int dst_rows, @@ -2905,18 +2914,49 @@ static bool vc_is_control(struct vc_data *vc, int tc, int c) return false; } +static void vc_con_rewind(struct vc_data *vc) +{ + if (vc->state.x && !vc->vc_need_wrap) { + vc->vc_pos -= 2; + vc->state.x--; + } + vc->vc_need_wrap = 0; +} + static int vc_con_write_normal(struct vc_data *vc, int tc, int c, struct vc_draw_region *draw) { - int next_c; + int next_c, prev_c; unsigned char vc_attr = vc->vc_attr; u16 himask = vc->vc_hi_font_mask, charmask = himask ? 0x1ff : 0xff; u8 width = 1; bool inverse = false; if (vc->vc_utf && !vc->vc_disp_ctrl) { - if (ucs_is_double_width(c)) + if (ucs_is_double_width(c)) { width = 2; + } else if (ucs_is_zero_width(c)) { + prev_c = vc_uniscr_getc(vc, -1); + if (prev_c == ' ' && + ucs_is_double_width(vc_uniscr_getc(vc, -2))) { + /* + * Let's merge this zero-width code point with + * the preceding double-width code point by + * replacing the existing whitespace padding. + */ + vc_con_rewind(vc); + } else if (c == 0xfe0f && prev_c != 0) { + /* + * VS16 (U+FE0F) is special. Let it have a + * width of 1 when preceded by a single-width + * code point effectively making the later + * double-width. + */ + } else { + /* Otherwise zero-width code points are ignored */ + goto out; + } + } } /* Now try to find out how to display it */ @@ -2995,6 +3035,8 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, tc = ' '; next_c = ' '; } + +out: notify_write(vc, c); if (inverse) diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index caf079bcb8..7d778752dc 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -29,6 +29,11 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); +static inline bool ucs_is_zero_width(uint32_t cp) +{ + /* coming soon */ + return false; +} #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -63,6 +68,11 @@ static inline bool ucs_is_double_width(uint32_t cp) { return false; } + +static inline bool ucs_is_zero_width(uint32_t cp) +{ + return false; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Thu Apr 10 01:13:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880896 Received: from fout-a6-smtp.messagingengine.com (fout-a6-smtp.messagingengine.com [103.168.172.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D488915E5D4; Thu, 10 Apr 2025 01:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.149 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247946; cv=none; b=bTsa9+PDc8nHpHhlO5TIRRXqdxiyKOp8Cr8t55dUfZ71a2R8gM2vM6y0+U9kto3h91oiGxwpCPPncp4pT+xNKRvrku3zwwF+M/R6eLGi+a6Cp95vx7dTWmJQMFyPlU9FfazyzPN2YTSyXcxM7cAfo/Rhyt0mBEXlZuWGEod28eY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247946; c=relaxed/simple; bh=JWVDkb37DmVyEIa4U1AGeH1RPcoUmuR/KvVZMgpL+So=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZcJiPbth9E+4Fr45Rg34MkXdXPXQa6du7k86GlwXSNgA2lb3pIkp1/CchBubY0Z5nfLLMxjmMGeodVS7bcy7MetAnYWcFE5PdGercicEBZG5GpDnzYj/kfLDFGi1sYQElVUV5NKr/n2xLWyyucXrFJ6lXxIo2UDh6mha7lOV5iM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=O4LeSxL8; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=WsRvfgEm; arc=none smtp.client-ip=103.168.172.149 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="O4LeSxL8"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="WsRvfgEm" Received: from phl-compute-09.internal (phl-compute-09.phl.internal [10.202.2.49]) by mailfout.phl.internal (Postfix) with ESMTP id 03B9213801DD; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-09.internal (MEProxy); Wed, 09 Apr 2025 21:19:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247942; x= 1744334342; bh=wOcZX8QRVBD6u9gZyTxUvB/nb1jqWPHTd6PJyhYwdEk=; b=O 4LeSxL81KnSBlmRf0DP8UpV/abZ/eTVIOC0SqzqmfbL182DCWQhYK8TBeTMBClqL VZDhU42dO4HN9c9SRejzI3sxPAqrPKSplk0Q7I1dyHWZlLfcHY+wettcVbHQHK2V Zcfn/dQOkWZdVi3A1qJLyLIDetOotwoSzHRIt8j25ue80/qDnfjtlqVpWLo7cO3o EbV8Z5yxdqDJmQClwIfmuaos13+T/HrytXilk2uO3FENjD6CGRbK01/f9Iutuawo xfLWRKWqUakaRrezHyUcoX2rdkQTHLUnHB6bTZTDUHKvKCQ8zJH7S2zTSfJ92Wci HGrxyyumiU79IrM8grqqA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247942; x=1744334342; bh=w OcZX8QRVBD6u9gZyTxUvB/nb1jqWPHTd6PJyhYwdEk=; b=WsRvfgEmTjOt9SxiF QjsAx3vVqRuETwjiTAZ0acrmi+ecUy/9tKewHLP+kTwT54buIMWPWW5P1+l8j3eH /qZInBUyKjye3Pa7v64AB6nu9gma64QZbkNUbe7eQ8Xl0agXaxhYZK8IoIfPQIG0 pI67aRA7V/5U+hHGMb+tg2gznGpY6arsRsvR1P0U5aBKsJlceMuBZlYLh76roDQN 649h9enPa7vaBNsT4C0F3qjNoTNaKIEFm4j3RxKWp+azszAapVK3wcIsQ0A4P1Xf lRHkBuGXW77sXI+NvIs1H1piQD6QRJPpWtVMMG2kVzKKU51ETr7y5w7Yel2GdzrF TO6dw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id C380410D8B73; Wed, 9 Apr 2025 21:19:01 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 04/11] vt: introduce gen_ucs_width.py to create ucs_width.c Date: Wed, 9 Apr 2025 21:13:56 -0400 Message-ID: <20250410011839.64418-5-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre The table in the current ucs_width.c is terribly out of date and incomplete. We also need a second table to store zero-width code points. Properly maintaining those tables manually is impossible. So here's a script to automatically generate them. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/gen_ucs_width.py | 264 ++++++++++++++++++++++++++++++++ 1 file changed, 264 insertions(+) create mode 100755 drivers/tty/vt/gen_ucs_width.py diff --git a/drivers/tty/vt/gen_ucs_width.py b/drivers/tty/vt/gen_ucs_width.py new file mode 100755 index 0000000000..41997fe001 --- /dev/null +++ b/drivers/tty/vt/gen_ucs_width.py @@ -0,0 +1,264 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# This script uses Python's unicodedata module to generate ucs_width.c + +import unicodedata +import sys + +def generate_ucs_width(): + # Output file name + c_file = "ucs_width.c" + + # Width data mapping + width_map = {} # Maps code points to width (0, 1, 2) + + # Define emoji modifiers and components that should have zero width + emoji_zero_width = [ + # Skin tone modifiers + (0x1F3FB, 0x1F3FF), # Emoji modifiers (skin tones) + + # Variation selectors (note: VS16 is treated specially in vt.c) + (0xFE00, 0xFE0F), # Variation Selectors 1-16 + + # Gender and hair style modifiers + (0x2640, 0x2640), # Female sign + (0x2642, 0x2642), # Male sign + (0x26A7, 0x26A7), # Transgender symbol + (0x1F9B0, 0x1F9B3), # Hair components (red, curly, white, bald) + + # Tag characters + (0xE0020, 0xE007E), # Tags + ] + + # Mark these emoji modifiers as zero-width + for start, end in emoji_zero_width: + for cp in range(start, end + 1): + try: + width_map[cp] = 0 + except (ValueError, OverflowError): + continue + + # Mark all regional indicators as single-width as they are usually paired + # providing a combined with of 2. + regional_indicators = (0x1F1E6, 0x1F1FF) # Regional indicator symbols A-Z + start, end = regional_indicators + for cp in range(start, end + 1): + try: + width_map[cp] = 1 + except (ValueError, OverflowError): + continue + + # Process all assigned Unicode code points (Basic Multilingual Plane + Supplementary Planes) + # Range 0x0 to 0x10FFFF (the full Unicode range) + for block_start in range(0, 0x110000, 0x1000): + block_end = block_start + 0x1000 + for cp in range(block_start, block_end): + try: + char = chr(cp) + + # Skip if already processed + if cp in width_map: + continue + + # Check if the character is a combining mark + category = unicodedata.category(char) + + # Combining marks, format characters, zero-width characters + if (category.startswith('M') or # Mark (combining) + (category == 'Cf' and cp not in (0x061C, 0x06DD, 0x070F, 0x180E, 0x200F, 0x202E, 0x2066, 0x2067, 0x2068, 0x2069)) or + cp in (0x200B, 0x200C, 0x200D, 0x2060, 0xFEFF)): # Known zero-width characters + width_map[cp] = 0 + continue + + # Use East Asian Width property + eaw = unicodedata.east_asian_width(char) + + if eaw in ('F', 'W'): # Fullwidth or Wide + width_map[cp] = 2 + elif eaw in ('Na', 'H', 'N', 'A'): # Narrow, Halfwidth, Neutral, Ambiguous + width_map[cp] = 1 + else: + # Default to single-width for unknown + width_map[cp] = 1 + + except (ValueError, OverflowError): + # Skip invalid code points + continue + + # Process Emoji - generally double-width + # Ranges according to Unicode Emoji standard + emoji_ranges = [ + (0x1F000, 0x1F02F), # Mahjong Tiles + (0x1F0A0, 0x1F0FF), # Playing Cards + (0x1F300, 0x1F5FF), # Miscellaneous Symbols and Pictographs + (0x1F600, 0x1F64F), # Emoticons + (0x1F680, 0x1F6FF), # Transport and Map Symbols + (0x1F700, 0x1F77F), # Alchemical Symbols + (0x1F780, 0x1F7FF), # Geometric Shapes Extended + (0x1F800, 0x1F8FF), # Supplemental Arrows-C + (0x1F900, 0x1F9FF), # Supplemental Symbols and Pictographs + (0x1FA00, 0x1FA6F), # Chess Symbols + (0x1FA70, 0x1FAFF), # Symbols and Pictographs Extended-A + ] + + for start, end in emoji_ranges: + for cp in range(start, end + 1): + if cp not in width_map or width_map[cp] != 0: # Don't override zero-width + try: + char = chr(cp) + width_map[cp] = 2 + except (ValueError, OverflowError): + continue + + # Optimize to create range tables + def ranges_optimize(width_data, target_width): + points = sorted([cp for cp, width in width_data.items() if width == target_width]) + if not points: + return [] + + # Group consecutive code points into ranges + ranges = [] + start = points[0] + prev = start + + for cp in points[1:]: + if cp > prev + 1: + ranges.append((start, prev)) + start = cp + prev = cp + + # Add the last range + ranges.append((start, prev)) + return ranges + + # Extract ranges for each width + zero_width_ranges = ranges_optimize(width_map, 0) + double_width_ranges = ranges_optimize(width_map, 2) + + # Get Unicode version information + unicode_version = unicodedata.unidata_version + + # Generate C implementation file + with open(c_file, 'w') as f: + f.write(f"""\ +// SPDX-License-Identifier: GPL-2.0 +/* + * ucs_width.c - Unicode character width lookup + * + * Auto-generated by gen_ucs_width.py + * + * Unicode Version: {unicode_version} + */ + +#include +#include +#include +#include + +struct interval {{ + uint32_t first; + uint32_t last; +}}; + +/* Zero-width character ranges */ +static const struct interval zero_width_ranges[] = {{ +""") + + for start, end in zero_width_ranges: + try: + start_char_desc = unicodedata.name(chr(start)) if start < 0x10000 else f"U+{start:05X}" + if start == end: + comment = f"/* {start_char_desc} */" + else: + end_char_desc = unicodedata.name(chr(end)) if end < 0x10000 else f"U+{end:05X}" + comment = f"/* {start_char_desc} - {end_char_desc} */" + except: + if start == end: + comment = f"/* U+{start:05X} */" + else: + comment = f"/* U+{start:05X} - U+{end:05X} */" + + f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") + + f.write("""\ +}; + +/* Double-width character ranges */ +static const struct interval double_width_ranges[] = { +""") + + for start, end in double_width_ranges: + try: + start_char_desc = unicodedata.name(chr(start)) if start < 0x10000 else f"U+{start:05X}" + if start == end: + comment = f"/* {start_char_desc} */" + else: + end_char_desc = unicodedata.name(chr(end)) if end < 0x10000 else f"U+{end:05X}" + comment = f"/* {start_char_desc} - {end_char_desc} */" + except: + if start == end: + comment = f"/* U+{start:05X} */" + else: + comment = f"/* U+{start:05X} - U+{end:05X} */" + + f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") + + f.write("""\ +}; + + +static int ucs_cmp(const void *key, const void *element) +{ + uint32_t cp = *(uint32_t *)key; + const struct interval *e = element; + + if (cp > e->last) + return 1; + if (cp < e->first) + return -1; + return 0; +} + +static bool is_in_interval(uint32_t cp, const struct interval *intervals, size_t count) +{ + if (cp < intervals[0].first || cp > intervals[count - 1].last) + return false; + + return __inline_bsearch(&cp, intervals, count, + sizeof(*intervals), ucs_cmp) != NULL; +} + +/** + * Determine if a Unicode code point is zero-width. + * + * @param ucs: Unicode code point (UCS-4) + * Return: true if the character is zero-width, false otherwise + */ +bool ucs_is_zero_width(uint32_t cp) +{ + return is_in_interval(cp, zero_width_ranges, ARRAY_SIZE(zero_width_ranges)); +} + +/** + * Determine if a Unicode code point is double-width. + * + * @param ucs: Unicode code point (UCS-4) + * Return: true if the character is double-width, false otherwise + */ +bool ucs_is_double_width(uint32_t cp) +{ + return is_in_interval(cp, double_width_ranges, ARRAY_SIZE(double_width_ranges)); +} +""") + + # Print summary + zero_width_count = sum(end - start + 1 for start, end in zero_width_ranges) + double_width_count = sum(end - start + 1 for start, end in double_width_ranges) + + print(f"Generated {c_file} with:") + print(f"- {len(zero_width_ranges)} zero-width ranges covering ~{zero_width_count} code points") + print(f"- {len(double_width_ranges)} double-width ranges covering ~{double_width_count} code points") + +if __name__ == "__main__": + generate_ucs_width() From patchwork Thu Apr 10 01:13:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880894 Received: from fout-a6-smtp.messagingengine.com (fout-a6-smtp.messagingengine.com [103.168.172.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 447111684B4; Thu, 10 Apr 2025 01:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.149 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; cv=none; b=kwWFmh+k34xYrnZ7xeR119KDo4vddmfzQEZEKyeoR6pCTpyh1zE4sklY0vCw3v1L77A3s+WXlLX+GtfrcIvXYhPvgjSeKusHqC0qIe6nIMM2c8D1NYI08GUzLOpbSXbo+dLJ+dI/Mps5Lop5DZtw4KrqEuhduiPjDTVa0ZbRxYs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; c=relaxed/simple; bh=J1U+t4G6lHglnug+NI68RmpaEIHqeRXzg8opSv1VJqc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HUU7aSd/vSYPDgwco7Y8NE5s7tc0nXH0KeSGAv6NwGbL+nmbukh3aGqNcncsc/GDK6YVXxJQCI+ayp75k6maCshcSHshH3sr2zH61bfQgkbmEGpn/G51jOZ69VM6ypA+Jt7nvHuskM7raDBCeJVNjsRpvZSluCk1tvbvhLR5AMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=PEY06lps; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=SYJLdwWS; arc=none smtp.client-ip=103.168.172.149 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="PEY06lps"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="SYJLdwWS" Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfout.phl.internal (Postfix) with ESMTP id 544D913801E9; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-06.internal (MEProxy); Wed, 09 Apr 2025 21:19:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247943; x= 1744334343; bh=s6rkQbGtA5qhvaLT3wXBVihHm47BLmvyKvXc8KriMHI=; b=P EY06lpsLd5MKODHyHFPr6p96Tt1f1wHCv5B2UzM/2wG1VjDf4HzoprN7y42ror0/ n/8D2wsLAV64dGOl8DgSMGWyczvGhC5xqhP4i/tUlBd4strEG26MibGyO4t9uHVR wMYB26fbGi4Ii3MbI0GS4h4UnIvnHCevB2Z0kpRD8PtWXSCoSQBW0/cVJYgM8vrK OSZPj4v/2DJy0gkbF+vy7kn01QC4RCp0k+MiLwCGLlf7UT73SMxpgGp5uzujeBFC QrxGIZzC0ZGNooBOReIZVqUWgv1tZdAzTGMRTjO+p01/drE1Ax9qSOSmdLmr1zsj MDnbku0rKL1EbeTDkEM0g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247943; x=1744334343; bh=s 6rkQbGtA5qhvaLT3wXBVihHm47BLmvyKvXc8KriMHI=; b=SYJLdwWSKHSKKFE3v DkvuGVmpCREcAhUq8U+jbs0Gww9pYS+9IpBWzaWQ/XDJClyIF8aT1XXkLiQrIXfz +c42kHjqXE0w7ZzrUmQifjko8Gxxt/2gHb49D5O9H6PsV8HQYWwtLrmpAAhcOMHB 8o98fzp5OPz2E/Y6HucPd2ke+L1Ee0e6JnnRMOt3n4qObkKlFGNm8YbKf2YNejEZ W9lpWy3v9W3iJkLddMMQzSCgD74ZKaLxQuqE/6JIxTEBxwc6Elh/jHcVaF3NIS1t 1SdQ5lD598NAdLoCrcd5rmDVDO3GwoDXHfTVQvSm9324thNLQuOL4UrSgZRKDRig piIpg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnhepleektdffjeevfefhiedtudevudetjeejvdej uedtvdevveefuedujeejffetieeknecuffhomhgrihhnpegtrghmrdgrtgdruhhknecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhhitghosehf lhhugihnihgtrdhnvghtpdhnsggprhgtphhtthhopeeipdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehnphhithhrvgessggrhihlihgsrhgvrdgtohhmpdhrtghpthhtohep jhhirhhishhlrggshieskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepghhrvghgkhhhse hlihhnuhigfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtohepuggrvhgvsehmihgv lhhkvgdrtggtpdhrtghpthhtoheplhhinhhugidqkhgvrhhnvghlsehvghgvrhdrkhgvrh hnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidqshgvrhhirghlsehvghgvrhdrkhgv rhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id E9CB810D8B74; Wed, 9 Apr 2025 21:19:01 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 05/11] vt: update ucs_width.c using gen_ucs_width.py Date: Wed, 9 Apr 2025 21:13:57 -0400 Message-ID: <20250410011839.64418-6-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This replaces ucs_width.c with the code generated by gen_ucs_width.py providing comprehensive tables for double-width and zero-width Unicode code points. Also make ucs_is_zero_width() effective. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/ucs_width.c | 495 +++++++++++++++++++++++++++++++++++-- include/linux/consolemap.h | 6 +- 2 files changed, 475 insertions(+), 26 deletions(-) diff --git a/drivers/tty/vt/ucs_width.c b/drivers/tty/vt/ucs_width.c index 5f0bde30a1..47b22583bd 100644 --- a/drivers/tty/vt/ucs_width.c +++ b/drivers/tty/vt/ucs_width.c @@ -1,45 +1,498 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * ucs_width.c - Unicode character width lookup + * + * Auto-generated by gen_ucs_width.py + * + * Unicode Version: 16.0.0 + */ #include #include #include #include -/* ucs_is_double_width() is based on the wcwidth() implementation by - * Markus Kuhn -- 2007-05-26 (Unicode 5.0) - * Latest version: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - */ - struct interval { uint32_t first; uint32_t last; }; -static int ucs_cmp(const void *key, const void *elt) +/* Zero-width character ranges */ +static const struct interval zero_width_ranges[] = { + { 0x000AD, 0x000AD }, /* SOFT HYPHEN */ + { 0x00300, 0x0036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ + { 0x00483, 0x00489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ + { 0x00591, 0x005BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ + { 0x005BF, 0x005BF }, /* HEBREW POINT RAFE */ + { 0x005C1, 0x005C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ + { 0x005C4, 0x005C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ + { 0x005C7, 0x005C7 }, /* HEBREW POINT QAMATS QATAN */ + { 0x00600, 0x00605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ + { 0x00610, 0x0061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ + { 0x0064B, 0x0065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ + { 0x00670, 0x00670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ + { 0x006D6, 0x006DC }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC SMALL HIGH SEEN */ + { 0x006DF, 0x006E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ + { 0x006E7, 0x006E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ + { 0x006EA, 0x006ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ + { 0x00711, 0x00711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ + { 0x00730, 0x0074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ + { 0x007A6, 0x007B0 }, /* THAANA ABAFILI - THAANA SUKUN */ + { 0x007EB, 0x007F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ + { 0x007FD, 0x007FD }, /* NKO DANTAYALAN */ + { 0x00816, 0x00819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ + { 0x0081B, 0x00823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ + { 0x00825, 0x00827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ + { 0x00829, 0x0082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ + { 0x00859, 0x0085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ + { 0x00890, 0x00891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ + { 0x00897, 0x0089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ + { 0x008CA, 0x00903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ + { 0x0093A, 0x0093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ + { 0x0093E, 0x0094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ + { 0x00951, 0x00957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ + { 0x00962, 0x00963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ + { 0x00981, 0x00983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ + { 0x009BC, 0x009BC }, /* BENGALI SIGN NUKTA */ + { 0x009BE, 0x009C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ + { 0x009C7, 0x009C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ + { 0x009CB, 0x009CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ + { 0x009D7, 0x009D7 }, /* BENGALI AU LENGTH MARK */ + { 0x009E2, 0x009E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ + { 0x009FE, 0x009FE }, /* BENGALI SANDHI MARK */ + { 0x00A01, 0x00A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ + { 0x00A3C, 0x00A3C }, /* GURMUKHI SIGN NUKTA */ + { 0x00A3E, 0x00A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ + { 0x00A47, 0x00A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ + { 0x00A4B, 0x00A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ + { 0x00A51, 0x00A51 }, /* GURMUKHI SIGN UDAAT */ + { 0x00A70, 0x00A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ + { 0x00A75, 0x00A75 }, /* GURMUKHI SIGN YAKASH */ + { 0x00A81, 0x00A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ + { 0x00ABC, 0x00ABC }, /* GUJARATI SIGN NUKTA */ + { 0x00ABE, 0x00AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ + { 0x00AC7, 0x00AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ + { 0x00ACB, 0x00ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ + { 0x00AE2, 0x00AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ + { 0x00AFA, 0x00AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ + { 0x00B01, 0x00B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ + { 0x00B3C, 0x00B3C }, /* ORIYA SIGN NUKTA */ + { 0x00B3E, 0x00B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ + { 0x00B47, 0x00B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ + { 0x00B4B, 0x00B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ + { 0x00B55, 0x00B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ + { 0x00B62, 0x00B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ + { 0x00B82, 0x00B82 }, /* TAMIL SIGN ANUSVARA */ + { 0x00BBE, 0x00BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ + { 0x00BC6, 0x00BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ + { 0x00BCA, 0x00BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ + { 0x00BD7, 0x00BD7 }, /* TAMIL AU LENGTH MARK */ + { 0x00C00, 0x00C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ + { 0x00C3C, 0x00C3C }, /* TELUGU SIGN NUKTA */ + { 0x00C3E, 0x00C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ + { 0x00C46, 0x00C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ + { 0x00C4A, 0x00C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ + { 0x00C55, 0x00C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ + { 0x00C62, 0x00C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ + { 0x00C81, 0x00C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ + { 0x00CBC, 0x00CBC }, /* KANNADA SIGN NUKTA */ + { 0x00CBE, 0x00CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ + { 0x00CC6, 0x00CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ + { 0x00CCA, 0x00CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ + { 0x00CD5, 0x00CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ + { 0x00CE2, 0x00CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ + { 0x00CF3, 0x00CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ + { 0x00D00, 0x00D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ + { 0x00D3B, 0x00D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ + { 0x00D3E, 0x00D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ + { 0x00D46, 0x00D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ + { 0x00D4A, 0x00D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ + { 0x00D57, 0x00D57 }, /* MALAYALAM AU LENGTH MARK */ + { 0x00D62, 0x00D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ + { 0x00D81, 0x00D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ + { 0x00DCA, 0x00DCA }, /* SINHALA SIGN AL-LAKUNA */ + { 0x00DCF, 0x00DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ + { 0x00DD6, 0x00DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ + { 0x00DD8, 0x00DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ + { 0x00DF2, 0x00DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ + { 0x00E31, 0x00E31 }, /* THAI CHARACTER MAI HAN-AKAT */ + { 0x00E34, 0x00E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ + { 0x00E47, 0x00E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ + { 0x00EB1, 0x00EB1 }, /* LAO VOWEL SIGN MAI KAN */ + { 0x00EB4, 0x00EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ + { 0x00EC8, 0x00ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ + { 0x00F18, 0x00F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ + { 0x00F35, 0x00F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ + { 0x00F37, 0x00F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ + { 0x00F39, 0x00F39 }, /* TIBETAN MARK TSA -PHRU */ + { 0x00F3E, 0x00F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ + { 0x00F71, 0x00F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ + { 0x00F86, 0x00F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ + { 0x00F8D, 0x00F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ + { 0x00F99, 0x00FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ + { 0x00FC6, 0x00FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ + { 0x0102B, 0x0103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ + { 0x01056, 0x01059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ + { 0x0105E, 0x01060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ + { 0x01062, 0x01064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ + { 0x01067, 0x0106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ + { 0x01071, 0x01074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ + { 0x01082, 0x0108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ + { 0x0108F, 0x0108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ + { 0x0109A, 0x0109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ + { 0x0135D, 0x0135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ + { 0x01712, 0x01715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ + { 0x01732, 0x01734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ + { 0x01752, 0x01753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ + { 0x01772, 0x01773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ + { 0x017B4, 0x017D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ + { 0x017DD, 0x017DD }, /* KHMER SIGN ATTHACAN */ + { 0x0180B, 0x0180D }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR THREE */ + { 0x0180F, 0x0180F }, /* MONGOLIAN FREE VARIATION SELECTOR FOUR */ + { 0x01885, 0x01886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ + { 0x018A9, 0x018A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ + { 0x01920, 0x0192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ + { 0x01930, 0x0193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ + { 0x01A17, 0x01A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ + { 0x01A55, 0x01A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ + { 0x01A60, 0x01A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ + { 0x01A7F, 0x01A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ + { 0x01AB0, 0x01ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ + { 0x01B00, 0x01B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ + { 0x01B34, 0x01B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ + { 0x01B6B, 0x01B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ + { 0x01B80, 0x01B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ + { 0x01BA1, 0x01BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ + { 0x01BE6, 0x01BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ + { 0x01C24, 0x01C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ + { 0x01CD0, 0x01CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ + { 0x01CD4, 0x01CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ + { 0x01CED, 0x01CED }, /* VEDIC SIGN TIRYAK */ + { 0x01CF4, 0x01CF4 }, /* VEDIC TONE CANDRA ABOVE */ + { 0x01CF7, 0x01CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ + { 0x01DC0, 0x01DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ + { 0x0200B, 0x0200E }, /* ZERO WIDTH SPACE - LEFT-TO-RIGHT MARK */ + { 0x0202A, 0x0202D }, /* LEFT-TO-RIGHT EMBEDDING - LEFT-TO-RIGHT OVERRIDE */ + { 0x02060, 0x02064 }, /* WORD JOINER - INVISIBLE PLUS */ + { 0x0206A, 0x0206F }, /* INHIBIT SYMMETRIC SWAPPING - NOMINAL DIGIT SHAPES */ + { 0x020D0, 0x020F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ + { 0x02640, 0x02640 }, /* FEMALE SIGN */ + { 0x02642, 0x02642 }, /* MALE SIGN */ + { 0x026A7, 0x026A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ + { 0x02CEF, 0x02CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ + { 0x02D7F, 0x02D7F }, /* TIFINAGH CONSONANT JOINER */ + { 0x02DE0, 0x02DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ + { 0x0302A, 0x0302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ + { 0x03099, 0x0309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ + { 0x0A66F, 0x0A672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ + { 0x0A674, 0x0A67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ + { 0x0A69E, 0x0A69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ + { 0x0A6F0, 0x0A6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ + { 0x0A802, 0x0A802 }, /* SYLOTI NAGRI SIGN DVISVARA */ + { 0x0A806, 0x0A806 }, /* SYLOTI NAGRI SIGN HASANTA */ + { 0x0A80B, 0x0A80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ + { 0x0A823, 0x0A827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ + { 0x0A82C, 0x0A82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ + { 0x0A880, 0x0A881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ + { 0x0A8B4, 0x0A8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ + { 0x0A8E0, 0x0A8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ + { 0x0A8FF, 0x0A8FF }, /* DEVANAGARI VOWEL SIGN AY */ + { 0x0A926, 0x0A92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ + { 0x0A947, 0x0A953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ + { 0x0A980, 0x0A983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ + { 0x0A9B3, 0x0A9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ + { 0x0A9E5, 0x0A9E5 }, /* MYANMAR SIGN SHAN SAW */ + { 0x0AA29, 0x0AA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ + { 0x0AA43, 0x0AA43 }, /* CHAM CONSONANT SIGN FINAL NG */ + { 0x0AA4C, 0x0AA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ + { 0x0AA7B, 0x0AA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ + { 0x0AAB0, 0x0AAB0 }, /* TAI VIET MAI KANG */ + { 0x0AAB2, 0x0AAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ + { 0x0AAB7, 0x0AAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ + { 0x0AABE, 0x0AABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ + { 0x0AAC1, 0x0AAC1 }, /* TAI VIET TONE MAI THO */ + { 0x0AAEB, 0x0AAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ + { 0x0AAF5, 0x0AAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ + { 0x0ABE3, 0x0ABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ + { 0x0ABEC, 0x0ABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ + { 0x0FB1E, 0x0FB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ + { 0x0FE00, 0x0FE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ + { 0x0FE20, 0x0FE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ + { 0x0FEFF, 0x0FEFF }, /* ZERO WIDTH NO-BREAK SPACE */ + { 0x0FFF9, 0x0FFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ + { 0x101FD, 0x101FD }, /* U+101FD */ + { 0x102E0, 0x102E0 }, /* U+102E0 */ + { 0x10376, 0x1037A }, /* U+10376 - U+1037A */ + { 0x10A01, 0x10A03 }, /* U+10A01 - U+10A03 */ + { 0x10A05, 0x10A06 }, /* U+10A05 - U+10A06 */ + { 0x10A0C, 0x10A0F }, /* U+10A0C - U+10A0F */ + { 0x10A38, 0x10A3A }, /* U+10A38 - U+10A3A */ + { 0x10A3F, 0x10A3F }, /* U+10A3F */ + { 0x10AE5, 0x10AE6 }, /* U+10AE5 - U+10AE6 */ + { 0x10D24, 0x10D27 }, /* U+10D24 - U+10D27 */ + { 0x10D69, 0x10D6D }, /* U+10D69 - U+10D6D */ + { 0x10EAB, 0x10EAC }, /* U+10EAB - U+10EAC */ + { 0x10EFC, 0x10EFF }, /* U+10EFC - U+10EFF */ + { 0x10F46, 0x10F50 }, /* U+10F46 - U+10F50 */ + { 0x10F82, 0x10F85 }, /* U+10F82 - U+10F85 */ + { 0x11000, 0x11002 }, /* U+11000 - U+11002 */ + { 0x11038, 0x11046 }, /* U+11038 - U+11046 */ + { 0x11070, 0x11070 }, /* U+11070 */ + { 0x11073, 0x11074 }, /* U+11073 - U+11074 */ + { 0x1107F, 0x11082 }, /* U+1107F - U+11082 */ + { 0x110B0, 0x110BA }, /* U+110B0 - U+110BA */ + { 0x110BD, 0x110BD }, /* U+110BD */ + { 0x110C2, 0x110C2 }, /* U+110C2 */ + { 0x110CD, 0x110CD }, /* U+110CD */ + { 0x11100, 0x11102 }, /* U+11100 - U+11102 */ + { 0x11127, 0x11134 }, /* U+11127 - U+11134 */ + { 0x11145, 0x11146 }, /* U+11145 - U+11146 */ + { 0x11173, 0x11173 }, /* U+11173 */ + { 0x11180, 0x11182 }, /* U+11180 - U+11182 */ + { 0x111B3, 0x111C0 }, /* U+111B3 - U+111C0 */ + { 0x111C9, 0x111CC }, /* U+111C9 - U+111CC */ + { 0x111CE, 0x111CF }, /* U+111CE - U+111CF */ + { 0x1122C, 0x11237 }, /* U+1122C - U+11237 */ + { 0x1123E, 0x1123E }, /* U+1123E */ + { 0x11241, 0x11241 }, /* U+11241 */ + { 0x112DF, 0x112EA }, /* U+112DF - U+112EA */ + { 0x11300, 0x11303 }, /* U+11300 - U+11303 */ + { 0x1133B, 0x1133C }, /* U+1133B - U+1133C */ + { 0x1133E, 0x11344 }, /* U+1133E - U+11344 */ + { 0x11347, 0x11348 }, /* U+11347 - U+11348 */ + { 0x1134B, 0x1134D }, /* U+1134B - U+1134D */ + { 0x11357, 0x11357 }, /* U+11357 */ + { 0x11362, 0x11363 }, /* U+11362 - U+11363 */ + { 0x11366, 0x1136C }, /* U+11366 - U+1136C */ + { 0x11370, 0x11374 }, /* U+11370 - U+11374 */ + { 0x113B8, 0x113C0 }, /* U+113B8 - U+113C0 */ + { 0x113C2, 0x113C2 }, /* U+113C2 */ + { 0x113C5, 0x113C5 }, /* U+113C5 */ + { 0x113C7, 0x113CA }, /* U+113C7 - U+113CA */ + { 0x113CC, 0x113D0 }, /* U+113CC - U+113D0 */ + { 0x113D2, 0x113D2 }, /* U+113D2 */ + { 0x113E1, 0x113E2 }, /* U+113E1 - U+113E2 */ + { 0x11435, 0x11446 }, /* U+11435 - U+11446 */ + { 0x1145E, 0x1145E }, /* U+1145E */ + { 0x114B0, 0x114C3 }, /* U+114B0 - U+114C3 */ + { 0x115AF, 0x115B5 }, /* U+115AF - U+115B5 */ + { 0x115B8, 0x115C0 }, /* U+115B8 - U+115C0 */ + { 0x115DC, 0x115DD }, /* U+115DC - U+115DD */ + { 0x11630, 0x11640 }, /* U+11630 - U+11640 */ + { 0x116AB, 0x116B7 }, /* U+116AB - U+116B7 */ + { 0x1171D, 0x1172B }, /* U+1171D - U+1172B */ + { 0x1182C, 0x1183A }, /* U+1182C - U+1183A */ + { 0x11930, 0x11935 }, /* U+11930 - U+11935 */ + { 0x11937, 0x11938 }, /* U+11937 - U+11938 */ + { 0x1193B, 0x1193E }, /* U+1193B - U+1193E */ + { 0x11940, 0x11940 }, /* U+11940 */ + { 0x11942, 0x11943 }, /* U+11942 - U+11943 */ + { 0x119D1, 0x119D7 }, /* U+119D1 - U+119D7 */ + { 0x119DA, 0x119E0 }, /* U+119DA - U+119E0 */ + { 0x119E4, 0x119E4 }, /* U+119E4 */ + { 0x11A01, 0x11A0A }, /* U+11A01 - U+11A0A */ + { 0x11A33, 0x11A39 }, /* U+11A33 - U+11A39 */ + { 0x11A3B, 0x11A3E }, /* U+11A3B - U+11A3E */ + { 0x11A47, 0x11A47 }, /* U+11A47 */ + { 0x11A51, 0x11A5B }, /* U+11A51 - U+11A5B */ + { 0x11A8A, 0x11A99 }, /* U+11A8A - U+11A99 */ + { 0x11C2F, 0x11C36 }, /* U+11C2F - U+11C36 */ + { 0x11C38, 0x11C3F }, /* U+11C38 - U+11C3F */ + { 0x11C92, 0x11CA7 }, /* U+11C92 - U+11CA7 */ + { 0x11CA9, 0x11CB6 }, /* U+11CA9 - U+11CB6 */ + { 0x11D31, 0x11D36 }, /* U+11D31 - U+11D36 */ + { 0x11D3A, 0x11D3A }, /* U+11D3A */ + { 0x11D3C, 0x11D3D }, /* U+11D3C - U+11D3D */ + { 0x11D3F, 0x11D45 }, /* U+11D3F - U+11D45 */ + { 0x11D47, 0x11D47 }, /* U+11D47 */ + { 0x11D8A, 0x11D8E }, /* U+11D8A - U+11D8E */ + { 0x11D90, 0x11D91 }, /* U+11D90 - U+11D91 */ + { 0x11D93, 0x11D97 }, /* U+11D93 - U+11D97 */ + { 0x11EF3, 0x11EF6 }, /* U+11EF3 - U+11EF6 */ + { 0x11F00, 0x11F01 }, /* U+11F00 - U+11F01 */ + { 0x11F03, 0x11F03 }, /* U+11F03 */ + { 0x11F34, 0x11F3A }, /* U+11F34 - U+11F3A */ + { 0x11F3E, 0x11F42 }, /* U+11F3E - U+11F42 */ + { 0x11F5A, 0x11F5A }, /* U+11F5A */ + { 0x13430, 0x13440 }, /* U+13430 - U+13440 */ + { 0x13447, 0x13455 }, /* U+13447 - U+13455 */ + { 0x1611E, 0x1612F }, /* U+1611E - U+1612F */ + { 0x16AF0, 0x16AF4 }, /* U+16AF0 - U+16AF4 */ + { 0x16B30, 0x16B36 }, /* U+16B30 - U+16B36 */ + { 0x16F4F, 0x16F4F }, /* U+16F4F */ + { 0x16F51, 0x16F87 }, /* U+16F51 - U+16F87 */ + { 0x16F8F, 0x16F92 }, /* U+16F8F - U+16F92 */ + { 0x16FE4, 0x16FE4 }, /* U+16FE4 */ + { 0x16FF0, 0x16FF1 }, /* U+16FF0 - U+16FF1 */ + { 0x1BC9D, 0x1BC9E }, /* U+1BC9D - U+1BC9E */ + { 0x1BCA0, 0x1BCA3 }, /* U+1BCA0 - U+1BCA3 */ + { 0x1CF00, 0x1CF2D }, /* U+1CF00 - U+1CF2D */ + { 0x1CF30, 0x1CF46 }, /* U+1CF30 - U+1CF46 */ + { 0x1D165, 0x1D169 }, /* U+1D165 - U+1D169 */ + { 0x1D16D, 0x1D182 }, /* U+1D16D - U+1D182 */ + { 0x1D185, 0x1D18B }, /* U+1D185 - U+1D18B */ + { 0x1D1AA, 0x1D1AD }, /* U+1D1AA - U+1D1AD */ + { 0x1D242, 0x1D244 }, /* U+1D242 - U+1D244 */ + { 0x1DA00, 0x1DA36 }, /* U+1DA00 - U+1DA36 */ + { 0x1DA3B, 0x1DA6C }, /* U+1DA3B - U+1DA6C */ + { 0x1DA75, 0x1DA75 }, /* U+1DA75 */ + { 0x1DA84, 0x1DA84 }, /* U+1DA84 */ + { 0x1DA9B, 0x1DA9F }, /* U+1DA9B - U+1DA9F */ + { 0x1DAA1, 0x1DAAF }, /* U+1DAA1 - U+1DAAF */ + { 0x1E000, 0x1E006 }, /* U+1E000 - U+1E006 */ + { 0x1E008, 0x1E018 }, /* U+1E008 - U+1E018 */ + { 0x1E01B, 0x1E021 }, /* U+1E01B - U+1E021 */ + { 0x1E023, 0x1E024 }, /* U+1E023 - U+1E024 */ + { 0x1E026, 0x1E02A }, /* U+1E026 - U+1E02A */ + { 0x1E08F, 0x1E08F }, /* U+1E08F */ + { 0x1E130, 0x1E136 }, /* U+1E130 - U+1E136 */ + { 0x1E2AE, 0x1E2AE }, /* U+1E2AE */ + { 0x1E2EC, 0x1E2EF }, /* U+1E2EC - U+1E2EF */ + { 0x1E4EC, 0x1E4EF }, /* U+1E4EC - U+1E4EF */ + { 0x1E5EE, 0x1E5EF }, /* U+1E5EE - U+1E5EF */ + { 0x1E8D0, 0x1E8D6 }, /* U+1E8D0 - U+1E8D6 */ + { 0x1E944, 0x1E94A }, /* U+1E944 - U+1E94A */ + { 0x1F3FB, 0x1F3FF }, /* U+1F3FB - U+1F3FF */ + { 0x1F9B0, 0x1F9B3 }, /* U+1F9B0 - U+1F9B3 */ + { 0xE0001, 0xE0001 }, /* U+E0001 */ + { 0xE0020, 0xE007F }, /* U+E0020 - U+E007F */ + { 0xE0100, 0xE01EF }, /* U+E0100 - U+E01EF */ +}; + +/* Double-width character ranges */ +static const struct interval double_width_ranges[] = { + { 0x01100, 0x0115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ + { 0x0231A, 0x0231B }, /* WATCH - HOURGLASS */ + { 0x02329, 0x0232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ + { 0x023E9, 0x023EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ + { 0x023F0, 0x023F0 }, /* ALARM CLOCK */ + { 0x023F3, 0x023F3 }, /* HOURGLASS WITH FLOWING SAND */ + { 0x025FD, 0x025FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ + { 0x02614, 0x02615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ + { 0x02630, 0x02637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ + { 0x02648, 0x02653 }, /* ARIES - PISCES */ + { 0x0267F, 0x0267F }, /* WHEELCHAIR SYMBOL */ + { 0x0268A, 0x0268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ + { 0x02693, 0x02693 }, /* ANCHOR */ + { 0x026A1, 0x026A1 }, /* HIGH VOLTAGE SIGN */ + { 0x026AA, 0x026AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ + { 0x026BD, 0x026BE }, /* SOCCER BALL - BASEBALL */ + { 0x026C4, 0x026C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ + { 0x026CE, 0x026CE }, /* OPHIUCHUS */ + { 0x026D4, 0x026D4 }, /* NO ENTRY */ + { 0x026EA, 0x026EA }, /* CHURCH */ + { 0x026F2, 0x026F3 }, /* FOUNTAIN - FLAG IN HOLE */ + { 0x026F5, 0x026F5 }, /* SAILBOAT */ + { 0x026FA, 0x026FA }, /* TENT */ + { 0x026FD, 0x026FD }, /* FUEL PUMP */ + { 0x02705, 0x02705 }, /* WHITE HEAVY CHECK MARK */ + { 0x0270A, 0x0270B }, /* RAISED FIST - RAISED HAND */ + { 0x02728, 0x02728 }, /* SPARKLES */ + { 0x0274C, 0x0274C }, /* CROSS MARK */ + { 0x0274E, 0x0274E }, /* NEGATIVE SQUARED CROSS MARK */ + { 0x02753, 0x02755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ + { 0x02757, 0x02757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ + { 0x02795, 0x02797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ + { 0x027B0, 0x027B0 }, /* CURLY LOOP */ + { 0x027BF, 0x027BF }, /* DOUBLE CURLY LOOP */ + { 0x02B1B, 0x02B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ + { 0x02B50, 0x02B50 }, /* WHITE MEDIUM STAR */ + { 0x02B55, 0x02B55 }, /* HEAVY LARGE CIRCLE */ + { 0x02E80, 0x02E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ + { 0x02E9B, 0x02EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ + { 0x02F00, 0x02FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ + { 0x02FF0, 0x03029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ + { 0x03030, 0x0303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ + { 0x03041, 0x03096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ + { 0x0309B, 0x030FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ + { 0x03105, 0x0312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ + { 0x03131, 0x0318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ + { 0x03190, 0x031E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ + { 0x031EF, 0x0321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ + { 0x03220, 0x03247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ + { 0x03250, 0x0A48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ + { 0x0A490, 0x0A4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ + { 0x0A960, 0x0A97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ + { 0x0AC00, 0x0D7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ + { 0x0F900, 0x0FAFF }, /* U+0F900 - U+0FAFF */ + { 0x0FE10, 0x0FE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ + { 0x0FE30, 0x0FE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ + { 0x0FE54, 0x0FE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ + { 0x0FE68, 0x0FE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ + { 0x0FF01, 0x0FF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ + { 0x0FFE0, 0x0FFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ + { 0x16FE0, 0x16FE3 }, /* U+16FE0 - U+16FE3 */ + { 0x17000, 0x187F7 }, /* U+17000 - U+187F7 */ + { 0x18800, 0x18CD5 }, /* U+18800 - U+18CD5 */ + { 0x18CFF, 0x18D08 }, /* U+18CFF - U+18D08 */ + { 0x1AFF0, 0x1AFF3 }, /* U+1AFF0 - U+1AFF3 */ + { 0x1AFF5, 0x1AFFB }, /* U+1AFF5 - U+1AFFB */ + { 0x1AFFD, 0x1AFFE }, /* U+1AFFD - U+1AFFE */ + { 0x1B000, 0x1B122 }, /* U+1B000 - U+1B122 */ + { 0x1B132, 0x1B132 }, /* U+1B132 */ + { 0x1B150, 0x1B152 }, /* U+1B150 - U+1B152 */ + { 0x1B155, 0x1B155 }, /* U+1B155 */ + { 0x1B164, 0x1B167 }, /* U+1B164 - U+1B167 */ + { 0x1B170, 0x1B2FB }, /* U+1B170 - U+1B2FB */ + { 0x1D300, 0x1D356 }, /* U+1D300 - U+1D356 */ + { 0x1D360, 0x1D376 }, /* U+1D360 - U+1D376 */ + { 0x1F000, 0x1F02F }, /* U+1F000 - U+1F02F */ + { 0x1F0A0, 0x1F0FF }, /* U+1F0A0 - U+1F0FF */ + { 0x1F18E, 0x1F18E }, /* U+1F18E */ + { 0x1F191, 0x1F19A }, /* U+1F191 - U+1F19A */ + { 0x1F200, 0x1F202 }, /* U+1F200 - U+1F202 */ + { 0x1F210, 0x1F23B }, /* U+1F210 - U+1F23B */ + { 0x1F240, 0x1F248 }, /* U+1F240 - U+1F248 */ + { 0x1F250, 0x1F251 }, /* U+1F250 - U+1F251 */ + { 0x1F260, 0x1F265 }, /* U+1F260 - U+1F265 */ + { 0x1F300, 0x1F3FA }, /* U+1F300 - U+1F3FA */ + { 0x1F400, 0x1F64F }, /* U+1F400 - U+1F64F */ + { 0x1F680, 0x1F9AF }, /* U+1F680 - U+1F9AF */ + { 0x1F9B4, 0x1FAFF }, /* U+1F9B4 - U+1FAFF */ + { 0x20000, 0x2FFFD }, /* U+20000 - U+2FFFD */ + { 0x30000, 0x3FFFD }, /* U+30000 - U+3FFFD */ +}; + + +static int ucs_cmp(const void *key, const void *element) { uint32_t cp = *(uint32_t *)key; - struct interval e = *(struct interval *) elt; + const struct interval *e = element; - if (cp > e.last) + if (cp > e->last) return 1; - else if (cp < e.first) + if (cp < e->first) return -1; return 0; } -static const struct interval double_width[] = { - { 0x1100, 0x115F }, { 0x2329, 0x232A }, { 0x2E80, 0x303E }, - { 0x3040, 0xA4CF }, { 0xAC00, 0xD7A3 }, { 0xF900, 0xFAFF }, - { 0xFE10, 0xFE19 }, { 0xFE30, 0xFE6F }, { 0xFF00, 0xFF60 }, - { 0xFFE0, 0xFFE6 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } -}; - -bool ucs_is_double_width(uint32_t cp) +static bool is_in_interval(uint32_t cp, const struct interval *intervals, size_t count) { - if (cp < double_width[0].first || - cp > double_width[ARRAY_SIZE(double_width) - 1].last) + if (cp < intervals[0].first || cp > intervals[count - 1].last) return false; - return bsearch(&cp, double_width, ARRAY_SIZE(double_width), - sizeof(struct interval), ucs_cmp) != NULL; + return __inline_bsearch(&cp, intervals, count, + sizeof(*intervals), ucs_cmp) != NULL; +} + +/** + * Determine if a Unicode code point is zero-width. + * + * @param ucs: Unicode code point (UCS-4) + * Return: true if the character is zero-width, false otherwise + */ +bool ucs_is_zero_width(uint32_t cp) +{ + return is_in_interval(cp, zero_width_ranges, ARRAY_SIZE(zero_width_ranges)); +} + +/** + * Determine if a Unicode code point is double-width. + * + * @param ucs: Unicode code point (UCS-4) + * Return: true if the character is double-width, false otherwise + */ +bool ucs_is_double_width(uint32_t cp) +{ + return is_in_interval(cp, double_width_ranges, ARRAY_SIZE(double_width_ranges)); } diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index 7d778752dc..b3a9118666 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -29,11 +29,7 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); -static inline bool ucs_is_zero_width(uint32_t cp) -{ - /* coming soon */ - return false; -} +bool ucs_is_zero_width(uint32_t cp); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) From patchwork Thu Apr 10 01:13:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880893 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 607CA171E43; Thu, 10 Apr 2025 01:19:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; cv=none; b=aSirT5Yy3xmvadu28GCgYoq4W4YfETULJ0hyLN6LhOOT0JKgC6T/UkWzj2WV0ZdRGvuUuHy63LOD5uiEXhzpPkXeWDugdYmOHhme2e4R4vnjRrIsW1aY/wxvMWLNA7dLOisvdWG6VWnM/OgoODJ/MUfkv9yfRtDcjmDdMJPsGeM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; c=relaxed/simple; bh=QmRQEb17PcXXwTDp2HQFbsclPY+p8NKjJJNQZrm+eYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=D8+X8hlNMz9kbkgY9zk7n6Zk9zzLiX27zSQDCdNutm0KGQqnusJqoMkZST1SS/vSljSRVkLMS3gG0QEIbMGk3jK/pcUctYLp5LwKbXBpjuQjEeDsa6342AmbgVdBHYZFSaf68OxKRgM3KhOGkqT7p9zB8bPowxW9+RC0+g3eqC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=B19mAGPQ; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=U/vq/SHD; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="B19mAGPQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="U/vq/SHD" Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfhigh.phl.internal (Postfix) with ESMTP id 6BF6F1140292; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-12.internal (MEProxy); Wed, 09 Apr 2025 21:19:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1744247943; x=1744334343; bh=EeKRigeUKSXvO0/AJWCt45EUE5TK7dKvCH869MZ00d8=; b= B19mAGPQ+irW0NiSRb3VTIKvVvN6cyWlLZ6DFG/ZguQmNx4AEsNW80dYFTEMY7pV 3H8PIrwS7ZlcvpEJt4V8ds24zqTjrnl1rYQ5Seviv+qAkGlhPhVTMeQg3VzRkjC7 E591+DaidMSVlgleUtFs0Qy+Y0foiZ9m0adTBiUKZv+X11JvDinpYAAV9FWb0lVX WwJOyg3lsWKAs/K7a7zFpvzdUZRzyDTIhTWExew+okjlq/8yFyrw8RpU2G6vZCjs vZ6of9V3AqLGqhdKVtrsbLcpCqWlu2BOwpb9ZalLvEJhkCI8INUg6dTAAkrdppJq 26CDPcCi7t2C9x/01pGf2Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1744247943; x= 1744334343; bh=EeKRigeUKSXvO0/AJWCt45EUE5TK7dKvCH869MZ00d8=; b=U /vq/SHD6cEqcPhbzWC9CvkYEh2iU+gf7oA/7+JHmAmKPgcoscrVkcpCIbrdwiqbt Xq9wywRdbsxjE6AIHMDHQxNFPi1jgjI6yD0tnDhzjXOzXuY5VDPglA3RDBswm5wx 3ydybIUzCwaBkjo59i7/E9Y3r6PqjNDvgQcVXMsQdDEYijDBxR+oksm/iSKjFOS1 eX43X4jfwzCoCaTLDh5DU0GdnH9A2NWGyeLyafcDMMEArZzzErJ1Yd8SWeBvrLxi NNEXfbzCkDpBttjPfYCLlxUABzR08+ZIAlkTqef2dm5hzu9rlauxXi0xEDO6GKxy xZeVCF/8wiDCYqiEWgeLg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhggtgfgsehtkeertder tdejnecuhfhrohhmpefpihgtohhlrghsucfrihhtrhgvuceonhhitghosehflhhugihnih gtrdhnvghtqeenucggtffrrghtthgvrhhnpedufedvleehffethffhffelkeekieekgeev ffelleejteehieeghfeluedvvdehieenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpehnihgtohesfhhluhignhhitgdrnhgvthdpnhgspghrtghp thhtohepiedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepnhhpihhtrhgvsegsrg ihlhhisghrvgdrtghomhdprhgtphhtthhopehjihhrihhslhgrsgihsehkvghrnhgvlhdr ohhrghdprhgtphhtthhopehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdroh hrghdprhgtphhtthhopegurghvvgesmhhivghlkhgvrdgttgdprhgtphhtthhopehlihhn uhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehlih hnuhigqdhsvghrihgrlhesvhhgvghrrdhkvghrnhgvlhdrohhrgh X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 43A1F10D8B75; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 06/11] vt: introduce gen_ucs_recompose.py to create ucs_recompose.c Date: Wed, 9 Apr 2025 21:13:58 -0400 Message-ID: <20250410011839.64418-7-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre The generated code includes a table that maps base character + combining mark pairs to their precomposed equivalents using Python's unicodedata module. It also provides the ucs_recompose() function to query that table. The default script behavior is to create a table with most commonly used Latin, Greek, and Cyrillic recomposition pairs only. It is much smaller than the table with all possible recomposition pairs (71 entries vs 1000 entries). But if one needs/wants the full table then simply running the script with the --full argument will generate it. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/gen_ucs_recompose.py | 321 ++++++++++++++++++++++++++++ 1 file changed, 321 insertions(+) create mode 100755 drivers/tty/vt/gen_ucs_recompose.py diff --git a/drivers/tty/vt/gen_ucs_recompose.py b/drivers/tty/vt/gen_ucs_recompose.py new file mode 100755 index 0000000000..64418803e4 --- /dev/null +++ b/drivers/tty/vt/gen_ucs_recompose.py @@ -0,0 +1,321 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +# +# This script uses Python's unicodedata module to generate ucs_recompose.c. +# The generated code maps base character + combining mark pairs to their +# precomposed equivalents. +# +# Usage: +# python gen_ucs_recompose.py # Generate with common recomposition pairs +# python gen_ucs_recompose.py --full # Generate with all recomposition pairs + +import unicodedata +import sys +import argparse +import textwrap + +common_recompose_description = "most commonly used Latin, Greek, and Cyrillic recomposition pairs only" +COMMON_RECOMPOSITION_PAIRS = [ + # Latin letters with accents - uppercase + (0x0041, 0x0300, 0x00C0), # A + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER A WITH GRAVE + (0x0041, 0x0301, 0x00C1), # A + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER A WITH ACUTE + (0x0041, 0x0302, 0x00C2), # A + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER A WITH CIRCUMFLEX + (0x0041, 0x0303, 0x00C3), # A + COMBINING TILDE = LATIN CAPITAL LETTER A WITH TILDE + (0x0041, 0x0308, 0x00C4), # A + COMBINING DIAERESIS = LATIN CAPITAL LETTER A WITH DIAERESIS + (0x0041, 0x030A, 0x00C5), # A + COMBINING RING ABOVE = LATIN CAPITAL LETTER A WITH RING ABOVE + (0x0043, 0x0327, 0x00C7), # C + COMBINING CEDILLA = LATIN CAPITAL LETTER C WITH CEDILLA + (0x0045, 0x0300, 0x00C8), # E + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER E WITH GRAVE + (0x0045, 0x0301, 0x00C9), # E + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER E WITH ACUTE + (0x0045, 0x0302, 0x00CA), # E + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER E WITH CIRCUMFLEX + (0x0045, 0x0308, 0x00CB), # E + COMBINING DIAERESIS = LATIN CAPITAL LETTER E WITH DIAERESIS + (0x0049, 0x0300, 0x00CC), # I + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER I WITH GRAVE + (0x0049, 0x0301, 0x00CD), # I + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER I WITH ACUTE + (0x0049, 0x0302, 0x00CE), # I + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER I WITH CIRCUMFLEX + (0x0049, 0x0308, 0x00CF), # I + COMBINING DIAERESIS = LATIN CAPITAL LETTER I WITH DIAERESIS + (0x004E, 0x0303, 0x00D1), # N + COMBINING TILDE = LATIN CAPITAL LETTER N WITH TILDE + (0x004F, 0x0300, 0x00D2), # O + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER O WITH GRAVE + (0x004F, 0x0301, 0x00D3), # O + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER O WITH ACUTE + (0x004F, 0x0302, 0x00D4), # O + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER O WITH CIRCUMFLEX + (0x004F, 0x0303, 0x00D5), # O + COMBINING TILDE = LATIN CAPITAL LETTER O WITH TILDE + (0x004F, 0x0308, 0x00D6), # O + COMBINING DIAERESIS = LATIN CAPITAL LETTER O WITH DIAERESIS + (0x0055, 0x0300, 0x00D9), # U + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER U WITH GRAVE + (0x0055, 0x0301, 0x00DA), # U + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER U WITH ACUTE + (0x0055, 0x0302, 0x00DB), # U + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER U WITH CIRCUMFLEX + (0x0055, 0x0308, 0x00DC), # U + COMBINING DIAERESIS = LATIN CAPITAL LETTER U WITH DIAERESIS + (0x0059, 0x0301, 0x00DD), # Y + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER Y WITH ACUTE + + # Latin letters with accents - lowercase + (0x0061, 0x0300, 0x00E0), # a + COMBINING GRAVE ACCENT = LATIN SMALL LETTER A WITH GRAVE + (0x0061, 0x0301, 0x00E1), # a + COMBINING ACUTE ACCENT = LATIN SMALL LETTER A WITH ACUTE + (0x0061, 0x0302, 0x00E2), # a + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER A WITH CIRCUMFLEX + (0x0061, 0x0303, 0x00E3), # a + COMBINING TILDE = LATIN SMALL LETTER A WITH TILDE + (0x0061, 0x0308, 0x00E4), # a + COMBINING DIAERESIS = LATIN SMALL LETTER A WITH DIAERESIS + (0x0061, 0x030A, 0x00E5), # a + COMBINING RING ABOVE = LATIN SMALL LETTER A WITH RING ABOVE + (0x0063, 0x0327, 0x00E7), # c + COMBINING CEDILLA = LATIN SMALL LETTER C WITH CEDILLA + (0x0065, 0x0300, 0x00E8), # e + COMBINING GRAVE ACCENT = LATIN SMALL LETTER E WITH GRAVE + (0x0065, 0x0301, 0x00E9), # e + COMBINING ACUTE ACCENT = LATIN SMALL LETTER E WITH ACUTE + (0x0065, 0x0302, 0x00EA), # e + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER E WITH CIRCUMFLEX + (0x0065, 0x0308, 0x00EB), # e + COMBINING DIAERESIS = LATIN SMALL LETTER E WITH DIAERESIS + (0x0069, 0x0300, 0x00EC), # i + COMBINING GRAVE ACCENT = LATIN SMALL LETTER I WITH GRAVE + (0x0069, 0x0301, 0x00ED), # i + COMBINING ACUTE ACCENT = LATIN SMALL LETTER I WITH ACUTE + (0x0069, 0x0302, 0x00EE), # i + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER I WITH CIRCUMFLEX + (0x0069, 0x0308, 0x00EF), # i + COMBINING DIAERESIS = LATIN SMALL LETTER I WITH DIAERESIS + (0x006E, 0x0303, 0x00F1), # n + COMBINING TILDE = LATIN SMALL LETTER N WITH TILDE + (0x006F, 0x0300, 0x00F2), # o + COMBINING GRAVE ACCENT = LATIN SMALL LETTER O WITH GRAVE + (0x006F, 0x0301, 0x00F3), # o + COMBINING ACUTE ACCENT = LATIN SMALL LETTER O WITH ACUTE + (0x006F, 0x0302, 0x00F4), # o + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER O WITH CIRCUMFLEX + (0x006F, 0x0303, 0x00F5), # o + COMBINING TILDE = LATIN SMALL LETTER O WITH TILDE + (0x006F, 0x0308, 0x00F6), # o + COMBINING DIAERESIS = LATIN SMALL LETTER O WITH DIAERESIS + (0x0075, 0x0300, 0x00F9), # u + COMBINING GRAVE ACCENT = LATIN SMALL LETTER U WITH GRAVE + (0x0075, 0x0301, 0x00FA), # u + COMBINING ACUTE ACCENT = LATIN SMALL LETTER U WITH ACUTE + (0x0075, 0x0302, 0x00FB), # u + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER U WITH CIRCUMFLEX + (0x0075, 0x0308, 0x00FC), # u + COMBINING DIAERESIS = LATIN SMALL LETTER U WITH DIAERESIS + (0x0079, 0x0301, 0x00FD), # y + COMBINING ACUTE ACCENT = LATIN SMALL LETTER Y WITH ACUTE + (0x0079, 0x0308, 0x00FF), # y + COMBINING DIAERESIS = LATIN SMALL LETTER Y WITH DIAERESIS + + # Common Greek characters + (0x0391, 0x0301, 0x0386), # Α + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ALPHA WITH TONOS + (0x0395, 0x0301, 0x0388), # Ε + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER EPSILON WITH TONOS + (0x0397, 0x0301, 0x0389), # Η + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ETA WITH TONOS + (0x0399, 0x0301, 0x038A), # Ι + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER IOTA WITH TONOS + (0x039F, 0x0301, 0x038C), # Ο + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMICRON WITH TONOS + (0x03A5, 0x0301, 0x038E), # Υ + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER UPSILON WITH TONOS + (0x03A9, 0x0301, 0x038F), # Ω + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMEGA WITH TONOS + (0x03B1, 0x0301, 0x03AC), # α + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ALPHA WITH TONOS + (0x03B5, 0x0301, 0x03AD), # ε + COMBINING ACUTE ACCENT = GREEK SMALL LETTER EPSILON WITH TONOS + (0x03B7, 0x0301, 0x03AE), # η + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ETA WITH TONOS + (0x03B9, 0x0301, 0x03AF), # ι + COMBINING ACUTE ACCENT = GREEK SMALL LETTER IOTA WITH TONOS + (0x03BF, 0x0301, 0x03CC), # ο + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMICRON WITH TONOS + (0x03C5, 0x0301, 0x03CD), # υ + COMBINING ACUTE ACCENT = GREEK SMALL LETTER UPSILON WITH TONOS + (0x03C9, 0x0301, 0x03CE), # ω + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMEGA WITH TONOS + + # Common Cyrillic characters + (0x0418, 0x0306, 0x0419), # И + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT I + (0x0438, 0x0306, 0x0439), # и + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT I + (0x0423, 0x0306, 0x040E), # У + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT U + (0x0443, 0x0306, 0x045E), # у + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT U +] + +full_recompose_description = "all possible recomposition pairs from the Unicode BMP" +def collect_all_recomposition_pairs(): + """Collect all possible recomposition pairs from the Unicode data.""" + # Map to store recomposition pairs: (base, combining) -> recomposed + recompose_map = {} + + # Process all assigned Unicode code points in BMP (Basic Multilingual Plane) + # We limit to BMP (0x0000-0xFFFF) to keep our table smaller with uint16_t + for cp in range(0, 0x10000): + try: + char = chr(cp) + + # Skip unassigned or control characters + if not unicodedata.name(char, ''): + continue + + # Find decomposition + decomp = unicodedata.decomposition(char) + if not decomp or '<' in decomp: # Skip compatibility decompositions + continue + + # Parse the decomposition + parts = decomp.split() + if len(parts) == 2: # Simple base + combining mark + base = int(parts[0], 16) + combining = int(parts[1], 16) + + # Only store if both are in BMP + if base < 0x10000 and combining < 0x10000: + recompose_map[(base, combining)] = cp + + except (ValueError, TypeError): + continue + + # Convert to a list of tuples and sort for binary search + recompose_list = [(base, combining, recomposed) + for (base, combining), recomposed in recompose_map.items()] + recompose_list.sort() + + return recompose_list + +def validate_common_pairs(full_list): + """Validate that all common pairs are in the full list. + + Raises: + ValueError: If any common pair is missing or has a different recomposition + value than what's in the full table. + """ + full_pairs = {(base, combining): recomposed for base, combining, recomposed in full_list} + for base, combining, recomposed in COMMON_RECOMPOSITION_PAIRS: + full_recomposed = full_pairs.get((base, combining)) + if full_recomposed is None: + error_msg = f"Error: Common pair (0x{base:04X}, 0x{combining:04X}) not found in full data" + print(error_msg) + raise ValueError(error_msg) + elif full_recomposed != recomposed: + error_msg = (f"Error: Common pair (0x{base:04X}, 0x{combining:04X}) has different recomposition: " + f"0x{recomposed:04X} vs 0x{full_recomposed:04X}") + print(error_msg) + raise ValueError(error_msg) + +def generate_recomposition_table(use_full_list=False): + """Generate the recomposition table C code.""" + # Output file name + c_file = "ucs_recompose.c" + + # Get Unicode version information + unicode_version = unicodedata.unidata_version + + # Collect all recomposition pairs for validation + full_recompose_list = collect_all_recomposition_pairs() + + # Decide which list to use + if use_full_list: + print("Using full recomposition list...") + recompose_list = full_recompose_list + table_description = full_recompose_description + alt_list = COMMON_RECOMPOSITION_PAIRS + alt_description = common_recompose_description + else: + print("Using common recomposition list...") + # Validate that all common pairs are in the full list + validate_common_pairs(full_recompose_list) + recompose_list = sorted(COMMON_RECOMPOSITION_PAIRS) + table_description = common_recompose_description + alt_list = full_recompose_list + alt_description = full_recompose_description + generation_mode = " --full" if use_full_list else "" + alternative_mode = " --full" if not use_full_list else "" + table_description_detail = f"{table_description} ({len(recompose_list)} entries)" + alt_description_detail = f"{alt_description} ({len(alt_list)} entries)" + + # Calculate min/max values for boundary checks + min_base = min(base for base, _, _ in recompose_list) + max_base = max(base for base, _, _ in recompose_list) + min_combining = min(combining for _, combining, _ in recompose_list) + max_combining = max(combining for _, combining, _ in recompose_list) + + # Generate implementation file + with open(c_file, 'w') as f: + f.write(f"""\ +// SPDX-License-Identifier: GPL-2.0 +/* + * ucs_recompose.c - Unicode character recomposition + * + * Auto-generated by gen_ucs_recompose.py{generation_mode} + * + * Unicode Version: {unicode_version} + * +{textwrap.fill( + f"This file contains a table with {table_description_detail}. " + + f"To generate a table with {alt_description_detail} instead, run:", + width=75, initial_indent=" * ", subsequent_indent=" * ")} + * + * python gen_ucs_recompose.py{alternative_mode} + */ + +#include +#include +#include +#include + +/* + * Structure for recomposition pairs. + * First element is the base character, second is the combining mark, + * third is the recomposed character. + * Using uint16_t to save space since all values are within BMP range. + */ +struct recomposition {{ + uint16_t base; + uint16_t combining; + uint16_t recomposed; +}}; + +/* + * Table of {table_description} + * Sorted by base character and then combining character for binary search + */ +static const struct recomposition recomposition_table[] = {{ +""") + + # Write the recomposition table with comments + for base, combining, recomposed in recompose_list: + try: + base_name = unicodedata.name(chr(base)) + combining_name = unicodedata.name(chr(combining)) + recomposed_name = unicodedata.name(chr(recomposed)) + comment = f"/* {base_name} + {combining_name} = {recomposed_name} */" + except ValueError: + comment = f"/* U+{base:04X} + U+{combining:04X} = U+{recomposed:04X} */" + f.write(f"\t{{ 0x{base:04X}, 0x{combining:04X}, 0x{recomposed:04X} }}, {comment}\n") + + f.write(f"""\ +}}; + +/* + * Boundary values for quick rejection + * These are calculated by analyzing the table during generation + */ +#define MIN_BASE_CHAR 0x{min_base:04X} +#define MAX_BASE_CHAR 0x{max_base:04X} +#define MIN_COMBINING_CHAR 0x{min_combining:04X} +#define MAX_COMBINING_CHAR 0x{max_combining:04X} + +struct compare_key {{ + uint16_t base; + uint16_t combining; +}}; + +static int recomposition_compare(const void *key, const void *element) +{{ + const struct compare_key *search_key = key; + const struct recomposition *table_entry = element; + + /* Compare base character first */ + if (search_key->base < table_entry->base) + return -1; + if (search_key->base > table_entry->base) + return 1; + + /* Base characters match, now compare combining character */ + if (search_key->combining < table_entry->combining) + return -1; + if (search_key->combining > table_entry->combining) + return 1; + + /* Both match */ + return 0; +}} + +/** + * Attempt to recompose two Unicode characters into a single character. + * + * @param previous: Previous Unicode code point (UCS-4) + * @param current: Current Unicode code point (UCS-4) + * Return: Recomposed Unicode code point, or 0 if no recomposition is possible + */ +uint32_t ucs_recompose(uint32_t base, uint32_t combining) +{{ + /* Check if characters are within the range of our table */ + if (base < MIN_BASE_CHAR || base > MAX_BASE_CHAR || + combining < MIN_COMBINING_CHAR || combining > MAX_COMBINING_CHAR) + return 0; + + struct compare_key key = {{ base, combining }}; + + struct recomposition *result = + __inline_bsearch(&key, recomposition_table, + ARRAY_SIZE(recomposition_table), + sizeof(*recomposition_table), + recomposition_compare); + + return result ? result->recomposed : 0; +}} +""") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Generate Unicode recomposition table") + parser.add_argument("--full", action="store_true", + help="Generate a full recomposition table (default: common pairs only)") + args = parser.parse_args() + + generate_recomposition_table(use_full_list=args.full) From patchwork Thu Apr 10 01:13:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880088 Received: from fout-a6-smtp.messagingengine.com (fout-a6-smtp.messagingengine.com [103.168.172.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DA8E189906; Thu, 10 Apr 2025 01:19:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.149 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; cv=none; b=ZE5CQ+q2SKvjGA1IiTi+ulJN7aPmStOqw0Dj+w6pa4yxAOs556Kd/9xiASdCSGFLaeZTJ/iMB6st3ErRklBf/ZdZ3114Rc34vA+BN5huhp5/LWXkZtuoRNaDZTCrvkcgDQ1r6XruVz1ivoOOkigxLBptuOaqHJJF+Faf7u8+LNg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; c=relaxed/simple; bh=I74X3sNLnMcqf/fraokExTqQ2HZKrSMo4U9rtX1aImo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WdMMmiY3jDLYQZ8R0lpQycntNq8nn3pD5fVGjcBdLhg5rr85nDx+mZfGSfVdZyM4NGkeeDnD5FB5DTjD8auz199Sx1zhNK4B5yTuWgVdEjIQj/I5epa766vNtZnfByBQK6wJj7Ohv7M9A0QLEiTpMaILxaKIJ0x28fnU2XISafk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=GYTfy2k3; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=jBV5dJKW; arc=none smtp.client-ip=103.168.172.149 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="GYTfy2k3"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="jBV5dJKW" Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfout.phl.internal (Postfix) with ESMTP id 7049A13801EB; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-12.internal (MEProxy); Wed, 09 Apr 2025 21:19:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247943; x= 1744334343; bh=bvC8V9DFp+JhuTcgzAWmVeg0PQPFYt6N1mRS0QfFSWk=; b=G YTfy2k3r3UJO8ftEYdZ1ldp/jIGNLo+MdmgJnYZUEbGDG4UJi6V+fze44veVs98/ lvFKKF6SPHzlTVPbOHL7xlIWiPjHt5QwNTb5bc+ryQ1Q9MvM7aLkTVHNVKh3JNim SBhk4dLpxKKaajSqEZBtTTQTDXtcJfBPh/JzWTSKPAY8wfg7Nm0vn8pthSein0qH SB6Z5c1pt6Szx9FenEN1qxVVk7tbV82BeZVM9bLavfsfsWFTwxYDIUMTjr2RmXZf qy1t1Asz4sg9S09luLLS9opQge8sFH0PCBr1u5/Rk8eWVcGCHErDtgUN/ERqLKTu qM5RzChAqiesktEKA56sg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247943; x=1744334343; bh=b vC8V9DFp+JhuTcgzAWmVeg0PQPFYt6N1mRS0QfFSWk=; b=jBV5dJKWzdPxFZ4XY 44rg8cs5+nqdkjGp1U1udhya6AKPm5l1gGKs22RipQnk6NnnAnyixUbsHcDQPrsv IHhGN7ps5D5S7iRNShOKsN/QrHvqQxY/EXiqa5f5bkrdG17NuTooeIvkweOJodCb 5lovt9N5KEsht/9LBBljo5YrZxXfwxYhK32ixeRP3qFijUm+qLDSd/aSywZg2MvZ Wv2ApcD3MmOZsRym1pJgARfKJqljnFhkeQFsjCwuIQrK2uZIwEb5VXnbBM0I5bCX 0llfvIznYXVy/fdOtVPS0bxHboBGeSRPv5l9RtXJb5C4N0jWJ1lbw2dtuHFrtUWl EFaVA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 6C1C410D8B77; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 07/11] vt: create ucs_recompose.c using gen_ucs_recompose.py Date: Wed, 9 Apr 2025 21:13:59 -0400 Message-ID: <20250410011839.64418-8-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This provides ucs_recompose() to recompose two Unicode characters into a single character if possible. This is needed for the VT to properly display decomposed UTF8 sequences. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/Makefile | 2 +- drivers/tty/vt/ucs_recompose.c | 170 +++++++++++++++++++++++++++++++++ include/linux/consolemap.h | 6 ++ 3 files changed, 177 insertions(+), 1 deletion(-) create mode 100644 drivers/tty/vt/ucs_recompose.c diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile index bee69277bb..a63f6c9438 100644 --- a/drivers/tty/vt/Makefile +++ b/drivers/tty/vt/Makefile @@ -8,7 +8,7 @@ obj-$(CONFIG_VT) += vt_ioctl.o vc_screen.o \ selection.o keyboard.o \ vt.o defkeymap.o obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o \ - ucs_width.o + ucs_width.o ucs_recompose.o # Files generated that shall be removed upon make clean clean-files := consolemap_deftbl.c defkeymap.c diff --git a/drivers/tty/vt/ucs_recompose.c b/drivers/tty/vt/ucs_recompose.c new file mode 100644 index 0000000000..5c30c989de --- /dev/null +++ b/drivers/tty/vt/ucs_recompose.c @@ -0,0 +1,170 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ucs_recompose.c - Unicode character recomposition + * + * Auto-generated by gen_ucs_recompose.py + * + * Unicode Version: 16.0.0 + * + * This file contains a table with most commonly used Latin, Greek, and + * Cyrillic recomposition pairs only (71 entries). To generate a table with + * all possible recomposition pairs from the Unicode BMP (1000 entries) + * instead, run: + * + * python gen_ucs_recompose.py --full + */ + +#include +#include +#include +#include + +/* + * Structure for recomposition pairs. + * First element is the base character, second is the combining mark, + * third is the recomposed character. + * Using uint16_t to save space since all values are within BMP range. + */ +struct recomposition { + uint16_t base; + uint16_t combining; + uint16_t recomposed; +}; + +/* + * Table of most commonly used Latin, Greek, and Cyrillic recomposition pairs only + * Sorted by base character and then combining character for binary search + */ +static const struct recomposition recomposition_table[] = { + { 0x0041, 0x0300, 0x00C0 }, /* LATIN CAPITAL LETTER A + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER A WITH GRAVE */ + { 0x0041, 0x0301, 0x00C1 }, /* LATIN CAPITAL LETTER A + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER A WITH ACUTE */ + { 0x0041, 0x0302, 0x00C2 }, /* LATIN CAPITAL LETTER A + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER A WITH CIRCUMFLEX */ + { 0x0041, 0x0303, 0x00C3 }, /* LATIN CAPITAL LETTER A + COMBINING TILDE = LATIN CAPITAL LETTER A WITH TILDE */ + { 0x0041, 0x0308, 0x00C4 }, /* LATIN CAPITAL LETTER A + COMBINING DIAERESIS = LATIN CAPITAL LETTER A WITH DIAERESIS */ + { 0x0041, 0x030A, 0x00C5 }, /* LATIN CAPITAL LETTER A + COMBINING RING ABOVE = LATIN CAPITAL LETTER A WITH RING ABOVE */ + { 0x0043, 0x0327, 0x00C7 }, /* LATIN CAPITAL LETTER C + COMBINING CEDILLA = LATIN CAPITAL LETTER C WITH CEDILLA */ + { 0x0045, 0x0300, 0x00C8 }, /* LATIN CAPITAL LETTER E + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER E WITH GRAVE */ + { 0x0045, 0x0301, 0x00C9 }, /* LATIN CAPITAL LETTER E + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER E WITH ACUTE */ + { 0x0045, 0x0302, 0x00CA }, /* LATIN CAPITAL LETTER E + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER E WITH CIRCUMFLEX */ + { 0x0045, 0x0308, 0x00CB }, /* LATIN CAPITAL LETTER E + COMBINING DIAERESIS = LATIN CAPITAL LETTER E WITH DIAERESIS */ + { 0x0049, 0x0300, 0x00CC }, /* LATIN CAPITAL LETTER I + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER I WITH GRAVE */ + { 0x0049, 0x0301, 0x00CD }, /* LATIN CAPITAL LETTER I + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER I WITH ACUTE */ + { 0x0049, 0x0302, 0x00CE }, /* LATIN CAPITAL LETTER I + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER I WITH CIRCUMFLEX */ + { 0x0049, 0x0308, 0x00CF }, /* LATIN CAPITAL LETTER I + COMBINING DIAERESIS = LATIN CAPITAL LETTER I WITH DIAERESIS */ + { 0x004E, 0x0303, 0x00D1 }, /* LATIN CAPITAL LETTER N + COMBINING TILDE = LATIN CAPITAL LETTER N WITH TILDE */ + { 0x004F, 0x0300, 0x00D2 }, /* LATIN CAPITAL LETTER O + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER O WITH GRAVE */ + { 0x004F, 0x0301, 0x00D3 }, /* LATIN CAPITAL LETTER O + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER O WITH ACUTE */ + { 0x004F, 0x0302, 0x00D4 }, /* LATIN CAPITAL LETTER O + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER O WITH CIRCUMFLEX */ + { 0x004F, 0x0303, 0x00D5 }, /* LATIN CAPITAL LETTER O + COMBINING TILDE = LATIN CAPITAL LETTER O WITH TILDE */ + { 0x004F, 0x0308, 0x00D6 }, /* LATIN CAPITAL LETTER O + COMBINING DIAERESIS = LATIN CAPITAL LETTER O WITH DIAERESIS */ + { 0x0055, 0x0300, 0x00D9 }, /* LATIN CAPITAL LETTER U + COMBINING GRAVE ACCENT = LATIN CAPITAL LETTER U WITH GRAVE */ + { 0x0055, 0x0301, 0x00DA }, /* LATIN CAPITAL LETTER U + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER U WITH ACUTE */ + { 0x0055, 0x0302, 0x00DB }, /* LATIN CAPITAL LETTER U + COMBINING CIRCUMFLEX ACCENT = LATIN CAPITAL LETTER U WITH CIRCUMFLEX */ + { 0x0055, 0x0308, 0x00DC }, /* LATIN CAPITAL LETTER U + COMBINING DIAERESIS = LATIN CAPITAL LETTER U WITH DIAERESIS */ + { 0x0059, 0x0301, 0x00DD }, /* LATIN CAPITAL LETTER Y + COMBINING ACUTE ACCENT = LATIN CAPITAL LETTER Y WITH ACUTE */ + { 0x0061, 0x0300, 0x00E0 }, /* LATIN SMALL LETTER A + COMBINING GRAVE ACCENT = LATIN SMALL LETTER A WITH GRAVE */ + { 0x0061, 0x0301, 0x00E1 }, /* LATIN SMALL LETTER A + COMBINING ACUTE ACCENT = LATIN SMALL LETTER A WITH ACUTE */ + { 0x0061, 0x0302, 0x00E2 }, /* LATIN SMALL LETTER A + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER A WITH CIRCUMFLEX */ + { 0x0061, 0x0303, 0x00E3 }, /* LATIN SMALL LETTER A + COMBINING TILDE = LATIN SMALL LETTER A WITH TILDE */ + { 0x0061, 0x0308, 0x00E4 }, /* LATIN SMALL LETTER A + COMBINING DIAERESIS = LATIN SMALL LETTER A WITH DIAERESIS */ + { 0x0061, 0x030A, 0x00E5 }, /* LATIN SMALL LETTER A + COMBINING RING ABOVE = LATIN SMALL LETTER A WITH RING ABOVE */ + { 0x0063, 0x0327, 0x00E7 }, /* LATIN SMALL LETTER C + COMBINING CEDILLA = LATIN SMALL LETTER C WITH CEDILLA */ + { 0x0065, 0x0300, 0x00E8 }, /* LATIN SMALL LETTER E + COMBINING GRAVE ACCENT = LATIN SMALL LETTER E WITH GRAVE */ + { 0x0065, 0x0301, 0x00E9 }, /* LATIN SMALL LETTER E + COMBINING ACUTE ACCENT = LATIN SMALL LETTER E WITH ACUTE */ + { 0x0065, 0x0302, 0x00EA }, /* LATIN SMALL LETTER E + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER E WITH CIRCUMFLEX */ + { 0x0065, 0x0308, 0x00EB }, /* LATIN SMALL LETTER E + COMBINING DIAERESIS = LATIN SMALL LETTER E WITH DIAERESIS */ + { 0x0069, 0x0300, 0x00EC }, /* LATIN SMALL LETTER I + COMBINING GRAVE ACCENT = LATIN SMALL LETTER I WITH GRAVE */ + { 0x0069, 0x0301, 0x00ED }, /* LATIN SMALL LETTER I + COMBINING ACUTE ACCENT = LATIN SMALL LETTER I WITH ACUTE */ + { 0x0069, 0x0302, 0x00EE }, /* LATIN SMALL LETTER I + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER I WITH CIRCUMFLEX */ + { 0x0069, 0x0308, 0x00EF }, /* LATIN SMALL LETTER I + COMBINING DIAERESIS = LATIN SMALL LETTER I WITH DIAERESIS */ + { 0x006E, 0x0303, 0x00F1 }, /* LATIN SMALL LETTER N + COMBINING TILDE = LATIN SMALL LETTER N WITH TILDE */ + { 0x006F, 0x0300, 0x00F2 }, /* LATIN SMALL LETTER O + COMBINING GRAVE ACCENT = LATIN SMALL LETTER O WITH GRAVE */ + { 0x006F, 0x0301, 0x00F3 }, /* LATIN SMALL LETTER O + COMBINING ACUTE ACCENT = LATIN SMALL LETTER O WITH ACUTE */ + { 0x006F, 0x0302, 0x00F4 }, /* LATIN SMALL LETTER O + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER O WITH CIRCUMFLEX */ + { 0x006F, 0x0303, 0x00F5 }, /* LATIN SMALL LETTER O + COMBINING TILDE = LATIN SMALL LETTER O WITH TILDE */ + { 0x006F, 0x0308, 0x00F6 }, /* LATIN SMALL LETTER O + COMBINING DIAERESIS = LATIN SMALL LETTER O WITH DIAERESIS */ + { 0x0075, 0x0300, 0x00F9 }, /* LATIN SMALL LETTER U + COMBINING GRAVE ACCENT = LATIN SMALL LETTER U WITH GRAVE */ + { 0x0075, 0x0301, 0x00FA }, /* LATIN SMALL LETTER U + COMBINING ACUTE ACCENT = LATIN SMALL LETTER U WITH ACUTE */ + { 0x0075, 0x0302, 0x00FB }, /* LATIN SMALL LETTER U + COMBINING CIRCUMFLEX ACCENT = LATIN SMALL LETTER U WITH CIRCUMFLEX */ + { 0x0075, 0x0308, 0x00FC }, /* LATIN SMALL LETTER U + COMBINING DIAERESIS = LATIN SMALL LETTER U WITH DIAERESIS */ + { 0x0079, 0x0301, 0x00FD }, /* LATIN SMALL LETTER Y + COMBINING ACUTE ACCENT = LATIN SMALL LETTER Y WITH ACUTE */ + { 0x0079, 0x0308, 0x00FF }, /* LATIN SMALL LETTER Y + COMBINING DIAERESIS = LATIN SMALL LETTER Y WITH DIAERESIS */ + { 0x0391, 0x0301, 0x0386 }, /* GREEK CAPITAL LETTER ALPHA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ALPHA WITH TONOS */ + { 0x0395, 0x0301, 0x0388 }, /* GREEK CAPITAL LETTER EPSILON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER EPSILON WITH TONOS */ + { 0x0397, 0x0301, 0x0389 }, /* GREEK CAPITAL LETTER ETA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER ETA WITH TONOS */ + { 0x0399, 0x0301, 0x038A }, /* GREEK CAPITAL LETTER IOTA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER IOTA WITH TONOS */ + { 0x039F, 0x0301, 0x038C }, /* GREEK CAPITAL LETTER OMICRON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMICRON WITH TONOS */ + { 0x03A5, 0x0301, 0x038E }, /* GREEK CAPITAL LETTER UPSILON + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER UPSILON WITH TONOS */ + { 0x03A9, 0x0301, 0x038F }, /* GREEK CAPITAL LETTER OMEGA + COMBINING ACUTE ACCENT = GREEK CAPITAL LETTER OMEGA WITH TONOS */ + { 0x03B1, 0x0301, 0x03AC }, /* GREEK SMALL LETTER ALPHA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ALPHA WITH TONOS */ + { 0x03B5, 0x0301, 0x03AD }, /* GREEK SMALL LETTER EPSILON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER EPSILON WITH TONOS */ + { 0x03B7, 0x0301, 0x03AE }, /* GREEK SMALL LETTER ETA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER ETA WITH TONOS */ + { 0x03B9, 0x0301, 0x03AF }, /* GREEK SMALL LETTER IOTA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER IOTA WITH TONOS */ + { 0x03BF, 0x0301, 0x03CC }, /* GREEK SMALL LETTER OMICRON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMICRON WITH TONOS */ + { 0x03C5, 0x0301, 0x03CD }, /* GREEK SMALL LETTER UPSILON + COMBINING ACUTE ACCENT = GREEK SMALL LETTER UPSILON WITH TONOS */ + { 0x03C9, 0x0301, 0x03CE }, /* GREEK SMALL LETTER OMEGA + COMBINING ACUTE ACCENT = GREEK SMALL LETTER OMEGA WITH TONOS */ + { 0x0418, 0x0306, 0x0419 }, /* CYRILLIC CAPITAL LETTER I + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT I */ + { 0x0423, 0x0306, 0x040E }, /* CYRILLIC CAPITAL LETTER U + COMBINING BREVE = CYRILLIC CAPITAL LETTER SHORT U */ + { 0x0438, 0x0306, 0x0439 }, /* CYRILLIC SMALL LETTER I + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT I */ + { 0x0443, 0x0306, 0x045E }, /* CYRILLIC SMALL LETTER U + COMBINING BREVE = CYRILLIC SMALL LETTER SHORT U */ +}; + +/* + * Boundary values for quick rejection + * These are calculated by analyzing the table during generation + */ +#define MIN_BASE_CHAR 0x0041 +#define MAX_BASE_CHAR 0x0443 +#define MIN_COMBINING_CHAR 0x0300 +#define MAX_COMBINING_CHAR 0x0327 + +struct compare_key { + uint16_t base; + uint16_t combining; +}; + +static int recomposition_compare(const void *key, const void *element) +{ + const struct compare_key *search_key = key; + const struct recomposition *table_entry = element; + + /* Compare base character first */ + if (search_key->base < table_entry->base) + return -1; + if (search_key->base > table_entry->base) + return 1; + + /* Base characters match, now compare combining character */ + if (search_key->combining < table_entry->combining) + return -1; + if (search_key->combining > table_entry->combining) + return 1; + + /* Both match */ + return 0; +} + +/** + * Attempt to recompose two Unicode characters into a single character. + * + * @param previous: Previous Unicode code point (UCS-4) + * @param current: Current Unicode code point (UCS-4) + * Return: Recomposed Unicode code point, or 0 if no recomposition is possible + */ +uint32_t ucs_recompose(uint32_t base, uint32_t combining) +{ + /* Check if characters are within the range of our table */ + if (base < MIN_BASE_CHAR || base > MAX_BASE_CHAR || + combining < MIN_COMBINING_CHAR || combining > MAX_COMBINING_CHAR) + return 0; + + struct compare_key key = { base, combining }; + + struct recomposition *result = + __inline_bsearch(&key, recomposition_table, + ARRAY_SIZE(recomposition_table), + sizeof(*recomposition_table), + recomposition_compare); + + return result ? result->recomposed : 0; +} diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index b3a9118666..4d3a34c288 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -30,6 +30,7 @@ int conv_uni_to_8bit(u32 uni); void console_map_init(void); bool ucs_is_double_width(uint32_t cp); bool ucs_is_zero_width(uint32_t cp); +uint32_t ucs_recompose(uint32_t base, uint32_t combining); #else static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode) @@ -69,6 +70,11 @@ static inline bool ucs_is_zero_width(uint32_t cp) { return false; } + +static inline uint32_t ucs_recompose(uint32_t base, uint32_t combining) +{ + return 0; +} #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ From patchwork Thu Apr 10 01:14:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880087 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D34C185B72; Thu, 10 Apr 2025 01:19:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; cv=none; b=O8/U/49b92LHgFN92Pwm7V8vtXgaMSxGu84OzKCCkToLbcQ414SnG767Dy1VeddLXdpXWlGBErAuS+PNV8Fm9B0OTT5RRcHAi8Ya3bQkXBT+SoJUmWH3DZnHyFq/TPgbaqwquvolqsZXGBTfgNZvVsy6VUA5DRf8CeUQv6aMIjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247947; c=relaxed/simple; bh=7Fl01rmqtsQopHmhT4GKDbwNXFWXu9m4k9r0i5xZdNQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GqGRyR/vSmJdewj7s3emYl0iH6J0gVp12cMx9t8XF64PURwKwEfYjJoBaglO4UDF52xwxU4o8QOyUvbgylxVTiWWt2G24QUZ7YtkPMdJBLUxnoCkQppnkwQikslp9rNzBQsLeZ8+TBuUPVtmFd63aZ/Aw1mBkTeygUGk2ip40+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=UOm7qTHH; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=hczixEIm; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="UOm7qTHH"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="hczixEIm" Received: from phl-compute-09.internal (phl-compute-09.phl.internal [10.202.2.49]) by mailfhigh.phl.internal (Postfix) with ESMTP id 6D2731140293; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-09.internal (MEProxy); Wed, 09 Apr 2025 21:19:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247943; x= 1744334343; bh=kciBX3rNr/kvXLJfeX4cavlPoN4UItk15A/naDhaYic=; b=U Om7qTHHBWtgjBQ9Fc+UHcP0DIdWte3128f2MlWELtqN805gCUR18C94JFRECHMHg EUF0pO4432Fb3Ji7v3gFvkCcaRVIkP5fbZCxlGkzbjHuwkgyz7Kr2twiGMggRawk glkmpUIk7nvSFbNDgMmzQSuONpVH0bj717IIdLeJGE70+o0OE17Uhe2MZNZ6IarS WTkTGZiudSd69JsMsI4k4TmIuOnVF34t7CzEeJfBblOFssM4BGRKDzZJvx/hNoQI 8rGQ2JuvysWUDp86B/oD2s7EuN4UmLMiiifol0ZbMVZsv/VaaTsEtFP3ZMjrV0S/ NhtpAA2gSJDeN7VabQUOA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247943; x=1744334343; bh=k ciBX3rNr/kvXLJfeX4cavlPoN4UItk15A/naDhaYic=; b=hczixEImpv23Y9LqV t+MazNa/Iaq7NCvqMR1ky6HkoGz5IMbtp2fc11SoIw5B0ZBG622SBNpCZxyF6ou8 LGfcHOBawsHTFzdyvVYLX6jjq+1ur59JZ7i9OFU8+Wb2lN8FNK5NIhVsST1Q2hSl GTbpJfLzLkPcjTrPYd9jHv3OSuAslBZkVwFX4/FZouTO7yUhN4526w0L7slRfb2q /KxtFhQyp5twxurSzyc+VgdGWTZWgCYv/PmiWpKteMdmJBkgWfW7mQa1SKW42yrb 7y/bEeuSokWmt3+lMbI0v518BpD8cxXR2xCDzxgdA0BLgDuNkIp9FeVAlvYt8XJZ 4H8zg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 92C3E10D8B78; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 08/11] vt: support Unicode recomposition Date: Wed, 9 Apr 2025 21:14:00 -0400 Message-ID: <20250410011839.64418-9-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Try replacing any decomposed Unicode sequence by the corresponding recomposed code point. Code point to glyph correspondance works best after recomposition, and this apply mostly to single-width code points therefore we can't preserve them in their decomposed form anyway. With all the infrastructure in place this is now trivial to do. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/vt.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 5d53feeb5d..e3d35c4f92 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2953,8 +2953,15 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, * double-width. */ } else { - /* Otherwise zero-width code points are ignored */ - goto out; + /* try recomposition */ + prev_c = ucs_recompose(prev_c, c); + if (prev_c != 0) { + vc_con_rewind(vc); + c = prev_c; + } else { + /* Otherwise zero-width code points are ignored */ + goto out; + } } } } From patchwork Thu Apr 10 01:14:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880895 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6F3B189F5C; Thu, 10 Apr 2025 01:19:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247946; cv=none; b=Zqn6fQEzrKEfwUPjhZQ+dea8DwpnVVUsShPINAQ59NYjoSWWC+CIc5vX/AOY7wHB47m3XwFs5DGchj8hKFGtJYwni+Prbw3K4TzNdTZIwCnQXm2cZBIry6SHjYADof+mICqStvm8uOmNmrcbat3uYGwokOMqS3IaI/vxMPKVd8A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247946; c=relaxed/simple; bh=tmnfp8FQyes4UwNmiDICLenwz4JDdWtdfJogWU6cNxA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qBu8xl0Nbk5VWjGGO9QEAt7RXCyQtbAfz/amliW7lf4jqWkY9kPULcUvR9ulGGojyC4VTY8jmYBG+XDJPJAonrzTf5u4rO68g4mLZU/uR9i5XHWc3uNqf06Ap5ZTdZUcsKwoXY0yIpFY+N6TSBWGGgtanvycUefjpdcnypOWnzE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=CkULa7kb; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=bIC/uEqs; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="CkULa7kb"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="bIC/uEqs" Received: from phl-compute-09.internal (phl-compute-09.phl.internal [10.202.2.49]) by mailfhigh.phl.internal (Postfix) with ESMTP id B561A1140295; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-09.internal (MEProxy); Wed, 09 Apr 2025 21:19:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247943; x= 1744334343; bh=qnKT4y+VqSv7GIH2zX73h94FGzDTN8DZlsY8YrX30t8=; b=C kULa7kbDhheaVWNtZ8gHDiKJvFYW3iFzQRwIlIiaooTY6jm+ONf5pQ4FYhiDYzr8 /szP2YtrEsGdk/+n+tIDTMCldCvOi07/Sq7//As3qsnh0MDt+MFJaEA90FwFc+p6 6iekap76sisdqm4hMhU3OEyNdJx8IOziTXZuksTJi9F0TB6JUf1YZhbZlcnUYy5x LDEgGTCcZ4mA25XC8gkbSB/HcKE8MmzBe6EyUHqW4GHZtpMIoAvhWbm5FqfOYw+5 OjyHKrhp0yYqd0fCqQS/wAr8h5qG0MHK9mFd4nSjOlaYYd4Vv67gzs+m+/Nv+ZS5 h39fJAJ9+S6BTQ8B8flTw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247943; x=1744334343; bh=q nKT4y+VqSv7GIH2zX73h94FGzDTN8DZlsY8YrX30t8=; b=bIC/uEqs+IdbmJGLv Ld/StFzmcfootk9Yqe7r7l+t5Un4EOEjvWGO9613i3Fy6Ml+WMvGuBZ5mhWIRWNt R7sBB1hkKuIggI80WhAdOdQvUNUgiUVYMff1G52Tu4PF8tjmt+knZ9hJ4jXyKECY /Qyu0+GB73vkUiEa3IzxMkh1wOUtw7SVo1smyzxcYwdnpaDn/sEeyE7G8+DHMPEQ vMuBOU8002O68881k/vDQ670MM7XmF/F5Uqi0VfpH3d7WKE5/Y5z9zXWBmDoAFLA u8aP22Jr7P30bBnXRexaom3PQEunE2hzvRUtbeY0MbSzwfPLRsegdKEzpPiKAv7o RZaEQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id B802810D8B7C; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/11] vt: update gen_ucs_width.py to produce more space efficient tables Date: Wed, 9 Apr 2025 21:14:01 -0400 Message-ID: <20250410011839.64418-10-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Split table ranges into BMP (16-bit) and non-BMP (above 16-bit). This reduces the corresponding text size by 20-25%. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/gen_ucs_width.py | 154 +++++++++++++++++++++++--------- 1 file changed, 113 insertions(+), 41 deletions(-) diff --git a/drivers/tty/vt/gen_ucs_width.py b/drivers/tty/vt/gen_ucs_width.py index 41997fe001..c6cbc93e83 100755 --- a/drivers/tty/vt/gen_ucs_width.py +++ b/drivers/tty/vt/gen_ucs_width.py @@ -132,13 +132,49 @@ def generate_ucs_width(): ranges.append((start, prev)) return ranges + # Function to split ranges into BMP (16-bit) and non-BMP (above 16-bit) + def split_ranges_by_size(ranges): + bmp_ranges = [] + non_bmp_ranges = [] + + for start, end in ranges: + if end <= 0xFFFF: + bmp_ranges.append((start, end)) + elif start > 0xFFFF: + non_bmp_ranges.append((start, end)) + else: + # Split the range at 0xFFFF + bmp_ranges.append((start, 0xFFFF)) + non_bmp_ranges.append((0x10000, end)) + + return bmp_ranges, non_bmp_ranges + # Extract ranges for each width zero_width_ranges = ranges_optimize(width_map, 0) double_width_ranges = ranges_optimize(width_map, 2) + # Split ranges into BMP and non-BMP + zero_width_bmp, zero_width_non_bmp = split_ranges_by_size(zero_width_ranges) + double_width_bmp, double_width_non_bmp = split_ranges_by_size(double_width_ranges) + # Get Unicode version information unicode_version = unicodedata.unidata_version + # Function to generate code point description comments + def get_code_point_comment(start, end): + try: + start_char_desc = unicodedata.name(chr(start)) + if start == end: + return f"/* {start_char_desc} */" + else: + end_char_desc = unicodedata.name(chr(end)) + return f"/* {start_char_desc} - {end_char_desc} */" + except: + if start == end: + return f"/* U+{start:04X} */" + else: + return f"/* U+{start:04X} - U+{end:04X} */" + # Generate C implementation file with open(c_file, 'w') as f: f.write(f"""\ @@ -156,62 +192,77 @@ def generate_ucs_width(): #include #include -struct interval {{ +struct interval16 {{ + uint16_t first; + uint16_t last; +}}; + +struct interval32 {{ uint32_t first; uint32_t last; }}; -/* Zero-width character ranges */ -static const struct interval zero_width_ranges[] = {{ +/* Zero-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct interval16 zero_width_bmp[] = {{ """) - for start, end in zero_width_ranges: - try: - start_char_desc = unicodedata.name(chr(start)) if start < 0x10000 else f"U+{start:05X}" - if start == end: - comment = f"/* {start_char_desc} */" - else: - end_char_desc = unicodedata.name(chr(end)) if end < 0x10000 else f"U+{end:05X}" - comment = f"/* {start_char_desc} - {end_char_desc} */" - except: - if start == end: - comment = f"/* U+{start:05X} */" - else: - comment = f"/* U+{start:05X} - U+{end:05X} */" + for start, end in zero_width_bmp: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:04X}, 0x{end:04X} }}, {comment}\n") + + f.write("""\ +}; +/* Zero-width character ranges (non-BMP, U+10000 and above) */ +static const struct interval32 zero_width_non_bmp[] = { +""") + + for start, end in zero_width_non_bmp: + comment = get_code_point_comment(start, end) f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") f.write("""\ }; -/* Double-width character ranges */ -static const struct interval double_width_ranges[] = { +/* Double-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct interval16 double_width_bmp[] = { """) - for start, end in double_width_ranges: - try: - start_char_desc = unicodedata.name(chr(start)) if start < 0x10000 else f"U+{start:05X}" - if start == end: - comment = f"/* {start_char_desc} */" - else: - end_char_desc = unicodedata.name(chr(end)) if end < 0x10000 else f"U+{end:05X}" - comment = f"/* {start_char_desc} - {end_char_desc} */" - except: - if start == end: - comment = f"/* U+{start:05X} */" - else: - comment = f"/* U+{start:05X} - U+{end:05X} */" + for start, end in double_width_bmp: + comment = get_code_point_comment(start, end) + f.write(f"\t{{ 0x{start:04X}, 0x{end:04X} }}, {comment}\n") + + f.write("""\ +}; +/* Double-width character ranges (non-BMP, U+10000 and above) */ +static const struct interval32 double_width_non_bmp[] = { +""") + + for start, end in double_width_non_bmp: + comment = get_code_point_comment(start, end) f.write(f"\t{{ 0x{start:05X}, 0x{end:05X} }}, {comment}\n") f.write("""\ }; -static int ucs_cmp(const void *key, const void *element) +static int ucs_cmp16(const void *key, const void *element) +{ + uint16_t cp = *(uint16_t *)key; + const struct interval16 *e = element; + + if (cp > e->last) + return 1; + if (cp < e->first) + return -1; + return 0; +} + +static int ucs_cmp32(const void *key, const void *element) { uint32_t cp = *(uint32_t *)key; - const struct interval *e = element; + const struct interval32 *e = element; if (cp > e->last) return 1; @@ -220,13 +271,22 @@ static int ucs_cmp(const void *key, const void *element) return 0; } -static bool is_in_interval(uint32_t cp, const struct interval *intervals, size_t count) +static bool is_in_interval16(uint16_t cp, const struct interval16 *intervals, size_t count) { if (cp < intervals[0].first || cp > intervals[count - 1].last) return false; return __inline_bsearch(&cp, intervals, count, - sizeof(*intervals), ucs_cmp) != NULL; + sizeof(*intervals), ucs_cmp16) != NULL; +} + +static bool is_in_interval32(uint32_t cp, const struct interval32 *intervals, size_t count) +{ + if (cp < intervals[0].first || cp > intervals[count - 1].last) + return false; + + return __inline_bsearch(&cp, intervals, count, + sizeof(*intervals), ucs_cmp32) != NULL; } /** @@ -237,7 +297,9 @@ static bool is_in_interval(uint32_t cp, const struct interval *intervals, size_t */ bool ucs_is_zero_width(uint32_t cp) { - return is_in_interval(cp, zero_width_ranges, ARRAY_SIZE(zero_width_ranges)); + return (cp <= 0xFFFF) + ? is_in_interval16(cp, zero_width_bmp, ARRAY_SIZE(zero_width_bmp)) + : is_in_interval32(cp, zero_width_non_bmp, ARRAY_SIZE(zero_width_non_bmp)); } /** @@ -248,17 +310,27 @@ bool ucs_is_zero_width(uint32_t cp) */ bool ucs_is_double_width(uint32_t cp) { - return is_in_interval(cp, double_width_ranges, ARRAY_SIZE(double_width_ranges)); + return (cp <= 0xFFFF) + ? is_in_interval16(cp, double_width_bmp, ARRAY_SIZE(double_width_bmp)) + : is_in_interval32(cp, double_width_non_bmp, ARRAY_SIZE(double_width_non_bmp)); } """) # Print summary - zero_width_count = sum(end - start + 1 for start, end in zero_width_ranges) - double_width_count = sum(end - start + 1 for start, end in double_width_ranges) + zero_width_bmp_count = sum(end - start + 1 for start, end in zero_width_bmp) + zero_width_non_bmp_count = sum(end - start + 1 for start, end in zero_width_non_bmp) + double_width_bmp_count = sum(end - start + 1 for start, end in double_width_bmp) + double_width_non_bmp_count = sum(end - start + 1 for start, end in double_width_non_bmp) + + total_zero_width = zero_width_bmp_count + zero_width_non_bmp_count + total_double_width = double_width_bmp_count + double_width_non_bmp_count print(f"Generated {c_file} with:") - print(f"- {len(zero_width_ranges)} zero-width ranges covering ~{zero_width_count} code points") - print(f"- {len(double_width_ranges)} double-width ranges covering ~{double_width_count} code points") + print(f"- {len(zero_width_bmp)} zero-width BMP ranges (16-bit) covering ~{zero_width_bmp_count} code points") + print(f"- {len(zero_width_non_bmp)} zero-width non-BMP ranges (32-bit) covering ~{zero_width_non_bmp_count} code points") + print(f"- {len(double_width_bmp)} double-width BMP ranges (16-bit) covering ~{double_width_bmp_count} code points") + print(f"- {len(double_width_non_bmp)} double-width non-BMP ranges (32-bit) covering ~{double_width_non_bmp_count} code points") + print(f"Total: {len(zero_width_bmp) + len(zero_width_non_bmp) + len(double_width_bmp) + len(double_width_non_bmp)} ranges covering ~{total_zero_width + total_double_width} code points") if __name__ == "__main__": generate_ucs_width() From patchwork Thu Apr 10 01:14:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880892 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EA9419D8A7; Thu, 10 Apr 2025 01:19:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247950; cv=none; b=Qgrrn6NR4B0qxv6douta6cs3c5aAwUldquIASIzEZDUt39Wu5DMU7DzyniBTDk+dUF9lthB6M/Rc2/P+PR07j8D6CdKDkun8cqehol9cjJo87Z1gBJabRx7C12NmevFe+617hY4pDu5pDL5WKQGLwyXKU1hGO3XckFtkMQTqQfs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247950; c=relaxed/simple; bh=tqQrMDmgbqOon0XNswuTDTGICUn0Vs+L9jCqfvQi5yc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IievEuFISUoAfEyiAJNY6gY9y5hr5yo9vQXJ2p6LH2mdBqs4tAkAABLo9wfel3ypd6CSZY6+jFI8Fe8j2Np8SZVMajF4g9G6z8H2BsVMAnHweFfIeDuEzCEmzHzuIizrgpQjDa+j0dpScCwS+snVxxVuX/90gCRXw/2irV0htCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=ZbRStC1m; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=QZ7ZM1vE; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="ZbRStC1m"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="QZ7ZM1vE" Received: from phl-compute-03.internal (phl-compute-03.phl.internal [10.202.2.43]) by mailfhigh.phl.internal (Postfix) with ESMTP id 032291140298; Wed, 9 Apr 2025 21:19:04 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-03.internal (MEProxy); Wed, 09 Apr 2025 21:19:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247943; x= 1744334343; bh=n4c7j4f2Z1SdFUz5Xh16vGKND0JIGZsXtT6hNXJIlxw=; b=Z bRStC1mwm8eJOi0OHsz5YucWKOFt/ZI2XH0KWClUBUxA9keVk/LazxDVfnA1yfTm zT1dqtKktK2SrC2R4rsnamgWFTkRFjTZ2GsEpMKdHFCtgJEumtWh/OojuZkTTA7q O5Xn6nlCTLs+JDCNpDy0I/TVsb25cwxu5tHZsf0Q2nrlMIufUd7uF0iMWSF8Mjv4 Mxjsfl4/yRvGo8zeR2W8ZIiuXYHxlXKdtduS8bfKvGIzlNleOurX9B97/+MQ73Po s/eU7wtUaE4yaUfIc5baecpzH0Iihy2JtwweqtKbZpvm48P83qtQgIaOCBaf+UX+ tTqQs/hoRbHLSr/U4Zwkg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247943; x=1744334343; bh=n 4c7j4f2Z1SdFUz5Xh16vGKND0JIGZsXtT6hNXJIlxw=; b=QZ7ZM1vEUrZ/yvCys x+4XD7aapBhfjlf8MlcC8SFqoEj2YjsaSs0qK6u0GcLjlldPue3C/KzJO564ytml x3h+RRXTGLywHUYd3UXlt+szFD68nJtcTCDZ7VHplGpddC6GEhZPGW3tUjweOkUK gtbYLjUkRyap9RhfpjhYAHCmPaeDIoO+u37sLxysiJKoFSBR21MFCMvxy6Jo0WAA WVw1+mTfpmN0CiDwN/KXf13cTBRTWCBZY/HrV/sbd0s+ziRrkJeGotv38L5ROkkJ Slg16P0ut0jVqcvxaI4QuxTRV1xQoW1yRiqWG6+yOdB06NdoPeE+Dl1t/eidRoHa gDN7w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id E587D10D8B7D; Wed, 9 Apr 2025 21:19:02 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 10/11] vt: update ucs_width.c following latest gen_ucs_width.py Date: Wed, 9 Apr 2025 21:14:02 -0400 Message-ID: <20250410011839.64418-11-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre Split table ranges into BMP (16-bit) and non-BMP (above 16-bit). This reduces the corresponding text size by 20-25%. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/ucs_width.c | 902 +++++++++++++++++++------------------ 1 file changed, 470 insertions(+), 432 deletions(-) diff --git a/drivers/tty/vt/ucs_width.c b/drivers/tty/vt/ucs_width.c index 47b22583bd..060aa8ae7f 100644 --- a/drivers/tty/vt/ucs_width.c +++ b/drivers/tty/vt/ucs_width.c @@ -12,452 +12,477 @@ #include #include -struct interval { +struct interval16 { + uint16_t first; + uint16_t last; +}; + +struct interval32 { uint32_t first; uint32_t last; }; -/* Zero-width character ranges */ -static const struct interval zero_width_ranges[] = { - { 0x000AD, 0x000AD }, /* SOFT HYPHEN */ - { 0x00300, 0x0036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ - { 0x00483, 0x00489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ - { 0x00591, 0x005BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ - { 0x005BF, 0x005BF }, /* HEBREW POINT RAFE */ - { 0x005C1, 0x005C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ - { 0x005C4, 0x005C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ - { 0x005C7, 0x005C7 }, /* HEBREW POINT QAMATS QATAN */ - { 0x00600, 0x00605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ - { 0x00610, 0x0061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ - { 0x0064B, 0x0065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ - { 0x00670, 0x00670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ - { 0x006D6, 0x006DC }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC SMALL HIGH SEEN */ - { 0x006DF, 0x006E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ - { 0x006E7, 0x006E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ - { 0x006EA, 0x006ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ - { 0x00711, 0x00711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ - { 0x00730, 0x0074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ - { 0x007A6, 0x007B0 }, /* THAANA ABAFILI - THAANA SUKUN */ - { 0x007EB, 0x007F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ - { 0x007FD, 0x007FD }, /* NKO DANTAYALAN */ - { 0x00816, 0x00819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ - { 0x0081B, 0x00823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ - { 0x00825, 0x00827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ - { 0x00829, 0x0082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ - { 0x00859, 0x0085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ - { 0x00890, 0x00891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ - { 0x00897, 0x0089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ - { 0x008CA, 0x00903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ - { 0x0093A, 0x0093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ - { 0x0093E, 0x0094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ - { 0x00951, 0x00957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ - { 0x00962, 0x00963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ - { 0x00981, 0x00983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ - { 0x009BC, 0x009BC }, /* BENGALI SIGN NUKTA */ - { 0x009BE, 0x009C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ - { 0x009C7, 0x009C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ - { 0x009CB, 0x009CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ - { 0x009D7, 0x009D7 }, /* BENGALI AU LENGTH MARK */ - { 0x009E2, 0x009E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ - { 0x009FE, 0x009FE }, /* BENGALI SANDHI MARK */ - { 0x00A01, 0x00A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ - { 0x00A3C, 0x00A3C }, /* GURMUKHI SIGN NUKTA */ - { 0x00A3E, 0x00A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ - { 0x00A47, 0x00A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ - { 0x00A4B, 0x00A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ - { 0x00A51, 0x00A51 }, /* GURMUKHI SIGN UDAAT */ - { 0x00A70, 0x00A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ - { 0x00A75, 0x00A75 }, /* GURMUKHI SIGN YAKASH */ - { 0x00A81, 0x00A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ - { 0x00ABC, 0x00ABC }, /* GUJARATI SIGN NUKTA */ - { 0x00ABE, 0x00AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ - { 0x00AC7, 0x00AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ - { 0x00ACB, 0x00ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ - { 0x00AE2, 0x00AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ - { 0x00AFA, 0x00AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ - { 0x00B01, 0x00B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ - { 0x00B3C, 0x00B3C }, /* ORIYA SIGN NUKTA */ - { 0x00B3E, 0x00B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ - { 0x00B47, 0x00B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ - { 0x00B4B, 0x00B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ - { 0x00B55, 0x00B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ - { 0x00B62, 0x00B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ - { 0x00B82, 0x00B82 }, /* TAMIL SIGN ANUSVARA */ - { 0x00BBE, 0x00BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ - { 0x00BC6, 0x00BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ - { 0x00BCA, 0x00BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ - { 0x00BD7, 0x00BD7 }, /* TAMIL AU LENGTH MARK */ - { 0x00C00, 0x00C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ - { 0x00C3C, 0x00C3C }, /* TELUGU SIGN NUKTA */ - { 0x00C3E, 0x00C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ - { 0x00C46, 0x00C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ - { 0x00C4A, 0x00C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ - { 0x00C55, 0x00C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ - { 0x00C62, 0x00C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ - { 0x00C81, 0x00C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ - { 0x00CBC, 0x00CBC }, /* KANNADA SIGN NUKTA */ - { 0x00CBE, 0x00CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ - { 0x00CC6, 0x00CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ - { 0x00CCA, 0x00CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ - { 0x00CD5, 0x00CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ - { 0x00CE2, 0x00CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ - { 0x00CF3, 0x00CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ - { 0x00D00, 0x00D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ - { 0x00D3B, 0x00D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ - { 0x00D3E, 0x00D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ - { 0x00D46, 0x00D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ - { 0x00D4A, 0x00D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ - { 0x00D57, 0x00D57 }, /* MALAYALAM AU LENGTH MARK */ - { 0x00D62, 0x00D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ - { 0x00D81, 0x00D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ - { 0x00DCA, 0x00DCA }, /* SINHALA SIGN AL-LAKUNA */ - { 0x00DCF, 0x00DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ - { 0x00DD6, 0x00DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ - { 0x00DD8, 0x00DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ - { 0x00DF2, 0x00DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ - { 0x00E31, 0x00E31 }, /* THAI CHARACTER MAI HAN-AKAT */ - { 0x00E34, 0x00E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ - { 0x00E47, 0x00E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ - { 0x00EB1, 0x00EB1 }, /* LAO VOWEL SIGN MAI KAN */ - { 0x00EB4, 0x00EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ - { 0x00EC8, 0x00ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ - { 0x00F18, 0x00F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ - { 0x00F35, 0x00F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ - { 0x00F37, 0x00F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ - { 0x00F39, 0x00F39 }, /* TIBETAN MARK TSA -PHRU */ - { 0x00F3E, 0x00F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ - { 0x00F71, 0x00F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ - { 0x00F86, 0x00F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ - { 0x00F8D, 0x00F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ - { 0x00F99, 0x00FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ - { 0x00FC6, 0x00FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ - { 0x0102B, 0x0103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ - { 0x01056, 0x01059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ - { 0x0105E, 0x01060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ - { 0x01062, 0x01064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ - { 0x01067, 0x0106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ - { 0x01071, 0x01074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ - { 0x01082, 0x0108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ - { 0x0108F, 0x0108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ - { 0x0109A, 0x0109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ - { 0x0135D, 0x0135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ - { 0x01712, 0x01715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ - { 0x01732, 0x01734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ - { 0x01752, 0x01753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ - { 0x01772, 0x01773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ - { 0x017B4, 0x017D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ - { 0x017DD, 0x017DD }, /* KHMER SIGN ATTHACAN */ - { 0x0180B, 0x0180D }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR THREE */ - { 0x0180F, 0x0180F }, /* MONGOLIAN FREE VARIATION SELECTOR FOUR */ - { 0x01885, 0x01886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ - { 0x018A9, 0x018A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ - { 0x01920, 0x0192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ - { 0x01930, 0x0193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ - { 0x01A17, 0x01A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ - { 0x01A55, 0x01A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ - { 0x01A60, 0x01A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ - { 0x01A7F, 0x01A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ - { 0x01AB0, 0x01ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ - { 0x01B00, 0x01B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ - { 0x01B34, 0x01B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ - { 0x01B6B, 0x01B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ - { 0x01B80, 0x01B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ - { 0x01BA1, 0x01BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ - { 0x01BE6, 0x01BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ - { 0x01C24, 0x01C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ - { 0x01CD0, 0x01CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ - { 0x01CD4, 0x01CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ - { 0x01CED, 0x01CED }, /* VEDIC SIGN TIRYAK */ - { 0x01CF4, 0x01CF4 }, /* VEDIC TONE CANDRA ABOVE */ - { 0x01CF7, 0x01CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ - { 0x01DC0, 0x01DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ - { 0x0200B, 0x0200E }, /* ZERO WIDTH SPACE - LEFT-TO-RIGHT MARK */ - { 0x0202A, 0x0202D }, /* LEFT-TO-RIGHT EMBEDDING - LEFT-TO-RIGHT OVERRIDE */ - { 0x02060, 0x02064 }, /* WORD JOINER - INVISIBLE PLUS */ - { 0x0206A, 0x0206F }, /* INHIBIT SYMMETRIC SWAPPING - NOMINAL DIGIT SHAPES */ - { 0x020D0, 0x020F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ - { 0x02640, 0x02640 }, /* FEMALE SIGN */ - { 0x02642, 0x02642 }, /* MALE SIGN */ - { 0x026A7, 0x026A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ - { 0x02CEF, 0x02CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ - { 0x02D7F, 0x02D7F }, /* TIFINAGH CONSONANT JOINER */ - { 0x02DE0, 0x02DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ - { 0x0302A, 0x0302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ - { 0x03099, 0x0309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ - { 0x0A66F, 0x0A672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ - { 0x0A674, 0x0A67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ - { 0x0A69E, 0x0A69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ - { 0x0A6F0, 0x0A6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ - { 0x0A802, 0x0A802 }, /* SYLOTI NAGRI SIGN DVISVARA */ - { 0x0A806, 0x0A806 }, /* SYLOTI NAGRI SIGN HASANTA */ - { 0x0A80B, 0x0A80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ - { 0x0A823, 0x0A827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ - { 0x0A82C, 0x0A82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ - { 0x0A880, 0x0A881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ - { 0x0A8B4, 0x0A8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ - { 0x0A8E0, 0x0A8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ - { 0x0A8FF, 0x0A8FF }, /* DEVANAGARI VOWEL SIGN AY */ - { 0x0A926, 0x0A92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ - { 0x0A947, 0x0A953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ - { 0x0A980, 0x0A983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ - { 0x0A9B3, 0x0A9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ - { 0x0A9E5, 0x0A9E5 }, /* MYANMAR SIGN SHAN SAW */ - { 0x0AA29, 0x0AA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ - { 0x0AA43, 0x0AA43 }, /* CHAM CONSONANT SIGN FINAL NG */ - { 0x0AA4C, 0x0AA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ - { 0x0AA7B, 0x0AA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ - { 0x0AAB0, 0x0AAB0 }, /* TAI VIET MAI KANG */ - { 0x0AAB2, 0x0AAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ - { 0x0AAB7, 0x0AAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ - { 0x0AABE, 0x0AABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ - { 0x0AAC1, 0x0AAC1 }, /* TAI VIET TONE MAI THO */ - { 0x0AAEB, 0x0AAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ - { 0x0AAF5, 0x0AAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ - { 0x0ABE3, 0x0ABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ - { 0x0ABEC, 0x0ABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ - { 0x0FB1E, 0x0FB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ - { 0x0FE00, 0x0FE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ - { 0x0FE20, 0x0FE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ - { 0x0FEFF, 0x0FEFF }, /* ZERO WIDTH NO-BREAK SPACE */ - { 0x0FFF9, 0x0FFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ - { 0x101FD, 0x101FD }, /* U+101FD */ - { 0x102E0, 0x102E0 }, /* U+102E0 */ - { 0x10376, 0x1037A }, /* U+10376 - U+1037A */ - { 0x10A01, 0x10A03 }, /* U+10A01 - U+10A03 */ - { 0x10A05, 0x10A06 }, /* U+10A05 - U+10A06 */ - { 0x10A0C, 0x10A0F }, /* U+10A0C - U+10A0F */ - { 0x10A38, 0x10A3A }, /* U+10A38 - U+10A3A */ - { 0x10A3F, 0x10A3F }, /* U+10A3F */ - { 0x10AE5, 0x10AE6 }, /* U+10AE5 - U+10AE6 */ - { 0x10D24, 0x10D27 }, /* U+10D24 - U+10D27 */ - { 0x10D69, 0x10D6D }, /* U+10D69 - U+10D6D */ - { 0x10EAB, 0x10EAC }, /* U+10EAB - U+10EAC */ - { 0x10EFC, 0x10EFF }, /* U+10EFC - U+10EFF */ - { 0x10F46, 0x10F50 }, /* U+10F46 - U+10F50 */ - { 0x10F82, 0x10F85 }, /* U+10F82 - U+10F85 */ - { 0x11000, 0x11002 }, /* U+11000 - U+11002 */ - { 0x11038, 0x11046 }, /* U+11038 - U+11046 */ - { 0x11070, 0x11070 }, /* U+11070 */ - { 0x11073, 0x11074 }, /* U+11073 - U+11074 */ - { 0x1107F, 0x11082 }, /* U+1107F - U+11082 */ - { 0x110B0, 0x110BA }, /* U+110B0 - U+110BA */ - { 0x110BD, 0x110BD }, /* U+110BD */ - { 0x110C2, 0x110C2 }, /* U+110C2 */ - { 0x110CD, 0x110CD }, /* U+110CD */ - { 0x11100, 0x11102 }, /* U+11100 - U+11102 */ - { 0x11127, 0x11134 }, /* U+11127 - U+11134 */ - { 0x11145, 0x11146 }, /* U+11145 - U+11146 */ - { 0x11173, 0x11173 }, /* U+11173 */ - { 0x11180, 0x11182 }, /* U+11180 - U+11182 */ - { 0x111B3, 0x111C0 }, /* U+111B3 - U+111C0 */ - { 0x111C9, 0x111CC }, /* U+111C9 - U+111CC */ - { 0x111CE, 0x111CF }, /* U+111CE - U+111CF */ - { 0x1122C, 0x11237 }, /* U+1122C - U+11237 */ - { 0x1123E, 0x1123E }, /* U+1123E */ - { 0x11241, 0x11241 }, /* U+11241 */ - { 0x112DF, 0x112EA }, /* U+112DF - U+112EA */ - { 0x11300, 0x11303 }, /* U+11300 - U+11303 */ - { 0x1133B, 0x1133C }, /* U+1133B - U+1133C */ - { 0x1133E, 0x11344 }, /* U+1133E - U+11344 */ - { 0x11347, 0x11348 }, /* U+11347 - U+11348 */ - { 0x1134B, 0x1134D }, /* U+1134B - U+1134D */ - { 0x11357, 0x11357 }, /* U+11357 */ - { 0x11362, 0x11363 }, /* U+11362 - U+11363 */ - { 0x11366, 0x1136C }, /* U+11366 - U+1136C */ - { 0x11370, 0x11374 }, /* U+11370 - U+11374 */ - { 0x113B8, 0x113C0 }, /* U+113B8 - U+113C0 */ - { 0x113C2, 0x113C2 }, /* U+113C2 */ - { 0x113C5, 0x113C5 }, /* U+113C5 */ - { 0x113C7, 0x113CA }, /* U+113C7 - U+113CA */ - { 0x113CC, 0x113D0 }, /* U+113CC - U+113D0 */ - { 0x113D2, 0x113D2 }, /* U+113D2 */ - { 0x113E1, 0x113E2 }, /* U+113E1 - U+113E2 */ - { 0x11435, 0x11446 }, /* U+11435 - U+11446 */ - { 0x1145E, 0x1145E }, /* U+1145E */ - { 0x114B0, 0x114C3 }, /* U+114B0 - U+114C3 */ - { 0x115AF, 0x115B5 }, /* U+115AF - U+115B5 */ - { 0x115B8, 0x115C0 }, /* U+115B8 - U+115C0 */ - { 0x115DC, 0x115DD }, /* U+115DC - U+115DD */ - { 0x11630, 0x11640 }, /* U+11630 - U+11640 */ - { 0x116AB, 0x116B7 }, /* U+116AB - U+116B7 */ - { 0x1171D, 0x1172B }, /* U+1171D - U+1172B */ - { 0x1182C, 0x1183A }, /* U+1182C - U+1183A */ - { 0x11930, 0x11935 }, /* U+11930 - U+11935 */ - { 0x11937, 0x11938 }, /* U+11937 - U+11938 */ - { 0x1193B, 0x1193E }, /* U+1193B - U+1193E */ - { 0x11940, 0x11940 }, /* U+11940 */ - { 0x11942, 0x11943 }, /* U+11942 - U+11943 */ - { 0x119D1, 0x119D7 }, /* U+119D1 - U+119D7 */ - { 0x119DA, 0x119E0 }, /* U+119DA - U+119E0 */ - { 0x119E4, 0x119E4 }, /* U+119E4 */ - { 0x11A01, 0x11A0A }, /* U+11A01 - U+11A0A */ - { 0x11A33, 0x11A39 }, /* U+11A33 - U+11A39 */ - { 0x11A3B, 0x11A3E }, /* U+11A3B - U+11A3E */ - { 0x11A47, 0x11A47 }, /* U+11A47 */ - { 0x11A51, 0x11A5B }, /* U+11A51 - U+11A5B */ - { 0x11A8A, 0x11A99 }, /* U+11A8A - U+11A99 */ - { 0x11C2F, 0x11C36 }, /* U+11C2F - U+11C36 */ - { 0x11C38, 0x11C3F }, /* U+11C38 - U+11C3F */ - { 0x11C92, 0x11CA7 }, /* U+11C92 - U+11CA7 */ - { 0x11CA9, 0x11CB6 }, /* U+11CA9 - U+11CB6 */ - { 0x11D31, 0x11D36 }, /* U+11D31 - U+11D36 */ - { 0x11D3A, 0x11D3A }, /* U+11D3A */ - { 0x11D3C, 0x11D3D }, /* U+11D3C - U+11D3D */ - { 0x11D3F, 0x11D45 }, /* U+11D3F - U+11D45 */ - { 0x11D47, 0x11D47 }, /* U+11D47 */ - { 0x11D8A, 0x11D8E }, /* U+11D8A - U+11D8E */ - { 0x11D90, 0x11D91 }, /* U+11D90 - U+11D91 */ - { 0x11D93, 0x11D97 }, /* U+11D93 - U+11D97 */ - { 0x11EF3, 0x11EF6 }, /* U+11EF3 - U+11EF6 */ - { 0x11F00, 0x11F01 }, /* U+11F00 - U+11F01 */ - { 0x11F03, 0x11F03 }, /* U+11F03 */ - { 0x11F34, 0x11F3A }, /* U+11F34 - U+11F3A */ - { 0x11F3E, 0x11F42 }, /* U+11F3E - U+11F42 */ - { 0x11F5A, 0x11F5A }, /* U+11F5A */ - { 0x13430, 0x13440 }, /* U+13430 - U+13440 */ - { 0x13447, 0x13455 }, /* U+13447 - U+13455 */ - { 0x1611E, 0x1612F }, /* U+1611E - U+1612F */ - { 0x16AF0, 0x16AF4 }, /* U+16AF0 - U+16AF4 */ - { 0x16B30, 0x16B36 }, /* U+16B30 - U+16B36 */ - { 0x16F4F, 0x16F4F }, /* U+16F4F */ - { 0x16F51, 0x16F87 }, /* U+16F51 - U+16F87 */ - { 0x16F8F, 0x16F92 }, /* U+16F8F - U+16F92 */ - { 0x16FE4, 0x16FE4 }, /* U+16FE4 */ - { 0x16FF0, 0x16FF1 }, /* U+16FF0 - U+16FF1 */ - { 0x1BC9D, 0x1BC9E }, /* U+1BC9D - U+1BC9E */ - { 0x1BCA0, 0x1BCA3 }, /* U+1BCA0 - U+1BCA3 */ - { 0x1CF00, 0x1CF2D }, /* U+1CF00 - U+1CF2D */ - { 0x1CF30, 0x1CF46 }, /* U+1CF30 - U+1CF46 */ - { 0x1D165, 0x1D169 }, /* U+1D165 - U+1D169 */ - { 0x1D16D, 0x1D182 }, /* U+1D16D - U+1D182 */ - { 0x1D185, 0x1D18B }, /* U+1D185 - U+1D18B */ - { 0x1D1AA, 0x1D1AD }, /* U+1D1AA - U+1D1AD */ - { 0x1D242, 0x1D244 }, /* U+1D242 - U+1D244 */ - { 0x1DA00, 0x1DA36 }, /* U+1DA00 - U+1DA36 */ - { 0x1DA3B, 0x1DA6C }, /* U+1DA3B - U+1DA6C */ - { 0x1DA75, 0x1DA75 }, /* U+1DA75 */ - { 0x1DA84, 0x1DA84 }, /* U+1DA84 */ - { 0x1DA9B, 0x1DA9F }, /* U+1DA9B - U+1DA9F */ - { 0x1DAA1, 0x1DAAF }, /* U+1DAA1 - U+1DAAF */ - { 0x1E000, 0x1E006 }, /* U+1E000 - U+1E006 */ - { 0x1E008, 0x1E018 }, /* U+1E008 - U+1E018 */ - { 0x1E01B, 0x1E021 }, /* U+1E01B - U+1E021 */ - { 0x1E023, 0x1E024 }, /* U+1E023 - U+1E024 */ - { 0x1E026, 0x1E02A }, /* U+1E026 - U+1E02A */ - { 0x1E08F, 0x1E08F }, /* U+1E08F */ - { 0x1E130, 0x1E136 }, /* U+1E130 - U+1E136 */ - { 0x1E2AE, 0x1E2AE }, /* U+1E2AE */ - { 0x1E2EC, 0x1E2EF }, /* U+1E2EC - U+1E2EF */ - { 0x1E4EC, 0x1E4EF }, /* U+1E4EC - U+1E4EF */ - { 0x1E5EE, 0x1E5EF }, /* U+1E5EE - U+1E5EF */ - { 0x1E8D0, 0x1E8D6 }, /* U+1E8D0 - U+1E8D6 */ - { 0x1E944, 0x1E94A }, /* U+1E944 - U+1E94A */ - { 0x1F3FB, 0x1F3FF }, /* U+1F3FB - U+1F3FF */ - { 0x1F9B0, 0x1F9B3 }, /* U+1F9B0 - U+1F9B3 */ - { 0xE0001, 0xE0001 }, /* U+E0001 */ - { 0xE0020, 0xE007F }, /* U+E0020 - U+E007F */ - { 0xE0100, 0xE01EF }, /* U+E0100 - U+E01EF */ +/* Zero-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct interval16 zero_width_bmp[] = { + { 0x00AD, 0x00AD }, /* SOFT HYPHEN */ + { 0x0300, 0x036F }, /* COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X */ + { 0x0483, 0x0489 }, /* COMBINING CYRILLIC TITLO - COMBINING CYRILLIC MILLIONS SIGN */ + { 0x0591, 0x05BD }, /* HEBREW ACCENT ETNAHTA - HEBREW POINT METEG */ + { 0x05BF, 0x05BF }, /* HEBREW POINT RAFE */ + { 0x05C1, 0x05C2 }, /* HEBREW POINT SHIN DOT - HEBREW POINT SIN DOT */ + { 0x05C4, 0x05C5 }, /* HEBREW MARK UPPER DOT - HEBREW MARK LOWER DOT */ + { 0x05C7, 0x05C7 }, /* HEBREW POINT QAMATS QATAN */ + { 0x0600, 0x0605 }, /* ARABIC NUMBER SIGN - ARABIC NUMBER MARK ABOVE */ + { 0x0610, 0x061A }, /* ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM - ARABIC SMALL KASRA */ + { 0x064B, 0x065F }, /* ARABIC FATHATAN - ARABIC WAVY HAMZA BELOW */ + { 0x0670, 0x0670 }, /* ARABIC LETTER SUPERSCRIPT ALEF */ + { 0x06D6, 0x06DC }, /* ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA - ARABIC SMALL HIGH SEEN */ + { 0x06DF, 0x06E4 }, /* ARABIC SMALL HIGH ROUNDED ZERO - ARABIC SMALL HIGH MADDA */ + { 0x06E7, 0x06E8 }, /* ARABIC SMALL HIGH YEH - ARABIC SMALL HIGH NOON */ + { 0x06EA, 0x06ED }, /* ARABIC EMPTY CENTRE LOW STOP - ARABIC SMALL LOW MEEM */ + { 0x0711, 0x0711 }, /* SYRIAC LETTER SUPERSCRIPT ALAPH */ + { 0x0730, 0x074A }, /* SYRIAC PTHAHA ABOVE - SYRIAC BARREKH */ + { 0x07A6, 0x07B0 }, /* THAANA ABAFILI - THAANA SUKUN */ + { 0x07EB, 0x07F3 }, /* NKO COMBINING SHORT HIGH TONE - NKO COMBINING DOUBLE DOT ABOVE */ + { 0x07FD, 0x07FD }, /* NKO DANTAYALAN */ + { 0x0816, 0x0819 }, /* SAMARITAN MARK IN - SAMARITAN MARK DAGESH */ + { 0x081B, 0x0823 }, /* SAMARITAN MARK EPENTHETIC YUT - SAMARITAN VOWEL SIGN A */ + { 0x0825, 0x0827 }, /* SAMARITAN VOWEL SIGN SHORT A - SAMARITAN VOWEL SIGN U */ + { 0x0829, 0x082D }, /* SAMARITAN VOWEL SIGN LONG I - SAMARITAN MARK NEQUDAA */ + { 0x0859, 0x085B }, /* MANDAIC AFFRICATION MARK - MANDAIC GEMINATION MARK */ + { 0x0890, 0x0891 }, /* ARABIC POUND MARK ABOVE - ARABIC PIASTRE MARK ABOVE */ + { 0x0897, 0x089F }, /* ARABIC PEPET - ARABIC HALF MADDA OVER MADDA */ + { 0x08CA, 0x0903 }, /* ARABIC SMALL HIGH FARSI YEH - DEVANAGARI SIGN VISARGA */ + { 0x093A, 0x093C }, /* DEVANAGARI VOWEL SIGN OE - DEVANAGARI SIGN NUKTA */ + { 0x093E, 0x094F }, /* DEVANAGARI VOWEL SIGN AA - DEVANAGARI VOWEL SIGN AW */ + { 0x0951, 0x0957 }, /* DEVANAGARI STRESS SIGN UDATTA - DEVANAGARI VOWEL SIGN UUE */ + { 0x0962, 0x0963 }, /* DEVANAGARI VOWEL SIGN VOCALIC L - DEVANAGARI VOWEL SIGN VOCALIC LL */ + { 0x0981, 0x0983 }, /* BENGALI SIGN CANDRABINDU - BENGALI SIGN VISARGA */ + { 0x09BC, 0x09BC }, /* BENGALI SIGN NUKTA */ + { 0x09BE, 0x09C4 }, /* BENGALI VOWEL SIGN AA - BENGALI VOWEL SIGN VOCALIC RR */ + { 0x09C7, 0x09C8 }, /* BENGALI VOWEL SIGN E - BENGALI VOWEL SIGN AI */ + { 0x09CB, 0x09CD }, /* BENGALI VOWEL SIGN O - BENGALI SIGN VIRAMA */ + { 0x09D7, 0x09D7 }, /* BENGALI AU LENGTH MARK */ + { 0x09E2, 0x09E3 }, /* BENGALI VOWEL SIGN VOCALIC L - BENGALI VOWEL SIGN VOCALIC LL */ + { 0x09FE, 0x09FE }, /* BENGALI SANDHI MARK */ + { 0x0A01, 0x0A03 }, /* GURMUKHI SIGN ADAK BINDI - GURMUKHI SIGN VISARGA */ + { 0x0A3C, 0x0A3C }, /* GURMUKHI SIGN NUKTA */ + { 0x0A3E, 0x0A42 }, /* GURMUKHI VOWEL SIGN AA - GURMUKHI VOWEL SIGN UU */ + { 0x0A47, 0x0A48 }, /* GURMUKHI VOWEL SIGN EE - GURMUKHI VOWEL SIGN AI */ + { 0x0A4B, 0x0A4D }, /* GURMUKHI VOWEL SIGN OO - GURMUKHI SIGN VIRAMA */ + { 0x0A51, 0x0A51 }, /* GURMUKHI SIGN UDAAT */ + { 0x0A70, 0x0A71 }, /* GURMUKHI TIPPI - GURMUKHI ADDAK */ + { 0x0A75, 0x0A75 }, /* GURMUKHI SIGN YAKASH */ + { 0x0A81, 0x0A83 }, /* GUJARATI SIGN CANDRABINDU - GUJARATI SIGN VISARGA */ + { 0x0ABC, 0x0ABC }, /* GUJARATI SIGN NUKTA */ + { 0x0ABE, 0x0AC5 }, /* GUJARATI VOWEL SIGN AA - GUJARATI VOWEL SIGN CANDRA E */ + { 0x0AC7, 0x0AC9 }, /* GUJARATI VOWEL SIGN E - GUJARATI VOWEL SIGN CANDRA O */ + { 0x0ACB, 0x0ACD }, /* GUJARATI VOWEL SIGN O - GUJARATI SIGN VIRAMA */ + { 0x0AE2, 0x0AE3 }, /* GUJARATI VOWEL SIGN VOCALIC L - GUJARATI VOWEL SIGN VOCALIC LL */ + { 0x0AFA, 0x0AFF }, /* GUJARATI SIGN SUKUN - GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE */ + { 0x0B01, 0x0B03 }, /* ORIYA SIGN CANDRABINDU - ORIYA SIGN VISARGA */ + { 0x0B3C, 0x0B3C }, /* ORIYA SIGN NUKTA */ + { 0x0B3E, 0x0B44 }, /* ORIYA VOWEL SIGN AA - ORIYA VOWEL SIGN VOCALIC RR */ + { 0x0B47, 0x0B48 }, /* ORIYA VOWEL SIGN E - ORIYA VOWEL SIGN AI */ + { 0x0B4B, 0x0B4D }, /* ORIYA VOWEL SIGN O - ORIYA SIGN VIRAMA */ + { 0x0B55, 0x0B57 }, /* ORIYA SIGN OVERLINE - ORIYA AU LENGTH MARK */ + { 0x0B62, 0x0B63 }, /* ORIYA VOWEL SIGN VOCALIC L - ORIYA VOWEL SIGN VOCALIC LL */ + { 0x0B82, 0x0B82 }, /* TAMIL SIGN ANUSVARA */ + { 0x0BBE, 0x0BC2 }, /* TAMIL VOWEL SIGN AA - TAMIL VOWEL SIGN UU */ + { 0x0BC6, 0x0BC8 }, /* TAMIL VOWEL SIGN E - TAMIL VOWEL SIGN AI */ + { 0x0BCA, 0x0BCD }, /* TAMIL VOWEL SIGN O - TAMIL SIGN VIRAMA */ + { 0x0BD7, 0x0BD7 }, /* TAMIL AU LENGTH MARK */ + { 0x0C00, 0x0C04 }, /* TELUGU SIGN COMBINING CANDRABINDU ABOVE - TELUGU SIGN COMBINING ANUSVARA ABOVE */ + { 0x0C3C, 0x0C3C }, /* TELUGU SIGN NUKTA */ + { 0x0C3E, 0x0C44 }, /* TELUGU VOWEL SIGN AA - TELUGU VOWEL SIGN VOCALIC RR */ + { 0x0C46, 0x0C48 }, /* TELUGU VOWEL SIGN E - TELUGU VOWEL SIGN AI */ + { 0x0C4A, 0x0C4D }, /* TELUGU VOWEL SIGN O - TELUGU SIGN VIRAMA */ + { 0x0C55, 0x0C56 }, /* TELUGU LENGTH MARK - TELUGU AI LENGTH MARK */ + { 0x0C62, 0x0C63 }, /* TELUGU VOWEL SIGN VOCALIC L - TELUGU VOWEL SIGN VOCALIC LL */ + { 0x0C81, 0x0C83 }, /* KANNADA SIGN CANDRABINDU - KANNADA SIGN VISARGA */ + { 0x0CBC, 0x0CBC }, /* KANNADA SIGN NUKTA */ + { 0x0CBE, 0x0CC4 }, /* KANNADA VOWEL SIGN AA - KANNADA VOWEL SIGN VOCALIC RR */ + { 0x0CC6, 0x0CC8 }, /* KANNADA VOWEL SIGN E - KANNADA VOWEL SIGN AI */ + { 0x0CCA, 0x0CCD }, /* KANNADA VOWEL SIGN O - KANNADA SIGN VIRAMA */ + { 0x0CD5, 0x0CD6 }, /* KANNADA LENGTH MARK - KANNADA AI LENGTH MARK */ + { 0x0CE2, 0x0CE3 }, /* KANNADA VOWEL SIGN VOCALIC L - KANNADA VOWEL SIGN VOCALIC LL */ + { 0x0CF3, 0x0CF3 }, /* KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT */ + { 0x0D00, 0x0D03 }, /* MALAYALAM SIGN COMBINING ANUSVARA ABOVE - MALAYALAM SIGN VISARGA */ + { 0x0D3B, 0x0D3C }, /* MALAYALAM SIGN VERTICAL BAR VIRAMA - MALAYALAM SIGN CIRCULAR VIRAMA */ + { 0x0D3E, 0x0D44 }, /* MALAYALAM VOWEL SIGN AA - MALAYALAM VOWEL SIGN VOCALIC RR */ + { 0x0D46, 0x0D48 }, /* MALAYALAM VOWEL SIGN E - MALAYALAM VOWEL SIGN AI */ + { 0x0D4A, 0x0D4D }, /* MALAYALAM VOWEL SIGN O - MALAYALAM SIGN VIRAMA */ + { 0x0D57, 0x0D57 }, /* MALAYALAM AU LENGTH MARK */ + { 0x0D62, 0x0D63 }, /* MALAYALAM VOWEL SIGN VOCALIC L - MALAYALAM VOWEL SIGN VOCALIC LL */ + { 0x0D81, 0x0D83 }, /* SINHALA SIGN CANDRABINDU - SINHALA SIGN VISARGAYA */ + { 0x0DCA, 0x0DCA }, /* SINHALA SIGN AL-LAKUNA */ + { 0x0DCF, 0x0DD4 }, /* SINHALA VOWEL SIGN AELA-PILLA - SINHALA VOWEL SIGN KETTI PAA-PILLA */ + { 0x0DD6, 0x0DD6 }, /* SINHALA VOWEL SIGN DIGA PAA-PILLA */ + { 0x0DD8, 0x0DDF }, /* SINHALA VOWEL SIGN GAETTA-PILLA - SINHALA VOWEL SIGN GAYANUKITTA */ + { 0x0DF2, 0x0DF3 }, /* SINHALA VOWEL SIGN DIGA GAETTA-PILLA - SINHALA VOWEL SIGN DIGA GAYANUKITTA */ + { 0x0E31, 0x0E31 }, /* THAI CHARACTER MAI HAN-AKAT */ + { 0x0E34, 0x0E3A }, /* THAI CHARACTER SARA I - THAI CHARACTER PHINTHU */ + { 0x0E47, 0x0E4E }, /* THAI CHARACTER MAITAIKHU - THAI CHARACTER YAMAKKAN */ + { 0x0EB1, 0x0EB1 }, /* LAO VOWEL SIGN MAI KAN */ + { 0x0EB4, 0x0EBC }, /* LAO VOWEL SIGN I - LAO SEMIVOWEL SIGN LO */ + { 0x0EC8, 0x0ECE }, /* LAO TONE MAI EK - LAO YAMAKKAN */ + { 0x0F18, 0x0F19 }, /* TIBETAN ASTROLOGICAL SIGN -KHYUD PA - TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS */ + { 0x0F35, 0x0F35 }, /* TIBETAN MARK NGAS BZUNG NYI ZLA */ + { 0x0F37, 0x0F37 }, /* TIBETAN MARK NGAS BZUNG SGOR RTAGS */ + { 0x0F39, 0x0F39 }, /* TIBETAN MARK TSA -PHRU */ + { 0x0F3E, 0x0F3F }, /* TIBETAN SIGN YAR TSHES - TIBETAN SIGN MAR TSHES */ + { 0x0F71, 0x0F84 }, /* TIBETAN VOWEL SIGN AA - TIBETAN MARK HALANTA */ + { 0x0F86, 0x0F87 }, /* TIBETAN SIGN LCI RTAGS - TIBETAN SIGN YANG RTAGS */ + { 0x0F8D, 0x0F97 }, /* TIBETAN SUBJOINED SIGN LCE TSA CAN - TIBETAN SUBJOINED LETTER JA */ + { 0x0F99, 0x0FBC }, /* TIBETAN SUBJOINED LETTER NYA - TIBETAN SUBJOINED LETTER FIXED-FORM RA */ + { 0x0FC6, 0x0FC6 }, /* TIBETAN SYMBOL PADMA GDAN */ + { 0x102B, 0x103E }, /* MYANMAR VOWEL SIGN TALL AA - MYANMAR CONSONANT SIGN MEDIAL HA */ + { 0x1056, 0x1059 }, /* MYANMAR VOWEL SIGN VOCALIC R - MYANMAR VOWEL SIGN VOCALIC LL */ + { 0x105E, 0x1060 }, /* MYANMAR CONSONANT SIGN MON MEDIAL NA - MYANMAR CONSONANT SIGN MON MEDIAL LA */ + { 0x1062, 0x1064 }, /* MYANMAR VOWEL SIGN SGAW KAREN EU - MYANMAR TONE MARK SGAW KAREN KE PHO */ + { 0x1067, 0x106D }, /* MYANMAR VOWEL SIGN WESTERN PWO KAREN EU - MYANMAR SIGN WESTERN PWO KAREN TONE-5 */ + { 0x1071, 0x1074 }, /* MYANMAR VOWEL SIGN GEBA KAREN I - MYANMAR VOWEL SIGN KAYAH EE */ + { 0x1082, 0x108D }, /* MYANMAR CONSONANT SIGN SHAN MEDIAL WA - MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE */ + { 0x108F, 0x108F }, /* MYANMAR SIGN RUMAI PALAUNG TONE-5 */ + { 0x109A, 0x109D }, /* MYANMAR SIGN KHAMTI TONE-1 - MYANMAR VOWEL SIGN AITON AI */ + { 0x135D, 0x135F }, /* ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK - ETHIOPIC COMBINING GEMINATION MARK */ + { 0x1712, 0x1715 }, /* TAGALOG VOWEL SIGN I - TAGALOG SIGN PAMUDPOD */ + { 0x1732, 0x1734 }, /* HANUNOO VOWEL SIGN I - HANUNOO SIGN PAMUDPOD */ + { 0x1752, 0x1753 }, /* BUHID VOWEL SIGN I - BUHID VOWEL SIGN U */ + { 0x1772, 0x1773 }, /* TAGBANWA VOWEL SIGN I - TAGBANWA VOWEL SIGN U */ + { 0x17B4, 0x17D3 }, /* KHMER VOWEL INHERENT AQ - KHMER SIGN BATHAMASAT */ + { 0x17DD, 0x17DD }, /* KHMER SIGN ATTHACAN */ + { 0x180B, 0x180D }, /* MONGOLIAN FREE VARIATION SELECTOR ONE - MONGOLIAN FREE VARIATION SELECTOR THREE */ + { 0x180F, 0x180F }, /* MONGOLIAN FREE VARIATION SELECTOR FOUR */ + { 0x1885, 0x1886 }, /* MONGOLIAN LETTER ALI GALI BALUDA - MONGOLIAN LETTER ALI GALI THREE BALUDA */ + { 0x18A9, 0x18A9 }, /* MONGOLIAN LETTER ALI GALI DAGALGA */ + { 0x1920, 0x192B }, /* LIMBU VOWEL SIGN A - LIMBU SUBJOINED LETTER WA */ + { 0x1930, 0x193B }, /* LIMBU SMALL LETTER KA - LIMBU SIGN SA-I */ + { 0x1A17, 0x1A1B }, /* BUGINESE VOWEL SIGN I - BUGINESE VOWEL SIGN AE */ + { 0x1A55, 0x1A5E }, /* TAI THAM CONSONANT SIGN MEDIAL RA - TAI THAM CONSONANT SIGN SA */ + { 0x1A60, 0x1A7C }, /* TAI THAM SIGN SAKOT - TAI THAM SIGN KHUEN-LUE KARAN */ + { 0x1A7F, 0x1A7F }, /* TAI THAM COMBINING CRYPTOGRAMMIC DOT */ + { 0x1AB0, 0x1ACE }, /* COMBINING DOUBLED CIRCUMFLEX ACCENT - COMBINING LATIN SMALL LETTER INSULAR T */ + { 0x1B00, 0x1B04 }, /* BALINESE SIGN ULU RICEM - BALINESE SIGN BISAH */ + { 0x1B34, 0x1B44 }, /* BALINESE SIGN REREKAN - BALINESE ADEG ADEG */ + { 0x1B6B, 0x1B73 }, /* BALINESE MUSICAL SYMBOL COMBINING TEGEH - BALINESE MUSICAL SYMBOL COMBINING GONG */ + { 0x1B80, 0x1B82 }, /* SUNDANESE SIGN PANYECEK - SUNDANESE SIGN PANGWISAD */ + { 0x1BA1, 0x1BAD }, /* SUNDANESE CONSONANT SIGN PAMINGKAL - SUNDANESE CONSONANT SIGN PASANGAN WA */ + { 0x1BE6, 0x1BF3 }, /* BATAK SIGN TOMPI - BATAK PANONGONAN */ + { 0x1C24, 0x1C37 }, /* LEPCHA SUBJOINED LETTER YA - LEPCHA SIGN NUKTA */ + { 0x1CD0, 0x1CD2 }, /* VEDIC TONE KARSHANA - VEDIC TONE PRENKHA */ + { 0x1CD4, 0x1CE8 }, /* VEDIC SIGN YAJURVEDIC MIDLINE SVARITA - VEDIC SIGN VISARGA ANUDATTA WITH TAIL */ + { 0x1CED, 0x1CED }, /* VEDIC SIGN TIRYAK */ + { 0x1CF4, 0x1CF4 }, /* VEDIC TONE CANDRA ABOVE */ + { 0x1CF7, 0x1CF9 }, /* VEDIC SIGN ATIKRAMA - VEDIC TONE DOUBLE RING ABOVE */ + { 0x1DC0, 0x1DFF }, /* COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW */ + { 0x200B, 0x200E }, /* ZERO WIDTH SPACE - LEFT-TO-RIGHT MARK */ + { 0x202A, 0x202D }, /* LEFT-TO-RIGHT EMBEDDING - LEFT-TO-RIGHT OVERRIDE */ + { 0x2060, 0x2064 }, /* WORD JOINER - INVISIBLE PLUS */ + { 0x206A, 0x206F }, /* INHIBIT SYMMETRIC SWAPPING - NOMINAL DIGIT SHAPES */ + { 0x20D0, 0x20F0 }, /* COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE */ + { 0x2640, 0x2640 }, /* FEMALE SIGN */ + { 0x2642, 0x2642 }, /* MALE SIGN */ + { 0x26A7, 0x26A7 }, /* MALE WITH STROKE AND MALE AND FEMALE SIGN */ + { 0x2CEF, 0x2CF1 }, /* COPTIC COMBINING NI ABOVE - COPTIC COMBINING SPIRITUS LENIS */ + { 0x2D7F, 0x2D7F }, /* TIFINAGH CONSONANT JOINER */ + { 0x2DE0, 0x2DFF }, /* COMBINING CYRILLIC LETTER BE - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS */ + { 0x302A, 0x302F }, /* IDEOGRAPHIC LEVEL TONE MARK - HANGUL DOUBLE DOT TONE MARK */ + { 0x3099, 0x309A }, /* COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK - COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK */ + { 0xA66F, 0xA672 }, /* COMBINING CYRILLIC VZMET - COMBINING CYRILLIC THOUSAND MILLIONS SIGN */ + { 0xA674, 0xA67D }, /* COMBINING CYRILLIC LETTER UKRAINIAN IE - COMBINING CYRILLIC PAYEROK */ + { 0xA69E, 0xA69F }, /* COMBINING CYRILLIC LETTER EF - COMBINING CYRILLIC LETTER IOTIFIED E */ + { 0xA6F0, 0xA6F1 }, /* BAMUM COMBINING MARK KOQNDON - BAMUM COMBINING MARK TUKWENTIS */ + { 0xA802, 0xA802 }, /* SYLOTI NAGRI SIGN DVISVARA */ + { 0xA806, 0xA806 }, /* SYLOTI NAGRI SIGN HASANTA */ + { 0xA80B, 0xA80B }, /* SYLOTI NAGRI SIGN ANUSVARA */ + { 0xA823, 0xA827 }, /* SYLOTI NAGRI VOWEL SIGN A - SYLOTI NAGRI VOWEL SIGN OO */ + { 0xA82C, 0xA82C }, /* SYLOTI NAGRI SIGN ALTERNATE HASANTA */ + { 0xA880, 0xA881 }, /* SAURASHTRA SIGN ANUSVARA - SAURASHTRA SIGN VISARGA */ + { 0xA8B4, 0xA8C5 }, /* SAURASHTRA CONSONANT SIGN HAARU - SAURASHTRA SIGN CANDRABINDU */ + { 0xA8E0, 0xA8F1 }, /* COMBINING DEVANAGARI DIGIT ZERO - COMBINING DEVANAGARI SIGN AVAGRAHA */ + { 0xA8FF, 0xA8FF }, /* DEVANAGARI VOWEL SIGN AY */ + { 0xA926, 0xA92D }, /* KAYAH LI VOWEL UE - KAYAH LI TONE CALYA PLOPHU */ + { 0xA947, 0xA953 }, /* REJANG VOWEL SIGN I - REJANG VIRAMA */ + { 0xA980, 0xA983 }, /* JAVANESE SIGN PANYANGGA - JAVANESE SIGN WIGNYAN */ + { 0xA9B3, 0xA9C0 }, /* JAVANESE SIGN CECAK TELU - JAVANESE PANGKON */ + { 0xA9E5, 0xA9E5 }, /* MYANMAR SIGN SHAN SAW */ + { 0xAA29, 0xAA36 }, /* CHAM VOWEL SIGN AA - CHAM CONSONANT SIGN WA */ + { 0xAA43, 0xAA43 }, /* CHAM CONSONANT SIGN FINAL NG */ + { 0xAA4C, 0xAA4D }, /* CHAM CONSONANT SIGN FINAL M - CHAM CONSONANT SIGN FINAL H */ + { 0xAA7B, 0xAA7D }, /* MYANMAR SIGN PAO KAREN TONE - MYANMAR SIGN TAI LAING TONE-5 */ + { 0xAAB0, 0xAAB0 }, /* TAI VIET MAI KANG */ + { 0xAAB2, 0xAAB4 }, /* TAI VIET VOWEL I - TAI VIET VOWEL U */ + { 0xAAB7, 0xAAB8 }, /* TAI VIET MAI KHIT - TAI VIET VOWEL IA */ + { 0xAABE, 0xAABF }, /* TAI VIET VOWEL AM - TAI VIET TONE MAI EK */ + { 0xAAC1, 0xAAC1 }, /* TAI VIET TONE MAI THO */ + { 0xAAEB, 0xAAEF }, /* MEETEI MAYEK VOWEL SIGN II - MEETEI MAYEK VOWEL SIGN AAU */ + { 0xAAF5, 0xAAF6 }, /* MEETEI MAYEK VOWEL SIGN VISARGA - MEETEI MAYEK VIRAMA */ + { 0xABE3, 0xABEA }, /* MEETEI MAYEK VOWEL SIGN ONAP - MEETEI MAYEK VOWEL SIGN NUNG */ + { 0xABEC, 0xABED }, /* MEETEI MAYEK LUM IYEK - MEETEI MAYEK APUN IYEK */ + { 0xFB1E, 0xFB1E }, /* HEBREW POINT JUDEO-SPANISH VARIKA */ + { 0xFE00, 0xFE0F }, /* VARIATION SELECTOR-1 - VARIATION SELECTOR-16 */ + { 0xFE20, 0xFE2F }, /* COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF */ + { 0xFEFF, 0xFEFF }, /* ZERO WIDTH NO-BREAK SPACE */ + { 0xFFF9, 0xFFFB }, /* INTERLINEAR ANNOTATION ANCHOR - INTERLINEAR ANNOTATION TERMINATOR */ +}; + +/* Zero-width character ranges (non-BMP, U+10000 and above) */ +static const struct interval32 zero_width_non_bmp[] = { + { 0x101FD, 0x101FD }, /* PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE */ + { 0x102E0, 0x102E0 }, /* COPTIC EPACT THOUSANDS MARK */ + { 0x10376, 0x1037A }, /* COMBINING OLD PERMIC LETTER AN - COMBINING OLD PERMIC LETTER SII */ + { 0x10A01, 0x10A03 }, /* KHAROSHTHI VOWEL SIGN I - KHAROSHTHI VOWEL SIGN VOCALIC R */ + { 0x10A05, 0x10A06 }, /* KHAROSHTHI VOWEL SIGN E - KHAROSHTHI VOWEL SIGN O */ + { 0x10A0C, 0x10A0F }, /* KHAROSHTHI VOWEL LENGTH MARK - KHAROSHTHI SIGN VISARGA */ + { 0x10A38, 0x10A3A }, /* KHAROSHTHI SIGN BAR ABOVE - KHAROSHTHI SIGN DOT BELOW */ + { 0x10A3F, 0x10A3F }, /* KHAROSHTHI VIRAMA */ + { 0x10AE5, 0x10AE6 }, /* MANICHAEAN ABBREVIATION MARK ABOVE - MANICHAEAN ABBREVIATION MARK BELOW */ + { 0x10D24, 0x10D27 }, /* HANIFI ROHINGYA SIGN HARBAHAY - HANIFI ROHINGYA SIGN TASSI */ + { 0x10D69, 0x10D6D }, /* GARAY VOWEL SIGN E - GARAY CONSONANT NASALIZATION MARK */ + { 0x10EAB, 0x10EAC }, /* YEZIDI COMBINING HAMZA MARK - YEZIDI COMBINING MADDA MARK */ + { 0x10EFC, 0x10EFF }, /* ARABIC COMBINING ALEF OVERLAY - ARABIC SMALL LOW WORD MADDA */ + { 0x10F46, 0x10F50 }, /* SOGDIAN COMBINING DOT BELOW - SOGDIAN COMBINING STROKE BELOW */ + { 0x10F82, 0x10F85 }, /* OLD UYGHUR COMBINING DOT ABOVE - OLD UYGHUR COMBINING TWO DOTS BELOW */ + { 0x11000, 0x11002 }, /* BRAHMI SIGN CANDRABINDU - BRAHMI SIGN VISARGA */ + { 0x11038, 0x11046 }, /* BRAHMI VOWEL SIGN AA - BRAHMI VIRAMA */ + { 0x11070, 0x11070 }, /* BRAHMI SIGN OLD TAMIL VIRAMA */ + { 0x11073, 0x11074 }, /* BRAHMI VOWEL SIGN OLD TAMIL SHORT E - BRAHMI VOWEL SIGN OLD TAMIL SHORT O */ + { 0x1107F, 0x11082 }, /* BRAHMI NUMBER JOINER - KAITHI SIGN VISARGA */ + { 0x110B0, 0x110BA }, /* KAITHI VOWEL SIGN AA - KAITHI SIGN NUKTA */ + { 0x110BD, 0x110BD }, /* KAITHI NUMBER SIGN */ + { 0x110C2, 0x110C2 }, /* KAITHI VOWEL SIGN VOCALIC R */ + { 0x110CD, 0x110CD }, /* KAITHI NUMBER SIGN ABOVE */ + { 0x11100, 0x11102 }, /* CHAKMA SIGN CANDRABINDU - CHAKMA SIGN VISARGA */ + { 0x11127, 0x11134 }, /* CHAKMA VOWEL SIGN A - CHAKMA MAAYYAA */ + { 0x11145, 0x11146 }, /* CHAKMA VOWEL SIGN AA - CHAKMA VOWEL SIGN EI */ + { 0x11173, 0x11173 }, /* MAHAJANI SIGN NUKTA */ + { 0x11180, 0x11182 }, /* SHARADA SIGN CANDRABINDU - SHARADA SIGN VISARGA */ + { 0x111B3, 0x111C0 }, /* SHARADA VOWEL SIGN AA - SHARADA SIGN VIRAMA */ + { 0x111C9, 0x111CC }, /* SHARADA SANDHI MARK - SHARADA EXTRA SHORT VOWEL MARK */ + { 0x111CE, 0x111CF }, /* SHARADA VOWEL SIGN PRISHTHAMATRA E - SHARADA SIGN INVERTED CANDRABINDU */ + { 0x1122C, 0x11237 }, /* KHOJKI VOWEL SIGN AA - KHOJKI SIGN SHADDA */ + { 0x1123E, 0x1123E }, /* KHOJKI SIGN SUKUN */ + { 0x11241, 0x11241 }, /* KHOJKI VOWEL SIGN VOCALIC R */ + { 0x112DF, 0x112EA }, /* KHUDAWADI SIGN ANUSVARA - KHUDAWADI SIGN VIRAMA */ + { 0x11300, 0x11303 }, /* GRANTHA SIGN COMBINING ANUSVARA ABOVE - GRANTHA SIGN VISARGA */ + { 0x1133B, 0x1133C }, /* COMBINING BINDU BELOW - GRANTHA SIGN NUKTA */ + { 0x1133E, 0x11344 }, /* GRANTHA VOWEL SIGN AA - GRANTHA VOWEL SIGN VOCALIC RR */ + { 0x11347, 0x11348 }, /* GRANTHA VOWEL SIGN EE - GRANTHA VOWEL SIGN AI */ + { 0x1134B, 0x1134D }, /* GRANTHA VOWEL SIGN OO - GRANTHA SIGN VIRAMA */ + { 0x11357, 0x11357 }, /* GRANTHA AU LENGTH MARK */ + { 0x11362, 0x11363 }, /* GRANTHA VOWEL SIGN VOCALIC L - GRANTHA VOWEL SIGN VOCALIC LL */ + { 0x11366, 0x1136C }, /* COMBINING GRANTHA DIGIT ZERO - COMBINING GRANTHA DIGIT SIX */ + { 0x11370, 0x11374 }, /* COMBINING GRANTHA LETTER A - COMBINING GRANTHA LETTER PA */ + { 0x113B8, 0x113C0 }, /* TULU-TIGALARI VOWEL SIGN AA - TULU-TIGALARI VOWEL SIGN VOCALIC LL */ + { 0x113C2, 0x113C2 }, /* TULU-TIGALARI VOWEL SIGN EE */ + { 0x113C5, 0x113C5 }, /* TULU-TIGALARI VOWEL SIGN AI */ + { 0x113C7, 0x113CA }, /* TULU-TIGALARI VOWEL SIGN OO - TULU-TIGALARI SIGN CANDRA ANUNASIKA */ + { 0x113CC, 0x113D0 }, /* TULU-TIGALARI SIGN ANUSVARA - TULU-TIGALARI CONJOINER */ + { 0x113D2, 0x113D2 }, /* TULU-TIGALARI GEMINATION MARK */ + { 0x113E1, 0x113E2 }, /* TULU-TIGALARI VEDIC TONE SVARITA - TULU-TIGALARI VEDIC TONE ANUDATTA */ + { 0x11435, 0x11446 }, /* NEWA VOWEL SIGN AA - NEWA SIGN NUKTA */ + { 0x1145E, 0x1145E }, /* NEWA SANDHI MARK */ + { 0x114B0, 0x114C3 }, /* TIRHUTA VOWEL SIGN AA - TIRHUTA SIGN NUKTA */ + { 0x115AF, 0x115B5 }, /* SIDDHAM VOWEL SIGN AA - SIDDHAM VOWEL SIGN VOCALIC RR */ + { 0x115B8, 0x115C0 }, /* SIDDHAM VOWEL SIGN E - SIDDHAM SIGN NUKTA */ + { 0x115DC, 0x115DD }, /* SIDDHAM VOWEL SIGN ALTERNATE U - SIDDHAM VOWEL SIGN ALTERNATE UU */ + { 0x11630, 0x11640 }, /* MODI VOWEL SIGN AA - MODI SIGN ARDHACANDRA */ + { 0x116AB, 0x116B7 }, /* TAKRI SIGN ANUSVARA - TAKRI SIGN NUKTA */ + { 0x1171D, 0x1172B }, /* AHOM CONSONANT SIGN MEDIAL LA - AHOM SIGN KILLER */ + { 0x1182C, 0x1183A }, /* DOGRA VOWEL SIGN AA - DOGRA SIGN NUKTA */ + { 0x11930, 0x11935 }, /* DIVES AKURU VOWEL SIGN AA - DIVES AKURU VOWEL SIGN E */ + { 0x11937, 0x11938 }, /* DIVES AKURU VOWEL SIGN AI - DIVES AKURU VOWEL SIGN O */ + { 0x1193B, 0x1193E }, /* DIVES AKURU SIGN ANUSVARA - DIVES AKURU VIRAMA */ + { 0x11940, 0x11940 }, /* DIVES AKURU MEDIAL YA */ + { 0x11942, 0x11943 }, /* DIVES AKURU MEDIAL RA - DIVES AKURU SIGN NUKTA */ + { 0x119D1, 0x119D7 }, /* NANDINAGARI VOWEL SIGN AA - NANDINAGARI VOWEL SIGN VOCALIC RR */ + { 0x119DA, 0x119E0 }, /* NANDINAGARI VOWEL SIGN E - NANDINAGARI SIGN VIRAMA */ + { 0x119E4, 0x119E4 }, /* NANDINAGARI VOWEL SIGN PRISHTHAMATRA E */ + { 0x11A01, 0x11A0A }, /* ZANABAZAR SQUARE VOWEL SIGN I - ZANABAZAR SQUARE VOWEL LENGTH MARK */ + { 0x11A33, 0x11A39 }, /* ZANABAZAR SQUARE FINAL CONSONANT MARK - ZANABAZAR SQUARE SIGN VISARGA */ + { 0x11A3B, 0x11A3E }, /* ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA - ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA */ + { 0x11A47, 0x11A47 }, /* ZANABAZAR SQUARE SUBJOINER */ + { 0x11A51, 0x11A5B }, /* SOYOMBO VOWEL SIGN I - SOYOMBO VOWEL LENGTH MARK */ + { 0x11A8A, 0x11A99 }, /* SOYOMBO FINAL CONSONANT SIGN G - SOYOMBO SUBJOINER */ + { 0x11C2F, 0x11C36 }, /* BHAIKSUKI VOWEL SIGN AA - BHAIKSUKI VOWEL SIGN VOCALIC L */ + { 0x11C38, 0x11C3F }, /* BHAIKSUKI VOWEL SIGN E - BHAIKSUKI SIGN VIRAMA */ + { 0x11C92, 0x11CA7 }, /* MARCHEN SUBJOINED LETTER KA - MARCHEN SUBJOINED LETTER ZA */ + { 0x11CA9, 0x11CB6 }, /* MARCHEN SUBJOINED LETTER YA - MARCHEN SIGN CANDRABINDU */ + { 0x11D31, 0x11D36 }, /* MASARAM GONDI VOWEL SIGN AA - MASARAM GONDI VOWEL SIGN VOCALIC R */ + { 0x11D3A, 0x11D3A }, /* MASARAM GONDI VOWEL SIGN E */ + { 0x11D3C, 0x11D3D }, /* MASARAM GONDI VOWEL SIGN AI - MASARAM GONDI VOWEL SIGN O */ + { 0x11D3F, 0x11D45 }, /* MASARAM GONDI VOWEL SIGN AU - MASARAM GONDI VIRAMA */ + { 0x11D47, 0x11D47 }, /* MASARAM GONDI RA-KARA */ + { 0x11D8A, 0x11D8E }, /* GUNJALA GONDI VOWEL SIGN AA - GUNJALA GONDI VOWEL SIGN UU */ + { 0x11D90, 0x11D91 }, /* GUNJALA GONDI VOWEL SIGN EE - GUNJALA GONDI VOWEL SIGN AI */ + { 0x11D93, 0x11D97 }, /* GUNJALA GONDI VOWEL SIGN OO - GUNJALA GONDI VIRAMA */ + { 0x11EF3, 0x11EF6 }, /* MAKASAR VOWEL SIGN I - MAKASAR VOWEL SIGN O */ + { 0x11F00, 0x11F01 }, /* KAWI SIGN CANDRABINDU - KAWI SIGN ANUSVARA */ + { 0x11F03, 0x11F03 }, /* KAWI SIGN VISARGA */ + { 0x11F34, 0x11F3A }, /* KAWI VOWEL SIGN AA - KAWI VOWEL SIGN VOCALIC R */ + { 0x11F3E, 0x11F42 }, /* KAWI VOWEL SIGN E - KAWI CONJOINER */ + { 0x11F5A, 0x11F5A }, /* KAWI SIGN NUKTA */ + { 0x13430, 0x13440 }, /* EGYPTIAN HIEROGLYPH VERTICAL JOINER - EGYPTIAN HIEROGLYPH MIRROR HORIZONTALLY */ + { 0x13447, 0x13455 }, /* EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT TOP START - EGYPTIAN HIEROGLYPH MODIFIER DAMAGED */ + { 0x1611E, 0x1612F }, /* GURUNG KHEMA VOWEL SIGN AA - GURUNG KHEMA SIGN THOLHOMA */ + { 0x16AF0, 0x16AF4 }, /* BASSA VAH COMBINING HIGH TONE - BASSA VAH COMBINING HIGH-LOW TONE */ + { 0x16B30, 0x16B36 }, /* PAHAWH HMONG MARK CIM TUB - PAHAWH HMONG MARK CIM TAUM */ + { 0x16F4F, 0x16F4F }, /* MIAO SIGN CONSONANT MODIFIER BAR */ + { 0x16F51, 0x16F87 }, /* MIAO SIGN ASPIRATION - MIAO VOWEL SIGN UI */ + { 0x16F8F, 0x16F92 }, /* MIAO TONE RIGHT - MIAO TONE BELOW */ + { 0x16FE4, 0x16FE4 }, /* KHITAN SMALL SCRIPT FILLER */ + { 0x16FF0, 0x16FF1 }, /* VIETNAMESE ALTERNATE READING MARK CA - VIETNAMESE ALTERNATE READING MARK NHAY */ + { 0x1BC9D, 0x1BC9E }, /* DUPLOYAN THICK LETTER SELECTOR - DUPLOYAN DOUBLE MARK */ + { 0x1BCA0, 0x1BCA3 }, /* SHORTHAND FORMAT LETTER OVERLAP - SHORTHAND FORMAT UP STEP */ + { 0x1CF00, 0x1CF2D }, /* ZNAMENNY COMBINING MARK GORAZDO NIZKO S KRYZHEM ON LEFT - ZNAMENNY COMBINING MARK KRYZH ON LEFT */ + { 0x1CF30, 0x1CF46 }, /* ZNAMENNY COMBINING TONAL RANGE MARK MRACHNO - ZNAMENNY PRIZNAK MODIFIER ROG */ + { 0x1D165, 0x1D169 }, /* MUSICAL SYMBOL COMBINING STEM - MUSICAL SYMBOL COMBINING TREMOLO-3 */ + { 0x1D16D, 0x1D182 }, /* MUSICAL SYMBOL COMBINING AUGMENTATION DOT - MUSICAL SYMBOL COMBINING LOURE */ + { 0x1D185, 0x1D18B }, /* MUSICAL SYMBOL COMBINING DOIT - MUSICAL SYMBOL COMBINING TRIPLE TONGUE */ + { 0x1D1AA, 0x1D1AD }, /* MUSICAL SYMBOL COMBINING DOWN BOW - MUSICAL SYMBOL COMBINING SNAP PIZZICATO */ + { 0x1D242, 0x1D244 }, /* COMBINING GREEK MUSICAL TRISEME - COMBINING GREEK MUSICAL PENTASEME */ + { 0x1DA00, 0x1DA36 }, /* SIGNWRITING HEAD RIM - SIGNWRITING AIR SUCKING IN */ + { 0x1DA3B, 0x1DA6C }, /* SIGNWRITING MOUTH CLOSED NEUTRAL - SIGNWRITING EXCITEMENT */ + { 0x1DA75, 0x1DA75 }, /* SIGNWRITING UPPER BODY TILTING FROM HIP JOINTS */ + { 0x1DA84, 0x1DA84 }, /* SIGNWRITING LOCATION HEAD NECK */ + { 0x1DA9B, 0x1DA9F }, /* SIGNWRITING FILL MODIFIER-2 - SIGNWRITING FILL MODIFIER-6 */ + { 0x1DAA1, 0x1DAAF }, /* SIGNWRITING ROTATION MODIFIER-2 - SIGNWRITING ROTATION MODIFIER-16 */ + { 0x1E000, 0x1E006 }, /* COMBINING GLAGOLITIC LETTER AZU - COMBINING GLAGOLITIC LETTER ZHIVETE */ + { 0x1E008, 0x1E018 }, /* COMBINING GLAGOLITIC LETTER ZEMLJA - COMBINING GLAGOLITIC LETTER HERU */ + { 0x1E01B, 0x1E021 }, /* COMBINING GLAGOLITIC LETTER SHTA - COMBINING GLAGOLITIC LETTER YATI */ + { 0x1E023, 0x1E024 }, /* COMBINING GLAGOLITIC LETTER YU - COMBINING GLAGOLITIC LETTER SMALL YUS */ + { 0x1E026, 0x1E02A }, /* COMBINING GLAGOLITIC LETTER YO - COMBINING GLAGOLITIC LETTER FITA */ + { 0x1E08F, 0x1E08F }, /* COMBINING CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I */ + { 0x1E130, 0x1E136 }, /* NYIAKENG PUACHUE HMONG TONE-B - NYIAKENG PUACHUE HMONG TONE-D */ + { 0x1E2AE, 0x1E2AE }, /* TOTO SIGN RISING TONE */ + { 0x1E2EC, 0x1E2EF }, /* WANCHO TONE TUP - WANCHO TONE KOINI */ + { 0x1E4EC, 0x1E4EF }, /* NAG MUNDARI SIGN MUHOR - NAG MUNDARI SIGN SUTUH */ + { 0x1E5EE, 0x1E5EF }, /* OL ONAL SIGN MU - OL ONAL SIGN IKIR */ + { 0x1E8D0, 0x1E8D6 }, /* MENDE KIKAKUI COMBINING NUMBER TEENS - MENDE KIKAKUI COMBINING NUMBER MILLIONS */ + { 0x1E944, 0x1E94A }, /* ADLAM ALIF LENGTHENER - ADLAM NUKTA */ + { 0x1F3FB, 0x1F3FF }, /* EMOJI MODIFIER FITZPATRICK TYPE-1-2 - EMOJI MODIFIER FITZPATRICK TYPE-6 */ + { 0x1F9B0, 0x1F9B3 }, /* EMOJI COMPONENT RED HAIR - EMOJI COMPONENT WHITE HAIR */ + { 0xE0001, 0xE0001 }, /* LANGUAGE TAG */ + { 0xE0020, 0xE007F }, /* TAG SPACE - CANCEL TAG */ + { 0xE0100, 0xE01EF }, /* VARIATION SELECTOR-17 - VARIATION SELECTOR-256 */ +}; + +/* Double-width character ranges (BMP - Basic Multilingual Plane, U+0000 to U+FFFF) */ +static const struct interval16 double_width_bmp[] = { + { 0x1100, 0x115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ + { 0x231A, 0x231B }, /* WATCH - HOURGLASS */ + { 0x2329, 0x232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ + { 0x23E9, 0x23EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ + { 0x23F0, 0x23F0 }, /* ALARM CLOCK */ + { 0x23F3, 0x23F3 }, /* HOURGLASS WITH FLOWING SAND */ + { 0x25FD, 0x25FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ + { 0x2614, 0x2615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ + { 0x2630, 0x2637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ + { 0x2648, 0x2653 }, /* ARIES - PISCES */ + { 0x267F, 0x267F }, /* WHEELCHAIR SYMBOL */ + { 0x268A, 0x268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ + { 0x2693, 0x2693 }, /* ANCHOR */ + { 0x26A1, 0x26A1 }, /* HIGH VOLTAGE SIGN */ + { 0x26AA, 0x26AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ + { 0x26BD, 0x26BE }, /* SOCCER BALL - BASEBALL */ + { 0x26C4, 0x26C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ + { 0x26CE, 0x26CE }, /* OPHIUCHUS */ + { 0x26D4, 0x26D4 }, /* NO ENTRY */ + { 0x26EA, 0x26EA }, /* CHURCH */ + { 0x26F2, 0x26F3 }, /* FOUNTAIN - FLAG IN HOLE */ + { 0x26F5, 0x26F5 }, /* SAILBOAT */ + { 0x26FA, 0x26FA }, /* TENT */ + { 0x26FD, 0x26FD }, /* FUEL PUMP */ + { 0x2705, 0x2705 }, /* WHITE HEAVY CHECK MARK */ + { 0x270A, 0x270B }, /* RAISED FIST - RAISED HAND */ + { 0x2728, 0x2728 }, /* SPARKLES */ + { 0x274C, 0x274C }, /* CROSS MARK */ + { 0x274E, 0x274E }, /* NEGATIVE SQUARED CROSS MARK */ + { 0x2753, 0x2755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ + { 0x2757, 0x2757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ + { 0x2795, 0x2797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ + { 0x27B0, 0x27B0 }, /* CURLY LOOP */ + { 0x27BF, 0x27BF }, /* DOUBLE CURLY LOOP */ + { 0x2B1B, 0x2B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ + { 0x2B50, 0x2B50 }, /* WHITE MEDIUM STAR */ + { 0x2B55, 0x2B55 }, /* HEAVY LARGE CIRCLE */ + { 0x2E80, 0x2E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ + { 0x2E9B, 0x2EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ + { 0x2F00, 0x2FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ + { 0x2FF0, 0x3029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ + { 0x3030, 0x303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ + { 0x3041, 0x3096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ + { 0x309B, 0x30FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ + { 0x3105, 0x312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ + { 0x3131, 0x318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ + { 0x3190, 0x31E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ + { 0x31EF, 0x321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ + { 0x3220, 0x3247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ + { 0x3250, 0xA48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ + { 0xA490, 0xA4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ + { 0xA960, 0xA97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ + { 0xAC00, 0xD7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ + { 0xF900, 0xFAFF }, /* U+F900 - U+FAFF */ + { 0xFE10, 0xFE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ + { 0xFE30, 0xFE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ + { 0xFE54, 0xFE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ + { 0xFE68, 0xFE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ + { 0xFF01, 0xFF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ + { 0xFFE0, 0xFFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ }; -/* Double-width character ranges */ -static const struct interval double_width_ranges[] = { - { 0x01100, 0x0115F }, /* HANGUL CHOSEONG KIYEOK - HANGUL CHOSEONG FILLER */ - { 0x0231A, 0x0231B }, /* WATCH - HOURGLASS */ - { 0x02329, 0x0232A }, /* LEFT-POINTING ANGLE BRACKET - RIGHT-POINTING ANGLE BRACKET */ - { 0x023E9, 0x023EC }, /* BLACK RIGHT-POINTING DOUBLE TRIANGLE - BLACK DOWN-POINTING DOUBLE TRIANGLE */ - { 0x023F0, 0x023F0 }, /* ALARM CLOCK */ - { 0x023F3, 0x023F3 }, /* HOURGLASS WITH FLOWING SAND */ - { 0x025FD, 0x025FE }, /* WHITE MEDIUM SMALL SQUARE - BLACK MEDIUM SMALL SQUARE */ - { 0x02614, 0x02615 }, /* UMBRELLA WITH RAIN DROPS - HOT BEVERAGE */ - { 0x02630, 0x02637 }, /* TRIGRAM FOR HEAVEN - TRIGRAM FOR EARTH */ - { 0x02648, 0x02653 }, /* ARIES - PISCES */ - { 0x0267F, 0x0267F }, /* WHEELCHAIR SYMBOL */ - { 0x0268A, 0x0268F }, /* MONOGRAM FOR YANG - DIGRAM FOR GREATER YIN */ - { 0x02693, 0x02693 }, /* ANCHOR */ - { 0x026A1, 0x026A1 }, /* HIGH VOLTAGE SIGN */ - { 0x026AA, 0x026AB }, /* MEDIUM WHITE CIRCLE - MEDIUM BLACK CIRCLE */ - { 0x026BD, 0x026BE }, /* SOCCER BALL - BASEBALL */ - { 0x026C4, 0x026C5 }, /* SNOWMAN WITHOUT SNOW - SUN BEHIND CLOUD */ - { 0x026CE, 0x026CE }, /* OPHIUCHUS */ - { 0x026D4, 0x026D4 }, /* NO ENTRY */ - { 0x026EA, 0x026EA }, /* CHURCH */ - { 0x026F2, 0x026F3 }, /* FOUNTAIN - FLAG IN HOLE */ - { 0x026F5, 0x026F5 }, /* SAILBOAT */ - { 0x026FA, 0x026FA }, /* TENT */ - { 0x026FD, 0x026FD }, /* FUEL PUMP */ - { 0x02705, 0x02705 }, /* WHITE HEAVY CHECK MARK */ - { 0x0270A, 0x0270B }, /* RAISED FIST - RAISED HAND */ - { 0x02728, 0x02728 }, /* SPARKLES */ - { 0x0274C, 0x0274C }, /* CROSS MARK */ - { 0x0274E, 0x0274E }, /* NEGATIVE SQUARED CROSS MARK */ - { 0x02753, 0x02755 }, /* BLACK QUESTION MARK ORNAMENT - WHITE EXCLAMATION MARK ORNAMENT */ - { 0x02757, 0x02757 }, /* HEAVY EXCLAMATION MARK SYMBOL */ - { 0x02795, 0x02797 }, /* HEAVY PLUS SIGN - HEAVY DIVISION SIGN */ - { 0x027B0, 0x027B0 }, /* CURLY LOOP */ - { 0x027BF, 0x027BF }, /* DOUBLE CURLY LOOP */ - { 0x02B1B, 0x02B1C }, /* BLACK LARGE SQUARE - WHITE LARGE SQUARE */ - { 0x02B50, 0x02B50 }, /* WHITE MEDIUM STAR */ - { 0x02B55, 0x02B55 }, /* HEAVY LARGE CIRCLE */ - { 0x02E80, 0x02E99 }, /* CJK RADICAL REPEAT - CJK RADICAL RAP */ - { 0x02E9B, 0x02EF3 }, /* CJK RADICAL CHOKE - CJK RADICAL C-SIMPLIFIED TURTLE */ - { 0x02F00, 0x02FD5 }, /* KANGXI RADICAL ONE - KANGXI RADICAL FLUTE */ - { 0x02FF0, 0x03029 }, /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT - HANGZHOU NUMERAL NINE */ - { 0x03030, 0x0303E }, /* WAVY DASH - IDEOGRAPHIC VARIATION INDICATOR */ - { 0x03041, 0x03096 }, /* HIRAGANA LETTER SMALL A - HIRAGANA LETTER SMALL KE */ - { 0x0309B, 0x030FF }, /* KATAKANA-HIRAGANA VOICED SOUND MARK - KATAKANA DIGRAPH KOTO */ - { 0x03105, 0x0312F }, /* BOPOMOFO LETTER B - BOPOMOFO LETTER NN */ - { 0x03131, 0x0318E }, /* HANGUL LETTER KIYEOK - HANGUL LETTER ARAEAE */ - { 0x03190, 0x031E5 }, /* IDEOGRAPHIC ANNOTATION LINKING MARK - CJK STROKE SZP */ - { 0x031EF, 0x0321E }, /* IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION - PARENTHESIZED KOREAN CHARACTER O HU */ - { 0x03220, 0x03247 }, /* PARENTHESIZED IDEOGRAPH ONE - CIRCLED IDEOGRAPH KOTO */ - { 0x03250, 0x0A48C }, /* PARTNERSHIP SIGN - YI SYLLABLE YYR */ - { 0x0A490, 0x0A4C6 }, /* YI RADICAL QOT - YI RADICAL KE */ - { 0x0A960, 0x0A97C }, /* HANGUL CHOSEONG TIKEUT-MIEUM - HANGUL CHOSEONG SSANGYEORINHIEUH */ - { 0x0AC00, 0x0D7A3 }, /* HANGUL SYLLABLE GA - HANGUL SYLLABLE HIH */ - { 0x0F900, 0x0FAFF }, /* U+0F900 - U+0FAFF */ - { 0x0FE10, 0x0FE19 }, /* PRESENTATION FORM FOR VERTICAL COMMA - PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS */ - { 0x0FE30, 0x0FE52 }, /* PRESENTATION FORM FOR VERTICAL TWO DOT LEADER - SMALL FULL STOP */ - { 0x0FE54, 0x0FE66 }, /* SMALL SEMICOLON - SMALL EQUALS SIGN */ - { 0x0FE68, 0x0FE6B }, /* SMALL REVERSE SOLIDUS - SMALL COMMERCIAL AT */ - { 0x0FF01, 0x0FF60 }, /* FULLWIDTH EXCLAMATION MARK - FULLWIDTH RIGHT WHITE PARENTHESIS */ - { 0x0FFE0, 0x0FFE6 }, /* FULLWIDTH CENT SIGN - FULLWIDTH WON SIGN */ - { 0x16FE0, 0x16FE3 }, /* U+16FE0 - U+16FE3 */ +/* Double-width character ranges (non-BMP, U+10000 and above) */ +static const struct interval32 double_width_non_bmp[] = { + { 0x16FE0, 0x16FE3 }, /* TANGUT ITERATION MARK - OLD CHINESE ITERATION MARK */ { 0x17000, 0x187F7 }, /* U+17000 - U+187F7 */ - { 0x18800, 0x18CD5 }, /* U+18800 - U+18CD5 */ + { 0x18800, 0x18CD5 }, /* TANGUT COMPONENT-001 - KHITAN SMALL SCRIPT CHARACTER-18CD5 */ { 0x18CFF, 0x18D08 }, /* U+18CFF - U+18D08 */ - { 0x1AFF0, 0x1AFF3 }, /* U+1AFF0 - U+1AFF3 */ - { 0x1AFF5, 0x1AFFB }, /* U+1AFF5 - U+1AFFB */ - { 0x1AFFD, 0x1AFFE }, /* U+1AFFD - U+1AFFE */ - { 0x1B000, 0x1B122 }, /* U+1B000 - U+1B122 */ - { 0x1B132, 0x1B132 }, /* U+1B132 */ - { 0x1B150, 0x1B152 }, /* U+1B150 - U+1B152 */ - { 0x1B155, 0x1B155 }, /* U+1B155 */ - { 0x1B164, 0x1B167 }, /* U+1B164 - U+1B167 */ - { 0x1B170, 0x1B2FB }, /* U+1B170 - U+1B2FB */ - { 0x1D300, 0x1D356 }, /* U+1D300 - U+1D356 */ - { 0x1D360, 0x1D376 }, /* U+1D360 - U+1D376 */ + { 0x1AFF0, 0x1AFF3 }, /* KATAKANA LETTER MINNAN TONE-2 - KATAKANA LETTER MINNAN TONE-5 */ + { 0x1AFF5, 0x1AFFB }, /* KATAKANA LETTER MINNAN TONE-7 - KATAKANA LETTER MINNAN NASALIZED TONE-5 */ + { 0x1AFFD, 0x1AFFE }, /* KATAKANA LETTER MINNAN NASALIZED TONE-7 - KATAKANA LETTER MINNAN NASALIZED TONE-8 */ + { 0x1B000, 0x1B122 }, /* KATAKANA LETTER ARCHAIC E - KATAKANA LETTER ARCHAIC WU */ + { 0x1B132, 0x1B132 }, /* HIRAGANA LETTER SMALL KO */ + { 0x1B150, 0x1B152 }, /* HIRAGANA LETTER SMALL WI - HIRAGANA LETTER SMALL WO */ + { 0x1B155, 0x1B155 }, /* KATAKANA LETTER SMALL KO */ + { 0x1B164, 0x1B167 }, /* KATAKANA LETTER SMALL WI - KATAKANA LETTER SMALL N */ + { 0x1B170, 0x1B2FB }, /* NUSHU CHARACTER-1B170 - NUSHU CHARACTER-1B2FB */ + { 0x1D300, 0x1D356 }, /* MONOGRAM FOR EARTH - TETRAGRAM FOR FOSTERING */ + { 0x1D360, 0x1D376 }, /* COUNTING ROD UNIT DIGIT ONE - IDEOGRAPHIC TALLY MARK FIVE */ { 0x1F000, 0x1F02F }, /* U+1F000 - U+1F02F */ { 0x1F0A0, 0x1F0FF }, /* U+1F0A0 - U+1F0FF */ - { 0x1F18E, 0x1F18E }, /* U+1F18E */ - { 0x1F191, 0x1F19A }, /* U+1F191 - U+1F19A */ - { 0x1F200, 0x1F202 }, /* U+1F200 - U+1F202 */ - { 0x1F210, 0x1F23B }, /* U+1F210 - U+1F23B */ - { 0x1F240, 0x1F248 }, /* U+1F240 - U+1F248 */ - { 0x1F250, 0x1F251 }, /* U+1F250 - U+1F251 */ - { 0x1F260, 0x1F265 }, /* U+1F260 - U+1F265 */ - { 0x1F300, 0x1F3FA }, /* U+1F300 - U+1F3FA */ - { 0x1F400, 0x1F64F }, /* U+1F400 - U+1F64F */ - { 0x1F680, 0x1F9AF }, /* U+1F680 - U+1F9AF */ + { 0x1F18E, 0x1F18E }, /* NEGATIVE SQUARED AB */ + { 0x1F191, 0x1F19A }, /* SQUARED CL - SQUARED VS */ + { 0x1F200, 0x1F202 }, /* SQUARE HIRAGANA HOKA - SQUARED KATAKANA SA */ + { 0x1F210, 0x1F23B }, /* SQUARED CJK UNIFIED IDEOGRAPH-624B - SQUARED CJK UNIFIED IDEOGRAPH-914D */ + { 0x1F240, 0x1F248 }, /* TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C - TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557 */ + { 0x1F250, 0x1F251 }, /* CIRCLED IDEOGRAPH ADVANTAGE - CIRCLED IDEOGRAPH ACCEPT */ + { 0x1F260, 0x1F265 }, /* ROUNDED SYMBOL FOR FU - ROUNDED SYMBOL FOR CAI */ + { 0x1F300, 0x1F3FA }, /* CYCLONE - AMPHORA */ + { 0x1F400, 0x1F64F }, /* RAT - PERSON WITH FOLDED HANDS */ + { 0x1F680, 0x1F9AF }, /* ROCKET - PROBING CANE */ { 0x1F9B4, 0x1FAFF }, /* U+1F9B4 - U+1FAFF */ { 0x20000, 0x2FFFD }, /* U+20000 - U+2FFFD */ { 0x30000, 0x3FFFD }, /* U+30000 - U+3FFFD */ }; -static int ucs_cmp(const void *key, const void *element) +static int ucs_cmp16(const void *key, const void *element) +{ + uint16_t cp = *(uint16_t *)key; + const struct interval16 *e = element; + + if (cp > e->last) + return 1; + if (cp < e->first) + return -1; + return 0; +} + +static int ucs_cmp32(const void *key, const void *element) { uint32_t cp = *(uint32_t *)key; - const struct interval *e = element; + const struct interval32 *e = element; if (cp > e->last) return 1; @@ -466,13 +491,22 @@ static int ucs_cmp(const void *key, const void *element) return 0; } -static bool is_in_interval(uint32_t cp, const struct interval *intervals, size_t count) +static bool is_in_interval16(uint16_t cp, const struct interval16 *intervals, size_t count) +{ + if (cp < intervals[0].first || cp > intervals[count - 1].last) + return false; + + return __inline_bsearch(&cp, intervals, count, + sizeof(*intervals), ucs_cmp16) != NULL; +} + +static bool is_in_interval32(uint32_t cp, const struct interval32 *intervals, size_t count) { if (cp < intervals[0].first || cp > intervals[count - 1].last) return false; return __inline_bsearch(&cp, intervals, count, - sizeof(*intervals), ucs_cmp) != NULL; + sizeof(*intervals), ucs_cmp32) != NULL; } /** @@ -483,7 +517,9 @@ static bool is_in_interval(uint32_t cp, const struct interval *intervals, size_t */ bool ucs_is_zero_width(uint32_t cp) { - return is_in_interval(cp, zero_width_ranges, ARRAY_SIZE(zero_width_ranges)); + return (cp <= 0xFFFF) + ? is_in_interval16(cp, zero_width_bmp, ARRAY_SIZE(zero_width_bmp)) + : is_in_interval32(cp, zero_width_non_bmp, ARRAY_SIZE(zero_width_non_bmp)); } /** @@ -494,5 +530,7 @@ bool ucs_is_zero_width(uint32_t cp) */ bool ucs_is_double_width(uint32_t cp) { - return is_in_interval(cp, double_width_ranges, ARRAY_SIZE(double_width_ranges)); + return (cp <= 0xFFFF) + ? is_in_interval16(cp, double_width_bmp, ARRAY_SIZE(double_width_bmp)) + : is_in_interval32(cp, double_width_non_bmp, ARRAY_SIZE(double_width_non_bmp)); } From patchwork Thu Apr 10 01:14:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880086 Received: from fhigh-a1-smtp.messagingengine.com (fhigh-a1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AE0B1A2C3A; Thu, 10 Apr 2025 01:19:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247948; cv=none; b=HSepUIVr+rZHs9HshCOTnPMOpv2k/l/KBtLQopbH8gCKJu9Onou0rywN9UkzgT/TcfD/0kiRVl+LMmEwGqUW7jjG8uh/T1KY5FxC+m+FXLYOk00naZkaALfIZ3RqJ40EdmBqdRahfJkhIFS+FqRS+LX/YwMqDGoTDCUoky4GhMc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744247948; c=relaxed/simple; bh=/Y85mHkgYi5NV6tiAamTtPpChyrVUnAlTn58mog05iI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S5rT32rdDtiq45hterQWds91uYftr+eAQxwezeVQpBqltgj+CpXRD2WsJ35uUPPvMKsO6coo8qJgCjzmU+s0db6YQk62uhYIxvO+3aUzLHuMksXh6V4hAY1d2/7QDqUdsK/u5/e9H4HICKuD9qhtdNfp5XYFbjfrD6fyz4pAOW4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=isrpxXJe; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=t+K8wkck; arc=none smtp.client-ip=103.168.172.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="isrpxXJe"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="t+K8wkck" Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.phl.internal (Postfix) with ESMTP id 1307F1140299; Wed, 9 Apr 2025 21:19:04 -0400 (EDT) Received: from phl-frontend-01 ([10.202.2.160]) by phl-compute-01.internal (MEProxy); Wed, 09 Apr 2025 21:19:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1744247944; x= 1744334344; bh=ixNtqElVlHFv8vJYBrN2UgLQXSpShh/TyAFUVWLQKrw=; b=i srpxXJeV0XTV2yZAIl6BQ8CbVo+q4ZaHsVIffubTk2Vm4NFGMDnA4eM41YH0FUFK vYDZGL4zNLyKknMEIb+4RkC6zYJf5MgBd389pXyZVUjgDa/6S5rThlvl4rgwTDoE O99gM8MPLI6apv3JQwXLkPq0iITJFhTbtEV3qR6rC45Tt+W8YbWd9jk5g+VHsbKb PZpLZjBeJYVE/+/xH38Ao2DGPE54yMUWbEdBcV7j8NiVn8nke9GzDzmimDYa98BV dmJsj20Lo0+q2jSp9hGeLqi3aymRl7o5WjE2cFYzsTiNVbFMy2rIZOC2s5z724YV 0aiQruBKlG8sgvKcXx6vQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744247944; x=1744334344; bh=i xNtqElVlHFv8vJYBrN2UgLQXSpShh/TyAFUVWLQKrw=; b=t+K8wkck8CZTT2Tz0 ie6UYfCqx5khSerUPe1CDF8pLZ/3+exgFTFWq0O7Gsx9jv961v2JslPv3KfMLWgW PAgBOm5AQgrDofZ5KgbyPyf3A2MWcFMxSYosDIxheEotRhFxgP+ypewtpBIcQnkA EXEsuLNClQ1zjXspZOuljG/FkbD9XBHCSFDeAlxZnwAYYi+WG/hjgQCO0L3QZWtR zQa1IXOQTC6amFPGjeUW+1A1soT028OkO27JKijbaqRW6U/J4kK5bUfa84T3/Hz9 pRwsB8IpeMlHVdiF2BPuhrKYL9ZBeNWh1gBxvf8DAC7lCEEIyouxaiSCGivXlX2I n3Cag== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdejheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredt tdenucfhrhhomheppfhitgholhgrshcurfhithhrvgcuoehnihgtohesfhhluhignhhitg drnhgvtheqnecuggftrfgrthhtvghrnheptdejueeiieehieeuffduvdffleehkeelgeek udekfeffhfduffdugedvteeihfetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepnhhitghosehflhhugihnihgtrdhnvghtpdhnsggprhgtphht thhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehnphhithhrvgessggrhi hlihgsrhgvrdgtohhmpdhrtghpthhtohepjhhirhhishhlrggshieskhgvrhhnvghlrdho rhhgpdhrtghpthhtohepghhrvghgkhhhsehlihhnuhigfhhouhhnuggrthhiohhnrdhorh hgpdhrtghpthhtohepuggrvhgvsehmihgvlhhkvgdrtggtpdhrtghpthhtoheplhhinhhu gidqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinh hugidqshgvrhhirghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 1A60410D8B80; Wed, 9 Apr 2025 21:19:03 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby Cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 11/11] vt: pad double-width code points with a zero-white-space Date: Wed, 9 Apr 2025 21:14:03 -0400 Message-ID: <20250410011839.64418-12-nico@fluxnic.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre In the Unicode screen buffer, we follow double-width code points with a space to maintain proper column alignment. This, however, creates semantic problems when e.g. using cut and paste or selection. Let's use a better code point for the column padding's purpose i.e. a zero-white-space rather than a full space. Signed-off-by: Nicolas Pitre --- drivers/tty/vt/vt.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index e3d35c4f92..dc84f9c6b7 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2937,12 +2937,13 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, width = 2; } else if (ucs_is_zero_width(c)) { prev_c = vc_uniscr_getc(vc, -1); - if (prev_c == ' ' && + if (prev_c == 0x200B && ucs_is_double_width(vc_uniscr_getc(vc, -2))) { /* * Let's merge this zero-width code point with * the preceding double-width code point by - * replacing the existing whitespace padding. + * replacing the existing zero-white-space + * padding. */ vc_con_rewind(vc); } else if (c == 0xfe0f && prev_c != 0) { @@ -3040,7 +3041,11 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, tc = conv_uni_to_pc(vc, ' '); if (tc < 0) tc = ' '; - next_c = ' '; + /* + * Store a zero-white-space in the Unicode screen given that + * the previous code point is semantically double-width. + */ + next_c = 0x200B; } out: From patchwork Thu Apr 10 19:38:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 880886 Received: from fhigh-a4-smtp.messagingengine.com (fhigh-a4-smtp.messagingengine.com [103.168.172.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0BC51E5206; Thu, 10 Apr 2025 19:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744313899; cv=none; b=nb/4d3I/Z8nLFGtXVhof6VbJNTvn36jjeudimv+wLX3JFDFFxC4/Low+H9lO5Fu5MSCHn4+HGeulwlSg17n1xeKBjn+Go4ocTpUbWZoNPD0t6jDesRMv6oWHmZAmq0MH5K9eG8ML+RWRLnqrfCvcRfxJwNwseGOg87udYRg8QBs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744313899; c=relaxed/simple; bh=GvnGRvIkZRasWUD1RFkAElt1RcGj5qgfsOegZr2uSmE=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=kVultGfpqtp2UBF2i/nUNeUTumyR59LfF4ZaNaGat9PJEnGjXtMoZaObJRW6qYSoNBu6bJ+GzK5u63oKJo+cibhfw1nDfa+DmaT0zRjueGzokni9vdRTh9g8Z7TJYRgZ536HTOhJozhZ4LuaSsAeRT6STRjAVQ4X3dT4kENx9yY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=bBg0dh1Q; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=Xc1heRjr; arc=none smtp.client-ip=103.168.172.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="bBg0dh1Q"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Xc1heRjr" Received: from phl-compute-11.internal (phl-compute-11.phl.internal [10.202.2.51]) by mailfhigh.phl.internal (Postfix) with ESMTP id 762E911400D8; Thu, 10 Apr 2025 15:38:15 -0400 (EDT) Received: from phl-frontend-02 ([10.202.2.161]) by phl-compute-11.internal (MEProxy); Thu, 10 Apr 2025 15:38:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fluxnic.net; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1744313895; x=1744400295; bh=Br48fJmS9v Auob4t1uWv3/F/2XSGgECcK3NoQZM6ImA=; b=bBg0dh1QZdlPax9wets0Nv20AQ wIr+4gX9T2awVw7/GjO6lPyjMB5Spn9TB6Yl255iagrYWFIEPHGgjUj415OzBabK +Op2f73YeL1TEbZ8RGiTa8NtTRB4usR0G/S+3Q6ZrITPXMN1DrihGPzxuC1itu+N PZKiaVGsnucSIiShX9s3hJFIAmqd+x8BkqmSSCc06fr3LW4dMclUfi8SMNTpwBkX f8E0xNYXm0ITmaWdhV4v3HLKr6MxGxYExoXzHUgVN5j4SUAHXiVgBgHAONiPD8WE 9IwwVFyJXrrr/lG+0E5q5NV/YMESNl+dFZNLIeASdUM9VFTR7mYGWbYUWtsg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1744313895; x=1744400295; bh=Br48fJmS9vAuob4t1uWv3/F/2XSGgECcK3N oQZM6ImA=; b=Xc1heRjrg61lw53vEb2ZF7MNVXcb5WZ8aXNXJbJx4LITtomEGxm 5PlrKaQddo1tKcUIRrklaM4j3t0sfydfeSHXoBD+gg2yEwQ0uPAVf5R0bYf2+w6S Ww5zmoTiMbEfaYnLvMJTARhPfzRUg6kHDpUthbyUJsbXKark+kQa0YPzD4k45j5I MaaDEpTCJr3Br9zxaJa4lzyM/rMupzBbo5zg27Rcye2WK8L32DX6nEAbB7EIuYcx zKd+8YMu4u84hwb7D1pU076J7UmHRl8yLIgIHZncrgd8jrugwm1QuvYhb06TR7Iz 7luz+WlLh47JU1fmuI3dO6pEFy2dIFAVFhg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdeljeejucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevufgjkfhfgggtsehttdertddttddv necuhfhrohhmpefpihgtohhlrghsucfrihhtrhgvuceonhhitghosehflhhugihnihgtrd hnvghtqeenucggtffrrghtthgvrhhnpefgvedvhfefueejgefggfefhfelffeiieduvdeh ffduheduffekkefhgeffhfefveenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpehnihgtohesfhhluhignhhitgdrnhgvthdpnhgspghrtghpthht ohepiedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepnhhpihhtrhgvsegsrgihlh hisghrvgdrtghomhdprhgtphhtthhopehjihhrihhslhgrsgihsehkvghrnhgvlhdrohhr ghdprhgtphhtthhopehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdrohhrgh dprhgtphhtthhopegurghvvgesmhhivghlkhgvrdgttgdprhgtphhtthhopehlihhnuhig qdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihhnuh igqdhsvghrihgrlhesvhhgvghrrdhkvghrnhgvlhdrohhrgh X-ME-Proxy: Feedback-ID: i58514971:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 10 Apr 2025 15:38:14 -0400 (EDT) Received: from xanadu (xanadu.lan [192.168.1.120]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 0B62D10DB8BC; Thu, 10 Apr 2025 15:38:14 -0400 (EDT) Date: Thu, 10 Apr 2025 15:38:13 -0400 (EDT) From: Nicolas Pitre To: Greg Kroah-Hartman , Jiri Slaby cc: Nicolas Pitre , Dave Mielke , linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 12/11] vt: remove zero-white-space handling from conv_uni_to_pc() In-Reply-To: <20250410011839.64418-1-nico@fluxnic.net> Message-ID: <6o2ss437-6nps-s943-1n38-54np5587r08s@syhkavp.arg> References: <20250410011839.64418-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Nicolas Pitre This is now taken care of by ucs_is_zero_width(). And in the case where we do want a padding from some zero-width code point then we should also give the legacy displays a space character to work with. Signed-off-by: Nicolas Pitre --- This is a fix for a small issue discovered during everyday usage. I didn't think it is worth resending the whole series for this but if you prefer otherwise please let me know. diff --git a/drivers/tty/vt/consolemap.c b/drivers/tty/vt/consolemap.c index 82d70083fe..bb4bb272eb 100644 --- a/drivers/tty/vt/consolemap.c +++ b/drivers/tty/vt/consolemap.c @@ -870,8 +870,6 @@ int conv_uni_to_pc(struct vc_data *conp, long ucs) return -4; /* Not found */ else if (ucs < 0x20) return -1; /* Not a printable character */ - else if (ucs == 0xfeff || (ucs >= 0x200b && ucs <= 0x200f)) - return -2; /* Zero-width space */ /* * UNI_DIRECT_BASE indicates the start of the region in the User Zone * which always has a 1:1 mapping to the currently loaded font. The diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index dc84f9c6b7..0d1d663c78 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -2964,13 +2964,15 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c, goto out; } } + /* padding for the legacy display like done below */ + tc = ' '; } } /* Now try to find out how to display it */ tc = conv_uni_to_pc(vc, tc); if (tc & ~charmask) { - if (tc == -1 || tc == -2) + if (tc == -1) return -1; /* nothing to display */ /* Glyph not found */