Universal character names with C99

International features of C99

In reality, C wasn’t suitable for non-English countries for many years

But, C99 allows us to embed characters from the universal character set into the source code such as

Greek letters

ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΧ

Arabic letters

اب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك

Russian letters

А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х

Japanese  letters

あぁかさたなはまやゃらわがざだばぱいぃきしちにひみりゐぎじぢ

CharUnicodeEscape sequenceEscape sequenceHtml numeric codeChar name
ءU+0621\u0621\U00000621ءArabic Letter Hamza
اU+0627\u0627\U00000627
ا
Arabic Letter Alef

The universal character set(UCS) which is closely related to Unicode and C99 provides us with a special feature called universal character names furthermore, the amazing universal character names help us to use UCS characters in the source code of a program

To clarify, the universal character name resembles an escape sequence

Native languages with C99

In short, C99 allows us to use native languages to define variables and function names in the source code of a program

Notation for universal character names

There are two notation

First, is \Udddddddd[ ا  or \U00000627]

The second is \udddd[ ا  or \u0627]

Where d is a hexadecimal digit

To demonstrate, let’s see an example

#include <stdio.h>
#include <string.h>

int main()
{
    char str[]="\u062E";//ARABIC LETTER KHAH خ
    int i;
    int len =strlen(str);
    for(i=0;i<len; i++)
    printf("%c",str[i]);
    return 0;

    return 0;
}

Encoding of Unicode

The Unicode assigns a unique number that we called a code point to each character so, there are many ways to represent these code points using bytes

One of it uses wide charaters(UCS-2) and the other uses multibyte characters (UTF-8)

The UCS code point for the Arabic letter خ ARABIC LETTER KHAH is 0000062E at this time, the universal characters name for this character is \U0000062E OR \U0000062e

As can be seen, the first four hexadecimal digits of the UCS point are 0 which means we can also use \u notation for writing a character as \u062E OR \u062e

See this link, for all Universal character names codes

Mohammed Anees

Hey there, welcome to aneescraftsmanship I am Mohammed Anees an independent developer/blogger. I like to share and discuss the craft with others plus the things which I have learned because I believe that through discussion and sharing a new world opens up

Leave a Reply

Your email address will not be published.