International features of C99
In reality, C wasn’t suitable for non-English countries for many years
But, C99 allows us to embed characters from the universal character set into the source code such as
Greek letters
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΧ
Arabic letters
اب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك
Russian letters
А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х
Japanese letters
あぁかさたなはまやゃらわがざだばぱいぃきしちにひみりゐぎじぢ
Char | Unicode | Escape sequence | Escape sequence | Html numeric code | Char name |
---|---|---|---|---|---|
ء | U+0621 | \u0621 | \U00000621 | ء | Arabic Letter Hamza |
ا | U+0627 | \u0627 | \U00000627 | ا | Arabic Letter Alef |
The universal character set(UCS) which is closely related to Unicode and C99 provides us with a special feature called universal character names furthermore, the amazing universal character names help us to use UCS characters in the source code of a program
To clarify, the universal character name resembles an escape sequence
Native languages with C99
In short, C99 allows us to use native languages to define variables and function names in the source code of a program
Notation for universal character names
There are two notation
First, is \Udddddddd[ ا or \U00000627]
The second is \udddd[ ا or \u0627]
Where d is a hexadecimal digit
To demonstrate, let’s see an example
#include <stdio.h>
#include <string.h>
int main()
{
char str[]="\u062E";//ARABIC LETTER KHAH خ
int i;
int len =strlen(str);
for(i=0;i<len; i++)
printf("%c",str[i]);
return 0;
return 0;
}
Encoding of Unicode
The Unicode assigns a unique number that we called a code point to each character so, there are many ways to represent these code points using bytes
One of it uses wide charaters(UCS-2) and the other uses multibyte characters (UTF-8)
The UCS code point for the Arabic letter خ ARABIC LETTER KHAH is 0000062E at this time, the universal characters name for this character is \U0000062E OR \U0000062e
As can be seen, the first four hexadecimal digits of the UCS point are 0 which means we can also use \u notation for writing a character as \u062E OR \u062e
Leave a Reply