Unicode in qooxdoo

In a recent post I wrote about I18N in qooxdoo. Another topic in that vein is Unicode handling.

Unicode is a general standard on encoding characters from natural languages all over the world. Basically, it is enumerating all those characters, starting from 0 to well beyond 1 million. On computers, these numbers are then instantiated, again using one of several encodings. One of the most popular of these encodings is UTF-8, a variable-length encoding where one character (actually, the Unicode number of that character which is called a code point) is encoded in one to up to four octets (8-bit bytes). The encoding goes like this:

Scalar Value                 First Byte Second Byte Third Byte Fourth Byte
00000000 0xxxxxxx            0xxxxxxx
00000yyy yyxxxxxx            110yyyyy   10xxxxxx
zzzzyyyy yyxxxxxx            1110zzzz   10yyyyyy    10xxxxxx
000uuuuu zzzzyyyy yyxxxxxx   11110uuu   10uuzzzz    10yyyyyy   10xxxxxx

(source: unicode.org)

The table shows how bits are distributed from the original code point (“Scalar Value”) to the bytes representing it. There are other encodings, like the fixed-length UCS encodings. The nice thing about UTF-8 is that it is space-efficient and, maybe more importantly, fully backwards compatible with ASCII: a single-byte UTF-8 value is identical to its ASCII value (7-bit ASCII, that is). For code points beyond that a part of each byte is just tagging that byte (the leading 1* bits), while the lower portion of the byte holds the actual code point bits.

qooxdoo treats all source code files and generated output, in fact all text files, as UTF-8. This assures, among other things, that you can use all kinds of funny characters in string literals and comments in your code. Unicode characters in arguments to tr() are passed correctly to the corresponding .po files, and their Unicode translations are correctly retrieved at run time. Since browsers usually support UTF-8 they are even rendered nicely on screen :-) .

5 thoughts on “Unicode in qooxdoo

  1. Pingback: qooxdoo » News » The week in qooxdoo (2009-02-27)

  2. Pingback: qooxdoo » News » Generator and Unicode Application Name Spaces

  3. Na minha aplicação usando o utf8 não esta aceitando os acentos, sera porque?