The PNG Guide is an eBook based on Greg Roelofs' book, originally published by O'Reilly. |
![]() |
Home ![]() ![]() |
||||||||||||||||||||||
![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||||||||||||||
International Text Annotations (iTXt)
The layout of iTXt is a generalization of tEXt and zTXt, as shown in Table 11-2.
The first field is a keyword, with exactly the same restrictions and officially registered values (Author, Description, and so on) as the tEXt and zTXt chunks. Latin-1 (ISO/IEC 8859-1) was chosen so that existing PNG source code could be used without modification to parse and optionally recognize the keyword. The keyword is followed by a null separator byte and two compression-related bytes. The first indicates whether the main text is compressed (if its value is 1) or not (if it's 0). If the text is compressed, the next byte indicates its compression method, which currently must be zero for the zlib-encoded deflate algorithm. The two bytes could have been combined, but for historical reasons relating to the method byte in IHDR, the split approach was favored. After the compression bytes is an optional case-insensitive field indicating the (human) language used in the remaining two text fields. This is necessary not only to render Unicode text properly but also to allow decoders to distinguish between multiple iTXt chunks, which may consist of the same text in different languages--but possibly identical keywords. Unlike both the keyword and the main text, the language tag is plain ASCII text (specifically, the ``invariant'' ASCII subset of ISO 646, which is itself a subset of both Latin-1 and Unicode UTF-8) conforming to Internet Standard RFC 1766. It consists of hyphen-separated ``words'' of between one and eight characters each, where the first word is either a two-letter ISO language code (ISO 639), the letter i for tags registered by the Internet Assigned Numbers Authority (IANA)[88] or the letter x for private tags. The second ``word'' is interpreted as an ISO 3166 country code if it is exactly two characters long or as an IANA-registered code if it is between three and eight characters. Subsequent ``words'' may be anything, as long as they conform to the general rules. Examples of language tags include cn (Chinese), en-US (American English), no-bok (Norwegian bokmål or ``book language''), i-navajo (Navajo), and x-klingon (Klingon, from the fictional Star Trek universe).
A null separator byte terminates the language tag, which is followed by an optional translation of the keyword into the given language. The translated keyword is represented in the UTF-8 encoding of the Unicode character set, which is described in the International Standard ISO/IEC 10646-1, in Internet RFC 2279, and in the Unicode Consortium's reference, The Unicode Standard. Like the primary keyword, it should not contain any newline characters, and it is also followed by a null byte. The remaining chunk data is the main UTF-8 text, either zlib-compressed or not, according to the compression flag. Since its length can be determined from the chunk length, it is not null-terminated. As with the other two text chunks, newlines should be represented by single line-feed characters (decimal 10), and all other control characters (1-9, 11-31, and 127-159) are discouraged. Note, however, that UTF-8 encodings may contain any of the bytes between 128 and 159; what is discouraged is the set of Unicode characters whose four-byte integer values are 128-159.
|
||||||||||||||||||||||
Home ![]() ![]() |