In an UDO source file you can use higher
characters without
having to know how a character has to look like in a destination
format like LaTeX or Windows Help. So you can enter a German ß
or ä
without any fear, UDO converts it for you automatically.
UDO expects files containing chars of the system charset of your operating system. If you run UDO on a MS-DOS computer UDO expects text files that are written with the IBM PC character set by default. If UDO runs on an Atari computer UDO will expect the TOS character set by default.
But UDO can manage file that are written with another character set, too. You have simply to tell UDO which character set your source file uses with !code_source [<charset>].
Below is an overview of the character sets UDO knows about:
UDO supports various codepages for various systems. Below you see a list of all currently supported systems and codepages, some of which with multiple descriptors for the same codepage. It doesn't matter if you use these descriptors upper- or lowercase. (The descriptors base on the former UDO descriptors and on those supported by the Unix command iconv.)
System | Encoding | Descriptor0 |
Unicode | UTF-8 | UTF-8 UTF8 |
Windows | Codepage 1250 | CP1250 MS-EE WINDOWS-1250 |
Codepage 1251 | CP1251 MS-CYRL RUSSIAN WINDOWS-1251 | |
Codepage 1252 | CP1252 MS-ANSI WINDOWS-1252 WIN | |
Codepage 1253 | CP1253 GREEK MS-GREEK WINDOWS-1253 | |
Codepage 1254 | CP1254 MS-TURK TURKISH WINDOWS-1254 | |
Codepage 1255 | CP1255 HEBREW MS-HEBR WINDOWS-1255 | |
Codepage 1256 | CP1256 ARABIC MS-ARAB WINDOWS-1256 | |
Codepage 1257 | CP1257 BALTIC WINBALTRIM WINDOWS-1257 | |
Codepage 1258 | CP1258 WINDOWS-1258 | |
ISO | 8859-1 | ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 LATIN1 L1 CSISOLATIN1 |
8859-2 | ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 LATIN2 L2 CSISOLATIN2 | |
8859-3 | ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 LATIN3 L3 CSISOLATIN3 | |
8859-4 | ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 LATIN4 L4 CSISOLATIN4 | |
8859-5 | ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 CYRILLIC CSISOLATINCYRILLIC | |
8859-6 | ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ARABIC CSISOLATINARABIC ASMO-708 ECMA-114 | |
8859-7 | ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 GREEK GREEK8 CSISOLATINGREEK ECMA-118 ELOT_928 | |
8859-8 | ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 HEBREW CSISOLATINHEBREW | |
8859-9 | ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 LATIN5 L5 CSISOLATIN5 TURKISH | |
8859-10 | ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 LATIN6 L6 CSISOLATIN6 NORDIC | |
8859-11 | ISO-8859-11 ISO8859-11 ISO_8859-11 THAI | |
8859-13 | ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 LATIN7 L7 CSISOLATIN7 BALTIC | |
8859-14 | ISO-8859-14 ISO-IR-199 ISO8859-14 ISO_8859-14 LATIN8 L8 CSISOLATIN8 CELTIC ISO-CELTIC | |
8859-15 | ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 LATIN9 L9 CSISOLATIN9 | |
8859-16 | ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 LATIN10 L10 CSISOLATIN10 | |
Apple | Mac Roman | MAC MACINTOSH MACROMAN CSMACINTOSH |
Mac CentEuro | MAC_CE MACCENTRALEUROPE | |
Atari | TOS | ATARI ATARIST TOS |
DOS | Codepage 437 | 437 CP437 IBM437 CSPC8CODEPAGE437 DOS |
Codepage 850 | 850 CP850 IBM850 CSPC850MULTILINGUAL OS2 | |
HP | Roman8 | HP8 HP-ROMAN8 R8 ROMAN8 CSHPROMAN8 |
NeXTStep | NeXTStep | NEXT NEXTSTEP |
Important: If you have used latin1 in your old UDO documents, you should switch it to e.g. cp1252 because UDO used to assign Windows codepage 1252 to it before version 7 which correctly assigns ISO-8859-1 to it!
When you use so-called 1-byte codepages (all codepages supported by UDO, except Unicode) and use one codepage for your UDO documents, but a different one for your output documents, you might want to keep in mind that all codepages have different settings. A codepage is a collection of 256 characters from the whole range of all characters which have been defined in the Unicode standard already.
Imagine you have created an UDO document using the DOS encoding and use DOS graphic signs, but your target format is e.g. Apple MacRoman. Then you will not be able to see your DOS graphic signs. When you have used the Hebrew letters from the Atari TOS encoding, you will not be lucky to see them in most other codepages.
In these cases we recommend to use UTF-8, if it is available for the target format. Internally, UDO keeps all codepages in Unicode format so you will be able to use e.g. the Hebrew Alef from the TOS character set and see it properly even in UTF-8 and Windows codepage 1255.
for x in `find . -name '*.cs'`; do iconv -f ISO-8859-2 -t UTF-8 $x > "$x.utf8"; rm $x; mv "$x.utf8" $x; doneThe encoding conversion cannot be done directly in the same file because it would be empty afterwards; thus we need the temporary *.utf8 files which are renamed with the original files names after the original files have been deleted.