User Manual UDO: Converting 8-bit characters

In an UDO source file you can use higher characters without having to know how a character has to look like in a destination format like LaTeX or Windows Help. So you can enter a German ß or ä without any fear, UDO converts it for you automatically.

UDO expects files containing chars of the system charset of your operating system. If you run UDO on a MS-DOS computer UDO expects text files that are written with the IBM PC character set by default. If UDO runs on an Atari computer UDO will expect the TOS character set by default.

But UDO can manage file that are written with another character set, too. You have simply to tell UDO which character set your source file uses with !code_source [<charset>].

UDO supports various codepages for various systems. Below you see a list of all currently supported systems and codepages, some of which with multiple descriptors for the same codepage. It doesn't matter if you use these descriptors upper- or lowercase. (The descriptors base on the former UDO descriptors and on those supported by the Unix command iconv.)

System	Encoding	Descriptor0
Unicode	UTF-8	UTF-8 UTF8
Windows	Codepage 1250	CP1250 MS-EE WINDOWS-1250
	Codepage 1251	CP1251 MS-CYRL RUSSIAN WINDOWS-1251
	Codepage 1252	CP1252 MS-ANSI WINDOWS-1252 WIN
	Codepage 1253	CP1253 GREEK MS-GREEK WINDOWS-1253
	Codepage 1254	CP1254 MS-TURK TURKISH WINDOWS-1254
	Codepage 1255	CP1255 HEBREW MS-HEBR WINDOWS-1255
	Codepage 1256	CP1256 ARABIC MS-ARAB WINDOWS-1256
	Codepage 1257	CP1257 BALTIC WINBALTRIM WINDOWS-1257
	Codepage 1258	CP1258 WINDOWS-1258
ISO	8859-1	ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 LATIN1 L1 CSISOLATIN1
	8859-2	ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 LATIN2 L2 CSISOLATIN2
	8859-3	ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 LATIN3 L3 CSISOLATIN3
	8859-4	ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 LATIN4 L4 CSISOLATIN4
	8859-5	ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 CYRILLIC CSISOLATINCYRILLIC
	8859-6	ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ARABIC CSISOLATINARABIC ASMO-708 ECMA-114
	8859-7	ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 GREEK GREEK8 CSISOLATINGREEK ECMA-118 ELOT_928
	8859-8	ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 HEBREW CSISOLATINHEBREW
	8859-9	ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 LATIN5 L5 CSISOLATIN5 TURKISH
	8859-10	ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 LATIN6 L6 CSISOLATIN6 NORDIC
	8859-11	ISO-8859-11 ISO8859-11 ISO_8859-11 THAI
	8859-13	ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 LATIN7 L7 CSISOLATIN7 BALTIC
	8859-14	ISO-8859-14 ISO-IR-199 ISO8859-14 ISO_8859-14 LATIN8 L8 CSISOLATIN8 CELTIC ISO-CELTIC
	8859-15	ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 LATIN9 L9 CSISOLATIN9
	8859-16	ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 LATIN10 L10 CSISOLATIN10
Apple	Mac Roman	MAC MACINTOSH MACROMAN CSMACINTOSH
	Mac CentEuro	MAC_CE MACCENTRALEUROPE
Atari	TOS	ATARI ATARIST TOS
DOS	Codepage 437	437 CP437 IBM437 CSPC8CODEPAGE437 DOS
	Codepage 850	850 CP850 IBM850 CSPC850MULTILINGUAL OS2
HP	Roman8	HP8 HP-ROMAN8 R8 ROMAN8 CSHPROMAN8
NeXTStep	NeXTStep	NEXT NEXTSTEP

Important: If you have used latin1 in your old UDO documents, you should switch it to e.g. cp1252 because UDO used to assign Windows codepage 1252 to it before version 7 which correctly assigns ISO-8859-1 to it!

When you use so-called 1-byte codepages (all codepages supported by UDO, except Unicode) and use one codepage for your UDO documents, but a different one for your output documents, you might want to keep in mind that all codepages have different settings. A codepage is a collection of 256 characters from the whole range of all characters which have been defined in the Unicode standard already.

Imagine you have created an UDO document using the DOS encoding and use DOS graphic signs, but your target format is e.g. Apple MacRoman. Then you will not be able to see your DOS graphic signs. When you have used the Hebrew letters from the Atari TOS encoding, you will not be lucky to see them in most other codepages.

In these cases we recommend to use UTF-8, if it is available for the target format. Internally, UDO keeps all codepages in Unicode format so you will be able to use e.g. the Hebrew Alef from the TOS character set and see it properly even in UTF-8 and Windows codepage 1255.