Last Revised: Thursday, December 2, 1999
I'm investigating how the HTML i18n protocol (IETF RFC 2070), Unicode, and the current ISO entity sets may work together in SGML, HTML and XML. These ISO entity sets are referenced but not instantiated in the SHML 1.0 DTD draft as delivered.
Rick Jelliffe has an explanation of charent usage on the oasis-open.org site.
Following is a list of available ISO character entity sets. If you happen to know of
the whereabouts of a missing set, let me know (or send it to me) and I'll post it here.
The .ent
files are CDATA numeric character references; .gml
are SDATA 'square bracketed' strings; .pen
are XML-compatible Unicode
numeric character references (thanks to Rick Jelliffe of Allette Systems).
Files | FPI/Description | |
---|---|---|
iso-lat1.ent iso-lat1.gml ISOlat1.pen |
"ISO 8879-1986//ENTITIES Added Latin 1//EN" "ISO 8879-1986//ENTITIES Added Latin 1//EN//XML" Latin 1 covers most West European languages such as Albanian, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Galician, Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish, and Swedish. The lack of the ligatures Dutch ij, French oe and ,,German`` quotation marks is somewhat tolerable. This entity set is included in the HTML 2.0 specification. | |
iso-lat2.ent iso-lat2.gml ISOlat2.pen |
"ISO 8879-1986//ENTITIES Added Latin 2//EN" "ISO 8879-1986//ENTITIES Added Latin 2//EN//XML" Latin 2 works for most Latin-written Slavic and Central European languages: Czech, German, Hungarian, Polish, Rumanian, Croatian, Slovak, Slovene. | |
ISO-8859-3 ISO 8859 Latin 3: | Latin 3 is popular with authors of Esperanto, Galician, Maltese, and Turkish. | |
ISO-8859-4 ISO 8859 Latin 4: | Latin 4 introduces letters for Estonian, Latvian, and Lithuanian. It is an incomplete predecessor of Latin 6. | |
ISO-8859-10 ISO Latin 6: | Latin 6 adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were missing in Latin 4 to cover the entire Nordic area. RFC 1345 listed a preliminary and different `latin6'. Skolt Sami still needs a few more accents than these. | |
iso8859-6.map teiarb.gml |
ISO 8859-6 Arabic character mapping table "-//TEI TR1 W4:1992//ENTITIES Basic Arabic Letters//EN" Each Arabic letter occurs in four easily predictable forms: initial, medial, final or separate. To make Arabic text legible you'll need a display engine that combines the appropriate glyphs; the fixed font is not an acceptable rendering. For information on Arabic on the Web, try Arabic ISO 8859-6 Web Page links | |
teicop.gml | "-//TEI TR1 W4:1992//ENTITIES Coptic Letters//EN" This is the TEI Coptic Letter set. Coptic is the Egyptian language written in a modified Greek script. | |
iso-cyr1.ent iso-cyr1.gml |
"ISO 8879-1986//ENTITIES Russian Cyrillic//EN" This entity set contains the Cyrillic characters used in the Russian language. | |
iso-cyr2.ent iso-cyr2.gml |
"ISO 8879-1986//ENTITIES Non-Russian Cyrillic//EN" With these non-Russian Cyrillic letters you can type Bulgarian, Byelorussian, Macedonian, Russian, Serbian and Ukrainian. But Ukrainians read the letter ghe with downstroke as heh and would need a ghe with upstroke to write a correct ghe. Stalin's officials tried to abolish this distinction. | |
iso-grk1.ent iso-grk1.gml ISOgrk1.pen |
"ISO 8879-1986//ENTITIES Greek Letters//EN" This is a set of modern Greek letters for use as language characters. Technical use of Greek letters (as in formulae) are described in the Technical set below. | |
iso-grk2.ent iso-grk2.gml ISOgrk2.pen |
"ISO 8879-1986//ENTITIES Monotoniko Greek//EN" This contains additional characters needed for Monotoniko Greek. | |
ISOgrk3.pen isogrk3.gml |
"ISO 8879:1986//ENTITIES Greek Symbols//EN//XML" "ISO 9573-13:1991//ENTITIES Greek Symbols//EN" | |
ISOgrk4.pen isogrk4.gml |
"ISO 8879:1986//ENTITIES Alternative Greek Symbols//EN//XML" "ISO 9573-13:1991//ENTITIES Alternative Greek Symbols//EN" | |
ISO-8859-8 ISO Hebrew: | This is Hebrew. Like Arabic it is written from right to left. | |
ISO-8859-9 ISO Turkish: | Latin 5 replaces the rarely needed Icelandic letters in Latin 1 with the Turkish ones. | |
Math/Technical | ||
iso-dia.ent iso-dia.gml ISOdia.pen |
"ISO 8879-1986//ENTITIES Diacritical Marks//EN" "ISO 8879:1986//ENTITIES Diacritical Marks//EN//XML" | |
iso-num.ent iso-num.gml ISOnum.pen |
"ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML" | |
iso-pub.ent iso-pub.gml ISOpub.pen |
"ISO 8879-1986//ENTITIES Publishing//EN" "ISO 8879:1986//ENTITIES Publishing//EN//XML" | |
iso-tech.ent iso-tech.gml ISOtech.pen |
"ISO 8879-1986//ENTITIES General Technical//EN" "ISO 8879:1986//ENTITIES General Technical//EN//XML" | |
isomscr.gml | "ISO 9573-13:1991//ENTITIES Math Alphabets: Script//EN" | |
iso-amsa.ent iso-amsa.gml |
"ISO 8879-1986//ENTITIES Added Math Symbols: Arrow Relations//EN" | |
iso-amsb.ent iso-amsb.gml |
"ISO 8879-1986//ENTITIES Added Math Symbols: Binary Operators//EN" | |
iso-amsc.ent iso-amsc.gml |
"ISO 8879-1986//ENTITIES Added Math Symbols: Delimiters//EN" | |
iso-amsn.ent iso-amsn.gml |
"ISO 8879-1986//ENTITIES Added Math Symbols: Negated Relations//EN" | |
iso-amso.ent iso-amso.gml |
"ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN" | |
iso-amsr.ent iso-amsr.gml |
"ISO 8879-1986//ENTITIES Added Math Symbols: Relations//EN" |
Copyright © 1997 Murray Altheim
Curator: Murray Altheim
<altheim@eng.sun.com>
Last Revised: Mon, Sept. 22, 1997