I18n and MS WEFT

Overview

After a brief overview of MS WEFT, and critique of some applications of it which are found "out there", this page focusses-in on use of MS WEFT in an i18n context.

We show how an HTML author, while staying within the W3C interworking specifications and producing a page which can be rendered by WWW-compatible browsers in the presence of appropriate installed fonts, can also offer MSIE-using readers the convenience of an embedded font, serving also as a workaround for an unfortunate shortcoming of IE which otherwise tends to lead to unnecessary display of missing-character indicators.

Introduction

MS WEFT is an MS-specific technique for creating embeddable fonts to be used in web pages. These fonts are then "embedded" by using W3C-conforming directives in CSS, and, as such, require no non-standard constructs in the web pages and stylesheets themselves. The embedded fonts are of no use to other browsers (other than those "browsers" which are mere wrappers for WinMSIE itself), but, when correctly applied, they do no harm either, since a non-supporting client agent ignores the relevant CSS sections, either because it does not implement the relevant constructs (the CSS parsing rules for unknown features are designed to make this harmless), or because, although they do understand the construct, they recognise that the font format is not one which they can use.

The WEFT technique is promoted mostly (I think it's fair to say) to typographers, as a means of improving the cosmetics of web pages. While this is a perfectly valid thing to want to do, it doesn't happen to be my particular area of interest, and I'll say rather little about it in these pages. I'm more interested in i18n support, which seems to receive scant mention in the MS WEFT documentation, and this is what the present web page is chiefly about.

At time of writing this note, the most-recent version of WEFT available from the public URL at MS was WEFT3 - specifically, a version which called itself "WEFT III (V5.3.2)", and is cited from some other contexts as "WEFT 3.2". This is the only version actually used in the present discussion. It's a couple of years old, and does not support some of the newer Unicode character ranges; but hunting around did not locate any reference to a newer version - not even in development status.

The sense of the term "embedding" does not mean that the fonts are literally wrapped into the HTML source code itself; what actually happens is that the HTML is associated with some CSS specifications which use the CSS/2.0 @font-face construct (see the CSS/2.0 specification section 15.3.1 - this has been removed from CSS/2.1) to reference an external file (.EOT file, also known as a "font object" in the WEFT documentation) which contains the "embeddable" synthetic font (think: "embedded open type"). Perhaps a more appropriate term would be "dynamic fonts" - the CSS/2.0 spec refers to them as "downloadable font data" - but I'm going along with the term that's used in the WEFT documentation.

By the way, Netscape 4.* also had a proprietary font-embedding technique, which would be called out from HTML and CSS using the CSS/2.0 constructs in a similar way. However, I won't be dealing with that here.

I would broadly categorise the uses that I have seen of the WEFT technique under three headings, as follows.

Typography / Cosmetics

In this usage, the actual text contains just a normal repertoire of characters, and the aim of using WEFT is to provide some cosmetically attractive font instead of the boring ones which come bundled with the reader's OS. As I already noted, this is a perfectly valid activity, but not one that is of interest to me in the present context.

Symbol-type fonts, in the broad sense - BAD

In this (mis)usage, the embedded font uses font indexes in the range 0-127 or 0-255 as the case may be, and populates them with the desired exotic character glyphs. On being presented with normal ASCII (or Windows-1252 etc.) characters in the HTML source, under selection of the named embedded font, the browser is expected to display the exotic characters instead of the regular ASCII or Windows-1252 characters which were in the original HTML source.

In an HTML context, this techique is just as bogus when used in an embedded font, as it is when used in an installable font, as I already discuss in my page "Using FONT FACE to extend repertoire?" (my "fontface harmful" page). This goes not only for the old font face HTML/3.2 markup, but even more so for modern CSS font-family in a stylesheet. As the relevant Mozilla FAQ on symbol/dingbat fonts rightly points out:

Characters in HTML 4 and XML documents are Unicode characters (even if the document has been encoded using a legacy encoding for transfer) - not font glyph indexes.

There's a more-detailed discussion of symbol font techniques "in the broad sense", within my cited "Fontface harmful" page, where three different sub-types are identified. It has to be said that, in an HTML context, all three approaches are bogus: this is not HTML's way of supporting an extended character repertoire. Please read my cited web page for the detailed story.

There might have been some excuse for using this trickery some 10 or more years ago, despite it being contrary to the published HTML interworking specifications, as web page authors could not at that time rely on an adequate level of Unicode support in browsers. But nowadays (2005) such browser versions must be considered long-since obsolete, and there is frankly no excuse for using this misbegotten trick in HTML pages now - and even less excuse for promoting the technique to naive web authors, as some web sites continue to do.

Properly-encoded Unicode fonts - good!

WEFT can be used to prepare properly-encoded Unicode-based embedded fonts, and those fonts then used for displaying properly-made HTML web pages in MSIE, irrespective of whether the reader already has a suitable installed font or not. And this is the main topic of the present web page.

Background to the present web page

As it happens, I became aware of the ability of WEFT in the i18n field in a particular context: a discussion of Canadian Syllabics. See the archived usenet thread which includes the message-id (but that discussion is over now, and some of my temporary test documents for it have been deleted since).

In the course of this discussion (on usenet and in private email), we discussed several fonts that were being offered specifically for rendering of Canadian Syllabics: of those, I will mention here:

Gérard Talbot points out that there must be an awareness of this issue in the Nunavut Government and Assembly, as they have sponsored the Tiro Typeworks font and use it in their own pages: he has also seen it promoted in school web pages and other places. Nevertheless, there is still a legacy of Native Canadian web pages based on pseudo-ASCII fonts.

So much for the accidental detail of how I got here! However, the technique is by no means confined to such exotic usages. For years, MSIE users have been complaining about the fact that certain less-usual mathematical operators are not getting displayed, despite the fact that they have a font which contains them. This is an example of a general shortcoming of IE versions up to and including at least IE6: IE makes up its mind which font it is going to use for a whole group of characters and, once it has made that choice, if one or other of the characters from that group is not present in the font, it makes no attempt to look elsewhere - it simply displays a missing-character glyph.

This shortcoming of IE can be worked-around by the user in a number of ways; but if the author provides (and suitably selects) a custom embedded font for the character repertoire in question, then IE simply works, and no user workaround is needed. Other browsers (e.g Mozilla family, or Opera) in my experience "just work" anyway, provided the glyphs can be found somewhere in the user's fonts (I don't know the exact machinery, so this might be over-simplified): they have no need of embedded fonts as a workaround for a problem which they haven't got, and they are in no way harmed by being offered a WEFT embedded font, which they will happily ignore.

Security aspects

Authors should be aware that embedded fonts are a potential route for security compromise of browsers; MSIE consequently has some security option settings relative to embedded fonts, with the choice of prompting the user for permission, or rejecting embedded fonts outright (depending on the security level which has been defined for the originating web site). These settings can be found in Internet Options> Security> Custom Level. Consequently, any page which uses the technique is advised to include some kind of polite explanation to the user. For the purposes discussed here, the user is equally free to acquire an installable font for the required characters, configuring their browser accordingly, and disregarding your offer of the embedded font, just as the users of non-IE browsers would do; so there's no need to give users the impression that you're commanding them outright to enable this security-relevant feature in their browser (some of them might be prevented by local site policies from doing that anyway). You're better advised to offer it as an available option, rather than as a mandatory feature of using your web page.

Font Licensing

Commercial fonts come with licensing conditions, which set out what you can and cannot do with the fonts, e.g vis a vis third parties. This is not surprising, considering the amount of work which goes into designing and creating a quality commercial font. Some of these conditions are reflected by flags stored in the TrueType and OpenType font formats. One of the aspects of these conditions is whether the font is eligible for embedding, in various contexts. Not surprisingly, MS WEFT honours the relevant flags, and will not allow you to derive an embedded font from one of these fonts unless the flags allow it.

Recap: what it is and why we're talking about it here

WEFT is an MS-specific technique for providing embedded (dynamic, downloadable) fonts for web pages.

OK, that's roughly it. Some authors take the view that it's more important for the author to achieve successful communication with their readers no matter what browser the reader might have chosen - even this "browser-like operating system component" called IE. Others would argue that MSIE ought to be given every opportunity to present itself to its users, warts and all, rather than web authors pandering to its shortcomings and putting in too much effort to help it out when its own developers don't really seem to care. I'd certainly sympathise with that attitude, but nevertheless, I'm offering here some details of what I learned, should you choose to go that way.

WEFT3 in action

For full value from this section, I'd recommend you to download and install the WEFT3 tool, and play with it alongside the MS WEFT tutorial.

Several features of the program are, to my mind, somewhat unintuitive on first encounter, but I don't believe I can write a better documentation than is already there, so I suggest reading and re-reading the documentation until it becomes clear. But most of the documentation and tutorials relate to web pages which contain only a limited repertoire of characters, and are concentrating on the resulting typography. When we start to take an interest in more-exotic characters, the documentation (and especially the tutorials) run out of steam, and the would-be author is rather left to their own devices in exploring the available menus and guessing what the proffered options will do: that was the motivation for writing the present page on this area of usage.

First you'll need to get a grasp of "allowed roots" (also referred to as "Binding" on some menus). This is a technique for preventing readers from stealing your embedded fonts and using them on other web pages. The "allowed roots" specify the leading parts of URLs from which the embedded fonts can be used: attempts to use those fonts from other URLs will be unsuccessful. If you want the fonts to be usable from a whole hierarchy of URLs or from entire web site(s) then you can do so, or you can make them specific to one exact URL if you so choose. But you do have to specify something: WEFT does not allow you to wild-card this feature.

I would also stress the benefit of inventing your own fantasy name (preferably something more creative than myfont as shown in the tutorial) for the downloaded font, to avoid confusion between downloaded and installed fonts, and to help with troubleshooting and testing.

Page analysis

One method of constructing the WEFT embedded font is by means of page analysis: you create a web page which includes the problematic characters, and then, in the presence of appropriate installed fonts, you run the WEFT "wizard" on the HTML page. This ultimately produces a somewhat modified HTML web page, and the embedded font file for it. On the way there are various options for tailoring what goes into the embedded font (.EOT) file. I'd recommend actually carrying out the example in the tutorial, just to get the exercise in performing the relevant actions, even though the result itself is pretty trivial.

On this simple plan, the "wizard" produces a page-specific .EOT file for each page that is processed. Given a cluster of HTML pages of a similar nature to each other, it may make more sense to produce a single .EOT file which covers the character repertoire of all of the pages in the cluster, and referencing this "common" .EOT file from each of the web pages; and indeed the WEFT tool includes provision for this. As well as for various other kinds of "subsetting", some of which seem to be useful and others of which seem to me to be less so.

In practice, if you're using WEFT for the kind of purpose which I envisage in this writeup, you probably want to toss-out of the .EOT file all of the "ordinary" characters for which MSIE has no problem with the rendering, limiting the embedded font to contain only the problematical characters or character ranges. There is a dialogue at a certain stage of the "Wizard" (it's step 4 in their documentation) where you can influence this. Note that the "Embed/Don't Embed" button is a User Interface Disaster: when the button reads "Embed" it means that the font will be embedded - it does not mean that you have to press the button in order to embed the font. If you mistakenly press the button, it will change to read "Don't Embed", and the font will not be embedded. Who on Earth taught these folks User Interfake design???

As an alternative to the "Wizard", the individual steps can be called up interactively from the WEFT menus. Either way, the above methods of creating embedded fonts do depend on analyzing existing HTML pages, and the created fonts will (in the absence of additional tweaking) only contain those characters which happen to have been used on the pages.

Manually creating fonts?

Suppose that you prefer to pro-actively create an .EOT which will contain all of the characters from a particular problematical range (let's say mathematical operators, or let's say Canadian Syllabics), for use also from your future HTML pages on such topics, then an alternative technique seems to be more plausible. Take a look at the WEFT menu Tools> Expert Create Fonts. Use its "Add" button to pick a font (I'm exercising this as I write, using "Arial Unicode MS") which you know contains the range which you want (e.g Mathematical Operators). Now use the "Subset..." button to open the "Subset Editor" window. Towards the right is the confusingly-named "Language" drop-down: use it to select the character range which is denoted "Mathematical Operators", and the central window will display the available characters from the selected font, greyed-out. As you click on these characters, they are added into the embedded subset shown towards the foot of the window. You can select further so-called "Language" subsets from the pulldown (e.g "Miscellaneous Symbols") and add some or all of those into the selection for embedding. What I don't know is how to select all of the available characters at once (the usual UI gesture of ctrl/A doesn't do it, anyway, nor is right-click honoured - another UI disaster area, I'm afraid).

This is all very well for the "Language" subsets which are known to the WEFT tool, such as the Math Operators mentioned above. Unfortunately, the version (5.3.2) of the tool does not support Canadian Syllabics explicitly, and things go sadly wrong with this otherwise plausible method of font creation, in as much as the Canadian Syllabics "Language" subset cannot be selected from the drop-down.

The values of the "Language" subsets in OpenType format are shown, for example, in Adobe's documentation for the OpenType "OS/2" table. It can be seen that, for example, Canadian Syllabics are at "Unicode Range" index 77. Related to this, the MS "Font Properties Extension" (of which the newest version seems to be from 1999, with an FAQ dated 1997), when pointed at e.g the Pigiarniq font, reports that it contains, not "Canadian Syllabics", but "Reserved for Unicode SubRanges Bit 77", showing that this utility, too, isn't properly up to date with Unicode itself. MSIE6, however, handles these correctly (and its font defaults configuration contains a drop-down menu where "Canadian Syllabic" can be selected: by default on my Win/2000 Pro system, the associated list of available fonts is empty, but if I install the Pigiarniq font mentioned elsewhere on this page, then it appears as the only available selection here). MS has some documentation, now outdated, on Unicode range bit settings (compare this with the Adobe page cited earlier), and on "Character sets and code pages" (their terminology).

A superficial inspection of the release dates of the WEFT tool at first seems to suggest that version 5.3.1 of WEFT is contemporary with version 3.1 (3.1.0) of Unicode, while WEFT 5.3.2 is contemporary with Unicode version 3.2. That hints at a deliberate alignment of the version numbers. But Canadian Syllabics were already present in Unicode version 3.1.0, whereas explicit support for them is still missing, along with some other "Language" subsets which were present in Unicode 3.1, even in WEFT 5.3.2. So the apparent alignment of version numbers falls down in this regard. Some parts of the WEFT tool seem happy to process Unicode characters also from "Language" subsets which they do not explicitly support: but other parts of the tool seem to get confused and, if attempted, produce unexpected results in the constructed .EOT "font object". Exploring these parts of the WEFT tool has been a somewhat frustrating exercise. But Gérard Talbot showed me, and I subsequently confirmed to my own satisfaction, that usable "font objects" can be built, if the right combination of WEFT options is used.

This brought me to the conclusion that, for any "Language" subset unsupported by the WEFT tool, the "Expert Create Fonts" menu was impractical. Considering the successful results reported by Gérard Talbot, I concluded that an adequate workaround was to create a dummy HTML page containing at least one instance of every required character, and then to go through the motions of analyzing this page, e.g with the "wizard". After defining sufficiently generous "allowed roots" to permit the .EOT file to be used from any of your web pages, you can create the .EOT file, after which the dummy HTML page is of no further immediate use: it could be archived in case of future need, as a means of documenting what went into the .EOT font, and as a starting point if the font should later need to be extended by adding extra characters.

Refer back to the WEFT tutorial to understand how to reference the .EOT file from your web pages.

Trivia points for troubleshooting:


|Up| |RagBag|About the author||