Using the Schema

Programmers expect compilers to check the validity of their programs and provide helpful error messages and warnings. Xgridfit splits this function between the compiler itself and a schema--a file that describes the structure of an XML document and can be used to check its correctness--that is, to validate it. It is a good idea not to rely solely on the error-checking done by the Xgridfit compiler; and there are various ways to use the schema.

The division of labor between the Xgridfit compiler and schema is roughly as follows: the schema checks that required elements and attributes are present and that elements and attributes that are not allowed (for example, those you have mistyped) are not present. Where the order of elements is significant, the schema checks that they are in the right order. The schema enforces naming conventions, making sure, for example, that glyph names are valid and that the names of control values and variables are in the correct form. The schema validates attribute values when it can: for example, when an integer within a certain range is required, the schema knows about that.

There are certain important tasks that a schema cannot do. It cannot validate complex expressions (like "(a + b) / 2"), and it cannot check to make sure that a named value matches the name of a <constant>, <control-value> or <variable>. It cannot make sure that there are <function> definitions to match <call-function> elements. These and a variety of other tasks are for the compiler.

There is a certain amount of overlap between the checks performed by the schema and those performed by the compiler; this redundancy is partly due to historical factors (the schema did not acquire its present abilities all at once), and partly because some checks are so easy for the compiler that it seems absurd to omit them. But the schema adds considerable value to what the compiler can do.

Limitations of the compiler

The Xgridfit compiler is (perhaps uniquely among compilers) written in XSLT, an XML-based transformation language. It is basic to the working of XSLT that it responds to the elements it expects to find and ignores those that it does not expect. If you were to write this,

    <move>
      <point num="a"/>
      <myelement>
        Here's my very own element!
      </myelement>
    </move>

the Xgridfit compiler would not complain; it would not notice the presence of <myelement> at all.

Ordinarily this is not a problem. Xgridfit does complain, after all, when it fails to find what it requires. For example, if you were to type this:

    <move>
      <poit num="a"/>
    </move>

The compiler would report the lack of a <point> element as an error, since a <point> is the only required child of <move>. Note, however, that it does not detect that "point" is misspelled or even that an extraneous element is present; rather, it fails to see the <poit> element entirely and notes the absence of <point>. In a case like the following, on the other hand, the compiler would report no error:

    <move>
      <refernce>
        <point num="r"/>
      </refernce>
      <point num="a"/>
    </move>

The compiler ignores the misspelled <refernce> element and produces a MDAP instruction, as it does when no <reference> element is present, rather than MDRP. The resulting bug is likely to be subtle and hard to diagnose.

The compiler could be taught to spot elements and attributes that don't belong, but only by writing expensive code that forces XSLT to perform jobs it is not designed for. The schema, however, is very good at detecting such elements and attributes; it would flag both <poit> and <refernce> as errors.

RELAX NG

Since its first release in 2006, Xgridfit has been packaged with a variety of schemas: with a classic DTD (Document Type Definition), with an XML Schema, and with RELAX NG schemas. These different schema languages have very different capabilities. At present, the only schema packaged with Xgridfit is RELAX NG, as this is the only one that comes close to describing the constraints and requirements of an Xgridfit program.

For example, consider the <delta-set> element. When it is the child of a <delta> element, it may contain a <point>, but not if it is the child of <control-value-delta>. The <point> may be omitted if the first element of the <delta> is a <point> or if the <delta> is a child of <move>; but otherwise it must be present. These complex constraints can be expressed with RELAX NG and with an XML Schema, but not with a DTD.

Here are some other oddities:

In <store-projection-vector>, the y-component and x-component attributes may both be omitted, but if one is present the other must be as well. This cannot be handled by either a DTD or an XML Schema.
The <line> element must contain either a ref attribute or two <point> elements. This also can be handled only by RELAX NG.
In <move>, the distance and pixel-distance attributes are mutually exclusive: if one is present, the other cannot be used. This can be handled only by RELAX NG and the XML Schema.

There are many features of an Xgridfit program file that a DTD cannot check; there are enough that an XML Schema cannot check to make RELAX NG an obvious choice.

RELAX NG comes in two flavors: an XML syntax and a compact syntax. Since it is possible to convert automatically from one to the other, both are provided with Xgridfit. The compact schema is named xgridfit.rnc; the XML schema is xgridfit.rng.

Using the schema: validation

Xgridfit validates all program files before compilation. To skip this step, include the --skip-validation option on the command-line:

    $ xgridfit --skip-validation myfont.xgf

By default the xgridfit executable uses xmllint (part of the libxml package) to validate program files. Error messages from xmllint can sometimes be a little puzzling. For example, if a <point> element is omitted from <move-point-to-intersection>, the output looks something like the following:

myfont.xgf:740: element line: Relax-NG validity error :
                    Expecting element point, got line
myfont.xgf:734: element move-point-to-intersection: Relax-NG validity error :
                    Element move-point-to-intersection failed to validate content
myfont.xgf:691: element function: Relax-NG validity error :
                    Expecting element variant, got param
myfont.xgf:692: element param: Relax-NG validity error :
                    Element function has extra content: param
Relax-NG validity error : Extra element function in interleave
myfont.xgf:691: element function: Relax-NG validity error :
                    Element xgridfit failed to validate content
myfont.xgf fails to validate

The first message pinpoints the source of the error: xmllint was expecting a <point> element as a child of <move-point-to-intersection>, but it found something else instead. The following messages are superfluous, as each ancestor of <move-point-to-intersection> is marked as invalid and causes its own error message. The lesson here is that the first error message generated by xmllint is very likely the only one you need to pay attention to. This is all the more true in that xmllint exits on finding a single error, without validating the rest of the file.

It is simple to use other validators. James Clark's Jing continues to validate even after it has found an error, and its error messages can be more intelligible than those of xmllint (though not always!). The message for the same error as the one caught by xmllint above is terse, lacking the unhelpful clutter, but it is also a little less informative:

/path/to/myfont.xgf:744:34: error: unfinished element

You can use xgfconfig (with the --validators or -V option) to make Xgridfit use the Jing processor; you must specify both its name and the location of its jar file. For example:

    $ xgfconfig -V jing#~/jing/bin/jing.jar

MSV (the Sun Multi-Schema XML Validator, released under the Apache Software License) is also highly usable. Its output is verbose but helpful:

start parsing a grammar.
validating Junicode-Italic.xgf
Error at line:744, column:34 of file:///path/to/myfont.xgf
  uncompleted content model. expecting: <point>

the document is NOT valid.

To make Xgridfit use MSV, invoke xgfconfig as follows:

    $ xgfconfig -V msv#/path/to/msv.jar

The RNV validator is fast and claims to provide intelligible error messges. Run xgfconfig with the option -V rnv. Here is the output for our sample error:

myfont.xgf
myfont.xgf:744:4: error: unfinished content of element ^move-point-to-intersection
required:
	element ^point
error: some documents are invalid

You can use the xgridfit executable to quickly validate a file: just suppress compilation with the -x option:

    $ xgridfit -x myfont.xgf

Note that Mac OS X version 10.5 ("Leopard") has a version of xmllint that does not accept the Xgridfit schema; users of this OS should choose one of the other validators instead. "Snow Leopard" does not have this problem.

Using the schema: guided editing

Some editors can validate XML documents in the background, while you edit. Among these the most commonly used are Emacs with nxml-mode (Free Software) and </oXygen> (inexpensive for individuals). Emacs and nxml-mode are both in the major Linux repositories, making them simple to install. After installation, find the nxml-mode schema directory (in Ubuntu it is /usr/share/emacs/site-lisp/nxml-mode/schema/), copy xgridfit.rnc there, and edit the file schemas.xml, adding the following line:

    <documentElement localName="xgridfit" uri="xgridfit.rnc"/>

Then, after you have loaded an Xgridfit program file into Emacs, type Alt-x (or Meta-x), followed by nxml-mode and Return. (The mode can be loaded automatically if you know how to edit the .emacs configuration file.)

Now your editor will validate on the fly, and if you type Ctrl-Return in a variety of contexts, it will either complete the tag or attribute you are typing or offer you a list of possible completions. Errors are underlined in red, and error messages (displayed at the bottom of the editing window) are mostly clear and informative.

Guided editing with </oXygen> (see the editor's documentation to get started) is similarly straightforward, relying more on context menus than on keyboard shortcuts. Either of these packages can save you a great deal of typing (and therefore a good bit of time).