ChessGML stands for Chess Game Markup Language. It is, to the best of my knowledge, the first serious attempt to define a chess data format based on XML, the Extensible Markup Language.
The ChessGML homepage is http://www.saremba.de/chessgml, where you will always get the most current information.
This distribution contains a prototypical implementation of an XML DTD for chess, a few sample documents and programs intended for processing ChessGML documents. The intention is to show that XML is a sound technical base for defining chess document structures and that such documents can be processed with reasonable effort.
At the moment, processing means transforming ChessGML into one of the traditional presentation formats, which include printable documents and WWW hypertext documents (HTML). Undoubtedly, there will be other formats in the future; for a format like ChessGML that stores logical and structural rather than presentational information, this will pose no problem.
The material presented in this distribution is not considered to be in a mature state in its entirety; instead, it is meant to serve as a starting point for a discussion.
This does not mean that the programs are untested or known to be unstable. They have, however, not been used or tested by anybody but the author.
Please send comments and contributions to chessgml@saremba.de. Error reports or other complaints should be sent to chessgml-bugs@saremba.de. If there is enough public interest, a mailing list would make sense.
For the implementation of an open standard it is desirable to make use of open, non-proprietary tools. Potential users should not be forced to use one particular computing platform.
I have taken care to use tools and standards that are available to a large audience. This includes users of Microsoft Windows and Unix/Linux platforms. (Unfortunately, I don't know anything about XML tools for the Mac.)
This means that everything that is presented here can be done on the platforms mentioned. At the moment, the software is implemented and tested only on Windows NT 4. I have no computer running Windows 9x and no intention to change that in the future. I do have a Linux machine, but for the time being I prefer to concentrate on conceptual and development work rather than porting. Feel invited to help with Linux (or other Unix) testing.
When reading further, you will note that RTF is the preferred format for print output for now. This is clearly both platform dependent and proprietary. But there is no reason why other output formats, in particular TeX, shouldn't be possible, and any competent help in this sector would be greatly appreciated.
Despite the current limitations, the ultimate goal is to implement a chess publishing system that uses open source tools on Unix/Linux and Windows (or whatever platforms will become popular in the future).
Due to the preliminary state of ChessGML, this material is not (yet) targeted to end users. You should have some technical understanding if you want to work with ChessGML in its present state, and you will have to acquire some XML knowledge if you want to contribute. (This will be no wasted time if you are a computer professional, because XML based data formats already plays an important role in the computer world and will increase in importance.)
There is no introductory material on XML in this distribution or on my website, but you will find more information on the Internet than you will ever have time to read . Good starting points are http://www.oasis-open.org/cover and http://www.xml.com.
The following graphic provides a bird eye's view of the various document formats that are used and transformations between them.
PGN is important as a source format because presently most freely available chess games are distributed in this format.
ChessGML SAN is an XML format that puts the movetext of the games inside a thin XML wrapper providing information about tournaments, players etc. The transformation from PGN to ChessGML SAN does not have to know anything about chess because the movetext of the games is left untouched.
Usually, you will have to do some hand-editing to polish up the ChessGML SAN files: Correct player names, add dates etc. This will be easy because it's not necessary to edit redundant information.
ChessGML TAG format substitutes a pure XML representation for the moves of a chess game and adds diagrams. Transforming from SAN to TAG format involves semantic (chess) knowledge in addition to the purely syntactic (XML) knowledge the other transformations have to employ. This simply means that the software has to be able to understand SAN notation and play through a game of chess; this is necessary, for example, when a move like Nbd2 has to be interpreted and disambiguated.
RTF and HTML are two formats that are in common use today, one for printable and the other for online documents. It's common practice to prove that you can generate the traditional output formats automatically when you define a new (and supposedly better) format. The real benefit will only come later when there are generally available tools that can directly use the new (ChessGML) format. We are still far away from this point, but the day will come...
There is always the problem what to include in a distribution: Should you include derived files (compiled programs, results of processing) or not? Should you include resources that are not easily available, but don't belong to the distribution in a strict sense? I have chosen the following approach:
The file chessgml.tgz contains the distribution in the strict sense, i.e. without files that can be derived from the original files. (There are some small exceptions, though.)
The file chessgml-full.tgz contains the same files as chessgml.tgz plus the following:
The compiled class files for the Java programs in the java directory. You would need Sun's Java SDK 1.2 in order to compile them yourself. With the full distribution, you only need the Java 1.2 Runtime Environment. (The Microsoft JVM will not suffice!)
The sample files in the documents subdirectory, transformed to RTF. You need James Clark's DSSSL engine Jade to perform this transformation yourself.
The sample files in the documents subdirectory, transformed to HTML. You need James Clark's XSL engine XT to perform this transformation yourself.
This section gives a short description for each one of the directories of the ChessGML distribution, in alphabetical order.
The documents directory contains a few sample documents in XML format, in particular a commented game (Rubinstein – Teichmann, Karlsbad 1907) and a complete set of uncommented games from a recent tournament (Corus 2000). These files are used to test the programs in the dsssl, xsl and java directories.
The full distribution also has the transformed files in RTF and HTML formats.
The dsssl directory contains a set of DSSSL scripts that are used to transform the ChessGML files to a printable format. RTF is used because Winword is accessible to most people, but there would be other possibilities like MIF (FrameMaker) or TeX.
The dtd directory contains the DTD (Document Type Definition) for ChessGML files, consisting of several modules.
The homepage directory contains the files that make up the ChessGML homepage. Some of them are written in XML format and then transformed to HTML.
The homepage/standards/pgn subdirectory also contains the PGN standard, tagged as Docbook XML and transformed to RTF and HTML.
The java directory contains two ChessGML tools and the chess library these tools are based on.
The resources directory has the sets of chess pieces that are needed to view the sample files in RTF and HTML format. There are three TrueType fonts and one set of GIF files.
The scripts directory contains some wrapper scripts that show how to call the programs in the distribution.
The xsl directory contains XSL (Extensible Style Language) scripts that transform ChessGML files, for example to HTML or to PGN.
The distribution unpacks into a directory named chessgml. You must set an environment variable named CHESSGML_DIR to the full pathname of this directory. Use the backslash (\) as directory separator even on Windows NT; there are some old commands (like move), used in the scripts, that will fail if you use a slash (/).
The following tools are needed if you want to work with the ChessGML samples.
As a minimum, you need the viewing tools for the files derived from the samples.
For the HTML files, obviously a browser is required; only Netscape 4.6 and Internet Explorer 5 were tested. The support for Cascading Style Sheets is known to be better in IE5.
For the RTF files, Microsoft's free (Gates-free, that is, not Stallman-free) Winword viewer is sufficient if don't want to edit the files. No tests with Office 2000 have been done. Of the other RTF-aware office products, only Star-Office was tested, and it failed miserably.
If you want to transform files from PGN to ChessGML (SAN) or from ChessGML SAN to TAG format, you have to use one of the Java applications. You need Sun's Java Runtime Environment or, if you want to change and recompile the programs, the Java SDK. Version 1.2 is required, available at http://java.sun.com.
Transforming the ChessGML TAG format to RTF with the DSSSL scripts requires James Clark's Jade. You can either get the original from http://www.jclark.com or OpenJade which is developed by a group of volunteers (see http://openjade.sourceforge.net).
Transforming ChessGML TAG format to HTML via the XSL scripts requires an XSL engine. Again, I used a James Clark tool (XT, available at http://www.jclark.com/xml/xt.html), but there are alternatives. I intend to use the Apache XML Project's Xalan (http://xml.apache.org/xalan) in the future.
For the transformation from ChessGML SAN to TAG format, the Java program CgmlSan2Tag.java uses the Apache XML Project's Xerces parser, available at http://xml.apache.org/xerces. You will have to set your CLASSPATH environment variable accordingly; see Sun's JRE documentation for this topic.
This directory contains four ChessGML sample files in its xml subdirectory:
Rubinstein-Teichmann.xml is a game from Karlsbad 1907 with comments by Georg Marco from the tournament book. Marco died in 1923, so his writings are no longer under copyright. (This applies to the complete book as his co-author, Carl Schlechter, died in 1918. Wouldn't this great book make a prime candidate for the first ChessGML-ified book? But remember it's written in German, so it would be nice if a native English speaker could translate it.)
You will notice that the notes to this game are rather short and without lengthy variations. That's exactly why I chose this game, because editing ChessGML files with a text editor is a real pain; where is the volunteer who writes a good tool for editing games that can export ChessGML files?
This file is intended to serve as an example to show that games with comments in ChessGML can be automatically transformed to something that looks good in print. The file Rubinstein-Teichmann.rtf in the rtf subdirectory is the result of the transformation with the DSSSL script chess.dsl; see the description of the wrapper script cgmlc2rtf.bat. This file has been left as is, without any manual post-editing.
You need the font Chess Leipzig installed on your computer to see the diagrams; this font is included in the distribution (in the resources/fonts directory)
2000-corus.san.xml is the complete Corus tournament, played in Wijk aan Zee in January 2000. There are no comments, and all the games are in SAN (Standard Algebraic Notation) format. This file (including the tournament crosstable and the progressive table) has been produced automatically from a PGN file (by the program Pgn2Cgml.java) and some manual editing of the player names.
The full distribution also contains the following derived files; all the files have been left as the automatic transformation has produced them, without any manual post-editing.
2000-corus.tag.xml in the xml subdirectory is the same tournament with all the games in ChessGML TAG format; this means that the games are given not in SAN but with every move tagged as an XML element. This file has been produced from 2000-corus.san.xml by the program CgmlSan2Tag.java.
2000-corus.rtf in the rtf subdirectory is an RTF file produced from 2000-corus.tag.xml by applying the DSSSL script wildhagen.dsl. See the description of the wrapper script cgmlt2rtf.bat.
Just in case you cannot make sense of the DSSSL script's name: Eduard Wildhagen was a publisher in Hamburg, Germany, some forty years who created a series called “Weltgeschichte des Schachs” (World History of Chess), mostly containing games collections of great players. The games were printed with small diagrams, one after every five moves, which made them easily readable even without a board. I have adopted this style of printing for my sample DSSSL scripts.
You need the font Chess Cases installed on your computer to see the diagrams in this document; the font is included in the distribution (in the resources/fonts directory).
The documents/html/2000-corus subdirectory contains the tournament tables (crosstable and round by round table) and all the games in HTML format. These files have been produced from 2000-corus.tag.xml by applying the XSL script tournament.xsl; see the description of the wrapper script cgmlt2html.bat.
loyd.xml is a file with some problems by Sam Loyd. The only purpose of this little sample is to show that problems can be encoded in ChessGML with the same ease as games. So far, there are no DSSSL or XSL scripts to transform this file to RTF or HTML; implementing them is left as an exercise to the interested reader. (Ha! I always wanted to write this sentence since I was a student of mathematics.)
studies.xml is a file with a few studies in ChessGML format. The same comments as for the Loyd problems apply here.
The name DSSSL has been mentioned several times now, and you might wonder what this funny acronym means. The short answer is “Document Style Semantics and Specification Language”, which is certainly a lot more comprehensible than the five letter word, isn't it?
The long answer is that defining a standard for describing the structure of documents (SGML, XML) is not enough – a standard for producing some readable format from an SGML/XML document is equally desirable. DSSSL defines what style means in the context of structured documents, and it defines a language to transform an SGML/XML document into a readable format. (In fact, it consists of two languages, a style language and a transformation language.)
DSSSL was intended to supersede all the proprietary styling languages that had been created by various vendors when it was accepted as an ISO standard in 1997, but it hasn't become very popular since then, neither with vendors nor with users. It's based on Scheme, a language from the Lisp family, which makes it attractive to computer scientists but arcane for the rest of mankind. And there is little tutorial or reference material other than the ISO standard document which makes it difficult to learn.
So, does it make sense to learn DSSSL? Perhaps not from a practical point of view, because there is already a successor named XSL (Extensible Style Language), being defined by the W3C. Certainly yes if we are talking about the basic ideas DSSSL is based on: the view of an SGML/XML document as a grove (a generalized tree) and the processing model. XSL is based on the same ideas that have been developed for DSSSL, but it uses a syntax that looks more accessible to most people. So, if you try to understand the DSSSL scripts, this will be no wasted time if you abstract from the syntax and just grasp the underlying ideas.
The dsssl directory contains several files because the scripts are modularized. The structure is as follows:
There are two “main” scripts for two different purposes:
wildhagen.dsl implements the generation of a complete tournament book in Wildhagen style. This means generation of a tournament table (crosstable for a round-robin event, round-by-round table for a Swiss system event), then all the rounds with all the games in long algebraic notation with a diagram inserted after every n moves. Usually you will use n=5, but other values are possible.
Note that the DSSSL script doesn't know anything about chess and its rules, so it will neither be able to output the moves in long algebraic notation if the ChessGML file contains them in SAN notation, not will it be able to generate diagrams – they have to be in the ChessGML file as diagram elements. The DSSSL scripts just translates the abstract diagrams to characters in a diagram font!
For a sample, see the file 2000-corus.tag.xml.
chess.dsl is the DSSSL script that translates a commented chess game to print format. It was used to transform the afore-mentioned game Rubinstein-Teichmann.xml to RTF.
diagram.dsl implements the general logic of translating the abstract ChessGML diagrams; it depends on the diagram font to which character each individual piece is translated, so the font-specific files dia-xxx.dsl take care of that.
But how do you specify which one of these file to use? This is done by a typical SGML mechanism, entities and PUBLIC identifiers. Look at the line
<!ENTITY user-chessfont.dsl PUBLIC "-//user//NOTATION chessfont//EN"> |
&user-chessfont.dsl; |
gentext.dsl implements the general logic for mapping terms (like, for example, “white-resigned” to a language specific form like “White resigned” in English, “Weiß gab auf” in German etc.). This is done by exactly the same mechanism like the one explained before.
The rest of the DSSSL files in the dsssl directory implements those parts of the transformation logic that are used by both wildhagen.dsl and chess.dsl.
The dsssl/customize subdirectory contains not only the catalog files already mentioned, but also some DSSSL files. These are just wrappers for the “real” DSSSL scripts; they override variable values used in the transformation scripts. (Note that the naming convention %varname% is not enforced by DSSSL or Jade; it is merely used to indicate that this is a variable intended for customization.) cgmlt2rtf.bat shows how to use such a customization script.
Using a pre-built customization script is nice, but a graphical user interface for playing around with the customizable values would be even nicer. Pomade (Poor Man's DSSSL Environment), implemented in Tcl/Tk, is a tool for providing this facility, and the dsssl/pomade subdirectory contains driver files for Pomade.
The dsssl/jade subdirectory has a few files from the Jade distribution; they are not exactly necessary here, but it makes using Jade easier for you.
This directory contains the prototype Document Type Definition for ChessGML. It's modularized into several files, with the file chess.dtd only containing a few definitions and the inclusion of several modules. This uses the entity technique described above; here, the entities are mostly defined by SYSTEM identifiers containing a filename.
Some tools have difficulties with modularized DTDs, and most XML tools don't know about PUBLIC identifiers. For this reason, you should normally have a file named chess.dtd containing the complete DTD in the same directory as the ChessGML file you are processing. The batch file xdtdflat.bat “flattens” the DTD, which means that all external entities referenced in chess.dtd are copied into the DTD. The resulting file chess.flat.dtd may be copied to the document directory as chess.dtd. (This is already done before you get the distribution, which is the exception mentioned above. See the description of dtdflat.pl.)
This directory contains the XML sources for the HTML files in the ChessGML homepage. They are tagged against the Docbook XML DTD; see Norman Walsh's website for details. Also included are the generated HTML files because you are not expected to have the Docbook DSSSL stylesheets, also to be obtained from Norman Walsh.
In the homepage/standards/pgn subdirectory, there is an XML version of the PGN standard document.
The java directory contains not the C++ sources you might have expected here but a few Java programs and a chess library. All the files are bit buried in a directory hierarchy that follows Sun's recommendations for building unique module names. The first part of the directory hierarchy is the reverse name of the author's domain name (saremba.de becomes de/saremba).
java/de/saremba/chess/lib contains the sources of the chess library. The implementation is rather straightforward and not very sophisticated; you should not build a chess-playing program on top of it. I fact, it is designed for extensibilty rather than efficiency; in particular, the representation of the board uses a simple numbering scheme that will allow for larger boards. (My plan is to implement Janus Chess on a 8x10 board in the next release.)
java/de/saremba/chess/app contains two applications built on-top of the chess library:
Pgn2Cgml.java parses a PGN file and converts it to ChessGML, either in SAN or TAG format. You have to give the program whether this is tournament (TOURNAMENT or SWISS) or just simple game collection. (You can also choose PLAYER but this case isn't yet handled more intelligently.) This will be extended in the future because I expect PGN files to be the primary source for ChessGML files.
The PGN parser has not received too much work. It expects standard PGN without comments and idiosyncrasies; if you have a PGN file that does not meet these requirements, normalize it with David Barnes' fine utility extract.
CgmlSan2Tag.java transforms a ChessGML (tournament) file from SAN to TAG format. This program is based on the chess library (for the semantic aspects, i.e. playing through the chess games and generating the tagged moves and perhaps the diagrams) and on the Apache XML Project's Xerces parser for the syntactic aspects (i.e. parsing the XML file).
java/de/saremba/xml/helpers contains a few XML modules that are used by the applications described before.
This directory contains some files that are not exactly necessary but make it easier to work with the ChessGML dustribution out of the box.
The fonts directory contains three TrueType fonts with chess pieces: Leipzig and Cases by Armando Hernandez Marroquin and Berlin by Eric Bentzen. They are referenced by the DSSSL scripts in the dsssl directory and are needed for printing the RTF files produces from ChessGML.
The pieces directory contains a set of chess pieces as GIF files (35x35 pixels). They have the names that are used by the XSL script that transforms ChessGML to HTML. (Each diagram is built as a table of 64 individual images, not as one image).
This directory contains several scripts, mostly written as DOS/WIndows batch files. If you have ever seen a command shell like Gnu's bash, you know that the DOS/Windows batch “language” is primitive and unaesthetic, but it's available on every Microsoft system. Sigh.
The parameters expected by the scripts are described in every individual script file. You will find examples in the file init.bat in the distribution's top-level directory.
cgmlc2rtf.bat transforms a ChessGML file in TAG format containing a commented game to RTF. You may specify the name of a catalog file that selects a language and a chess font (take one from the dsssl/customize directory).
CgmlSan2Tag.bat is a wrapper for calling the Java program that transforms a ChessGML file from SAN to TAG format. This is necessary if you want to output a ChessGML file in print or HTML format because the SGML/XML tools don't know anything about the rules of chess. Running CgmlSan2Tag.bat replaces the moves in Standard Algebraic Notation by moves in ChessGML tagged format. You specify (as third parameter) the number of halfmoves after which a diagram is inserted; a value of 0 means that no diagrams are generated.
cgmlt2html.bat transforms a ChessGML file in tagged format to HTML by feeding an XSLT script (specified as third parameter) to James Clark's XT engine. This is currently only tested for tournaments files with uncommented games; comments will be handled in a future release.
cgmlt2rtf.bat transforms a ChessGML file in tagged format containing a tournament with uncommented games to RTF. In addition to the name of the ChessGML file to be transformed you must also specify the name of the DSSSL script to be used (preferably one of those in dsssl/customize) and the name of a catalog file that selects a language and a chess font (take one from the same directory).
compile.bat is a script that forces recompilation of all the Java sources. Remember you need a Java SDK from Sun, Version >= 1.2 for compiling.
dtdflat.pl is a Perl script that calls spam (SP Add Markup) from James Clark's SP toolkit to flatten the chess DTD, which means that all the modules referenced in the DTD are included to produce a complete DTD file. You don't need Perl installed on your computer, however; the flattened DTD file is already included in the distribution. But if you want to exeriment and edit in the modules, so can make use of this script. (Do it indirectly, via xdtdflat.bat in the dtd directory!)
pgn2cgml.bat is a wrapper script for transforming a PGN file to ChessGML. It does little more than calling the Java program Pgn2Cgml.java with the correct parameters. You should give it a hint what type of file the PGN is (TOURNAMENT or SWISS). With the 4th parameter you can specify whether you want ChessGML SAN (value < 0) or TAG format (value >=0) and, in the latter case, after how many halfmoves you want diagrams to be generated (value 0 means no diagrams).
validate.bat is a very simple wrapper around James Clark's validating parser nsgmls. Use it if you want to validate a ChessGML file from the command line or from a script. (You can do this with Internet Explorer 5, but don't try it with large files like 2000-corus.tag.xml in ChessGML TAG format. I tried it for curiousity and killed IE5 halfway through when it had eaten 300 MB of virtual memory.)
This directory contains a few XSL scripts. XSLT scripts, to be exact, because the Transformation Part of the three-part XSL standard has been published as W3C Recommendation, while the Formatting part is still a Working Draft, with heavy changes between revisions. Although the last call has been made on March 27th 2000, I have decided not to invest my time into a moving target and to concentrate on things that are stable and reliable.
For a clever chess/XSL hack see http://www.renderx.com/chess.html. (Clever because they managed to implement some of the semantic aspects – move interpretation, chessboard representation – in XSL; hack because they took a lot of liberties with SAN notation and avoided en passent capturing.)
The scripts are mostly for demonstrating how easy it is to transform ChessGML files to other formats. The most interesting one, from a practical point of view, is certainly tournament.xsl.
cgml2pgn.xsl transforms a ChessGML file in SAN format to PGN. It's not surprising that this is considerable easier than the transformation from PGN to ChessGML, because a) there's no need to parse the XML file “by hand” because XSL handles all the dirty little details for you and b) a transformation from PGN to ChessGML adds information, while the reverse throws something away.
chessgml.css is not an XSL file, obviously, but a Cascading Style Sheet. It is used for making the HTML files generated by tournament.xsl look a bit nicer.
extract-games.xsl extracts the games from a ChessGML tournament file. You shouldn't try to do this by hand-editing with a text editor, because some information is only contained once in the tournament file and referenced from each game. This script collects the necessary information and puts it in every game file.
identity.xsl is the identity transform, but it does a bit more than just “copy a b”. All the attributes in the input file that have default values in the DTD are written explicitly to the output file (this is sometimes called attribute normalization). The encoding of the output is demanded to be ISO-8859-1, although this is feature not yet supported by all XSL engines (including XT).
strip-dia.xsl is just a slight modification of identity.xsl (3 lines added), yet it makes all diagrams disappear from a ChessGML TAG file.
tournament.xsl is the most useful of this small collection of XSL scripts, transforming a ChessGML tournament file in TAG format to HTML: one file with tournament tables plus one file per game, hyperlinked from the table entries. See the batch script cgmlt2html.bat in the scripts directory.