Talk:TaNaKh XML to XHTML Conversion Demonstration

CUT TECH INFO

A note on the CSS
If you don't like the way the webpage looks, and think you can do better, feel free to play with the CSS style sheet and get back to us! The XHTML is produced using the JLP Compiler App(let), which is explained below.

Some things to look for
The archival format represents the semantic structure of the text. The CSS styles it for output. To see how differences in textual structure can be represented visually in the applet, compare Genesis 1 (בְּרֵאשִׁית א), represented as prose; Psalms 92 (תְּהִילִּים צב), represented as poetry; and Exodus 15 (שְׁמוֹת טו), mixed prose and poetry.

Techno-speak
We have reached a milestone in our project. Efraim has developed XSL stylesheets, that can transform the raw data in our database to XHTML. The actual XHTML used is a small subset of the XHTML standard, which we are referring to for now as muXHTML, or micro-XHTML. The plan is to use the box model of XHTML to easily be able to transform the markup further to PDF etc. (through some other chain). The XHTML is mainly div tags that are heavily classed. To produce any proper documents (anything more than a simple web page printout) from this XHTML, a very advanced upgrade of CSS is required, one which no known free software provides. For now it is a good demonstration, and can be used for online applications and other computer applications.

Browser Compatibility
Thanks to http://browsershots.org/, I was able to obtain screenshots of different browsers rendering muXHTML. I have uploaded the images here. A description of the test is in browsertest.txt. Feel free to go through the screenshots and fill out the table.

The next step
The next step is to allow more complex raw data to be converted and handled well in XHTML. For example choices (eg. only say this on shabbos, etc.), out of line text (eg. footnotes, side notes, annotations, commentary etc.) and overlapping hierarchies (eg. a sentence that spans over the boundary of a pasuk, etc.) have to be able to be converted into good XHTML, that can be classified well using CSS.

JLP Compiler App(let)
To produce the XHTML we provided a demo applet searches the database for entry points, and provides the user with a menu of choices to compile. For example, one can pick Ruth, or Ruth perek 1, etc. (only tanach is up there). After a short while, the page will display the output, which can then be saved using the web browser just as any page. To run the applet, simply visit the link below. In order for the applet to work, you must trust it. The applet and its libraries are quite large (10 MB range), and can thus take some time to load. Further, the processing can take some time, so be patient and let it run for a few.

Link to JLP Compiler App(let): http://jewishliturgy.org/dump/jlpdemo/jnlp/jlpdemo.htm

Techno-speak
The applet is pretty much a driver that bundles up the transforming process and performs a transform on some chosen XML data, both (the transform itself, and the XML data) downloaded from our database, and then presents the resulting XHTML to the user.

The applet is unique in that it hosts a local web server on the client side, and uses that web server as the user interface and the serve the XHTML output that it generates. The entry points are discovered using an XQuery against the eXist XML database that we host, which contains all of the raw data. The transforms are performed using the Saxon XSLT library, and Efraim's beautiful XSLT stylesheets. The Ezra Fonts are served from the applet as well, and embedded into the XHTML output via the CSS stylesheet (also served) using @font-face from CSS2 (I'd bet it'd be hard to find another applet which does that!).

One of the main advantages of using an applet, is that an applet offloads the heavy processing from the server to the client.

The applet can just as easily be run via JNLP (Java Web Start) and as a regular Java jar application. It is signed (with a random key for now) to give it the necessary permissions to operate.