Intro to hacking
From the Open Siddur Project Development Wiki
|
This page introduces the interested programmer to the Open Siddur code base. This document assumes interest in the source code behind the Open Siddur technology and familiarity with common programming terms. If you're interested in transcribing texts, go here.
Note: This document is in a very rough state. The information in it should be correct, but following it may not be smooth sailing. If you see something blatantly wrong, please complain!
Joining the project
Optionally, you may fill out our survey form and/or join the opensiddur-tech email list. A second email list, opensiddur-dev, sends automated messages after each code commit and bug report.
Obtaining the source code
The Open Siddur source code is stored in a subversion repository on Google code. The checkout instructions there describe how to get the code. Non-committers are given read-only access to the repository. Commit access is granted to those who make code contributions to the project via the discussion list or through contact with the developers.
Compiling the full source code assumes you have the following programming tools:
- Basic UNIX utilities that are usually in the binutils package (cp, rm, a shell)
- Subversion source code management system (svn)
- GNU make
- A Java Development Kit from Sun or OpenJDK, version 1.6, or above.
- Installing a copy of the database using our installer requires Python 2.6 or higher.
- TEI Roma and its dependencies
- The TEI Roma script, Saxon XSLT, the TEI stylesheets, and jing-trang are shipped from our svn repository.
- libxml2 (for xmllint and xsltproc); and a Perl programming environment are expected to be on most development systems, or are easily downloadable.
Under most (all?) GNU/Linux distributions, the required utilities that we don't ship are installable as packages from the standard software repositories.
Under Apple OS X, the XCode development kit includes the required utilities.
Under Microsoft Windows, you can get a Unix-like development environment from Cygwin.
Additional setup instructions for schema compilation and validation
Some development does not require these additional steps. You must only follow these steps if you want to compile the schema or schema documentation on your own, or you want to validate JLPTEI against the schema.
Symbolic link (do not copy) the following files to somewhere in the path (note: all paths are relative to trunk). These files must be in the same directory as their jar.
- lib/jing
- lib/trang
- lib/saxon
- lib/absolutize
Organization of the repository
The top level of the source code repository is divided into a number of directories:
- trunk: ongoing development
- tags: development snapshots at key points
- branches: personal code branches where developers can break things and test new technologies without interfering with mainline development.
- wiki: content of the Google code site.
- sources: sources of texts that are not kept in trunk, usually because of their large size.
trunk
The trunk contains the following major divisions:
- code: Source code
- schema: definition files for the JLPTEI XML schema
- common: Files needed by more than one aspect of the system (XML catalog, eg)
- text: Generated texts (other than in the database)
- doc: Generated documentation, see below
- db: A mirror of the parts of the XML database versioned in subversion, including the XQuery API and collection-specific configuration files. This is mostly kept up to date by use of svn:external declarations that force `svn update` to update the mirror; generated parts are kept up to date by the `make db` target. No code editing takes place in this directory.
- lib: Third-party library and program dependencies
Make targets
$ make clean
Cleans everything that's been created by make, except params.xsl2
$ make xsltdoc
Produces detailed code documentation in the doc/code directory. The documentation is also also available on the web (although it is not always up to date with the subversion source).
$ make schema
Compiles the schema and schema documentation into RelaxNG and HTML respectively, in the doc/jlp directory. (requires TEI Roma).
$ make tanach
Compiles the Tanach XML documents from a non-XML source (requires having checked out the repository from the root)
$ make code
Compiles the parts of the code that are automatically generated (including the params.xsl2 file that stores local settings, if no such file exists).
Troubleshooting
If you get an OutOfMemory exception from Java during any of the builds, try making a new file trunk/Makefile.local containing the following line:
JAVAOPTIONS = -Xms1024m -Xmx1024m
Alternatively, if you will be running a transform on a large XML file with many links, you may need to increase the size further. For example, in order to run lib/transform.sh against the strongs.xml file, it was necessary to increase the heap size to a value over 1.4GB. The following setting can be manually entered in the shell prior to running the transform or added to your ~/.bashrc file:
export JAVAOPTIONS="-Xms2g -Xmx2g"
Input conversion
Input converters are standalone scripts that convert documents from another format to JLPTEI. These are typically one-time-use programs. The make target "tanach" runs an XSLT 2.0 input converter to convert a Tanach sourced from the Internet Sacred Text Archive to JLPTEI.
Transforms
The transforms are a set of XSLT 2.0 programs that convert JLPTEI to (a minimalist form of) XHTML.
Before running the transforms, you must run
$ make codefrom the trunk directory.
The transforms rely on an XML catalog (`common/catalog.xml`) in order to find included transforms. A script to run saxon has been included in the lib directory.
For example, when run from the trunk directory:
$ make tanach $ make code $ lib/transform.sh text/tanach/Ruth.xml Ruth.html $ cp code/JLPDemo/src/style.css .
will generate the HTML for the book of Ruth. The HTML, rendered with instructions in style.css, is viewable in any web browser.
Note that the transform takes some time to run, especially to compile the entire Tanach (Tanach.xml to Tanach.html).
You can change the amount of debugging information the transform outputs in two ways:
- Pass a parameter to saxon (through transform.sh), as in:
- lib/transform.sh text/wlc/Ruth.xml Ruth.html debug-level=4
- Edit code/common/params.xsl2 and set the $debug-level parameter, as described in that file.
Higher numbers indicate higher levels of debugging information. Level 1 is errors only, level 2 includes warnings, level 3 includes info, level 4 includes detail (a lot of debugging output), and higher levels indicate excruciatingly painful levels of detail.
To run a transform stage directly, run saxon (where $XMLTOTRANSFORM is the input XML, $OUTPUT is the output file, and $STAGE is the path to the full stage)
lib/saxon -s $XMLTOTRANSFORM -o $OUTPUT $STAGE
The output from stage1.xsl2 is the input to stage2.xsl2, the output of stage2.xsl2 is the input to xhtml.xsl2 .
Coding
A copy of the generated code documentation is available online. The online documentation usually lags behind the current head revision on subversion by a few revisions.
If you're interested in coding XSLT for the project, check out our draft coding conventions.
Some parts of the project code are separable and usable on their own. These include:
- the automated transliterator
- the XSLT Grammar Parser
- a script to turn any directory in a filesystem into a "backup" that can be "restored" into the eXist database, with control over excluded files and user and group ownership and permissions.
Web applications
Development on the web applications is in the early stages. In order to participate, you will want to have your own development copy of the database. The current development philosophy/coding tutorial is described at XRX Toolkit.
Compiler applet
A compiler applet is used in the demos, and the technology used in it will eventually be used as the basis of the program to compile a siddur recipe (an XML file that stores the user's settings) into displayable formats. This applet allows the user to choose a part of Tanach which is stored in our database, compile it on his/her own machine, and see it displayed on a web browser.
The source code is written in Java and is located in the trunk/code/JLPDemo directory. To build from source, cd to that directory, and type:
$ ant build
The builder will compile the Java, copy the necessary library files and sign all the jars. The jars subdirectory will contain the result, which you can use in any web browser. The applet will also run as a standalone application, or via Java Web Start.
The applet requires Java 1.5 to run, and runs best on systems with Java 1.6. For compatibility information, see demos.
