Looseleaf Binder

= What it is =

Looseleaf binder is [going to be] an extensible XML bindery for Python. A bindery represents XML elements as Python objects.

It is intended as a support library for the Jewish Liturgy Project's XML applications. It is not intended to represent every possible corner case of XML.

Why a new bindery?
A number of XML libraries already exist for Python, including the standard library DOM implementations, elementTree/lxml.etree, and other binderies: gnosis.xml.objectify, lxml objectify, and Amara bindery.

DOM is not particularly extensible and has a very non-Pythonic interface. A DOM wrapper class could do the job, but all work would then have to be duplicated. It also makes carrying around XML fragments rather annoying. Looseleaf currently uses DOM for XML serialization.

ElementTree is nice and simple, but is somewhat awkward to use for mixed content documents. Looseleaf borrows ElementTree's attitude toward XML namespaces.

lxml has many nice features, but both lxml.etree and lxml objectify use thin wrapper classes around C structures, limiting the types of extension classes that can be written.

Amara is currently undergoing a transition to a new version, which is not yet well-documented and is not guaranteed to be fully backward compatible. It is also licensed under a license derived from the Apache License v1.1, making any application that relies on it GPL-incompatible,

= Loading the library =

Any of the following lines will import the library into your program:

The remainder of the documentation will assume that the Looseleaf library is bound to L

= Loading a string =

In-place strings containing XML can be loaded with the fromString function:

fromString returns two return values. The first is the bound document. The second argument is a PrefixManager, which records the namespace prefix mappings that were used by the document (see below).

= Loading a file =

fromFile works exactly like fromString.

= Building a document from scratch =

A document may be built from scratch by instantiating Element, and adding child Node instances through the object interfaces.

= Namespaces =

Looseleaf bindery is fully XML namespace aware. All names that can be qualified are represented as a tuple:

In the list above:
 * (1) represents the name of the TEI element div.
 * (2) represents the name of the XHTML element div.
 * (3) represents the name of the element div in no namespace.
 * (4) is shorthand for (3) and is usable anywhere a name in no namespace can be used.
 * (5) represents the name of the JLPTEI option element.

During processing in Looseleaf objects, namespace prefixes are not bound to the namespace. Prefixes may be stored in a PrefixManager instance at load time, and restored at save time. Attributes in the xmlns namespace (http://www.w3.org/2000/xmlns/) will have their usual meaning of binding a prefix to a namespace at serialization time.

By default, the xml and xmlns reserved namespace prefixes are known to the bindery and will work without being declared explicitly in a PrefixManager. Also, by default, no namespace (namespace None) is bound to no prefix.

At load time, the PrefixManager is only guaranteed to remember one prefix for each namespace.

= Elements =

Elements can be instantiated manually. The following code instantiates a JLP option element, containing a TEI seg element:

Element instances support all the usual Python list operations, where the element's list contains all child nodes in document order. The following exceptions are where the behavior of Element differs from list:
 * Only certain Node objects can be inserted into the list and become child elements (Element, Comment, PI, CData, Text).
 * Indexing a nonexistent child element returns an empty list instead of raising IndexError.

Element indexing can be done a number of ways (here, we assume that TEI and JLP are defined as above, and that elem is an instantiated Element):

If no such node is found, an empty NodeSequence is returned. If one node is found, a single Node is returned. If more than one is found, a NodeSequence is returned.

The Element class presents the following interfaces:
 * elem.namespaceUri = the namespaceUri. None if no namespace.
 * elem.tag = the tag name
 * elem.name = (namespaceUri, tag)
 * elem.a = an AttributeSequence containing the element's attributes

A sequence of the Element's attribute is in Element.a.

= Attributes =

Attributes are stored in Attribute classes, which descend from Node.

The Attribute class presents the following interfaces:
 * attr.namespaceUri
 * attr.tag = attribute name. None if no namespace.  (Note: attributes ignore default namespaces, so, all unqualified attributes have no namespace)
 * attr.name = (namespaceUri, tag)
 * attr.value = the value of the attribute

Attributes are accessed via the Element's a property:

Special xml:* attributes
In addition to the normal way to access attributes, the following special attributes are available to an Element elem:
 * elem.xmlId = the element's xml:id
 * elem.xmlBase = the element's base URI
 * elem.xmlLang = element's xml:lang

Getting the value of xml:base or xml:lang will get the value in any ancestor. Setting the value will set it in the current element.

= Text Nodes =

Text nodes are represented by the Text object.

Text objects' data is stored in the value property.

= Processing Instructions =

Processing instructions are represented by the PI object:

= Comments =

Comments are represented by the Comment object.

= Navigation and search axes =

Navigation within a tree of Nodes is accomplished through a set of axes that resemble XPath axes. These navigation axes exist on all Nodes and NodeSequences, and will return an empty NodeSequence if no matching nodes exist:
 * elem.a = attributes
 * elem.parent = parent node
 * elem.ancestor = all ancestors in reverse document order
 * elem.descendant = all descendants in document order
 * elem.sibling = all siblings in document order
 * elem.precedingSibling = all preceding siblings in reverse document order
 * elem.followingSibling = all following siblings in document order
 * elem.preceding = all preceding nodes in reverse document order
 * elem.following = all following nodes in document order

Conditions
Special conditions, derived from the class Condition, can also be used for navigation. The Condition object defines two methods, test and testSequence:

The test method returns a boolean indicating whether the condition applies to the given node.

The testSequence method returns a NodeSequence containing the items in the sequence being tested for which the condition applies.

Neither function may have side effects on the original node or sequence.

Condition may also be used to test about a NodeSequence itself.

At this time, the following Condition-derived objects are defined:
 * NS, which checks whether a node's namespace URI is the same as the given URI
 * A, which selects an attribute from the current Element. If the nodeSequence is not an Element, returns empty.

Examples:

= NodeSequence and AttributeSequence =

A NodeSequence is a list of nodes (which may include any Node-derived types). An AttributeSequence is a sequence of Attribute nodes. Both sequence types support all the navigation axes.

= Saving a document =

To save a tree t to a string (1) or file (2) use:

The keyword pretty determines whether additional indentation is added to format the XML into a more human readable form. It defaults to False.

To reuse saved XML namespace prefixes, you may also pass both toString and toFile a PrefixManager instance.

= Custom binding classes =

Custom binding classes must derive from Node. Ideally, they should derive from the closest possible node type class, eg, if it binds an element primarily, it should derive from Element.

To set up custom binding, you need a custom binding class and a function that will return the derived type if the conditions are met, and return None if they aren't. A binding function may also return ExcludedNode if the node being tested should not be included in the tree.

Note that the binding function operates on a DOM node. The entire DOM, including parent and child elements can be considered accessible.

A list of such binding functions can be passed to fromString or fromFile using the customBinding keyword:

If you want to remove text nodes that contain only whitespace, then you can use the provided binding factory looseleaf.whitespaceRemovalBinder as a customBinding.

Every {http://www.tei-c.org/ns/1.0}div element will then be bound using the custom class.

Binding conditions are processed in list order.

= Iteration =

The basic iterator object used to walk a Looseleaf in document order is NodeIterator.

The following example prints the content of all text fields in the document test.xml:

= Transformation =

= Keys =

= Licensing =

Like the rest of the code in JLP, Looseleaf binder is released under the GNU Lesser General Public License, version 3 or later.