Scan index

From Open Siddur Project Development Wiki

Jump to: navigation, search

Scan index files connect page numbers in a scanned source to an sequence of images of the source.

The files have a normal TEI header, and are born-digital documents.

The tei:body of a scan index contains a single tei:div[@type='scan-index']. This division contains only a single tei:list. Each tei:item in the list has an @n attribute that references a page number as written in the book, and a single child tei:ptr element. The element's @xml:id attribute must be a "p" followed by the page number in Arabic or Roman numerals (i, ii, iii, 1,2,3,4...), except for cover pages (frontcover, backcover) or the title page (title). No zero padding is needed. The @target attribute contains the URL of the page image.

An example of a (short) book that is stored in the scan directory on the database is shown here:

<tei:list>
 <tei:item n="title"><tei:ptr xml:id="ptitle" target="title.jpg"/></tei:item>
 <tei:item n="1"><tei:ptr xml:id="p1" target="1.jpg"/></tei:item>
 <tei:item n="backcover"><tei:ptr xml:id="pbackcover" target="backcover.jpg"/></tei:item>
</tei:list>

An example of a book that is stored on Google books is shown here:

<tei:list>
 <tei:item n="frontcover"><tei:ptr xml:id="pfrontcover" target="http://books.google.com/books?id=9tY8AAAAYAAJ&amp;pg=PP1"/></tei:item>
 <tei:item n="title"><tei:ptr xml:id="ptitle" target="http://books.google.com/books?id=9tY8AAAAYAAJ&amp;pg=PP5"/></tei:item>
 <tei:item n="א"><tei:ptr xml:id="p1" target="http://books.google.com/books?id=9tY8AAAAYAAJ&amp;pg=PP9"/></tei:item>
</tei:list>


Because the book may be reconstructed from scans using the index file, the index must appear in page-order. Scans that are missing pages may point to a page giving an HTTP 404 error.

Scan index files, by convention, are located in /scans/BIBLIOGRPAHY_ID/scan-index.xml

This way, any page N can be located by /scans/BIBLIOGRPAHY_ID/scan-index.xml#pN

Personal tools
NAVIGATION