Talk:JLPTEI
From the Open Siddur Project Development Wiki
Removed text:
There are three types of multiply sourced texts that I'll discuss in this section: (A) Texts with large variations (B) Texts with variations in the XML representation of the content (C) Texts with small variations (D) Functionally variant texts
How to indicate which choice will be processed will be the subject of a later email.
Contents |
Texts with large variations in content
A "large" variation, in this case, is at least one segment in length. When texts have large variations, a choice has to be made -- should they be considered the same text, where every variation is marked up, or should they be considered separate texts, even though they share a lot of text? If they're treated as separate texts, the methods of part II completely describe how to encode them. If the variation is encoded within the same text, the following type of encoding may be used in the original hierarchy:
<ptr target="#chc1"/>
and at some other point in the file:
<choice xml:id="chc1"> <j:option xml:id="opt1"><ptr target="#seg1"/></j:option> <j:option xml:id="opt2"><ptr target="#seg2"/></j:option> </choice>
where #seg1 and #seg2 refer to segments in the text repository.
The reference to chc1 can then be made identically in all concurrent hierarchies where the same text is needed.
Texts with large variations in the XML representation of content
Variations only in XML representation of the same hierarchy type (eg, a paragraph that may be divided into sentences ) can be handled by the methods described in part II.
Texts with small variations
Small variations are smaller than a segment in length. These variations may appear within a segment in the text repository. That type of encoding might be used where the variation involves one or more words that only differ in pointing (vowels), as shown below:
<seg xmlns="http://www-tei-c.org/ns/1.0"> <choice> <j:option><w>יִתְגַּדַּל</w></j:option> <j:option><w>יִתְגַּדֵּל</w></j:option> </choice> </seg>
When the variant text requires an instruction for the user, it should be considered to be at least one segment long. If it requires an instruction, it will likely need to be an independently addressable unit.
Functionally varying texts
"Functionally" varying texts are texts that are substantially similar, but have different forms depending on their function at any particular place in the service. An example would be the kaddish, which has 5 functionally varying forms, which share the vast majority of their text. Each functional variant may be considered its own XML hierarchy. Shared parts may then be included in each hierarchy by reference.
Linking corrsponding content
This part concerns content that has a 1:1 correspondence between itself and a text, including translations and transliterations. Other forms of correspondence of secondary material (instructions, commentary, and graphics) will be in a later section.
Translations
The project has to support multiple translations into English and other languages.*
The points that are used to link translations to the original text are the identifiers (@xml:id attributes) on the original text. Translation linkages define the nodes of alignment between the original and translated texts.
The minimal addressable unit is the word. The second minimal unit is the segment. Translations may also be aligned with respect to larger units, such as sentences, paragraphs, lines, or line groups. The primary constraint is that the translation may only be linked once within each unit of translation. If the translation is linked by paragraph, the same translation cannot also contain a linkage to a segment inside the paragraph.
Because both words and segments are stored in the text repository aside from the XML hierarchy, a translation that only links to words or segments need not redefine the complete structural hierarchy of the text.
Translation units are linked to their original texts using either tei:link[@type="translation"] or tei:linkGrp[@type="translation"]/tei:link+ .
Two examples are provided, based on Yaaleh V'Yavo, which is divided into segments in a text repository and a paragraph in YaalehVYavo.xml .
The first example (YaalehVYavo-en1.xml) shows a segment by segment linkage. In some cases, one segment cannot be translated separately, so an intermediate pointer (tei:ptr inside tei:linkGrp) is used to group the original text into a more convenient translation unit. Note that this example (1) does not redefine paragraphs or sentences. (2) Does not redefine the choice/option framework in the original. A processor that is aligning these translations must reuse the same text structure as the original document.
The second example (YaalehVYavo-en2.xml) shows a paragraph-by-paragraph linkage. Here, both paragraphs and sentences are redefined, and the choice/option structure is also redefined. The conditionals that define when to include which options and which instructions to include will also need to be redefined. Because the linkage is at the paragraph level, the processor has no way to align those blocks internally.
Translations are linked to the files where they are stored by a special link type: <ptr type="translation" target="translation-uri"/> These links can be stored in the metadata section of the file containing the original text, along with the conditional information associated with the translation.
- As a matter of policy, translations in the official distribution should be required to have a mission statement explaining the goal of the translation. That would help users figure out which translation to choose, and contributing editors figure out whether a change is appropriate. Translators are also offered the opportunity to use translation notes to explain why a certain choice was made, or possible alternatives.
Implementation note: It should be possible to implement the above by transforming the linkages into an alternate hierarchy and aligning it to the original hierarchy using the same methods as would be used in any multi-hierarchy alignment. Temporary block-level elements (such as j:translationGrp/j:translation|j:original) may need to be used.
Transliterations
In most cases transliterations will not have to be stored. An automated transliterator (translit.xsl2) can be used instead. The automated transliterator relies on user-supplied tables to convert letters and vowels into another alphabet. Two transliteration tables (Modern Israeli Hebrew->English and SBL-style academic transliteration) already exist (both in the common/ directory) and can be used as templates for new tables.
There is a distinct advantage to avoiding non-automated transliterations -- given the number of transliteration styles, if a transliteration is hard-coded, it will likely result in stylistic inconsistencies in a given printed siddur (eg, the hard-coding transliterates ת as 's' and the remaining transliteration uses a 't').
In general, a processor can process transliterations at the same time as the remainder of the text. Running the transliterator segment-by-segment should always be able to give a correctly aligned result.
In order to do on-the-fly transliterations, a special type of element is introduced:
<j:segGen type="translit"><!-- Unicode Hebrew here --></j:segGen>
This type of construct can be used in instructions so that:
<j:segGen type="translit">מַצָּה</j:segGen>
can generate something like this in the final copy:
/matzah/ (מַצָּה)
The j:segGen (generated segment) tag is supposed to evoke the same idea as tei:divGen, which represents an automatically generated division-level section.
Instructions and notes
Instructions and notes are both new text that link to parts of another text, usually, the prayer text itself or its translation.
Instructions
Instructions are meant to be displayed inline, usually before the section of text to which they refer. The text to which the instructions link are the sections of text to which the instructions apply. Instruction linkages will (usually) appear in the same file as the text to which the instructions refer. The instruction text may be stored in a separate file, and the same instruction text may be reused. Instructions themselves may be translated, and the preferred language of display is a user setting.
Instructions that represent the same idea (eg, translations of the same instruction) are grouped under a j:instructGrp element. Each implementation of the instruction is kept under a j:instruct element (which is a synonym for tei:note[@type='instruct']). Instructions may contain normal text (text with no elements, segments, sentences, paragraphs, etc.)
Example of a simple instruction:
<j:instructGrp xml:id="instruct_RC"> <j:instruct xml:lang="en"> On the New Moon: </j:instruct> <j:instruct xml:lang="he"> בְּרֹאשׁ חֹדֶשׁ </j:instruct> </j:instructGrp>
Instructions are linked to text using the tei:link[@type='instruct'] element. Unlike most links, the "instruct" type defines a directionality. The first URI in the target is the origin, the second URI is the instruction (usually referencing the xml:id of the j:instructGrp).
<tei:link type="instruct" targets="origin_uri instruction_uri"/>
In the case of the following XML (from the last Yaaleh V'Yavo example):
<tei:choice> <j:option xml:id="opt_RC"> <!-- this one points to the segment containing "Rosh Chodesh" --> <tei:ptr target="#d18e116"/> </j:option> </tei:choice>
The link to the instruction may be written as:
<tei:link type="instruct" target="#opt_RC #instruct_RC"/>
The instruction may be linked to any level of text (eg: seg, s, p, ab, l, lg) in any of the text's structural hierarchies. If the same instruction applies to more than one nonconsecutive section of text, an intermediate pointer may be used as the first target.
Instructions that are included conditionally will be addressed in a future email.
Notes
Notes link back to a part of the text. They may link into the text repository and/or structural hierarchies, and even into instructions (say, to explain detail) and they may link to original text or to a translation. The note need only be processed if the part of the text to which it refers is processed.
Notes are represented by tei:note elements, which may contain paragraph-like content as specified in the TEI Guidelines. The notes tag should contain an @type attribute, and that attribute can be used by the user to select what types of notes should appear before any additional conditional processing. *A list of allowed note types has to be enumerated.*
A file containing notes can be linked to the file is annotates using tei:ptr[@type='notes-link']**. The notes themselves are then linked back to the original text using tei:link[@type='note'] where the first target is the URI of the note target and the second URI points to the note. A single note may reference more than one text.
- I'm not sure if I like this requirement for bidirectional linkages, but I don't know how to relax it.
Conditional text
Automatically defined conditions
Every tei:choice/j:option defines a condition...
Defining conditions
Conditions can have natural language descriptions...
Linking text to conditions
tei:link[@type='condition'], conditional instructions
Conditional blocks
Conditional blocks are a concurrent hierarchy...
Choosing which conditions to evaluate
@j:set vs. element?
Metadata
Original text sourcing
Tanach
Rabbinic works
Multiple manuscripts
Bibliographic sourcing
Of the source text
For instructions
For commentary/notes
Chain of responsibility
To what level of detail?
Revision history
Copyright licensing
tei:availability ... use Creative Commons RDF?
Organization of the archive
what's a file
Independent unit?
