Transcription
From the Open Siddur Project Development Wiki
Transcription is a process for converting a scanned image of text into machine readable text. For the Open Siddur Project we do this by manually typing and saving in our database the text as it appears in the scanned pages of historic siddurim. This text is then proofread to make sure its content is identical to the text on the original scanned page. After proofreading, the text is encoded in an XML form that carries with it high-level structural data about the text. The encoded form of this text is then stored in our XML database.
Contents |
Read the Transcription Tutorial
Start your transcription by first reading one of our Transcription tutorials: for Hebrew transcription or for English transcription.
Status of Transcriptions
| Stage Name | Pages in Stage |
|---|---|
| Ready for Transcription | |
| Assigned to a transcriber | 5 |
NOTE: You must be logged in to receive credit for your work.
Credit for Transcription Work
While anonymous transcription is certainly allowed, credit can only be assigned to those who are logged in (otherwise, there is no record of whom to credit aside from an IP address!). Registration is free, easy, and most of all encouraged -- because we want to give everyone who contributes to this project credit for their efforts. If you would like to remain anonymous or pseudonymous, you may be credited under a name different from your real name. Please indicate the name you would like us to use in credit lists by editing your user page.
Also, please read our policy on copyright for details on how we license all work contributed through this project. All transcriptions from texts in the public domain will remain public domain.
Getting additional help and providing feedback
Open Siddur Project's processes are all about feedback. There are expected to be lots of bugs, inconsistencies, and inadequate explanations as we strive to make the first ever free and open repository of Siddur texts available. Please (please!) send any bug reports or feature requests to the issue tracker. Even "I got here, read the docs, and still have no clue what to do" is helpful feedback. Just tell us what you did, and we'll try to correct the problem. If you have any questions or comments, don't hesitate to post to the Discussion List. (You will need a Google account to post to the issue tracker. You need to join the google group to post there.)
The Future of the Open Siddur Transcription Interface
Even though this wiki is not intended as the final interface for transcription, work that is performed here will not be lost as we advance to our new transcription interface. More ready-for-transcription pages will be added as work progresses on the ones that are already listed.
Why not use OCR for Hebrew?
With technology in its current state, manual transcription (typing) is the only reliable way to transcribe Hebrew text with vowels. Open source tools for the automated transcription of Hebrew (such as hOCR) are not yet capable of reliable conversion of images with Hebrew letters and diacritical marks into machine readable Hebrew text without requiring more work proofreading the text than would have been done transcribing it from scratch. We have not found closed source tools to be any better. Until such tools improve, projects such as the Open Siddur must depend on the manual transcription of text by humans.
