Transcription

From the Open Siddur Project Development Wiki

(Redirected from Transcription:Main Page)
Jump to: navigation, search

Contents

Transcription is a process for converting a scanned image of text into machine readable text. For the Open Siddur Project we do this by manually typing and saving in our database the text as it appears in the scanned pages of historic siddurim. This text is then proofread to make sure its content is identical to the text on the original scanned page. After proofreading, the text is encoded in an XML form that carries with it high-level structural data about the text. The encoded form of this text is then stored in our XML database.

Status of Transcriptions

Pages ready for transcription or assigned
Stage Name Pages in Stage
Ready for Transcription
Assigned to a transcriber 5

NOTE: You must be logged in to receive credit for your work.


Before you begin

Setup Your Computer for Hebrew

The default fonts on many systems do not correctly display Hebrew text with vowels and other diacritical marks. To correctly view Hebrew text, download and install Unicode Hebrew fonts.

Transcribing Hebrew texts requires you to have a keyboard setup that supports vowels and other diacritical marks. You won't need any special keyboard with Hebrew letters and vowels -- all modern operating systems allow you to change the keyboard layout.

If you're already familiar with a Hebrew keyboard layout and simply want to begin typing Hebrew in your browser, you should definitely check out Ze'ev Clementson's open source Hebrew keyboard layout bookmarklets. All you'll need is a modern browser that includes a bookmarks bar (like, for instance, Firefox).

Many modern operating systems come pre-installed with Hebrew keyboard layouts, but these layouts are insufficient for typing the full set of diacritical marks. We have been using the layout designed by *Tiro Typeworks* successfully, and we recommend its use. See the keyboard setup page for more information on installing a Hebrew Keyboard layout for your Operating System.

Have you completed installing Unicode Hebrew fonts and setting up your keyboard? If so, then continue reading this tutorial.

Familiarize Yourself with a Hebrew Keyboard Layout

Not having a keyboard with the Hebrew letters inscribed on its keys means you'll experience a time where you're unfamiliar with the keyboard layout. During this time it absolutely helps to use an on-screen keyboard to learn the placement of the Hebrew characters as well as the vowels and other diacritical marks when pressing the "shift" or "alt" keys. If you're using MS Windows, an on-screen keyboard is installed with you operating system. You can run it by clicking the Start button and selecting "Run..." and typing osk.exe and clicking OK. There are other on-screen keyboards available for Macintosh and other popular operating systems you may be using.


Some people find it easier to type out Hebrew consonants and add the nikud (vowel marks) later. Others find it easier to type both in order. However, it is very important that all diacritic marks and punctuation are to be transcribed exactly as they appear in the original source.

Transcription Rules

Please familiarize yourself with our tags and conventions for transcribing text.

Begin Transcribing!

Make certain you're familiar with our tags and conventions for transcribing text.

Logging in

Log in or create an account now, then return to this page.

Credit for Transcription Work

Credit can only be assigned to those who are logged in (otherwise, there is no record of whom to credit aside from an IP address!). Registration is free, easy, and most of all encouraged -- because we want to give everyone who contributes to this project credit for their efforts. If you would like to remain anonymous or pseudonymous, you may be credited under a name different from your real name. Please indicate the name you would like us to use in credit lists by editing your user page.

Also, please read our policy on copyright for details on how we license all work contributed through this project. All transcriptions from texts in the public domain will remain public domain.

Selecting a page for transcription

To select a page to begin a new transcription, see the box at the top of this page labeled "Workflow status", click on "Ready for Transcription." This will bring you to a list of pages that are have not yet been assigned for transcription and are available for you to take on. Pages with the word "English" in their titles involve English language transcription. You may choose to transcribe any of those pages from that list. To adopt a page you can either click "Assign me" in the "Transcription toolbox" or you can click the "Start transcription now" link in the pink box and save the page after beginning your transcription work.

Take Advantage of Your User Information

It certainly helps to take a look at your user page. This is an area where you can let other transcribers know your background and what pages you've been working on. Your user page can be accessed on the top row of this wiki -- just click on your username there and you'll see it.

To find incomplete pages that you started work on, look at the "my contributions" link on the top row of the wiki. It lists all pages that you have edited.

Working with the Transcription Interface

All of the transcription work is done inside the transcription interface. To see an example of what sort of work you'll be transcribing, here's an image of a page from the 1917 JPS. (Please open it in a new tab or window.)

When you first select a page to transcribe, you'll see an information page indicating the document's status, as shown in the image below:

The "Transcription Toolbox" links back to this tutorial and provides helpful reminders of important keys.

Note that, on this page, the text may appear right-justified and punctuation marks may be placed incorrectly. This is a technical artifact and is of no importance to the final text.

Just under the toolbox is a "Start Transcribing Now" link inside a pink highlighted line. Click this link and the page is assigned to you. This assures that nobody else will edit the page while you are working on it. (Instructions on adjusting the page status after you've begun working on it are described later on.)

When you first see the editing page, it has a page viewer on top and an edit box just below it, as shown here:

Note that you don't have to be online to transcribe text for the Open Siddur Project. To transcribe offline, first right click on the image and select save it to your computer. At your convenience, open up the image in your favorite image viewer and begin your transcription in your favorite application for typing in Hebrew. Later on, you can copy/paste your transcription from this application to the textbox in the transcription interface and click "submit" to submit your work.

Moving and Zooming Images

Near the top of the page to be transcribed, you will see an image taking up most of the width of your screen. This is the page image which is to be transcribed. Initially, it will probably appear too small to be readable, and the whole image will not be shown on screen.

To pan the image (move it within the viewport), move the mouse over the image, hold the left mouse button down, and move the mouse around (drag the mouse). It will pan in the direction of the mouse movement.

To zoom the image, hold both the shift and left mouse buttons down and drag upwards. To zoom out, hold the shift and left mouse buttons down and drag downwards. A zoomed image may also be panned by dragging it.

While you're typing, you may find it helpful to move and zoom the viewport.

Returning to a Saved Transcription

If you want to pause the transcription and continue later, click Save page. It will remain assigned to you (in "Transcribe-assigned" status) until its status is changed. You might find it very helpful to make a note of the page number you were transcribing in your user page. Alternatively, the page will be listed in your "my contributions" page.

To return to a saved transcription, go to the Transcription page and, in the box labeled "Workflow status", click on "Assigned to a transcriber." This will bring you to an index of pages whose transcription has already been assigned, and the page you worked on previously should be listed here. Select your page and when you get to the page, click on "Continue transcribing now" in the pink box.

Finishing a Page (or handing it off to someone else)

When you will no longer be transcribing the page, you must change the page status to reflect that it is now either ready to be proofread or that the transcription should be completed by someone else. Near the "Save page" button at the bottom of the transcription page is a Page status box, as shown in the image on the right.

To indicate that transcription is finished and to pass the page on to proofreading, click the Proofread-open radio button, then click Save page.

To indicate that you would like to stop transcribing the page and that someone else should continue where you left off, click the Transcribe-open radio button, then click Save page.

Getting additional help and providing feedback

Open Siddur Project's processes are all about feedback. There are expected to be lots of bugs, inconsistencies, and inadequate explanations as we strive to make the first ever free and open repository of Siddur texts available. Please (please!) send any bug reports or feature requests to the issue tracker. Even "I got here, read the docs, and still have no clue what to do" is helpful feedback. Just tell us what you did, and we'll try to correct the problem. If you have any questions or comments, don't hesitate to post to the Discussion List. (You will need a Google account to post to the issue tracker. You need to join the google group to post there.)

The Future of the Open Siddur Transcription Interface

Even though this wiki is not intended as the final interface for transcription, work that is performed here will not be lost as we advance to our new transcription interface. More ready-for-transcription pages will be added as work progresses on the ones that are already listed.

Why not use OCR for Hebrew?

With technology in its current state, manual transcription (typing) is the only reliable way to transcribe Hebrew text with vowels. Open source tools for the automated transcription of Hebrew (such as hOCR) are not yet capable of reliable conversion of images with Hebrew letters and diacritical marks into machine readable Hebrew text without requiring more work proofreading the text than would have been done transcribing it from scratch. We have not found closed source tools to be any better. Until such tools improve, projects such as the Open Siddur must depend on the manual transcription of text by humans.

Personal tools
NAVIGATION