IRC Conference/logs/2010-03-14

 (hi)

=-= ilan_ is now known as Guest57709

 Unexpected busyness at home

 so, a dictionary definition might be kind of like a note

yes

Did you have any other ideas for integrating it?

 for the web-based version, dictionaries could be useful in a sense similar to what TanakhML does

 (that's a less useful mode in print)

 it's still a type of note


 * <-- realazthat has left freenode (Read error: Connection reset by peer)

for the web-based version, we could also do something similar to what I do in my iphone app: provide a dictionary pop-up when the hebrew word is touched

sorry

dc'd

 Similar concept...

but, that is harder to do in a paper Siddur

 Either way, it requires a linkage between words and dictionary entries

-->| realazthat_ (~realaztha@ool-182e4988.dyn.optonline.net) has joined #jewishliturgy

 Azriel needs the duct tape

=-= realazthat_ is now known as azriel

David has already annotated the wlc with the strongs dictionary refs

 So, it's another input converter

=-= azriel is now known as azriel_fasten

 linking my xml:ids to David's osisIds

since we already have a wlc input converter, we could just grab his

yes, what you said

=-= Guest57709 is now known as ilanc

 I know I'm a newbie, but it sounds like maybe it could look like the definitions found at the bottom of the page in siddur rinat yisrael

 Yes, I think that's kind of like what Ze'ev is envisioning

 in so far as option for template layout in paper siddurs go

 it's an ideas i hadn't though of. great!

 Is your development blocked on anything?

free time

<EfraimDF> Unfortunately, we can't code that for you

darn

<EfraimDF> Probably the next in logical order would be Azriel...

Same here.

I've almost got the pan/zoom jquery code up and running.

<EfraimDF> actually

<EfraimDF> Probably I should segue next

<EfraimDF> because everyone else is talking about the web app...

<EfraimDF> The XSLT transforms aren't *done*, but I think they're in a good enough state for an alpha release

<EfraimDF> In order to have an alpha release of something that a normal person can use, the web app becomes the largest dev priority

<EfraimDF> (As I commented on list, the transforms in their current state can perform all the operations that BBYO's released product can do)

<EfraimDF> The web app is a two-part beast:

<EfraimDF> A UI and a server-side API

<EfraimDF> (so then functions are XQuery functions, and the app handles XQuery)

<EfraimDF> or use an XML-RPC or REST style API of our own

<EfraimDF> which is a front-end for server-side XQuery

what are the pros/cons of each approach?

<EfraimDF> XQuery directly gives the web app all the power of the XML database

<EfraimDF> that's both a pro and a con

why is it a con?

<EfraimDF> Possible corruption/invalidity of the data if the app messes up

what's the difference between the data getting corrupted by front-end code as opposed to back-end code?

<EfraimDF> You will always get well-formed XML, you can't guarantee, for example, that there are no dangling pointers

it is indeed best to restrict as much as possile

possible*

but

it will also limit its power

<EfraimDF> I'm assuming that there's going to be a lot more parallel development on the front end side

unless the api becomes more complex

<EfraimDF> Multiple clients, for example

Front-end - yes, finally a bit I can understand.

<EfraimDF> Imagine that we write an official front end in jQuery

<EfraimDF> and someone else writes a front end for iPhones

<EfraimDF> they can share code (to some extent)

This is the siddur-creation, not the transcription, we're talking about, right?

<EfraimDF> any aspect

<EfraimDF> I don't see any reason to restrict db access dependent on the data's stage in the workflow

<EfraimDF> XML RPC means writing some URIs and effectively tunneling a calling protocol over it

<EfraimDF> (eg, SOAP messages)

<Aharon> zeev, are you comitted to iphone or is android also doable?

<EfraimDF> REST means writing URI handlers that respond to standard http methods (GET, POST, PUT, DELETE)

But when we're talking about the front-end, we're talking interfaces, right? And each interface is task-dependent.

<EfraimDF> so, for example, a POST to http://opensiddur.org/user might result in user creation

I do mostly iphone stuff, but might do android if there's a market for android apps

<EfraimDF> There's the UI (very task dependent) and there's the interface from the front end to the db (should at least look consistent)

<EfraimDF> It would be nice to have an integrated UI too

<EfraimDF> which is why everything in the UI should effectively act like a widget

<EfraimDF> disconnected from its origin page to(zeev) I have a Nexus One. Market for android is everyone who hasn't been using Verizon

<EfraimDF> for transcription editing, for browsing the page images, for post-transcription editing (proofreading, even after encoding)

<EfraimDF> is this making any sense?

Yeah.

Good that I'm writing the pan-zoom tool as a plugin.

I'll do so with the rest of the jquery code.

<EfraimDF> In terms of what blocks on what (practically)

<EfraimDF> Pretty much the whole UI app blocks on user management

ouch

<EfraimDF> (which is why I'm using that now as a springboard for thinking about APIs)

<EfraimDF> well, if you can't add or authenticate a user, you can't do much in a database

<EfraimDF> particularly if you want to keep track of who contributed what

<EfraimDF> From a UI perspective, what's the most useful way to communicate between app and db?

<EfraimDF> TEI and TEI fragments directly? (easier to work with on the backend)

<EfraimDF> some kind of bidirectionally-processed version?

give examples

<EfraimDF> specialized microformats for each type of API call (getting close to XMLRPC)

<EfraimDF> ok...

<EfraimDF> (I know Azriel thinks user management is special, but it really isn't that special )

To communicate between the app and DB - why not use JSON and AJAX- or am I missing the question?

<EfraimDF> AJAX is a protocol to(GabeSeed) how's this going for you?

<EfraimDF> JSON is a data format

I know.

<EfraimDF> the question is -- what to send in AJAX messages

Ah.

<EfraimDF> I was taking AJAX as a given to(GabeSeed) to private message in IRC type /msg username

<EfraimDF> example:

<EfraimDF> I want to create a new user

<EfraimDF> A user needs certain information (username, password, password hint?, real name (first, middle?, last), email address, institutional affiliation?

<EfraimDF> all of which will be stored in the user's profile

<EfraimDF> On the db, the profile is an XML file located in the user's /home/$USER collection

<EfraimDF> (where $USER will stand in for the username)

<EfraimDF> so...

<EfraimDF> how should the web app make a call to the db to create a new user and profile?

Why not just write CRUD (Create Retrieve Update Delete) methods for each kind of data? Each one gets a unique ID, plus an optional JSON or XML set of data.

<EfraimDF> HTTP POST the TEI profile(?) to http://opensiddur.net/user

<EfraimDF> XMLRPC way:

<EfraimDF> POST a set of parameters in an ad-hoc format to some API handler

<EfraimDF> One question is: how much TEI should the UI app need to know

The latter sounds better to me.

<EfraimDF> Which would involve writing specialized handlers for every type of data

Which I think we ought to do anyway.

That work will have to be done one way or another.

the question is where

the advantage of having it on the server

is that it is easier to write more clients

the advantage of having it on the client

is that it is easier to decouple the client from the db

(and use it for JLPTEI on say, a filesystem)

<EfraimDF> There's also the question of complex data structures

<EfraimDF> How much API do we need to make a JLPTEI editor work on the client side?

How does having it on the client decouple it any better?

well

if we completely abstract away from using TEI on the client entirely

than when the client has to manipulate a TEI file

(which is basically all it actually does to the backend)

that will have to be rewritten

<EfraimDF> There's also a question of how we want to serve linked data to the outside

admitadly, the xquery that will be running on the server would have to be reimplemented anyway

I don't think decoupling is a short term goal

<EfraimDF> (depends on how you implement your decoupling)

<EfraimDF> (if you mean decoupling from the Internet, that's different from decoupling from the db)

true, if decoupling involves bundling eXist etc. with the client, it can be easily decoupled

without major changes

I think for the sake of expediancy

<EfraimDF> (for those who don't know, eXist can run in an embedded mode in a Java application)

we should follow Illan's advice

and do as much as possible on the server

just try to put all of this type of logic in an interface that can be reimplemented later on

if required

since the UI is the biggest block

or will be

we should make the clients as simple as possible

and only later migrate more complex things to it

example?

<EfraimDF> http://opensiddur.org/user maps to user management functions

<EfraimDF> passing a SOAP message to it fills in a profile template

I hate to do this, but I need to go. I'll leave the client on, but I'll be AFK. Bottom line for me - give me specs for the front-end(s), and I'll implement.

<EfraimDF> passing a new profile to it sets up a new user

<EfraimDF> these SOAP things are going to get harder when we have more irregular data

<EfraimDF> A bibliography API, for example, that has to accept lots of different types of entries

<EfraimDF> incidentally, my order of operations will likely be: user management, bibliography management, page images, text repositories, ...

For text repositories, won't the SOAP message schema wind up being quite complex?

<EfraimDF> yes to(ilanc) thanks Ilan!

Won't that be a bottleneck in our being able to provide new functionality?

(and, also a real pain to test/debug)

<EfraimDF> someone's gonna have to write the code to deal with complex situations. What we're trying to figure out here is who's the lucky one?

<EfraimDF> Generic schemas may be the easier way to go

lucky for me I'm just a simple xslt programmer ...

the way I started implementing the encoding app

was direct xml access

and the client was responsible for everything

it would then reupload the final xml

which could then be validated before being stored

now

that is complex

but at least its all xml/TEI etc.

if we abstract too much from that

<EfraimDF> which made sense b/c you had no API. The only change for you would be that you would do an HTTP POST or PUT and either get back a 200, 201 or 400+error report

we will be reimplementing some sort of language that is simpler than TEI but still complex

<EfraimDF> that is what I'm afraid of

what we should do

is abstract the easy stuff for sure

like user management

etc.

same with the contributor/biblio stuff

<EfraimDF> biblio can get complex too

hmm

<EfraimDF> although its format is a bit more constrained

the client needs access to the text repository, the selection, the views

which pretty much makes up the entire file

are we going to repackage that in some other protocol?

make it multiple requests?

etc.

its complicated

but in the end its just a language/protocol that has the info we need

<EfraimDF> the alternative is to make it multiple requests... so, add segment to repository is one, reorder selection is another, and so on

in some cases, you might be able to get away with a restricted view

yes

thats definately more complicated IMO

right now

its simply a fancy xml editor

you would have to write a protocol for reordering

<EfraimDF> the protocol could be as simple as "PUT the new version"

yes

<EfraimDF> followed by "check if it still validates"

thats what I intended

but this is only for the complex stuff

thus for things that could be simplified

we shouldn't use the xquery api

but our own

but for complex stuff

unless its worth our while

we should stick to direct xml manipulation

until we see a way to simplify it

figuring out the requirements for something like this without trying it first is very difficult

heh

<EfraimDF> ok, so I'll work on the server providing a direct XML type API

<EfraimDF> which could have a simpler API grafted over it

<EfraimDF> (so, the default case is that the client app can do everything you can do by reading and writing XML)

btw

the data format

which everything is returned

if it is JSON has a distinct advantage

in that it can overcome javascript's SOP (same origin policy)

yes

raw manipulation is ugly

just to reiterate with an example

if you request the text repository, views, and selection of a TEI file

no matter what protocol you use

you will get back something that pretty much looks like a TEI file anyway

that will anyway have to be parsed etc.

if it is TEI

than their was no point on putting a layer of API on top

if it isn't TEI, then you now have to parse whatever it *is*

<EfraimDF> well, you need to put at least enough API on top of it to prevent a POST of obviously invalid data

well

the db can validate

right

on top of everything else

whatever format is used at the end

putting another layer on top

converting it to JSON has at least two advantages

in that it can overcome SOP

and that it is easy to manipulate in JS

and illan won't have to do it himself

btw

in case you don't know what SOP is

javascript only allows ajax requests to pages on the same domain as the javascript is executed under

which is quite annoying sometimes

it means that the webapp *must* run on jewishliturgy.org or opensiddur.org etc.

<EfraimDF> how do you make the outgoing call to overcome SOP?

I don't remember the details

http://en.wikipedia.org/wiki/JSON#Using_JSON_in_Ajax

<EfraimDF> ok...

<EfraimDF> Anything else we need to look at as a group?

is anybody here a PDF maven?

like the specs?

ah

<EfraimDF> not me...

me neither

also JLPDemo 0.4

we can get that done this week

What's JLPDemo?

<EfraimDF> the demo app

What's JLPDemo 0.4?

<EfraimDF> 0.4 is the next set of demos... things we want to include:

<EfraimDF> - auto transliteration (it'll be wrong on qamats qatan!)

http://wiki.jewishliturgy.org/Demos/Compiler/0.4

http://wiki.jewishliturgy.org/Demos

<EfraimDF> - ah right, we have a handy wiki page for it

<EfraimDF> it's a demo of the end-process for texts, assuming they're already encoded

<EfraimDF> would be nice to finish 1917 JPS Tehillim and input convert it so we could do a translated Tehillim book

So, what will "Simple recipes: build a new, valid, JLPTEI file with a selection that references data in existing files in order" use?

<EfraimDF> eg, generate Hallel or Pesukei d'zimrah

<EfraimDF> or the week's parsha

How far along is the 1917 JPS transcription?

<EfraimDF> I'd need to check back w/JB Hare to find out how much has been done aside from our work on it

any approximate %age?

<EfraimDF> but for our part --

<EfraimDF> ~85% transcribed

<EfraimDF> Needs to be proofread (the easiest job in the world!)

~85% of the entire 1917 JPS?

<EfraimDF> no, of Tehillim

<EfraimDF> I don't have accurate numbers on the whole thing

<EfraimDF> (all of the Torah is done, I think all of the megillot are done, some of kethuvim isn't)

<EfraimDF> (and possibly some neviim)

<EfraimDF> Last I checked, I think Job wasn't done (that's huge)

<EfraimDF> As of Dec 2009, here's what was left: Joshua Kings (both books) Isaiah Jeremiah Ezekiel Psalms Job Daniel

ok, thanks

<EfraimDF> As I said, I don't know who else was making progress

<Aharon> we were contributed some text of an important siddur.

<Aharon> unfortunately the text was encoded in a PDF

<Aharon> some of the hebrew in the text is readable

<Aharon> (specifically) the footnotes and commentary

<Aharon> however, the liturgy would need to be decrypted

what license is it?

send me the pdf

<Aharon> the contributor, AviChai foundation made a claim that the material should be considered Public Domain

awesome

is the text in the pdf actual text or images of text?

it probably has a weird encoding

<EfraimDF> are we sure it's upper level ASCII and not UTF-16?

if I had the PDF, I can figure it out

so, when you said "some of the hebrew in the text is readable", what did you mean?

but I've seen text in proprietary encodings

which means it would have to "decoded" to(zeev) which of your gmail addresses should i email you at?

<Aharon> the text that is readable has no nikkud

now I'm itching for the pdf

what font does it use?

is that included?

heh

Ah, could be using a non-embedded font?

<Aharon> lots of nice fonts are embedded in the pdf

<EfraimDF> the fonts being embedded makes it readable by a PDF viewer, but not necessarily as the text layer

<Aharon> zeev, which of your two gmail addresses should i email you at?

Aharon: either one, they all get to me

<EfraimDF> cc me too

<EfraimDF> nm i found it (I think)

<EfraimDF> I havent got a clue what encoding that is

Aharon: did you send it? I didn't receive anything?

<Aharon> it's in the gmail

<Aharon> you'll get it any nanosecond

<Aharon> we were also contributed a text in native Davka format

got it - both docs look ok to me. Which pages don't look right to you?

<Aharon> we can convert almost all of that text without trouble except the sheva na character is not preserved

<Aharon> the problem isn't that the pages of the AviChai texts look ok, it's extracting usable text from them

<Aharon> i think it should be obvious to everyone how incredibly valuable these texts would be if they were freely distributable in an open, standard format

<Aharon> they represent the scholarship of Dr. Avigdor Shinan of Hebrew University

<Aharon> just an aside, Rabbi Kaunfer who is working on his PhD at JTS in Jewish liturgy also recommended Dr. Shinan to me, as a very open and affable academic conributor

<Aharon> so it's nice to see a lot of this effort coming together in our project

<Aharon> zeev, can you discern the character encoding of any of the jewish liturgy texts?

only with a hex editor - is there some utility that you use?

<Aharon> a simple text editor

<EfraimDF> one thing I discovered ---

<EfraimDF> it's written backwards

<EfraimDF> \xd9 = pe

<EfraimDF> \xc8 = yod

<EfraimDF> \xcf = lamed

<EfraimDF> \xc2 = vav

<EfraimDF> \u2030 = heh

<EfraimDF> note that it's partially 8 bit codes and partially 16 bit codes

<Aharon> interesting

<Aharon> how are you analyzing it?

<EfraimDF> loaded the title page's text (no vowels) into a Python unicode string

<EfraimDF> determine that ASCII space = space

<EfraimDF> and count letters

can it be backward utf8?

or is it entirely custom

<EfraimDF> it's not UTF-8

<EfraimDF> the tav corresponds to DOT ABOVE

I like the font

it looks artscrolly

I know u prob don't

<Aharon> heh, it's the Koren font

<EfraimDF> Artscroll uses Hadassah

<EfraimDF> he's probably looking at the part that's in Frank Ruehl, which resembles Hadassah

they are all embedded subsets

are the fonts also released to us?

<EfraimDF> no

or is that impossible

<EfraimDF> you should look at Culmus Ancient scripts Keter

I read ur posts

on jtech

that was funny

mention of fonts

and we pounced

<Aharon> other fonts in the doc (non of which are open): Sucariot, Narkisim, Palatino

<EfraimDF> the fonts are totally irrelevant

if you write up a table

of correspondence

I can write a parser

using spirit

and ebnf

<EfraimDF> why not just use a ... um... lookup table?

well

<EfraimDF> it doesn't

ok then

<EfraimDF> it's really just mapping high ASCII

<EfraimDF> It's about 4 lines of Python + the lookup table

keep in mind the backardness might be per-line

<EfraimDF> it is

I hope we don't have to resort to backward text

thats almost as good as drawing vectors

<EfraimDF> this was backwardsed by Adobe Distiller

I was trying to find the specs for bidi for pdf

so I could use the primitives to copy the spec


 * <-- GabeSeed has left freenode (Quit: Page closed)

to(EfraimDF) Gabe said he'd get Heshy to join us next time around

<Aharon> thanks Azriel. looking forward to it

heh

I don't know how possible it is

<Aharon> no longer feeling the itch?

I haven't had that much time to kill on it recently

its really quite frustrating though

how many libraries there are

that *almost* support what we need

<Aharon> oh, i was talking about decrypting these two pdfs

efraim is probably going to do it

<Aharon> w00t!

as he is best at decoding the tables

and the actual script is quite easy

<EfraimDF> heh -- I was going to write a relatively quick user management thing

<EfraimDF> and see what it takes to code what I was talking about before

<EfraimDF> deciphering the tables just takes time

I can do the script then

<Aharon> with this text converted, i can begin making some very nice proof-of-concept printed siddurim, to help give folk an idea of what's possible

Aharon:we would still need the encoding app to do that

<EfraimDF> the script should take no time

lol

<EfraimDF> Aharon -- I don't see how you can do that w/o encoding?

<Aharon> it's the end product people are wondering about

<EfraimDF> right, but any moron can put text in a word processor

<Aharon> in other words, mockups need not be virtual

<EfraimDF> what differentiates your mockup from something someone puts out in a wp?

<Aharon> to show folks in person

<Aharon> it isn't so different except that the sourcetext we're using makes a great argument for the importance of cooperating on sharing texts

<Aharon> having a document in hand which shows what a remixed text looks like, with actual remixed text freely licensed is convincing

<EfraimDF> what about a plain-text input converter?

<EfraimDF> I've wanted one for a while, but havent had the time to do it.

<Aharon> what does it do?

<Aharon> converts text to __ ?

<EfraimDF> take simple commands from a text file (newline = new segment, double newline=new paragraph) and converts to valid JLPTEI

can you put that into ebnf?

<EfraimDF> one could

<Aharon> what is ebnf?

<EfraimDF> I would think it would be the harder way to do it

ebnf would define what you mean

newline == new segment etc.

<EfraimDF> oh, you didn't get what I was saying...

Would the purpose of a "plain next input converter" be to provide an end-user with the ability to enter ad-hoc texts which could subsequently be linked to existing texts via the os client app?

text*

<EfraimDF> the purpose would be as a stopgap until we have the client, which would have the functionality of the plain text converter incorporated in

you would still need the client in order to create the composite doc

<EfraimDF> basically, to generate text repositories and encodings that work, but are probably semantically incomplete or wrong

<EfraimDF> The demo tool we have now could do that

<EfraimDF> (Unless I'm misunderstanding something)

I thought the demo tool just displayed a book from the Tanach? Is there another demo?

<EfraimDF> the demo tool could display *any* JLPTEI

<EfraimDF> all it does is call the db and ask "what files are available"?

<EfraimDF> and then call the XSLT transform to paste them together and display them

<Aharon> we just don't have any other JLPTEI formatted texts besides the TaNaKh, correct

<EfraimDF> correct

<EfraimDF> which is why a simple converter might work

<EfraimDF> we could take either Gabriel's text (or AviChai's, with the character encoding conversion)

<EfraimDF> do some simple manipulations of it involving pressing a few times and saving it as text

<EfraimDF> and that would make a first-pass encoding

<EfraimDF> (which would likely be semantically wrong, but would work)

<Aharon> the output of such a script would be stored where?

<EfraimDF> as of now?

<EfraimDF> you save a text file containing the UTF8 encoded hebrew, pass it though the converter, it saves an XML file

<EfraimDF> you upload the XML file to the db

<EfraimDF> same way the tanach got in the database

<Aharon> if it's a matter of converting new lines and paragraphs to jlptei, then that can be done without a custom script

<EfraimDF> ?

Does the Keter-YG font look a bit "odd" to anyone else except me (on a Mac)? For me, the KeterAramTsova font displays non-consonants much better.

<EfraimDF> looks ok to me

same as KeterAramTsova?

<EfraimDF> it looks different

<EfraimDF> but positioning looks the same

yes, but degeshim are positioned ok?

<EfraimDF> i thought so

must just be me then

<EfraimDF> mac's handling of opentype?

yes, that's what I meant

I've emailed to you 2 examples - presumably, Keter-YG displays the degesh better than this on your linux pc?

lunch time - bye all

<--| zeev has left #jewishliturgy ("ERC Version 5.2 (IRC client for Emacs)")

-->| realazthat_ (~realaztha@pool-68-237-99-249.ny325.east.verizon.net) has joined #jewishliturgy


 * <-- azriel_fasten has left freenode (Ping timeout: 258 seconds)

-->| realazthat__ (~realaztha@pool-68-237-99-249.ny325.east.verizon.net) has joined #jewishliturgy


 * <-- realazthat_ has left freenode (Ping timeout: 245 seconds)