Tuesday 13 February 2018

Getting Started with Textual Communities

Welcome to the temporary home of Version 2 of Textual Communities ("TC"), at textcomtest.usask.ca. This address will change when we are ready to go fully public with TC. Until then, this is a sandbox version, and all data may disappear at any point.
If you just want to see what TC can do: choose a community from "Public Communities", and "View".

Sample files

You can get the sample files used in this documentation at www.sd-editions.com/tc. You can download all the files in this directory in a single zipfile at www.sd-editions.com/tc/tcstart.zip

Logging in

Here is what you see:

Press the inviting "Start" button, and you will be asked to log in by social media, or create a log-in using your email address. If you do the latter, you will be sent an email to that address to confirm your registration. (Note: TC uses email addresses to uniquely identify each user).

Creating or joining a community

When you first log in as a new user, the Start button has changed:

The "Create Community" button brings you to this screen:

The two compulsory fields, "Name" and "Abbreviation", are marked with *. Note the accessibility options: you can hide your community from everyone, or allow anyone to do anything, and many options in between.

Your first document: an XML file

Once you have a community, you need documents! The "Start" button at the centre of the screen has changed again:

Choose "Add Document" and you are offered two choices:

This time, select the "XML file" option. TC likes TEI! Here is a very simple example of a TEI/XML file, optimized for TC use:

<?xml version="1.0" ?> 
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt><title>Fairfax</title></titleStmt>
<publicationStmt>
<p>Draft for Textual Communities site (spelling modernized)</p>
</publicationStmt>
<sourceDesc><p>Murray McGillivray</p></sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
  <pb n="130r" facs="FF130R.JPG"/>
<div n="Book of the Duchess">
        <lb/><head n="Title">The book of the Duchesse</head>
        <lb/><l n="1">I Have great wonder/ be this light</l>
        <lb/><l n="2">How that I live/ for day nor night</l>
        <lb/><l n="3">I may nat slepe/ wel nigh nought</l>
        <lb/><l n="4">I have so many/ an idel thought</l>
        <lb/><l n="5">Purely/ for default of sleep</l>
        <lb/><l n="6">That by my truthe/ I take no keep</l>
        <lb/><l n="7">Of no thing/ how it cometh or goth</l>
        <lb/><l n="8">Ne me is no thing/ leief nor loth</l>
        <lb/><l n="9">Al is y like good / to me</l>
        <lb/><l n="10">Joy or sorrow / where so it be</l>
       </div>
  </body>
</text>
</TEI>

There are a few things to note about this file:
  • "Content" elements with "n" attributes (<l n="1">) are especially important to TC. TC uses these to identify all content sections. Thus: the first line is labelled by TC as "div=Book of the Duchess:l=1", and TC then uses this identifier to locate all versions of the first line in every document
  • Note the explicit use of <lb/> elements to mark each document new line. TC uses the implicit hierarchy of page, column and line breaks (<pb/> <cb/> <lb/>) to construct a "text-tree" for each document, alongside the "text-tree" it creates for the hierarchy of <div> and <l> elements.
TC's understanding, that every text is composed of two distinct text-trees, one for the document (<pb/> <lb/> etc) and one for the act of communication represented in the document (<div>, <l> etc), is what separates TC from other systems for creating scholarly editions.

Adding more documents, adding images

After selecting "XML file" you will get this dialogue:

Choose the file "Fairfax.xml" from the sample files (see above), give it the name "Ff" (or similar), and press "Load".
You will receive various encouraging messages, and the window should change to show you the sigil for this manuscript in the left hand pane:
Click on the arrow beside Ff to see the pages in Ff, and then click on the first page. Its transcription will now appear in bottom right pane:

Now, you can add an image to the page. You can do this in several ways:
  • Click on the "Add Image" button in the top-right pane, or the camera icon beside the page number "130r". You will get a box inviting you to choose an image file or drop it onto the dialogue. Choose FF130R.JPG from the sample files.
  • You can load multiple images by putting them all in a folder, zipping the folder, and then clicking on the ZIP icon next to the manuscript name. Choose FairfaxImages.zip from the sample files.
In either case, you will see the image appear in the top right pane. The red camera icon beside each page which now has an image will turn black. If you have all the images for the manuscript, the multiple image icon (two cameras above one another) will also turn black:

Play around with the other icons on this page. Try pressing the "Save" "Preview" and "Commit" buttons, to see what happens. (Note: "Commit" will write the page to the underlying database.)
Add another document by clicking on the + icon in the left hand pane. Again, choose the "XML file" option, this time add "Bodley.xml" from the sample files, with the name Bd.

Collation

The power of Textual Communities may be seen in the Collation system. At the top of the left panel, click the "Collation" tab:

In TC terms, an "entity" is a discrete segment of an act of communication: a line of poetry, a paragraph of prose. Click on the arrow beside "Book of the Duchess" to open up the entities (lines of poetry) within it:
(The order of these may vary.) Now, click on one of these lines. You will get this advice:
So, go to that menu:
Choose a base text (it does not matter which). Now, go back to click on line 1 in the collation. The right hand panel will change, to present the wonderful Collation Editor (developed originally for the Greek New Testament editing projects at Münster and Birmingham):
(You may need to make the window larger to see the menu at the bottom of the pane). Spend some time playing with this. You can regularize variants (e.g. remove the variant wonder/wondir) by dropping one word on another:
After choosing "Save", you will see that both manuscripts now have the reading "wonder":
Play with the settings menu. You can change how the collation works from this menu:
You will see how the collation changes as these selections change.
This brief introduction gives only a glimpse of the power of the Collation Editor. Try the following, for example:

  1. Go back to one of the documents, change line 1, commit the change (this writes it to the database used by the collation), and return to the collation. You will see your change there.
  2. Now, for fun: go to the second page of Ff (130v) and have line 38 continue from the previous page onto this page and add something to it. Hint: change the "From previous page" value:

Then, commit this change and return to the collation. You will see that line 38 now includes this extra text, across the page break. You can view the XML for this page by clicking on the XML icon beside the manuscript name, to comfirm that the line indeed continues across the page break:

Other facilities

There is a great deal more in TC than this sketch shows. It is particularly rich in community management features, as follows:
  1. You can invite other people to become members of your community (click on the "Members" link when you have chosen your community, or on the "Member profile" item on the log-in menu) and follow the "Invite" link
  2. You can change the status of any member, assign them pages to transcribe, check the progress of the transcription, assign them someone to approve their transcripts (the "Members" link for each community you lead)
  3. You can permit other people to join your community without need of your approval, or require that anyone who wants to join must be approved by you ("Member profile" on the log-in menu)
Further, you can permit anyone to access pages, whole documents, or any part of the text of any document, and import it to their own website.

Copyright, etc.

We encourage anyone contributing materials to TC to make these available under the Creative Commons Attribution (CC-A) license. That is: no share-alike and no "non-commercial" restrictions. This means there should no restrictions at all except requiring all subsequent users of the material to acknowledge your part in making it.
For the time being: TC will accept materials which do have restrictions on them. However, it is likely that TC in future will require that all materials held on TC servers are free of all restrictions (CC-A or similar). This is because TC uses University of Saskatchewan and Compute Canada servers. As both are publicly funded, hosting materials with any kind of copyright restrictions raises legal and ethical issues.
If this is a problem for you, you should not use TC.

Some interesting features of TC

Here, in no particular order, are some aspects of TC which make it unusual, even unique:
  • TC is built on an explicit ontology of texts, documents and works. Various of my publications describe this ontology (see https://www.academia.edu/12297061/Some_principles_for_the_making_of_collaborative_scholarly_editions_in_digital_formhttps://www.academia.edu/9575974/The_Concept_of_the_Work_in_the_Digital_Age_published_version_https://www.academia.edu/3233227/Towards_a_Theory_of_Digital_Editions). Briefly: TC sees text as a collection of leaves, with all leaves present on two distinct trees, each of which conforms precisely to the "OHCO" (ordered hierarchy of content objects) model. One of the trees represents the document (codex/quires/pages/columns/lines). The other tree represents the act of communication ("entity") inscribed in the document: as Play/Scenes/Acts/Lines, or Poem/Stanzas/Lines, etc. Note that this is not simply a matter of "overlapping hierarchies", as usually characterized. It is actually two quite distinct trees: distinct to the point that branches and their leaves might appear with quite different orders on the two trees (as in the case of notes or alterations spanning across the margins of multiple pages, etc.) Broadly, TC uses the 'document' tree to display the document page by page, line by line, and TC uses the 'entity' tree to locate units of text across multiple documents for collation.
  • XML and all the tools associated with it famously supports "one text, one tree". (Long ago, XML's predecessor SGML did attempt to enable multiple trees in any one text through the CONCUR feature. I never did discover a useful implementation of CONCUR.) Over some twenty-five years, I have tried to manipulate the two hierarchies using a variety of tools (most prominently, the Anastasia publishing system). One problem was that for long I thought the problem was simply "overlapping hierarchies", and not the more demanding scenario of two distinct trees. Another problem was the inefficiency of XML tools. Accordingly, while TC uses XML as its standard input format, it creates the two distinct trees from the XML and then stores the two trees not as XML but as a series of JSON documents stored in a MongoDB backend. In essence, the text is a collection of leaves stored in JSON fields, with each leaf also stored in distinct JSON documents representing the two trees. Over the last decade I have attempted to express this model  with three different database systems: first, XML in the form of XML-DB; then SQL in a relational database (underlying the first version of TC, still to be seen at www.textualcommunities.usask.ca), and finally JSON. JSON wins. A key reason for the success of JSON was the requirement that we be able to edit pages in real time: that is, take out a chunk of each tree, rebuild both trees as needed and then reattach the leaves of text to each rebuilt tree, all while the editor watches. Doing this in real time is like gathering leaves in a howling gale. As a bonus, JSON (much more than XML) is the native language of web content, with an immense range of Javascript/HTML tools available to process it.
  • Technically: TC is build in pure javascript, using node.js and npm tools (https://nodejs.org/en/https://www.npmjs.com/), for both server and browser components. This makes maintenance, etc, far easier. TC also uses the Angular framework to provide all interface components (https://angularjs.org/; drawing on the Bootstrap and JQuery libraries). This architecture was designed by Xiaohan Zhang between 2012 (when we realized that the SQL solution would not work) and 2015. All code is freely available on Github, at https://github.com/DigitalResearchCentre/tc.
  • Theoretically: there is no limit to the number of trees structuring every text. TC supports two. Best of British luck to whoever wants to deal with more than two.
  • TC uses a IIIF server and viewer software (http://iiif.io/). In the future, we want to broaden our support for IIIF, to import full IIIF documents, etc.
  • We would like to be obsolete very very soon. Someone please do this better than we did.








No comments:

Post a Comment