How To Publish a Ga...
 
Notifications
Clear all

How To Publish a Gateway Language ULB on WACS

1 Posts
1 Users
1 Likes
160 Views
lversaw
(@lversaw)
Member
Joined: 6 years ago
Posts: 14
Topic starter  

This article describes how to publish a Gateway Language ULB project, for use as a source text for translating into other languages. This article deals with ULB projects created in tS or BTT-Writer (BTTW). Briefly, the resource must be approved for publishing, and the files must be compiled together into USFM files, one for each Bible book, and wrapped into a β€œresource container.”

This article addresses the documentation and approval steps, and other administrative steps done in PORT. But it focuses mainly on the technical steps.

Prerequisites

  • The books have been translated and checked to level 3.
  • Each of the books has been uploaded to the Door43 or WACS server.
  • The reader has access to PORT Publishing.

Create a Publishing Request in PORT

To initiate the publishing process, somebody creates a Publishing Request record in PORT, sometimes called an RPP. The request includes the URL of all the material to be published. The request also includes the required documentation, including copies of signed agreements and license files.

Approve the Request

Whoever is authorized to approve publishing requests checks the request in PORT, including the list of URLs and the signed agreements. That person approves the request when all the documentation is in order.

Compile to USFM files

Once the request to publish is approved, proceed to the β€œConvert and Merge” step. In this step, a content technician merges all the little text files from the repos and converts them into true USFM files, one per book.

First copy the entire collection of uploaded files to a computer for processing.
I do this by usingΒ git clone commands. For example,

git clone https://git.door43.org/Ariel/tl_gen_text_ulb_L3.git

git clone https://git.door43.org/Ariel/tl_exo_text_ulb_L3.git

and so on…

The download will include some 13,000 text files, organized into folders which represent the chapters of each book. There are several kinds of text files.

  • Each repo should contain one manifest.json file at the top level. It identifies the target language, source language resource and version, translators, and other information.
  • Each repo should contain one 00/title.txt or front/title.txt file. This file contains the translated book title.
  • Each chapter folder may contain a title.txt file, which is the translated chapter title. These files are not always present.
  • The vast majority of text files are USFM snippets containing one to four verses each, representing one β€œchunk” of Scripture. These are named according to the first verse in the chunk: 01.txt, 04.txt, etc.

The conversion process combines all the text files for each book into a corresponding .usfmΒ file. TheΒ txt2USFM-RC.pyΒ program does this conversion, generating a set of correctly named .usfm files. To use this script, or any other script found inΒ  https://github.com/unfoldingWord-dev/tools/tree/develop/usfm , you must have a working Python environment set up on your computer. Set a few variables at the top of the script, and let it run.

Verify theΒ .usfm files, and make corrections

The txt2USFM-RC.py converter fixes many errors as it runs. But invariably, many issues remain. Here is a partial list of issues to detect and resolve.

  • Valid USFM. No unrecognized markers.
  • No non-numeric or otherwise invalid chapter or verse numbers.
  • All chapters present and in correct order.
  • All verses present in each chapter. Exception: allow verses that are not in certain manuscripts (footnoted) to be missing.
  • All verse numbers in increasing order.
  • Verse bridges appropriately marked.
  • No extra or duplicate chapters or verses.
  • No empty chapters or verses.
  • No verse fragments.
  • USFM header has the required fields with correct information: \id, \ide, \h, \toc1, \toc2, \toc3, \mt1.
  • Paragraph marker after each chapter marker before verse 1.
  • Paragraph markers and \q# markers should occur after \s# markers.
  • \s5 marker to mark the beginning of each chunk.
  • Footnotes properly marked. Don’t let them remain inline with the words of Scripture.
  • Resolve unresolved conflicts, typically marked by β€œHEAD”.
  • Remove spaces where necessarily to resolve floating quotation marks and other characters. These are quotation marks and other punctuation with spaces on both sides. We need to correct these because when there is a " space " between a quotation mark and the phrase quoted, the quotation mark can end up on a line end by itself making it harder to read.
  • Fix other punctuation that is clearly wrong.

This is an imposing list. Finding the issues requires a linter tool like verifyUSFM.py. I use four levels of correction:

  1. usfm_cleanup.py to fix issues that are common to most translations
  2. Custom scripts, when I find a lot of issues common to a particular translation
  3. Manual editing of the .usfm files
  4. Request fixes from the translators in the field.

Create aΒ manifest.yamlΒ file for the ULB project

At this point, you are almost done creating a Resource Container (RC). Resource Containers are documented at https://resource-container.readthedocs.io/en/latest/ . The last file in the Resource Container for a Bible is the manifest.yaml file, whose purpose is to describe the resource. Here are the steps and important points about making the manifest file.

  • Borrow a known good manifest.yaml file from another project as a template, but review every line in it.
  • Follow the specifications inΒ  https://resource-container.readthedocs.io/en/latest/manifest.html .
  • Must use UTF-8 character encoding, with no BOM.
  • Copy contributor names from the manifest.json files, and the names of translators provided by the field, and the names of those who signed agreements, and any other source of names that you have. (Note that txt2USFM-RC.py produces a list of contributors that it culls from the manifest.json files. The list is output to a contributors.txt file.)
  • Ensure quotes aroundΒ version number strings.
  • Update theΒ issued andΒ modifiedΒ dates when any content changes.
  • Modify only theΒ modified date if just metadata (manifest) changes. If it is just a cosmetic change of no value to end users, do not even modify theΒ modified date.
  • Increment theΒ version string whenever theΒ issuedΒ date changes.
  • TheΒ language|title field should be localized if possible.
  • TheΒ subject field must say β€œBible”.
  • TheΒ projects section should list each book, in correct order, with its vernacular title. The example below shows a Swahili entry for 1 Chronicles. (Note that txt2USFM-RC.py builds a list of correctly ordered and formatted projects entries for you, in a file named projects.yaml.)

-
title: '1 Mambo ya Nyakati '
versification: ufw
identifier: '1ch'
sort: 13
path: ./13-1CH.usfm
categories: [ 'bible-ot' ]:

  • Check manifest.yaml using with theΒ verifyManifest.pyΒ script. That script reports errors and potential errors. If the manifest is not valid YAML, the script should crash.

Upload to WACS

Now that you have a validated set of USFM files and a validated manifest.yaml file, you have a publishable Resource Container.

Create a repository in the RPP organization on WACS, or fork one from WA-Catalog if possible. The repository name should include the language code, an underscore, and β€œulb” or β€œudb”. It should contain a README.md file and a LICENSE file. Upload your validated files to this repository.

As a final verification step, check the rendering on WACS by using the See in Reader button on the RPP repository. Check the table of contents, the overall appearance, and read the warnings. Address whatever doesn’t look right and whatever warnings you can fix.

If the RPP repository is a fork of the WA-Catalog repo, create a Pull Request to merge the contents into WA-Catalog.

Final Steps

Mark the β€œConvert and Merge” step complete in PORT. After this, someone needs to merge the RPP repository into the WA-Catalog organization. After that, an overnight process will do the final processing necessary to publish.

The following day, verify that the newly published resource is available to download and use in BTT-Writer.


   
Quote
Share: