Solutions for chunking issue in translationStudio desktop

translationStudio

Last Post by Mondele 5 years ago

2 Posts

2 Users

0 Likes

192 Views

RSS

ben2018

(@ben2018)

New Member

Joined: 6 years ago

Posts: 1

Topic starter 02/06/2018 12:22 am

Background:

The desktop version of translationStudio is an Electron app that currently depends on the door43-client module to download new source text from link to example source text. Once this source text file is downloaded, it is loaded into memory as a JSON object. Finally, this JSON object is used to create files representing that source text, inside C:\Users\<username>\AppData\Local\translationstudio\library\resource_containers (on Windows). A directory structure is created for groups of *.usx files, where each folder "##" (eg: 00 for chapter 1, 01 for chapter 2, ...) represents a chapter, and each *.usx file within a folder represents a chunk. When a user loads a source text, these *.usx files are read back into memory to create another JSON object which determines the Polymer frames - the chunks that the user sees.

Also, the app can load chunks from *.tsrc archives. For example, the ULB and UDB source texts are pre-chunked archives that come packaged with the app, in src/index/resource_containers.

There are a few key node modules that deal with downloading and loading source text for a translation into memory as JSON data, and this JSONand loading it into the user interface. Source code files involved in these operations:

<project path>\ts-desktop\node_modules\door43-client\lib\main.js

<project path>\ts-desktop\node_modules\door43-client\node_modules\resource-container\lib\main.js

Problem:

The current file system and data structures representing source text pre-suppose a certain chunking - even the files downloaded from the api are pre-chunked on the server. The app does not let the translator decide how to chunk source text.

Solutions to Explore:

Keep using translationStudio and modify the code: insert code into the door43-client: keep the current data structures dealing with chunking but modify the chunk data before it loads into the Polymer frames. This could be chunked verse-by-verse so that the back-end data is chunked consistently between translations so they can be compared and conflicts resolved. Then a layer of functionality could be implemented between the back-end data and the ui where the translator groups verses into chunks as he or she pleases (viewed as chunks but saved on the back-end as one *.usx file per verse, using the existing door43-client code).

Modify source.json files on the server to

Modify the current data structure.

Take useful code from translationStudio to implement many of its features into Autographa.

(More details to come when I edit this post later...)

Quote