BasicCAT Documentation¶
Introduction¶
BasicCAT is an open source computer-aided translation tool, which aims at providing a simple and useful tool for translators. The name is BasicCAT, because of its simplicity and its programming language—Basic. Basic is easy to learn and everyone can build a suitable CAT tool based on BasicCAT’s source code.
It has the following features:
- Translation Memory
- Terminology Management
- Language Check
- Select words to get meanings
- Quickfill
- AutoCorrect
- Interactive Machine Translation
- Export Word for external review
- Export bilingual paragraph files
- Export to markdown with notes
- Merge and split segments freely
- Online Dictionaries Integration
- Support Machine Translation Services Provided by Google, Microsoft, etc.
- Pretranslate based on translation memory and machine translation
- Support common file formats: TXT, idml, xliff, gettext Po
- Support three translation standards: TMX, TBX and SRX
- Sharing translation memory and term online
- Version Control and Collaboration Using Git
Getting Started¶
New Project¶
Click “File”->”New”, choose “en to zh” to create English to Chinese projects and “zh to en” to create Chinese to English projects. Choose other language pair to specify the source language and target language of the project. Translation Memory and Term Base will be created at the same time.

You can also enter the required language code that complies with the ISO 639 standard by yourself. See the detailed info here.

Save the project before further operations.
Add File¶
When a project is opened, a list of item will show in the left area. You can manage project files, translation memory, term base, view statistics and preview.

Right click “Project Files” to add files or add folders.

Click filenames to open files. The interface will look like this when a file is opened. Every function area is marked out in the picture. Input your translation in the right textarea. After one segment’s translation is done, press “Enter” to go to the next one.

Translation Memory¶
After you press “Enter” to finish one segment, the translation will be added to the translation memory. When translating a similar segment, it will appear in the lower area. Click that match to fill the translation into the textarea.

Translation memory’s match rate can be set in Project Settings. The rate should be between 0.5 to 1.0.

Add External Translation Memory¶
There are two types of translation memory in BasicCAT. One is project memory and one is external memory. Project memory stores memories created when translating the project’s files, while external memory show imported translation memory.
Click “Project->Project Settings”, and in the “TM” page, you can manage external translation memory.
You can import TMX files or tab-delimited txt files. For txt files, the source should be in the first row and the target should be in the second. A preview window will appear when you add a new file.


When a segment from external translation memory is matched, a filename will show to indicate where that match comes from. The source text in the translation memory, the source text of the current segment and the target text in the translation memory will show in the differences display area.

Manage Translation Memory¶
Click “Translation Memory” in the Project area to open TM Manager. You can search, export, edit or remove translation memories.

Terminology Management¶
When a sentence contains terms, you can select the corresponding texts in the source and the translation to add terms. BasicCAT uses opennlp to lemmatize words. So, if you add a term in its plural form, BasicCAT can detect its singular form in another segment.

Right click on the term item to view more info and its history.

Attention
As an external term database may contains thousands of entries, BasicCAT uses a HashMap algorithm to match terms. Only the text in the source will be lemmatized. Terms in external termbases will not be lemmatized. So when adding a term, it is better to add in its original form.
Importing terms is much the same as importing translation memory. TBX and tab-delimited txt files are supported.
Term manager is also similar to TM manager. The difference lies in that tags and notes can be added for terms.

Segments Manipulation¶
BasicCAT uses the SRX segmentation standard to segment the text. A segment can be a sentence or a phrase.
Merge and Split Segments¶
If you come across a wrongly segmented name like below, you can move the cursor to the end and press “Delete” to merge the two segments.

If two segments belong to different files or translation units, they cannot be merged. Different paragraphs in Word and different stories in InDesign belong to such case.

BasicCAT hides format tags when possible. So if segments contain hidden tags, there will be a message box as below. You can choose to continue, and the merged source text may contains complex tags.

When you need to split, like at the semicolon below, move the cursor to the semicolon and press “Enter”.

Neglect Segment¶
When doing English to Chinese translation, it is common that the first segment and the second one have similar meanings. You can mark the first one as neglected and only translate the second one. When generating target files, these segments will be omitted. Use Menu Edit->Mark the current segment as neglected to do this.
Textarea of neglected segments will be gray and not editable.

Add notes¶
If you come across difficult sentences, you can make notes on how you get the translation done. Use Menu Edit->Show/Edit notes of the current segment to view or edit notes.

Segments containing notes will have textarea with gray border.

View segment history¶
BasicCAT will record segment history. Click the menu Edit->Show segment history to view the history, where the user name is the user name added in the version control settings.

Statistics¶
Click “Statistics” in the project area, you can see the statistics like words number and percentage completed.

Preview¶
Click “Preview” in the project area to preview the text. Translated source text will be replaced by translation.

Generate target files¶
When translation is done, use menu “File->Generate target files” to create translated files in the target folder.

Advanced Features¶
Machine Translation¶
BasicCAT has a built-in support for 12 machine translation services:
- Baidu
- Microsoft Bing
- Niutrans
- Youdao
- Yandex
- MyMemory
- Sogou
- Sogou DeepI
- Tencent Cloud
- IBM Watson
- Amazon
The result will show in the lower area.

You have to apply for API keys to use these translation services. Links to apply have been included in the MT list above. MyMemory does not require API key and only needs to give an email address. You need to set up at least one machine translation to use selecting words to get meanings and autocomplete functions.
Use Menu “Options->Preferences” to enter the setup interface.


These translation services have usage limits. Baidu and Bing have a 2million limit per month; MyMemory has a limit of 10,000 words per day; Yandex has a limit of 1million words per day and 10million words per month. Other services are paid services. Niutrans and youdao will give away 1million words upon registry.
Pre-translate¶
You can use translation memory or machine translation to pre-translate the whole text. Use Menu “Project->Pre-translate” to open the dialog and choose TM or MT to do this. You can set the lowest match rate and which machine translation service to use.


Select words to get meanings¶
BasicCAT can show the meanings of selected text based on machine translation and online dictionary. You need to check this function in settings and enable at least one machine translation service.

You can fill the result to the translation textarea by clicking the item in the dropdown list.
Online Dictionary¶
BasicCAT integrates online dictionaries as offline dictionaries have copyright issues.
Select a word in the source text and click menu “Edit->Show online dictionary dropdown” or use shortcut key CTRL+D. A dropdown list of online dictionary names will appear. A browse window will be opened.



Use “Add Selected” Button to fill selected text in the browser to the translation textarea. Press “Open in browser” to open the page in local browsers.
You can add other online dictionaries by modifying the dictList.txt in the project’s config folder.
Language Check¶
BasicCAT uses Language Tool to check language errors. Language Tool is an open source spelling and grammar checker.
When a segment is translated and you press “Enter” to switch to the next one, it will check the previous segment. Errors will be shown in the lower area. A dropdown list containing the possible corrections will also be shown below the translation textarea. By clicking the replacement item in the lower area or the dropdown list item, the mistakes will be replaced to the right one.

To use language check, you can either use the API provided by LanguageTool directly (default address: https://languagetool.org/api/v2/check), or download LanguageTool and run it locally.
Unzip the downloaded, open BasicCAT, click menu “Tools->Server Launcher”, and you can setup the path to the folder containing the file named languagetool-server.jar. Press “Start LanguageTool Server” to run the server locally.

You can visit http://127.0.0.1:8081/v2/check?text=a%20example&language=en-US to check whether the server is running.
In addition, you need to check the “enable language tool” checkbox in settings.

You can also set the maximum dropdown suggestions to avoid showing too many suggestions.
Autocomplete (Interactive Machine Translation)¶
Here is how autocomplete works: The source text is tokenized and parsed using Stanford NLP Tools. Then, words and phrases will be extracted and get machine translated. The results will be stored in the memory. When the user inputs a word that match the beginning of these segments, a dropdown list containing matched translation suggestions will show.

As for Chinese to English translation, it can also be used to input English words quickly.

To use autocomplete, Stanford CoreNLP is needed.
Stanford CoreNLP Homepage: https://stanfordnlp.github.io/CoreNLP/index.html
BaiduNetdisk (backup download): https://pan.baidu.com/s/1LNW4IDw8Viz6RURmzqxI9Q
CoreNLP and the Chinese model jar are needed. Unzip CoreNLP and put the Chinese model jar in the same folder. Use Server Launcher to start the server.
Visit http://127.0.0.1:9000 to see whether the server is running.
Like setting Language Tool, you also need to check the “Enable Autocomplete” checkbox to use autocomplete.
Change the link to use a remote server.
You can also set the maximum dropdown suggestions to avoid showing too many suggestions.
Quickfill¶
We often have to input special characters or the same text many times. BasicCAT supports quickfill. You can use the shortcut key CTRL+Q or click the menu “Edit-show quick fill dropdown” to display a dropdown list of quickfill items. Matched terms can also be included.

Click menu “Project->Project Settings” to set up quickfill.

AutoCorrect¶
AutoCorrect is inspired by Microsoft Word’s function. It can detect the inputed text and correct spelling errors. For example, we need to input Chinese punctuations when doing an English to Chinese translation and autocorrect can replace English punctuations to Chinese punctuations. We can also use it to input content quickly. For example, “rst” is the abbreviation of “restructuredText”. When AutoCorrect is setuped, we can input “rst” the get “restructuredText”.

Click menu “Project->Project Settings” to set up AutoCorrect.

Export Word for review¶
BasicCAT can export translated contents to docx files and use Word to review.
Right click on the file and click “Export to docx for review” to export the docx file.


When the review is done, you can import the result back. Right click on the file and click “Import from review”.

You can check the imported result one by one or replace the original translation with the review directly.

Export to markdown with notes¶
Like the previous export, right click on the filename and click export to->markdown with notes to get a markdown file.
Markdown files can be converted to word files later using Pandoc.
Search and Replace¶
Click menu “Edit->Search and Replace” to open the Search and Replace Dialog. You can search in the source or the target text. Regular expressions are supported.
Below is an example of replacing English quotation marks to Chinese quotation marks.

You can learn more about regular expressions at this website.
Translating All Kinds of Files¶
TXT Files¶
TXT files are pure text files. There are no extra tags and all the segments can be merged and splitted.
IDML Files¶
IDML is an XML based file format used by Adobe InDesign. Documents made by the latest version of InDesign have to be converted to IDML for use in old versions of InDesign. InDesign’s default file format is indd, which is a proprietary format. It cannot be opened by third-party applications. In order to translate InDesign documents, Indd files have to be converted to IDML files.
An IDML file is a compressed file in nature. Its structure is as follows.
.
├── META-INF
│ ├── container.xml
│ └── metadata.xml
├── MasterSpreads
│ └── MasterSpread_udd.xml
├── Resources
│ ├── Fonts.xml
│ ├── Graphic.xml
│ ├── Preferences.xml
│ └── Styles.xml
├── Spreads
│ ├── Spread_uc8.xml
│ ├── Spread_uce.xml
│ └── Spread_ucf.xml
├── Stories
│ ├── Story_u106.xml
│ ├── Story_u11d.xml
│ ├── Story_u134.xml
│ ├── Story_u151.xml
│ └── Story_ued.xml
├── XML
│ ├── BackingStory.xml
│ └── Tags.xml
├── designmap.xml
└── mimetype
What relevant to us are designmap.xml, spreads folder, stories folder and resources folder.
designmap.xml defines the document’s basic structure. A spread file includes the structure of one page or facing pages. A story file contains the text shown in a textframe. Fonts.xml stores font info and Styles.xml stores styles info.
BasicCAT reads designmap.xml and spreads files to get the order of stories shown in the document and extracts texts from story files.
Below is what a story file looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<idPkg:Story xmlns:idPkg="http://ns.adobe.com/AdobeInDesign/idml/1.0/packaging" DOMVersion="13.0">
<Story Self="u19caf" AppliedTOCStyle="n" UserText="true" IsEndnoteStory="false" TrackChanges="false" StoryTitle="$ID/" AppliedNamedGrid="n">
<StoryPreference OpticalMarginAlignment="false" OpticalMarginSize="12" FrameType="TextFrameType" StoryOrientation="Horizontal" StoryDirection="LeftToRightDirection" />
<InCopyExportOption IncludeGraphicProxies="true" IncludeAllResources="false" />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Intro Copy">
<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
<Content>“No capes!” That’s what Edna says in the first film when Mr. Incredible wants a cape on his new Supersuit. She knows that capes can be dangerous for Supers. A cape caused one Super to get pulled into a jet turbine, and another was sucked into a spinning </Content>
</CharacterStyleRange>
<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Intro Copy Bold">
<Content>vortex</Content>
</CharacterStyleRange>
<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
<Content>. Capes could cause other problems, too. Let’s see what they are.</Content>
</CharacterStyleRange>
<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]" FillColor="Color/Black" FontStyle="300" PointSize="12">
<Properties>
<Leading type="unit">20</Leading>
</Properties>
<Br />
</CharacterStyleRange>
</ParagraphStyleRange>
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Body Copy">
<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]" />
</ParagraphStyleRange>
</Story>
</idPkg:Story>
Paragraph info is stored in the ParagraphStyleRange tag. One ParagraphStyleRange tag can include many CharacterStyleRange tags which include texts. There are two types of style in IDML. One is global style and the other is local style. Global styles are stored in Styles.xml. Story files use AppliedParagraphStyle and AppliedCharacterStyle attributes to mark global styles. Local styles are defined in the attributes and the properties element in story files, like PointSize and FontStyle.
BasicCAT will convert the story file above into the tagged text below.
<p16><c0>“No capes!” That’s what Edna says in the first film when Mr. Incredible wants a cape on his new Supersuit. She knows that capes can be dangerous for Supers. A cape caused one Super to get pulled into a jet turbine, and another was sucked into a spinning </c0>
<c3>vortex</c3>
<c0>. Capes could cause other problems, too. Let’s see what they are.</c0>
<c0 id="3">
</c0>
</p16>
<p3><c0></c0>
</p3>
The numbers in p16
, c0
correspond to the rank of some global style in styles.xml. The id attribute of <c0 id="3">
corresponds to the rank of characterstylerange in a story file. The number is used to read local styles when generating target files. c0 is the default style that has no special formats. BasicCAT will hide it in the source text shown. Paragraph tags and tags without texts can also be hidden.
When the example story is opened by BasicCAT, the result looks like this:

c3 tag add bold font style to the text. English fonts often have many font weights, like Extra Light, Light, Normal and Heavy, while Chinese fonts often have only one regular font weight. The bold style of Chinese characters in Word is achieved by software algorithm, but IDML uses font weight to do this. So, to assure that the tags will make effects for English to Chinese translation projects, BasicCAT uses Source Han Serif as IDML’s Chinese font. Source Han Serif has 7 font weights. Font weight’s names of English fonts will be converted to names in Source Han Serif as follows.
- 100->ExtraLight
- 200->ExtraLight
- 300->Light
- 400->Regular
- 500->Medium
- 600->SemiBold
- 700->Bold
- 800->Heavy
- 900->Heavy
- Normal->Regular
- Black->Heavy
So, you need to install Source Han Serif to display Chinese in InDesign.
Translation of tags in IDML is not necessary. If there is no corresponding tags in target texts, c0 tags will be filled to make sure the translation will not be omitted.
XLIFF Files¶
XLIFF is an XML format for processing extracted text to be translated. CAT tools often use this format to store extracted text from files like docx, html and idml. When translation is done, CAT tools will generate target files based on XLIFF files.
XLIFF is a standard formulated by the OASIS organization and can be used as an intermediate format between different translation software.
XLIFF uses abstract placeholder tags inherited from opentag and encapsulating tags inherited from TMX to represent special formats.
For example, there are two ways to represent “This is bold.” in XLIFF.
Abstract placeholder:
<trans-unit id="1">
<source>This is <g id="1">bold</g>.</source>
</trans-unit>
Encapsulation:
<trans-unit id="1">
<source>This is <bpt id="1">\b</bpt>bold<ept id="1">\b0</ept>.</source>
</trans-unit>
The placeholder way abstracts tags to its preset placeholder tags. To indicate bold, HTML uses <b>
and RTF uses \b
. They all become <g>
in XLIFF. It also shows less tags info. But after this abstraction, we cannot tell what function this tag has.
When translating XLIFF files, there will be tags in the source text. If tags are not put in the target text, BasicCAT will fill them at the end of target text. But this may result in incomplete content.
Attention
If tags are not put in the target text, BasicCAT will fill them at the end of target text. But this may result in incomplete content.
Below is the XLIFF extracted by okapi from the story file example in IDML part.
<?xml version="1.0" encoding="UTF-8"?>
<xliff version='1.2'
xmlns='urn:oasis:names:tc:xliff:document:1.2'>
<file original="Stories/Story_u19caf.xml" source-language="en-US" target-language="zh-CN" datatype="xml">
<body>
<trans-unit id="NB085C0FA-tu1" xml:space="preserve">
<source xml:lang="en-US"><g id="1">“No capes!” That’s what Edna says in the first film when Mr. Incredible wants a cape on his new Supersuit. She knows that capes can be dangerous for Supers. A cape caused one Super to get pulled into a jet turbine, and another was sucked into a spinning </g><g id="2">vortex</g><g id="3">. Capes could cause other problems, too. Let’s see what they are.</g></source>
<target xml:lang="zh-CN"><g id="1">“No capes!” That’s what Edna says in the first film when Mr. Incredible wants a cape on his new Supersuit. She knows that capes can be dangerous for Supers. A cape caused one Super to get pulled into a jet turbine, and another was sucked into a spinning </g><g id="2">vortex</g><g id="3">. Capes could cause other problems, too. Let’s see what they are.</g></target>
</trans-unit>
</body>
</file>
</xliff>
When the XLIFF file is opened by BasicCAT, the result looks like below. Single tag and paired tags at both sides of the segment will be hidden.

PO Files¶
PO is a format similar to XLIFF. It is designed to localize C programs at first.
Below is PO content extracted by okapi from the example IDML story file.
msgctxt "okpCtx:sd=197:tu=NB085C0FA-tu1"
msgid "<g1>“No capes!” That’s what Edna says in the first film when Mr. Incredible wants a cape on his new Supersuit. She knows that capes can be dangerous for Supers. A cape caused one Super to get pulled into a jet turbine, and another was sucked into a spinning </g1><g2>vortex</g2><g3>. Capes could cause other problems, too. Let’s see what they are.</g3>"
msgstr ""
msgctxt stores context info. msgid stores the source text and msgstr stores the target text. PO files created by okapi also uses tags to represent formats.
Below is how it looks when translating the PO file in BasicCAT. Like XLIFF, Single tag and paired tags at both sides of the segment will be hidden.

PDF Files¶
PDF file is a difficult format to process. It can be converted to docx by tools such as Word, ABBYY, Solid Document Converter, etc. But the original format cannot be well preserved. Adobe Acrobat can be used to modify text, but there are many limitations.
PDF files are often generated by files in another format, such as docx, idml, etc. It’s better to handle the source file directly. If you want to preserve the formats but do not have the source file, you have first translate text and do desktop publishing from anew.
When translating InDesign documents, publishing houses often give the PDF document to the translator. The translator types the translation into a Word and then gives it to the typesetter to replace sources texts with target texts in InDesign. In such a case, what translators have to do is to extract the text from PDF.
BasicCAT has a PDF to Text tool. It can be accessed by menu “Tools->PDF2TXT”. If PDF’s text can be extracted, it can be extracted directly by pressing the “strip” button. If not, the open source OCR software Tesseract will be used.
You can add page number to the extracted text. It can also add facing pages number like Page 4-5.

Texts in PDF do not have paragraph info. They are just fixed dots in a page. The extracted text may have newline at the end of each line. PDF2TXT provides “reflow” function to remove extra newlines.
About the installation of tesseract, Windows users can download a copy from here. PDF2TXT will ask for the path to tessearct.exe.
Linux and macOS(with homebrew) users can install tesseract-ocr directly. You may need to download extra language models.
Using Okapi to Translate Files in Other Formats¶
Okapi is a set of translation components. It has Checkmate to check translation quality, Ratel to edit segmentation rule and Rainbow to do all kinds of translation and localization tasks.
To translate files in other formats, we need to use Rainbow.
- Create XLIFF or PO files from source files
Open Rainbow and drag source files into the window.

Set the source language, target languages, file encoding, etc.

Click menu “Utilities->Translation Kit Creation”. You can choose to generate XLIFF, PO or other intermediate formats. The default output path is the source files’ folder.

After the execution is done, there will be a folder named pack1 in the output path. In the work folder are generated files.
- Generate target files from the previously created okapi project
When the translation is done, put the translated files back into the work folder. Open Rainbow and drag manifest.rkm into it.

Click menu “Utilities->Translation Kit Post-processing” to generate target files.
Visit here to see what formats Okapi supports.
Starting from version 1.5, BasicCAT has integrated Okapi Tikal. It can automatically convert other format files into XLIFF files, and automatically generate target files from translated XLIFF files. However, if you need to modify the parameters that Okapi uses to process a format, you still need to use Rainbow.
Collaboration¶
Sharing translation memory and term¶
The BasicCAT’s server program is needed to setup a translation memory and term sharing service. The program is available in the BasicCAT’s download page.
Relevant settings are needed on the client side.

How to run the server program (requires java8+):
$ nohup java -jar CloudKVS_Server.jar &
You can modify the key.txt in program’s folder to setup an access key. The key can be any one-line text.
Using Git to collaborate¶
BasicCAT can build a local git repository and upload it to a remote repository when the remote uri and account info are set. When requesting a git push, the program will fetch the latest changes and update local files. Then new changes in the local repo will be uploaded. BasicCAT automatically resolves conflicts according to the creation time of segment’s translation. You will almost encounter no conflicts.
You need to set the remote uri in the project settings and choose whether to upload the changes to the remote everytime you perform a save operation. You need to setup git accounts in preference settings first.

You can also use the menu to manually do git actions.

Collaboration using git will not sync translation memory and terms. It only syncs work files.
Using GitHub¶
GitHub is a popular Git hosting platform. We can collaborate upon it.
First, create an empty repository.


Will see this setting page:

Use BasicCAT to open a project and set the remote repository.

Then, you can package the project to other translators.
Other translators need to have their own GitHub account and be authorized push permission to the repository.
In the settings of the repository, click Collaborators on the left and add other translators’ GitHub account.

See the GitHub Help to learn more.
Other¶
Appearance Setting¶
Click menu “Options->Preferences” to enter the Preferences Window. Click “Appearance” in the left.
For now, only setting of editing area’s font is supported.

Auto Backup¶
BasicCAT supports auto backup. You can set up its time interval. The backup files will be stored in the project’s bak folder.

Version Control Using Git¶
BasicCAT comes with git and can execute git add and git commit every time the project is saved.
You can set git commit’s user information in Preferences. If you need to sync with remote repositories, you can set a password.

You can use git show to see what the latest changes are. You have to install Git on your computer to do this. You can use git reset to return to a previous version.

More functions can be realized by installing a git desktop client.
Managing Plugins¶
BasicCAT currently supports two kinds of plugins. One is machine translation plugin and the other is filter plugin. You can set the plugin folder and add or remove plugins in the Preferences.

How to Contribute¶
BasicCAT is an open source project licensed under GPLv2. Contributions to help BasicCAT grow are appreciated.
If you are a developer, you can develop plugins or make a pull request to improve BasicCAT’s code.
If you are a user and come across some issues or have some suggestions, you can open an issue in BasicCAT’s github repository.
If you want to translate BasicCAT’s website, documentation or its interface, you can also open an issue to apply.