Welcome to indictrans’s documentation!

The project aims on adding a state-of-the-art transliteration module for cross transliterations among all Indian languages including English and Urdu.

travis-ci build status coveralls.io coverage status CircleCI Documentation Status


The module currently supports the following languages:

  • Hindi
  • Bengali
  • Gujarati
  • Punjabi
  • Malayalam
  • Kannada
  • Tamil
  • Telugu
  • Oriya
  • Marathi
  • Assamese
  • Konkani
  • Bodo
  • Nepali
  • Urdu
  • English

Module Reference

Module References

indictrans.transliterator — Transliterator

class indictrans.Transliterator(source='hin', target='eng', decode='viterbi', build_lookup=False, rb=True)

Transliterator for Indic scripts including English and Urdu.

Parameters:
source : str, default: hin

Source Language (3 letter ISO-639 code)

target : str, default: eng

Target Language (3 letter ISO-639 code)

decode : str, default: viterbi

Decoding algorithm, either viterbi or beamsearch.

build_lookup : bool, default: False

Flag to build lookup-table. Fastens the transliteration process if the input text contains repeating words.

rb : bool, default: True

Decides whether to use rule-based system or ML system for transliteration. This choice is only for Indic to Indic transliterations. If True uses ruled-based one.

Examples

>>> from indictrans import Transliterator
>>> trn = Transliterator(source='hin', target='eng', build_lookup=True)
>>> hin = '''कांग्रेस पार्टी अध्यक्ष सोनिया गांधी, तमिलनाडु की मुख्यमंत्री
... जयललिता और रिज़र्व बैंक के गवर्नर रघुराम राजन के बीच एक
... समानता है. ये सभी अलग-अलग कारणों से भारतीय जनता पार्टी के
... राज्यसभा सांसद सुब्रमण्यम स्वामी के निशाने पर हैं. उनके
... जयललिता और सोनिया गांधी के पीछे पड़ने का कारण कथित
... भ्रष्टाचार है.'''
>>> eng = trn.transform(hin)
>>> print(eng)
congress party adhyaksh sonia gandhi, tamilnadu kii mukhyamantri
jayalalita our reserve baink ke governor raghuram rajan ke beech ek
samanta hai. ye sabi alag-alag carnon se bharatiya janata party ke
rajyasabha saansad subramanyam swami ke nishane par hain. unke
jayalalita our sonia gandhi ke peeche padane ka kaaran kathith
bhrashtachar hai.

Methods

convert  

indictrans.base.BaseTransliterator — BaseTransliterator

class indictrans.base.BaseTransliterator(source, target, decoder, build_lookup=False)

Base class for transliterator.

Attributes:
vectorizer_ : instance

OneHotEncoder instance for converting categorical features to one-hot features.

classes_ : dict

Dictionary of set of tags with unique ids ({id: tag}).

coef_ : array

HMM coefficient array

intercept_init_ : array

HMM intercept array for first layer of trellis.

intercept_trans_ : array

HMM intercept/transition array for middle layers of trellis.

intercept_final_ : array

HMM intercept array for last layer of trellis.

wx_process : method

wx2utf/utf2wx method of WX instance

nu : instance

UrduNormalizer instance for normalizing Urdu scripts.

Methods

convert_to_wx(text) Converts Indic scripts to WX.
load_models() Loads transliteration models.
predict(word[, k_best]) Given encoded word matrix and HMM parameters, predicts output sequence (target word)
top_n_trans(text[, k_best]) Returns k-best transliterations using beamsearch decoding.
transliterate(text[, k_best]) Single best transliteration using viterbi decoding.
base_fit  
load_mappings  
convert_to_wx(text)

Converts Indic scripts to WX.

load_models()

Loads transliteration models.

predict(word, k_best=5)

Given encoded word matrix and HMM parameters, predicts output sequence (target word)

top_n_trans(text, k_best=5)

Returns k-best transliterations using beamsearch decoding.

Parameters:
k_best : int, default: 5, optional

Used by Beamsearch decoder to return k-best transliterations.

transliterate(text, k_best=None)

Single best transliteration using viterbi decoding.

indictrans._utils.WX — WXConverter

class indictrans._utils.WX(order=u'utf2wx', lang=u'hin')

WX-converter for UTF to WX conversion of Indic scripts and vice-versa.

Parameters:
lang : str, default: hin

Input script

order : str, default: utf2wx

Order of conversion

Examples

>>> from indictrans import WX
>>> wxc = WX(lang='hin', order='utf2wx')
>>> hin_utf = u'''बीजेपी के सांसद सुब्रमण्यम स्वामी ने कुछ ही दिन पहले
... अपनी ही सरकार को कठघरे में खड़ा करते हुए जीडीपी आंकड़ों पर
... सवाल उठाए हैं.'''
>>> hin_wx = wxc.utf2wx(hin_utf)
>>> print(hin_wx)
bIjepI ke sAMsaxa subramaNyama svAmI ne kuCa hI xina pahale
apanI hI sarakAra ko kaTaGare meM KadZA karawe hue jIdIpI AMkadZoM para
savAla uTAe hEM.
>>> wxc = WX(lang='hin', order='wx2utf')
>>> hin_utf_ = wxc.wx2utf(hin_wx)
>>> print(hin_utf_)
बीजेपी के सांसद सुब्रमण्यम स्वामी ने कुछ ही दिन पहले
अपनी ही सरकार को कठघरे में खड़ा करते हुए जीडीपी आंकड़ों पर
सवाल उठाए हैं.
>>> wxc = WX(lang='mal', order='utf2wx')
>>> mal_utf = u'''വിപണിയിലെ ശുഭാപ്തിവിശ്വാസക്കാരായ കാളകള്‍ക്ക് അനുകൂലമായ
... രീതിയിലാണ് ബി എസ് ഇയില്‍ വ്യാപാരം നടക്കുന്നത്.'''
>>> mal_wx = wxc.utf2wx(mal_utf)
>>> print(mal_wx)
vipaNiyileV SuBApwiviSvAsakkArAya kAlYakalYkk anukUlamAya
rIwiyilAN bi eVs iyil vyApAraM natakkunnaw.
>>> wxc = WX(lang='mal', order='wx2utf')
>>> mal_utf_ = wxc.wx2utf(mal_wx)
>>> print(mal_utf_)
വിപണിയിലെ ശുഭാപ്തിവിശ്വാസക്കാരായ കാളകള്ക്ക് അനുകൂലമായ
രീതിയിലാണ് ബി എസ് ഇയില് വ്യാപാരം നടക്കുന്നത്.

Methods

iscii2unicode(iscii) Convert ISCII to Unicode
iscii2wx(my_string) Convert ISCII to WX
normalize(text) Performs some common normalization, which includes: - Byte order mark, word joiner, etc.
unicode2iscii(unicode_) Convert Unicode to ISCII
utf2wx(unicode_) Convert UTF string to WX-Roman
wx2iscii(my_string) Convert WX to ISCII
wx2utf(wx) Convert WX-Roman to UTF
fit  
initialize_utf2wx_hash  
initialize_wx2utf_hash  
iscii2unicode_ben  
iscii2unicode_guj  
iscii2unicode_hin  
iscii2unicode_kan  
iscii2unicode_mal  
iscii2unicode_ori  
iscii2unicode_pan  
iscii2unicode_tam  
iscii2unicode_tel  
map_EY  
map_EY2  
map_OY  
map_OY2  
map_Z  
map_ZeV  
map_ZoV  
map_a  
map_eV  
map_eV2  
map_lY  
map_lYY  
map_nY  
map_oV  
map_oV2  
map_q  
map_rY  
unicode2iscii_ben  
unicode2iscii_guj  
unicode2iscii_hin  
unicode2iscii_kan  
unicode2iscii_mal  
unicode2iscii_ori  
unicode2iscii_pan  
unicode2iscii_tam  
unicode2iscii_tel  
iscii2unicode(iscii)

Convert ISCII to Unicode

iscii2wx(my_string)

Convert ISCII to WX

normalize(text)

Performs some common normalization, which includes: - Byte order mark, word joiner, etc. removal - ZERO_WIDTH_NON_JOINER and ZERO_WIDTH_JOINER removal

unicode2iscii(unicode_)

Convert Unicode to ISCII

utf2wx(unicode_)

Convert UTF string to WX-Roman

wx2iscii(my_string)

Convert WX to ISCII

wx2utf(wx)

Convert WX-Roman to UTF

indictrans._utils.OneHotEncoder — OneHotEncoder

class indictrans._utils.OneHotEncoder

Transforms categorical features to continuous numeric features.

Examples

>>> from one_hot_encoder import OneHotEncoder
>>> enc = OneHotEncoder()
>>> sequences = [list('bat'), list('cat'), list('rat')]
>>> enc.fit(sequences)
<one_hot_encoder.OneHotEncoder instance at 0x7f346d71c200>
>>> enc.transform(sequences, sparse=False).astype(int)
array([[0, 1, 0, 1, 1],
       [1, 0, 0, 1, 1],
       [0, 0, 1, 1, 1]])
>>> enc.transform(list('cat'), sparse=False).astype(int)
array([[1, 0, 0, 1, 1]])
>>> enc.transform(list('bat'), sparse=True)
<1x5 sparse matrix of type '<type 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

Methods

fit(X) Fit OneHotEncoder to X.
transform(X[, sparse]) Transform X using one-hot encoding.
fit(X)

Fit OneHotEncoder to X.

Parameters:
X : array-like, shape [n_samples, n_feature]

Input array of type int.

Returns:
self
transform(X, sparse=True)

Transform X using one-hot encoding.

Parameters:
X : array-like, shape [n_samples, n_features]

Input array of categorical features.

sparse : bool, default: True

Return sparse matrix if set True else return an array.

Returns:
X_out : sparse matrix if sparse=True else a 2-d array, dtype=int

Transformed input.

indictrans._utils.UrduNormalizer — UrduNormalizer

class indictrans._utils.UrduNormalizer

Normalizer for Urdu scripts. Normalizes different unicode canonical equivalances to a single unicode code-point.

Examples

>>> from indictrans import UrduNormalizer
>>> text = u'''ﺎﻧ کﻭ ﻍیﺮﻗﺎﻧﻮﻧی ﺝگہ کﺱ ﻥے ﺩی؟
... ﻝﻭگﻭں کﻭ ﻖﺘﻟ کیﺍ ﺝﺍﺭ ہﺍ ہے ۔
... ﺏڑے ﻡﺎﻣﻭں ﺎﻧ ﺪﻧﻭں ﻢﺤﻟہ ﺥﺩﺍﺩﺍﺩ ﻡیں ﺭہﺕے ﺕھے۔
... ﻉﻭﺎﻣی یﺍ ﻑﻼﺣی ﺥﺪﻣﺎﺗ ﺍیک ﺎﻟگ ﺩﺎﺋﺭہ ﻊﻤﻟ ہے۔'''
>>> nu = UrduNormalizer()
>>> print(nu.normalize(text))
ان کو غیرقانونی جگہ کس نے دی؟
لوگوں کو قتل کیا جار ہا ہے ۔
بڑے ماموں ان دنوں محلہ خداداد میں رہتے تھے۔
عوامی یا فلاحی خدمات ایک الگ دائرہ عمل ہے۔

Methods

cnorm(text) Normalize NO_BREAK_SPACE, SOFT_HYPHEN, WORD_JOINER, H_SPACE, ZERO_WIDTH[SPACE, NON_JOINER, JOINER], MARK[LEFT_TO_RIGHT, RIGHT_TO_LEFT, BYTE_ORDER, BYTE_ORDER_2]
normalize(text) normalize text
cnorm(text)

Normalize NO_BREAK_SPACE, SOFT_HYPHEN, WORD_JOINER, H_SPACE, ZERO_WIDTH[SPACE, NON_JOINER, JOINER], MARK[LEFT_TO_RIGHT, RIGHT_TO_LEFT, BYTE_ORDER, BYTE_ORDER_2]

normalize(text)

normalize text

indictrans.trunk.StructuredPerceptron — StructuredPerceptron

class indictrans.trunk.StructuredPerceptron(lr_exp=0.1, n_iter=15, random_state=None, verbose=0)

Structured perceptron for sequence classification.

The implemention is based on average structured perceptron algorithm of M. Collins.

Parameters:
lr_exp : float, default: 0.1

The Exponent used for inverse scaling of learning rate. Given iteration number t, the effective learning rate is 1. / (t ** lr_exp)

n_iter : int, default: 15

Maximum number of epochs of the structured perceptron algorithm

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, default: 0 (quiet mode)

Verbosity mode.

References

M. Collins (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. EMNLP.

Methods

fit(X, y) Fit the model to the given set of sequences.
predict(X) Predict output sequences for input sequences in X.
fit(X, y)

Fit the model to the given set of sequences.

Parameters:
X : {array-like, sparse matrix}, shape (n_sequences, sequence_length,

n_features)

Feature matrix of train sequences.

y : list of arrays, shape (n_sequences, sequence_length)

Target labels.

Returns:
self : object

Returns self.

predict(X)

Predict output sequences for input sequences in X.

Parameters:
X : {array-like, sparse matrix}, shape (n_sequences, sequence_length,

n_features)

Feature matrix of test sequences.

Returns:
y : array, shape (n_sequences, sequence_length)

Labels per sequence in X.

User Guide

Installation

Dependencies

indictrans requires cython, and SciPy.

Clone & Install

Clone the repository:
    git clone https://github.com/libindic/indictrans.git
    ------------------------OR--------------------------
    git clone https://github.com/irshadbhat/indictrans.git

Change to the cloned directory:
    cd indic-trans
    pip install -r requirements.txt
    python setup.py install

Model Setup & Training

Train and Test

Assuming your data is in tnt format you can encode the data ane train a indictrans.trunk.StructuredPerceptron classifier.

>>> from indictrans import trunk
>>> #load trianing data
... X, y = trunk.load_data('indictrans/trunk/tests/hin2rom.tnt')
>>> #build ngram-context
... X = trunk.build_context(X, ngram=4)
>>> #fit encoder
... enc, X = trunk.fit_encoder(X)
>>> #train structured-perceptron model
... clf = trunk.train_sp(X, y, n_iter=5, verbose=2)
Iteration 1 ...
Train-set error = 1.5490
Iteration 2 ...
Train-set error = 1.0040
Iteration 3 ...
Train-set error = 0.8030
Iteration 4 ...
Train-set error = 0.6900
Iteration 5 ...

This will train the perceptron for 5 epochs (specified via the n_iter parameter).

Then you can use the trained classifier as follows:

>>> #load testing data
... X_test, y = trunk.load_data('indictrans/trunk/tests/hin2rom.tnt')
>>> #build ngram-context for testing data
... X_test = trunk.build_context(X_test, ngram=4) # ngram value should be same as for train-set
>>> #encode test-set
... X_test = [enc.transform(x) for x in X_test]
>>> #predict output sequences
... y_ = clf.predict(X_test)
>>> y[10]  # True
[u'c', u'l', u'a', u'ne', u'_']
>>> y_[10]  # Predicted
[u'c', u'l', u'a', u'n', u'_']
>>> y_[100]  # True
[u'p', u'a', u'r', u'aa', u'n', u'd', u'e']
>>> y_[100]  # Predicted
[u'p', u'a', u'r', u'aa', u'n', u'd', u'e']

Note that you need to build-context using the same ngram value as used for trainig data. Also you need to encode test data using the encoder enc developed on training data.

Train directly from Console

indictrans-trunk provides a much easier way to train, test and save models directly from console.

user@indic-trans$ indictrans-trunk --help

-d , --data-file      training data-file: set of sequences
-o , --output-dir     output directory to dump trained models
-n , --ngrams         ngram context for feature extraction: default 4
-e , --lr-exp         The Exponent used for inverse scaling oflearning rate:
                      default 0.1
-m , --max-iter       Maximum number of iterations for training: default 15
-r , --random-state   Random seed for shuffling sequences within each
                      iteration.
-l , --verbosity      Verbosity level: default 0 (quiet moe)
-t , --test-file      testing data-file: optional: stores output sequences
                      in `test_file.out`

user@indic-trans$ indictrans-trunk -d hin2rom.tnt -o /tmp/rom-ind/ -n 4 -e 0.1 -m 5 -l 3 -t hin2rom.tnt
Iteration 1 ...
First sequence comparision: 0-27 0-95 0-30 0-10 ... loss: 4
Train-set error = 1.8090
Iteration 2 ...
First sequence comparision: 120-46 86-86 63-63 120-120 95-95 123-123 10-10 ... loss: 1
Train-set error = 0.6560
Iteration 3 ...
First sequence comparision: 123-123 110-110 40-40 46-46 ... loss: 0
Train-set error = 0.3820
Iteration 4 ...
First sequence comparision: 2-2 95-95 86-86 77-77 64-64 31-31 120-120 80-80 10-10 ... loss: 0
Train-set error = 0.2240
Iteration 5 ...
First sequence comparision: 40-40 120-120 31-31 120-120 125-125 120-120 123-123 117-117 31-31 120-120 ... loss: 0
Train-set error = 0.1540

Testing ...

Assuming hin2rom.tnt was given as test-file, the output file will be generated with the name hin2rom.tnt.out.

Transliteration

Transliterate

In order to transliterate raw text, you can use the indictrans.Transliterator which uses already trained models to transliterate the text. If the input text contains repeating words, which raw text generally does, make sure to set build_lookup flag to True. As the name indicates this builds lookup for transliterated words and thus avoids repeated transliteration of same words. This saves a lot of time if the input corpus is too big.

>>> from indictrans import Transliterator
>>> trn = Transliterator(source='hin', target='eng', build_lookup=True)
>>> hin = """कांग्रेस पार्टी अध्यक्ष सोनिया गांधी, तमिलनाडु की मुख्यमंत्री
... जयललिता और रिज़र्व बैंक के गवर्नर रघुराम राजन के बीच एक समानता
... है. ये सभी अलग-अलग कारणों से भारतीय जनता पार्टी के राज्यसभा सांसद
... सुब्रमण्यम स्वामी के निशाने पर हैं. उनके जयललिता और सोनिया गांधी के
... पीछे पड़ने का कारण कथित भ्रष्टाचार है."""
>>> eng = trn.transform(hin)
>>> print(eng)
congress party adhyaksh sonia gandhi, tamilnadu kii mukhyamantri
jayalalita our reserve baink ke governor raghuram rajan ke beech ek samanta
hai. ye sabi alag-alag carnon se bharatiya janata party ke rajyasabha saansad
subramanyam swami ke nishane par hain. unke jayalalita our sonia gandhi ke
peeche padane ka kaaran kathith bhrashtachar hai.
>>> trn = Transliterator(source='eng', target='hin')
>>> hin_ = trn.transform(eng)
>>> print(hin_)
कांग्रेस पार्टी अध्यक्ष सोनिया गांधी, तमिलनाडु की मुख्यमांत्री
जयललिता और रिज़र्व बैंक के गवर्नर रघुराम राजन के बीच एक समानता
है. ये सभी अलग-अलग कार्नों से भारतीय जनता पार्टी के राज्यसभा संसद
सुब्रमण्यम स्वामी के निशाने पर हैं. उनके जयललिता और सोनिया गांधी के
पीछे पड़ने का कारण कथित भ्रष्टाचार है.

K-Best Transliterations

You can generate k-best outputs for a given sequence by changing the default decoder viterbi to beamsearch and then set the k_best parameter to the desired value.

>>> from indictrans import Transliterator
>>> r2i = Transliterator(source='eng', target='mal', decode='beamsearch')
>>> words = '''sereleskar morocco calendar bhagyalakshmi bhoolokanathan medical
...         ernakulam kilometer vitamin management university naukuchiatal'''.split()
>>> for word in words:
>>>     print('%s -> %s' % (word, '  '.join(r2i.transform(word, k_best=5))))
sereleskar -> സേറെലേസ്കാര്  സെറെലേസ്കാര്  സേറെലേസ്കാര  സെറെലേസ്കാര  സേറെലേസ്കര്
morocco -> മൊറോക്കോ  മൊറോക്ഡോ  മൊരോക്കോ  മോറോക്കോ  മൊറോക്കൂ
calendar -> കേലെന്ദര  കേലെന്ഡര  കേലെന്ദ്ര  കേലെന്ദാര  കേലെന്ഡ്ര
bhagyalakshmi -> ഭാഗ്യലക്ഷ്മീ  ഭാഗ്യലക്ഷ്മി  ഭഗ്യലക്ഷ്മീ  ഭാഗ്യാലക്ഷ്മീ  ഭഗ്യലക്ഷ്മി
bhoolokanathan -> ഭൂലോകനാഥന  ഭൂലോകാനാഥന  ഭൂലോക്കനാഥന  ബൂലോകനാഥന  ഭൂലോകനാതന
medical -> മെഡിക്കല്  മെഡിക്കലും  മെഡിക്കില്  മ്മഎഡിക്കല്  മേഡിക്കല്
ernakulam -> എറണാകുളം  ഈറണാകുളം  എറണാകുലം  എറണാകുളഅം  എറണാകുളാം
kilometer -> കിലോമീറ്റര്  കിലോഈറ്റര്  കിലോമീറ്റ്ര്  കിലോമീറ്ററ്  കിലോമീടര്
vitamin -> വിറ്റാമിന്  വിറ്റമിന്  വൈറ്റാമിന്  വിതാമിന്  വിതആമിന്
management -> മാനേജ്മെന്റ്  മാനേജ്ഞ്മെന്റ്  മാനേഗ്മെന്റ്  മാംനേജ്മെന്റ്  മാനേജ്മെതുറ്
university -> യൂണിവേഴ്സിറ്റി  യൂണിവേര്സിറ്റി  യുണിവേഴ്സിറ്റി  യൂനിവേഴ്സിറ്റി  യൂണിവേഴ്സിറ്റീ
naukuchiatal -> നകുചിയാറ്റാള്  നകുചിയാറ്റാല്  നകുചിയാറ്റാല  നകുചിയാറ്റള്  നകുചിയറ്റാള്

ML and Rule-Based systems for Indic Scripts

For Indic scripts except Urdu you can use rule-based as well as machine learning (ML) system for transliteration. Rule based systems are very fast than ML systems and seem more accurate too. But for some language pairs ML systems generates better results.

>>> from indictrans import Transliterator
>>> rom_text = 'indictrans libindic hyderabad university bhagyalakshmi bharat morocco'.split()
>>> r2h = Transliterator(source='eng', target='hin')
>>> hin_text = list(map(r2h.transform, rom_text))
>>> hin_text
['इंडिक्ट्रांस', 'लिबिंदिक', 'हैदराबाद', 'यूनिवर्सिटी', 'भाग्यालक्ष्मी', 'भारत', 'मोरोक्को']
>>> h2t_rb = Transliterator(source='hin', target='tel', rb=True) # Rule-Based
>>> h2m_rb = Transliterator(source='hin', target='mal', rb=True) # Rule-Based
>>> h2ta_rb = Transliterator(source='hin', target='tam', rb=True) # Rule-Based
>>> h2t_ml = Transliterator(source='hin', target='tel', rb=False) # ML
>>> h2m_ml = Transliterator(source='hin', target='mal', rb=False) # ML
>>> h2ta_ml = Transliterator(source='hin', target='tam', rb=False) # ML
>>> list(map(h2t_ml.transform, hin_text))
['ఇండిక్ట్రాంస్', 'లిబిందిక', 'హైదరాబాద్', 'యూనివర్శిటీ', 'భాగ్యాలక్ష్మి', 'భారత్', 'మోరోక్కో']
>>> list(map(h2t_rb.transform, hin_text))
['ఇండిక్ట్రాంస', 'లిబిందిక', 'హైదరాబాద', 'యూనివర్సిటీ', 'భాగ్యాలక్ష్మీ', 'భారత', 'మోరోక్కో']
>>> list(map(h2ta_rb.transform, hin_text))
['இங்டிக்ட்ராங்ஸ', 'லிபிங்திக', 'ஹைதராபாத', 'யூநிவர்ஸிடீ', 'பாக்யாலக்ஷ்மீ', 'பாரத', 'மோரோக்கோ']
>>> list(map(h2ta_ml.transform, hin_text))
['இண்டிக்ட்ராங்ஸ்', 'லிபிந்திக்', 'ஹைதராபாத்', 'யூனிவர்சிடி', 'பாக்யாலக்ஷ்மி', 'பாரதப்', 'மோரோக்கோ']
>>> list(map(h2m_rb.transform, hin_text))
['ഇംഡിക്ട്രാംസ', 'ലിബിംദിക', 'ഹൈദരാബാദ', 'യൂനിവര്സിടീ', 'ഭാഗ്യാലക്ഷ്മീ', 'ഭാരത', 'മോരോക്കോ']
>>> list(map(h2m_ml.transform, hin_text))
['ഇന്ഡിക്ട്രാംസ്', 'ലിബിന്ദിക', 'ഹൈദരാബാദ്', 'യൂനിവര്സിടി', 'ഭാഗ്യാലക്ഷ്മി', 'ഭാരത', 'മോരോക്കോ']

Transliterate from Console

You can transliterate text files directly using the console shortcut indictrans.

$ indictrans --h

-h, --help          show this help message and exit
-v, --version       show program's version number and exit
-s, --source        select language (3 letter ISO-639 code) {hin, guj, pan,
                    ben, mal, kan, tam, tel, ori, eng, mar, nep, bod, kok,
                    asm, urd}
-t, --target        select language (3 letter ISO-639 code) {hin, guj, pan,
                    ben, mal, kan, tam, tel, ori, eng, mar, nep, bod, kok,
                    asm, urd}
-b, --build-lookup  build lookup to fasten transliteration
-i, --input         <input-file>
-o, --output        <output-file>


$ indictrans < hindi.txt --s hin --t eng --build-lookup > hindi-rom.txt
$ indictrans < roman.txt --s hin --t eng --build-lookup > roman-hin.txt

$ echo 'indictrans libindic hyderabad university bhagyalakshmi bharat morocco' |\\
 indictrans -s eng -t hin | indictrans -s hin -t tel -r # RULE-BASED
ఇండిక్ట్రాంస లిబిందిక హైదరాబాద యూనివర్సిటీ భాగ్యాలక్ష్మీ భారత మోరోక్కో

Indices and tables


travis-ci build status coveralls.io coverage status CircleCI Documentation Status