wikt2pron’s documentation¶
Wiktionary pronunciation collector
A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format. Support IPA and X-SAMPA format at present.
This project is developed in GSoC 2017 with CMU Sphinx.
Blogs for this project can be found at my Blogspot.
Collected pronunciation dictionaries and related example models can be downloaded at Dropbox.
Contents¶
Introduction¶
wikt2pron is a Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format. Support IPA and X-SAMPA format at present.
Features¶
- Extract pronunciation from Wiktionary XML dump.
- Lookup pronunciation for a word in Wiktionary.
- IPA -> X-SAMPA conversion.
Installation¶
# download the latest version
$ git clone https://github.com/abuccts/wikt2pron.git
$ cd enwiktionary
# install and run test
$ python setup.py install
$ python setup.py -q test
# make documents
$ make -C docs html
Usage¶
Extract pronunciation from Wiktionary XML dump¶
First, create an instance of
Wiktionary
class:>>> from pywiktionary import Wiktionary >>> wikt = Wiktionary(XSAMPA=True)Use the example XML dump in
pywiktionary/data
:>>> dump_file = "pywiktionary/data/enwiktionary-test-pages-articles-multistream.xml" >>> pron = wikt.extract_IPA(dump_file)Here’s the extracted result:
>>> from pprint import pprint >>> pprint(pron) [{'id': 16, 'pronunciation': {'English': [{'IPA': '/ˈdɪkʃ(ə)n(ə)ɹɪ/', 'X-SAMPA': '/"dIkS(@)n(@)r\\I/', 'lang': 'en'}, {'IPA': '/ˈdɪkʃənɛɹi/', 'X-SAMPA': '/"dIkS@nEr\\i/', 'lang': 'en'}]}, 'title': 'dictionary'}, {'id': 65195, 'pronunciation': {'English': 'IPA not found.'}, 'title': 'battleship'}, {'id': 39478, 'pronunciation': {'English': [{'IPA': '/ˈmɜːdə(ɹ)/', 'X-SAMPA': '/"m3:d@(r\\)/', 'lang': 'en'}, {'IPA': '/ˈmɝ.dɚ/', 'X-SAMPA': '/"m3`.d@`/', 'lang': 'en'}]}, 'title': 'murder'}, {'id': 80141, 'pronunciation': {'English': [{'IPA': '/ˈdæzəl/', 'X-SAMPA': '/"d{z@l/', 'lang': 'en'}]}, 'title': 'dazzle'}]
Lookup pronunciation for a word in Wiktionary¶
First, create an instance of
Wiktionary
class:>>> from pywiktionary import Wiktionary >>> wikt = Wiktionary(XSAMPA=True)Lookup a word using
lookup
method:>>> word = wikt.lookup("present")The entry of word “present” is at https://en.wiktionary.org/wiki/present, and here is the lookup result:
>>> from pprint import pprint >>> pprint(word) {'Catalan': 'IPA not found.', 'Danish': [{'IPA': '/prɛsanɡ/', 'X-SAMPA': '/prEsang/', 'lang': 'da'}, {'IPA': '[pʰʁ̥ɛˈsɑŋ]', 'X-SAMPA': '[p_hR_0E"sAN]', 'lang': 'da'} ], 'English': [{'IPA': '/ˈpɹɛzənt/', 'X-SAMPA': '/"pr\\Ez@nt/', 'lang': 'en'}, {'IPA': '/pɹɪˈzɛnt/', 'X-SAMPA': '/pr\\I"zEnt/', 'lang': 'en'}, {'IPA': '/pɹəˈzɛnt/', 'X-SAMPA': '/pr\\@"zEnt/', 'lang': 'en'}], 'Ladin': 'IPA not found.', 'Middle French': 'IPA not found.', 'Old French': 'IPA not found.', 'Swedish': [{'IPA': '/preˈsent/', 'X-SAMPA': '/pre"sent/', 'lang': 'sv'}]}To lookup a word in a certain language, specify the
lang
parameter:>>> wikt = Wiktionary(lang="English", XSAMPA=True) >>> word = wikt.lookup("read") >>> pprint(word) [{'IPA': '/ɹiːd/', 'X-SAMPA': '/r\\i:d/', 'lang': 'en'}, {'IPA': '/ɹɛd/', 'X-SAMPA': '/r\\Ed/', 'lang': 'en'}]
IPA -> X-SAMPA conversion¶
>>> from pywiktionary import IPA >>> IPA_text = "/t͡ʃeɪnd͡ʒ/" # en: [[change]] >>> XSAMPA_text = IPA.IPA_to_XSAMPA(IPA_text) >>> XSAMPA_text "/t__SeInd__Z/"
Using the collected dictionaries¶
To use the collected dictionaries training G2P models or acoustic models, please refer to these blogs for details:
pywiktionary API¶
The library provides classes which are usable by third party tools.
Wiktionary
Class¶
-
class
pywiktionary.
Wiktionary
(lang=None, XSAMPA=False)[source]¶ Wiktionary class for IPA extraction from XML dump or MediaWiki API.
To extraction IPA for a certain language, specify
lang
parameter, default is extracting IPA for all available languages.To convert IPA text to X-SAMPA text, use
XSAMPA
parameter.Parameters: - lang (string) – String of language type.
- XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
-
extract_IPA
(dump_file)[source]¶ Extraction IPA list from Wiktionary XML dump.
Parameters: dump_file (string) – Path of Wiktionary XML dump file. Returns: List of extracted IPA results in {"id": "", "title": "", "pronunciation": ""}
format.Return type: list
-
get_entry_pronunciation
(wiki_text, title=None)[source]¶ Extraction IPA for entry in Wiktionary XML dump.
Parameters: - wiki_text (string) – String of XML entry wiki text.
- title (string) – String of wiki entry title.
Returns: Dict of word’s IPA results. Key: language name; Value: list of IPA text.
Return type: dict
-
lookup
(word)[source]¶ Look up IPA of word through Wiktionary API.
Parameters: word (string) – String of a word to be looked up. Returns: Dict of word’s IPA results. Key: language name; Value: list of IPA text. Return type: dict
Parser
Class¶
-
class
pywiktionary.
Parser
(lang=None, XSAMPA=False)[source]¶ Wiktionary parser to extract IPA text from pronunciation section.
To extraction IPA for a certain language, specify
lang
parameter, default is extracting IPA for all available languages.To convert IPA text to X-SAMPA text, use
XSAMPA
parameter.Parameters: - lang (string) – String of language type.
- XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
-
expand_template
(text)[source]¶ Expand IPA Template through Wiktionary API.
Used to expand
{{*-IPA}}
template in parser and return IPA list.Parameters: text (string) – String of template text inside “{{” and “}}”. Returns: List of expanded IPA text. Return type: list of string Examples
>>> parser = Parser() >>> template = "{{la-IPA|eccl=yes|thēsaurus}}" >>> parser.expand_template(template) ['/tʰeːˈsau̯.rus/', '[tʰeːˈsau̯.rʊs]', '/teˈsau̯.rus/']
-
parse
(wiki_text, title=None)[source]¶ Parse Wiktionary wiki text.
Split Wiktionary wiki text into different langugaes and return parseed IPA result.
Parameters: - wiki_text (string) – String of Wiktionary wiki text, from XML dump or Wiktionary API.
- title (string) – String of wiki entry title.
Returns: Dict of parsed IPA results. Key: language name; Value: list of IPA text.
Return type: dict
-
parse_detail
(wiki_text, depth=3)[source]¶ Parse the section of a certain language in wiki text.
Parse pronunciation section of the certain language recursively.
Parameters: - wiki_text (string) – String of wiki text in a language section.
- depth (int) – Integer indicated depth of pronunciation section.
Returns: List of extracted IPA text in
{"IPA": "", "X-SAMPA": "", "lang": ""}
format.Return type: list of dict
-
parse_pronunciation
(wiki_text)[source]¶ Parse pronunciation section in wiki text.
Parse IPA text from pronunciation section and convert to X-SAMPA.
Parameters: wiki_text (string) – String of pronunciation section in wiki text. Returns: List of extracted IPA text in {"IPA": "", "X-SAMPA": "", "lang": ""}
format.Return type: list of dict
Utilities¶
IPA and X-SAMPA related variables and functions. Modified from https://en.wiktionary.org/wiki/Module:IPA Lua module partially.
-
IPA.IPA.
IPA_to_CMUBET
(text)[source]¶ Convert IPA to CMUBET for US English.
Use IPA and symbol set used in Wiktionary and CMUBET symbol set used in CMUDict.
Parameters: text (string) – String of IPA text parsed from Wiktionary. Returns: Converted CMUBET text. Return type: string
-
IPA.IPA.
IPA_to_XSAMPA
(text)[source]¶ Convert IPA to X-SAMPA.
Use IPA and X-SAMPA symbol sets used in Wiktionary.
Parameters: text (string) – String of IPA text parsed from Wiktionary. Returns: Converted X-SAMPA text. Return type: string Notes
- Use
_j
for palatalized instead of'
- Use
=
for syllabic instead of_=
- Use
~
for nasalization instead of_~
- Please refer to IPA <-> X-SAMPA Symbol Set for more details.
Examples
>>> IPA_text = "/t͡ʃeɪnd͡ʒ/" # en: [[change]] >>> XSAMPA_text = IPA_to_XSAMPA(IPA_text) >>> XSAMPA_text "/t__SeInd__Z/"
- Use
Convert spelling text in {{*-IPA}}
to IPA pronunciation.
Most are modified from Wiktionary Lua Module.
-
IPA.fr_pron.
to_IPA
(text, pos='')[source]¶ Generates French IPA from spelling.
Implements template {{fr-IPA}}.
Parameters: - text (string) – String of fr-IPA text parsed in {{fr-IPA}} from Wiktionary.
- pos (string) – String of
|pos=
parameter parsed in {{fr-IPA}}.
Returns: Converted French IPA.
Return type: string
Notes
- Modified from Wiktioanry fr-pron Lua module partially.
- Rewritten from rewritten by Benwing and original by Kc kennylau.
- Testcases are modified from Wiktionary fr-pron/testcases.
Examples
>>> fr_text = "hæmorrhagie" # fr: [[hæmorrhagie]] >>> fr_IPA = fr_pron.to_IPA(fr_text) >>> fr_IPA "e.mɔ.ʁa.ʒi"
-
IPA.ru_pron.
to_IPA
(text, adj='', gem='', bracket='', pos='')[source]¶ Generates Russian IPA from spelling.
Implements template {{ru-IPA}}.
Parameters: - text (string) – String of ru-IPA text parsed in {{ru-IPA}} from Wiktionary.
- adj (string) – String of
|noadj=
parameter parsed in {{ru-IPA}}. - gem (string) – String of
|gem=
parameter parsed in {{ru-IPA}}. - bracket (string) – String of
|bracket=
parameter parsed in {{ru-IPA}}. - pos (string) – String of
|pos=
parameter parsed in {{ru-IPA}}.
Returns: Converted Russian IPA.
Return type: string
Notes
- Modified from Wiktioanry ru-pron Lua module partially.
- Rewritten from Author: Originally Wyang; rewritten by Benwing; additional contributions from Atitarev and a bit from others.
- Testcases are modified from Wiktionary ru-pron/testcases.
Examples
>>> ru_text = "счастли́вый" # ru: [[счастли́вый]] >>> ru_IPA = ru_pron.to_IPA(ru_text) >>> ru_IPA "ɕːɪs⁽ʲ⁾ˈlʲivɨj"
-
IPA.hi_pron.
to_IPA
(text)[source]¶ Generates Hindi IPA from spelling.
Implements template {{hi-IPA}}.
Parameters: text (string) – String of hi-IPA text parsed in {{hi-IPA}} from Wiktionary. Returns: Converted Hindi IPA. Return type: string Notes
- Modified from Wiktioanry hi-IPA Lua module partially.
- Testcases are modified from Wiktionary hi-IPA/testcases.
Examples
>>> hi_text = "मैं" # hi: [[मैं]] >>> hi_IPA = hi_pron.to_IPA(hi_text) >>> hi_IPA "mɛ̃ː"
-
IPA.es_pron.
to_IPA
(word, LatinAmerica=False, phonetic=True)[source]¶ Generates Spanish IPA from spelling.
Implements template {{es-IPA}}.
Parameters: - word (string) – String of es-IPA text parsed in {{es-IPA}} from Wiktionary.
- LatinAmerica (bool) – Value of
|LatinAmerica=
parameter parsed in {{es-IPA}}. - phonetic (bool) – Value of
|phonetic=
parameter parsed in {{es-IPA}}.
Returns: Converted Spanish IPA.
Return type: string
Notes
- Modified from Wiktioanry es-pronunc Lua module partially.
- Testcases are modified from Wiktionary es-pronunc/testcases.
Examples
>>> es_text = "baca" # es: [[baca]] >>> es_IPA = es_pron.to_IPA(es_text) >>> es_IPA "ˈbaka"
-
IPA.cmn_pron.
to_IPA
(text, IPA_tone=True)[source]¶ Generates Mandarin IPA from Pinyin.
Implements
|m=
parameter for template {{zh-pron}}.Parameters: - text (string) – String of
|m=
parameter parsed in {{zh-pron}} from Wiktionary. - IPA_tone (bool) – Whether add IPA tone in result.
Returns: Converted Mandarin IPA.
Return type: string
Notes
- Modified from Wiktioanry cmn-pron Lua module partially.
Examples
>>> cmn_text = "pīnyīn" # zh: [[拼音]] >>> cmn_IPA = cmn_pron.to_IPA(cmn_text) >>> cmn_IPA "pʰin⁵⁵ in⁵⁵"
- text (string) – String of
IPA <-> X-SAMPA Symbol Set¶
# X-SAMPA symbols
data = {
# not in official X-SAMPA; from http://www.kneequickie.com/kq/Z-SAMPA
"b\\": {
"IPA_symbol": "ⱱ",
},
"b_<": {
"IPA_symbol": "ɓ",
},
"d`": {
"IPA_symbol": "ɖ",
"has_descender": True,
},
"d_<": {
"IPA_symbol": "ɗ",
},
# not in official X-SAMPA; Wikipedia-specific
"d`_<": {
"IPA_symbol": "ᶑ",
"has_descender": True,
},
"g": {
"IPA_symbol": "ɡ",
"has_descender": True,
},
"g_<": {
"IPA_symbol": "ɠ",
"has_descender": True,
},
"h\\": {
"IPA_symbol": "ɦ",
},
"j\\": {
"IPA_symbol": "ʝ",
"has_descender": True,
},
"l`": {
"IPA_symbol": "ɭ",
"has_descender": True,
},
"l\\": {
"IPA_symbol": "ɺ",
},
"n`": {
"IPA_symbol": "ɳ",
"has_descender": True,
},
"p\\": {
"IPA_symbol": "ɸ",
"has_descender": True,
},
"r`": {
"IPA_symbol": "ɽ",
"has_descender": True,
},
"r\\": {
"IPA_symbol": "ɹ",
},
"r\\`": {
"IPA_symbol": "ɻ",
"has_descender": True,
},
"s`": {
"IPA_symbol": "ʂ",
"has_descender": True,
},
"s\\": {
"IPA_symbol": "ɕ",
},
"t`": {
"IPA_symbol": "ʈ",
},
"v\\": {
"IPA_symbol": "ʋ",
},
"x\\": {
"IPA_symbol": "ɧ",
"has_descender": True,
},
"z`": {
"IPA_symbol": "ʐ",
"has_descender": True,
},
"z\\": {
"IPA_symbol": "ʑ",
},
"A": {
"IPA_symbol": "ɑ",
},
"B": {
"IPA_symbol": "β",
"has_descender": True,
},
"B\\": {
"IPA_symbol": "ʙ",
},
"C": {
"IPA_symbol": "ç",
"has_descender": True,
},
"D": {
"IPA_symbol": "ð",
},
"E": {
"IPA_symbol": "ɛ",
},
"F": {
"IPA_symbol": "ɱ",
"has_descender": True,
},
"G": {
"IPA_symbol": "ɣ",
"has_descender": True,
},
"G\\": {
"IPA_symbol": "ɢ",
},
"G\\_<": {
"IPA_symbol": "ʛ",
},
"H": {
"IPA_symbol": "ɥ",
"has_descender": True,
},
"H\\": {
"IPA_symbol": "ʜ",
},
"I": {
"IPA_symbol": "ɪ",
},
"I\\": {
"IPA_symbol": "ɪ̈",
},
"J": {
"IPA_symbol": "ɲ",
"has_descender": True,
},
"J\\": {
"IPA_symbol": "ɟ",
},
"J\\_<": {
"IPA_symbol": "ʄ",
"has_descender": True,
},
"K": {
"IPA_symbol": "ɬ",
},
"K\\": {
"IPA_symbol": "ɮ",
"has_descender": True,
},
"L": {
"IPA_symbol": "ʎ",
},
"L\\": {
"IPA_symbol": "ʟ",
},
"M": {
"IPA_symbol": "ɯ",
},
"M\\": {
"IPA_symbol": "ɰ",
"has_descender": True,
},
"N": {
"IPA_symbol": "ŋ",
"has_descender": True,
},
"N\\": {
"IPA_symbol": "ɴ",
},
"O": {
"IPA_symbol": "ɔ",
},
"O\\": {
"IPA_symbol": "ʘ",
},
"P": {
"IPA_symbol": "ʋ",
},
"Q": {
"IPA_symbol": "ɒ",
},
"R": {
"IPA_symbol": "ʁ",
},
"R\\": {
"IPA_symbol": "ʀ",
},
"S": {
"IPA_symbol": "ʃ",
"has_descender": True,
},
"T": {
"IPA_symbol": "θ",
},
"U": {
"IPA_symbol": "ʊ",
},
"U\\": {
"IPA_symbol": "ʊ̈",
},
"V": {
"IPA_symbol": "ʌ",
},
"W": {
"IPA_symbol": "ʍ",
},
"X": {
"IPA_symbol": "χ",
"has_descender": True,
},
"X\\": {
"IPA_symbol": "ħ",
},
"Y": {
"IPA_symbol": "ʏ",
},
"Z": {
"IPA_symbol": "ʒ",
"has_descender": True,
},
"\"": {
"IPA_symbol": "ˈ",
},
"%": {
"IPA_symbol": "ˌ",
},
# not in official X-SAMPA; from http://www.kneequickie.com/kq/Z-SAMPA
"%\\": {
"IPA_symbol": "ᴙ",
},
"'": {
"IPA_symbol": "ʲ",
"is_diacritic": True,
},
":": {
"IPA_symbol": "ː",
"is_diacritic": True,
},
":\\": {
"IPA_symbol": "ˑ",
"is_diacritic": True,
},
"@": {
"IPA_symbol": "ə",
},
"@`": {
"IPA_symbol": "ɚ",
},
"@\\": {
"IPA_symbol": "ɘ",
},
"{": {
"IPA_symbol": "æ",
},
"}": {
"IPA_symbol": "ʉ",
},
"1": {
"IPA_symbol": "ɨ",
},
"2": {
"IPA_symbol": "ø",
},
"3": {
"IPA_symbol": "ɜ",
},
"3`": {
"IPA_symbol": "ɝ",
},
"3\\": {
"IPA_symbol": "ɞ",
},
"4": {
"IPA_symbol": "ɾ",
},
"5": {
"IPA_symbol": "ɫ",
},
"6": {
"IPA_symbol": "ɐ",
},
"7": {
"IPA_symbol": "ɤ",
},
"8": {
"IPA_symbol": "ɵ",
},
"9": {
"IPA_symbol": "œ",
},
"&": {
"IPA_symbol": "ɶ",
},
"?": {
"IPA_symbol": "ʔ",
},
"?\\": {
"IPA_symbol": "ʕ",
},
"<\\": {
"IPA_symbol": "ʢ",
},
">\\": {
"IPA_symbol": "ʡ",
},
"^": {
"IPA_symbol": "ꜛ",
},
"!": {
"IPA_symbol": "ꜜ",
},
# not in official X-SAMPA
"!!": {
"IPA_symbol": "‼",
},
"!\\": {
"IPA_symbol": "ǃ",
},
"|\\": {
"IPA_symbol": "ǀ",
"has_descender": True,
},
"||": {
"IPA_symbol": "‖",
"has_descender": True,
},
"|\\|\\": {
"IPA_symbol": "ǁ",
"has_descender": True,
},
"=\\": {
"IPA_symbol": "ǂ",
"has_descender": True,
},
# linking mark, liaison
"-\\": {
"IPA_symbol": "‿",
"is_diacritic": True,
},
# coarticulated; not in official X-SAMPA
"__": {
"IPA_symbol": u"\u0361",
},
# fortis, strong articulation; not in official X-SAMPA
"_:": {
"IPA_symbol": u"\u0348",
},
"_\"": {
"IPA_symbol": u"\u0308",
"is_diacritic": True,
},
# advanced
"_+": {
"IPA_symbol": u"\u031F",
"with_descender": "˖",
"is_diacritic": True,
},
# retracted
"_-": {
"IPA_symbol": u"\u0320",
"with_descender": "˗",
"is_diacritic": True,
},
# rising tone
"_/": {
"IPA_symbol": u"\u030C",
"is_diacritic": True,
},
# voiceless
"_0": {
"IPA_symbol": u"\u0325",
"with_descender": u"\u030A",
"is_diacritic": True,
},
# syllabic
"=": {
"IPA_symbol": u"\u0329",
"with_descender": u"\u030D",
"is_diacritic": True,
},
# syllabic (both are OK according to https://en.wikipedia.org/wiki/X-SAMPA)
"_=": {
"IPA_symbol": u"\u0329",
"with_descender": u"\u030D",
"is_diacritic": True,
},
# strident: not in official X-SAMPA;
# from http://www.kneequickie.com/kq/Z-SAMPA
"_%\\": {
"IPA_symbol": u"\u1DFD",
},
# ejective
"_>": {
"IPA_symbol": "ʼ",
"is_diacritic": True,
},
# pharyngealized
"_?\\": {
"IPA_symbol": "ˤ",
"is_diacritic": True,
},
# falling tone
"_\\": {
"IPA_symbol": u"\u0302",
"is_diacritic": True,
},
# non-syllabic
"_^": {
"IPA_symbol": u"\u032F",
"with_descender": u"\u0311",
"is_diacritic": True,
},
# no audible release
"_}": {
"IPA_symbol": u"\u031A",
"is_diacritic": True,
},
# r-coloring (colouring), rhotacization
"`": {
"IPA_symbol": u"\u02DE",
"is_diacritic": True,
},
# nasalization
"~": {
"IPA_symbol": u"\u0303",
"is_diacritic": True,
},
# advanced tongue root
"_A": {
"IPA_symbol": u"\u0318",
"is_diacritic": True,
},
# apical
"_a": {
"IPA_symbol": u"\u033A",
"is_diacritic": True,
},
# extra-low tone
"_B": {
"IPA_symbol": u"\u030F",
"is_diacritic": True,
},
# low rising tone
"_B_L": {
"IPA_symbol": u"\u1DC5",
"is_diacritic": True,
},
# less rounded
"_c": {
"IPA_symbol": u"\u031C",
"is_diacritic": True,
},
# dental
"_d": {
"IPA_symbol": u"\u032A",
"is_diacritic": True,
},
# velarized or pharyngealized (dark)
"_e": {
"IPA_symbol": u"\u0334",
"is_diacritic": True,
},
# downstep
"<F>": {
"IPA_symbol": "↘",
},
# falling tone
"_F": {
"IPA_symbol": u"\u0302",
"is_diacritic": True,
},
# velarized
"_G": {
"IPA_symbol": "ˠ",
"is_diacritic": True,
},
# high tone
"_H": {
"IPA_symbol": u"\u0301",
"is_diacritic": True,
},
# high rising tone
"_H_T": {
"IPA_symbol": u"\u1DC4",
"is_diacritic": True,
},
# aspiration
"_h": {
"IPA_symbol": "ʰ",
"is_diacritic": True,
},
# palatalization
"_j": {
"IPA_symbol": "ʲ",
"is_diacritic": True,
},
# creaky voice, laryngealization, vocal fry
"_k": {
"IPA_symbol": u"\u0330",
"is_diacritic": True,
},
# low tone
"_L": {
"IPA_symbol": u"\u0300",
"is_diacritic": True,
},
# lateral release
"_l": {
"IPA_symbol": "ˡ",
"is_diacritic": True,
},
# mid tone
"_M": {
"IPA_symbol": u"\u0304",
"is_diacritic": True,
},
# laminal
"_m": {
"IPA_symbol": u"\u033B",
"is_diacritic": True,
},
# linguolabial
"_N": {
"IPA_symbol": u"\u033C",
"is_diacritic": True,
},
# nasal release
"_n": {
"IPA_symbol": "ⁿ",
"is_diacritic": True,
},
# more rounded
"_O": {
"IPA_symbol": u"\u0339",
"is_diacritic": True,
},
# lowered
"_o": {
"IPA_symbol": u"\u031E",
"with_descender": "˕",
"is_diacritic": True,
},
# retracted tongue root
"_q": {
"IPA_symbol": u"\u0319",
"is_diacritic": True,
},
# global rise
"<R>": {
"IPA_symbol": "↗",
},
# rising tone
"_R": {
"IPA_symbol": u"\u030C",
"is_diacritic": True,
},
# rising falling tone
"_R_F": {
"IPA_symbol": u"\u1DC8",
"is_diacritic": True,
},
# raised
"_r": {
"IPA_symbol": u"\u031D",
"is_diacritic": True,
},
# extra-high tone
"_T": {
"IPA_symbol": u"\u030B",
"is_diacritic": True,
},
# breathy voice, murmured voice, murmur, whispery voice
"_t": {
"IPA_symbol": u"\u0324",
"is_diacritic": True,
},
# voiced
"_v": {
"IPA_symbol": u"\u032C",
"is_diacritic": True,
},
# labialized
"_w": {
"IPA_symbol": "ʷ",
"is_diacritic": True,
},
# extra-short
"_X": {
"IPA_symbol": u"\u0306",
"is_diacritic": True,
},
# mid-centralized
"_x": {
"IPA_symbol": u"\u033D",
"is_diacritic": True,
},
"__T": {
"IPA_symbol": "˥",
},
"__H": {
"IPA_symbol": "˦",
},
"__M": {
"IPA_symbol": "˧",
},
"__L": {
"IPA_symbol": "˨",
},
"__B": {
"IPA_symbol": "˩",
},
# not X-SAMPA; for convenience
# dotted circle
"0": {
"IPA_symbol": "◌",
},
}
identical = "acehklmnorstuvwxz"
for char in identical:
data[char] = {"IPA_symbol": char}
identical_with_descender = "jpqy"
for char in identical_with_descender:
data[char] = {"IPA_symbol": char, "has_descender": True}
Authors¶
- Yifan Xiong – https://github.com/abuccts