unihan-db¶
unihan-db - database SQLAlchemy models for UNIHAN. Part of the cihai project. Powered by unihan-etl. See also: libUnihan.
By default, unihan-db creates a SQLite database in an XDG data directory. You can specify a custom database destination by passing a database url into get_session.
Example usage:
#!/usr/bin/env python
# -*- coding: utf8 - *-
from __future__ import unicode_literals
import pprint
from sqlalchemy.sql.expression import func
from unihan_db import bootstrap
from unihan_db.tables import Unhn
session = bootstrap.get_session()
bootstrap.bootstrap_unihan(session)
random_row = session.query(Unhn).order_by(
func.random()
).limit(1).first()
pp = pprint.PrettyPrinter(indent=0)
pp.pprint(random_row.to_dict())
Run:
$ ./examples/01_bootstrap.py
Output:
{'char': '鎷',
'kCantonese': [{'char_id': '鎷', 'definition': 'maa5', 'id': 24035}],
'kDefinition': [],
'kHanYu': [{'char_id': '鎷',
'id': 24014,
'locations': [{'character': 5,
'generic_indice_id': 24014,
'generic_reading_id': None,
'id': 42170,
'page': 4237,
'virtual': 0,
'volume': 6}],
'type': 'kHanYu'}],
'kHanyuPinyin': [{'char_id': '鎷',
'id': 18090,
'locations': [{'character': 5,
'generic_indice_id': None,
'generic_reading_id': 18090,
'id': 42169,
'page': 4237,
'virtual': 0,
'volume': 6}],
'readings': [{'generic_reading_id': 18090,
'id': 26695,
'reading': 'mǎ'}],
'type': 'kHanyuPinyin'}],
'kMandarin': [{'char_id': '鎷', 'hans': 'mǎ', 'hant': 'mǎ', 'id': 23486}],
'ucn': 'U+93B7'}
API¶
-
unihan_db.bootstrap.
add_to_dict
(b)¶ Add
to_dict()
method to SQLAlchemy Base object.- Parameters
b (
declarative_base()
) – SQLAlchemy Base class
-
unihan_db.bootstrap.
bootstrap_unihan
(session, options={})¶ Download, extract and import unihan to database.
-
unihan_db.bootstrap.
get_session
(engine_url='sqlite:///{user_data_dir}/unihan_db.db')¶ Return new SQLAlchemy session object from engine string.
engine_url accepts a string template variable for
{user_data_dir}
, which is replaced to the XDG data directory for the user running the script process. This variable is only useful for SQLite, where file paths are used for the engine_url.- Parameters
engine_url (str) – SQLAlchemy engine string
-
unihan_db.bootstrap.
is_bootstrapped
(metadata)¶ Return True if cihai is correctly bootstrapped.
-
unihan_db.bootstrap.
setup_logger
(logger=None, level='INFO')¶ Setup logging for CLI use.
- Parameters
logger (
logging.Logger
) – instance of loggerlevel (str) – logging level, e.g. ‘INFO’
-
unihan_db.bootstrap.
to_dict
(obj, found=None)¶ Return dictionary of an SQLAlchemy Query result.
Supports recursive relationships.
- Parameters
obj (
sqlalchemy.orm.query.Query
result object) – SQLAlchemy Query resultfound (
set
) – recursive parameters
- Returns
dictionary representation of a SQLAlchemy query
- Return type
unihan_db table design¶
Tables are split into general categories, similar to how UNIHAN db’s files are:
Unhn_DictionaryIndices
Unhn_DictionaryLikeData
Unhn_IRGSources
Unhn_NumericValues
Unhn_OtherMappings
Unhn_RadicalStrokeCounts
Unhn_Readings
Unhn_Variants
Tables are prefixed Unhn_
, with no vowels.
Those root tables include the base data for all 90 UNIHAN fields. Specialized values branched off into field-specialized tables through polymorphic joins.
-
class
unihan_db.tables.
GenericRadicalStrokes
(**kwargs)¶ -
char_id
¶
-
id
¶
-
radical
¶
-
simplified
¶
-
strokes
¶
-
type
¶
-
-
class
unihan_db.tables.
Unhn
(**kwargs)¶ -
char
¶
-
kCCCII
¶
-
kCantonese
¶
-
kCheungBauer
¶
-
kCheungBauerIndex
¶
-
kCihaiT
¶
-
kDaeJaweon
¶
-
kDefinition
¶
-
kFenn
¶
-
kFennIndex
¶
-
kGSR
¶
-
kHDZRadBreak
¶
-
kHanYu
¶
-
kHanyuPinlu
¶
-
kHanyuPinyin
¶
-
kIICore
¶
-
kIRGDaeJaweon
¶
-
kIRGHanyuDaZidian
¶
-
kIRGKangXi
¶
-
kIRG_GSource
¶
-
kIRG_HSource
¶
-
kIRG_JSource
¶
-
kIRG_KPSource
¶
-
kIRG_KSource
¶
-
kIRG_MSource
¶
-
kIRG_TSource
¶
-
kIRG_USource
¶
-
kIRG_VSource
¶
-
kMandarin
¶
-
kRSAdobe_Japan1_6
¶
-
kRSJapanese
¶
-
kRSKanWa
¶
-
kRSKangXi
¶
-
kRSKorean
¶
-
kRSUnicode
¶
-
kSBGY
¶
-
kTotalStrokes
¶
-
kXHC1983
¶
-
ucn
¶
-
-
class
unihan_db.tables.
UnhnLocation
(**kwargs)¶ -
character
¶
-
generic_indice_id
¶
-
generic_reading_id
¶
-
id
¶
-
page
¶
-
virtual
¶
-
volume
¶
-
-
class
unihan_db.tables.
UnhnLocationkXHC1983
(**kwargs)¶ -
character
¶
-
entry
¶
-
generic_indice_id
¶
-
generic_reading_id
¶
-
id
¶
-
page
¶
-
substituted
¶
-
-
class
unihan_db.tables.
kCheungBauer
(**kwargs)¶ -
cangjie
¶
-
char_id
¶
-
id
¶
-
locations
¶
-
radical
¶
-
readings
¶
-
strokes
¶
-
type
¶
-
History¶
Move from Pipfile to Poetry (https://github.com/cihai/unihan-db/pull/261)
Speed up importing initial data
Support for more fields
Support for appdirs (XDG directory specification)
Zero-config sqlite default
Bump unihan-etl to 0.9.5
Add
project_urls
to setup.pyUse
collections
import that’s compatible with python 2 and 3Loosen version constraints
unihan-db 0.1.0 (2017-05-29)¶
Initial commit