CZ URN:NBN API

Python API for czech URN:NBN resolver (documentation).

What is URN:NBN

URN:NBN is system for registration and assigning of special codes for electronic publications (it may look like this: urn:nbn:cz:edep-00000s). The codes can be then used to resolve (translate) the code to metadata information about publication and/or to get pointer to actual digital instance of the publication (=file).

The system works like magnet used by the popular torrent programs and websites. Once you have the URN, you can query independent resolvers (torrent trackers), which then points you to libraries (users), who store copy of the document (file).

cz-urnnbn-api

cz-urnnbn-api is Python package used to work with czech URN:NBN resolver. It allows you to register documents, add new digital instances and resolve strings back to URL of systems, where the document is stored.

Warning:
The package is not 100% complete, because complex nature of the API and because it was created for E-deposit project, which doesn’t require 100% functionality. Package is opensource, and pull requests are welcomed.

Package structure

The package itself is split into multiple files.

File relations

Import relations of files:

_images/relations.png

Class relations

Relations of the classes:

_images/class_relations.png

API

cz_urnnbn_api package:

api submodule

Functions for interaction with URN:NBN API.

cz_urnnbn_api.api.is_valid_reg_code(reg_code='edep')

Check whether reg_code is valid registrar code.

Parameters:reg_code (str) – Producent’s registration code.
Returns:True, if the reg_code is valid.
Return type:bool
cz_urnnbn_api.api.iter_registrars()

Iterate over all registrars.

Yields:objRegistrar instance with basic informations.
cz_urnnbn_api.api.get_registrar_info(reg_code)

Get detailed informations about registrar with reg_code.

Parameters:reg_code (str) – Code identifier of registrar.
Returns:Registrar instance with all informations.
Return type:obj
cz_urnnbn_api.api.register_document_obj(xml_composer, reg_code='edep')

Register document in mode BY_RESOLVER - let the resolver give you URN:NBN code.

Parameters:
  • xml_composer (obj) – Instance of the MonographComposer.
  • reg_code (str, default settings.REG_CODE) – Registrar’s code.
Returns:

Instance of URN_NBN which contains assinged URN:NBN code.

Return type:

obj

cz_urnnbn_api.api.register_document(xml, reg_code='edep')

Register document in mode BY_RESOLVER - let the resolver give you URN:NBN code.

Parameters:
  • xml (str) – XML, which will be used for registration. See xml_composer for details.
  • reg_code (str, default settings.REG_CODE) – Registrar’s code.
Returns:

Instance of URN_NBN which contains assinged URN:NBN code.

Return type:

obj

cz_urnnbn_api.api.register_digital_instance_obj(urn_nbn, digital_instance)

Register digital_instance object as new digital instance of the document for given urn_nbn

Parameters:
  • urn_nbn (str) – URN:NBN identifier of registered document.
  • digital_instance (obj) – DigitalInstance instance.
Returns:

DigitalInstance with more informations.

Return type:

obj

cz_urnnbn_api.api.register_digital_instance(urn_nbn, url, digital_library_id, format=None, accessibility=None)

Compose and register new digital instance of document.

Parameters:
  • urn_nbn (str) – URN:NBN identifier of registered document.
  • url (str) – URL of the digital instance.
  • digital_library_id (str) – ID of the digital library.
  • format (str, def. None) – Format of the instance - pdf for example.
  • accessibility (str, def. None) – Is the registration neccessary to access this instance?
Returns:

DigitalInstance with more informations.

Return type:

obj

cz_urnnbn_api.api.get_digital_instances(urn_nbn)

Get list of DigitalInstance objects for given urn_nbn.

DigitalInstances are pointers to DigitalLibrary, where the instance of document is stored. There should be always a link to the document in url property.

Returns:DigitalInstance objects or blank list.
Return type:list
cz_urnnbn_api.api.get_urn_nbn_info(urn_nbn)

For given urn_nbn string, return parsed URN_NBN object.

Parameters:urn_nbn (str) – String.
Returns:URN_NBN object with additional info in properties.
Return type:obj
cz_urnnbn_api.api.get_full_urn_nbn_record(urn_nbn)

Return full record for given urn_nbn.

Warning

This function doesn’t yet return ORM data, just plain string.

Parameters:urn_nbn (str) – String.
Returns:XML with full informations about URN:NBN record.
Return type:str

xml_composer submodule

This module contains composers to allow creation of XML for URN:NBN, so you don’t have to create XML by hand.

See:
class cz_urnnbn_api.xml_composer.MonographComposer(**kwargs)

Bases: kwargs_obj.kwargs_obj.KwargsObj

Compostition class for Monograph publications.

title

str, required – Title of the publication.

subtitle

str – Subtitle of the publication.

ccnb

str – CCNB number.

isbn

str – ISBN string. You should validate this first.

other_id

str – Useful for UUID and so on..

document_type

str – Electronic? Scan?

digital_born

bool – Was the publication digitally born, or is it scan?

author

str – Author of the publication.

publisher

str – Publishers name.

place

str – Place where the publication was published (usually city).

year

str – Year when the publication was published.

format

str, required – PDF? EPUB?

to_xml_dict()

Compose hierarchical structure from ordered dicts, which will hold the XML.

Returns:Structure from ordered dicts.
Return type:odict
to_xml()

Convert itself to XML string.

Returns:XML.
Return type:str
class cz_urnnbn_api.xml_composer.MultiMonoComposer(**kwargs)

Bases: cz_urnnbn_api.xml_composer.MonographComposer

Composition class for Multi monograph XMLs for URN:NBN.

title

str, required – Title of the publication.

volume_title

str, required – Title of the whole volume.

subtitle

str – Subtitle of the publication.

ccnb

str – CCNB number.

isbn

str – ISBN string. You should validate this first.

other_id

str – Useful for UUID and so on..

document_type

str – Electronic? Scan?

digital_born

bool – Was the publication digitally born, or is it scan?

author

str – Author of the publication.

publisher

str – Publishers name.

place

str – Place where the publication was published (usually city).

year

str – Year when the publication was published.

format

str, required – PDF? EPUB?

to_xml_dict()

Convert itself to XML string.

Returns:XML.
Return type:str

xml_convertor submodule

This module contains convertors for converting MODS to XML required by URN:NBN project.

See:
cz_urnnbn_api.xml_convertor.pick_only_text(fn)
class cz_urnnbn_api.xml_convertor.MonographPublication(mods_xml)

Bases: object

This class accepts MODS monographic data, which can then convert to XML for URN:NBN.

get_title(*args, **kwargs)
get_subtitle(*args, **kwargs)
get_author()
Returns:Author’s name.
Return type:str
get_form()
Returns:Form of the book. Electronic source, and so on..
Return type:str
get_place()
Returns:Place where the book was released.
Return type:str
get_publisher(*args, **kwargs)
get_year(*args, **kwargs)
get_identifier(*args, **kwargs)
get_ccnb(*args, **kwargs)
get_isbn(*args, **kwargs)
get_uuid(*args, **kwargs)
compose()

Convert self to nested ordered dicts, which may be serialized to XML using xmltodict module.

Returns:XML parsed to ordered dicts.
Return type:OrderedDict
add_format(file_format)

Add informations about file_format to internal XML dict.

Parameters:file_format (str) – PDF, jpeg, etc..
to_xml()

Convert itself to XML unicode string.

Returns:XML.
Return type:unicode
class cz_urnnbn_api.xml_convertor.MonographVolume(mods_xml)

Bases: cz_urnnbn_api.xml_convertor.MonographPublication

Conversion of Multi-monograph data to XML required by URN:NBN.

get_volume_title(*args, **kwargs)
compose()

Convert self to nested ordered dicts, which may be serialized to XML using xmltodict module.

Returns:XML parsed to ordered dicts.
Return type:OrderedDict
cz_urnnbn_api.xml_convertor.convert_mono_xml(mods_xml, file_format)

Convert MODS monograph record to XML, which is required by URN:NBN resolver.

Parameters:mods_xml (str) – MODS volume XML.
Returns:XML for URN:NBN resolver.
Return type:str
Raises:ValueError – If can’t find required data in MODS (author, title).
cz_urnnbn_api.xml_convertor.convert_mono_volume_xml(mods_volume_xml, file_format)

Convert MODS monograph, multi-volume record to XML, which is required by URN:NBN resolver.

Parameters:mods_volume_xml (str) – MODS volume XML.
Returns:XML for URN:NBN resolver.
Return type:str
Raises:ValueError – If can’t find required data in MODS (author, title).

settings submodule

Module is containing all necessary global variables for the package.

Module also has the ability to read user-defined data from two paths:

  • $HOME/_SETTINGS_PATH
  • /etc/_SETTINGS_PATH

See _SETTINGS_PATH for details.

Note

If the first path is found, other is ignored.

Example of the configuration file ($HOME/edeposit/urnnbn.json):

{
    "EXPORT_DIR": "/somedir/somewhere"
}

Attributes

cz_urnnbn_api.settings.USERNAME = ''

Username for the URN – NBN resolver website

cz_urnnbn_api.settings.PASSWORD = ''

Password for the URN – NBN resolver website

cz_urnnbn_api.settings.REG_CODE = 'edep'

Registration code (used in API)

cz_urnnbn_api.settings.URL = 'https://resolver-test.nkp.cz/api/v3/'

URL of the URN – NBN resolver

cz_urnnbn_api.settings.REG_URL = 'https://resolver-test.nkp.cz/api/v3/registrars/'
cz_urnnbn_api.settings.get_all_constants()

Get list of all uppercase, non-private globals (doesn’t start with _).

Returns:Uppercase names defined in globals() (variables from this module).
Return type:list
cz_urnnbn_api.settings.substitute_globals(config_dict)

Set global variables to values defined in config_dict.

Parameters:config_dict (dict) – dictionary with data, which are used to set globals.

Note

config_dict have to be dictionary, or it is ignored. Also all variables, that are not already in globals, or are not types defined in _ALLOWED (str, int, float) or starts with _ are silently ignored.

api_structures subpackage:

Catalog structure

class cz_urnnbn_api.api_structures.catalog.Catalog

Bases: cz_urnnbn_api.api_structures.catalog.Catalog

Class used for representing informations about Catalogs, where the URN:NBN points.

uid

str – ID of the catalog.

name

str – Name of the catalog.

created

str – ISO 8601 date string .

url_prefix

str – ..

DigitalInstance structure

class cz_urnnbn_api.api_structures.digital_instance.DigitalInstance(url, digital_library_id, **kwargs)

Bases: kwargs_obj.kwargs_obj.KwargsObj

Container used to hold informations about instances of the documents in digital library - this is pointer to document in digital library.

uid

str – ID of the library.

url

str – URL of the library.

digital_library_id

str – Id of the digitial library.

active

bool, def. None – Is the record active?

created

str, def. None – ISO 8601 string with date.

deactivated

str, def. None – ISO 8601 string representation of date.

format

str, def. None – Format of the book. jpg;pdf for example.

accessibility

str, def. None – Free description of accessibility.

to_xml()

Convert self to XML, which can be send to API to register new digital instance.

Returns:UTF-8 encoded string with XML representation.
Return type:str
Raises:AssertionError – If url or digital_library_id is not set.
static instance_from_xmldict(dict_tag)

Create DigitalInstance from nested dicts (result of xmltodict).

Parameters:dict_tag (dict) – Nested dicts.
Returns:DigitalInstance object.
Return type:obj
static from_xml(xml)

Parse xml string and DigitalInstances.

Parameters:xml (str) – Unicode/utf-8 XML.
Returns:List of DigitalInstance objects.
Return type:list

DigitalLibrary structure

Structure for representing informations about Digital libraries.

class cz_urnnbn_api.api_structures.digital_library.DigitalLibrary(uid, name, description=None, url=None, created=None)

Bases: object

Container used to hold informations about given digital library, where the document to which the URN:NBN points, stored.

url

str – URL of the library.

uid

str – ID of the library.

name

str – Name of the digital library.

created

str – ISO 8601 string.

description

str – Free text description of the library.

Modes structure

class cz_urnnbn_api.api_structures.modes.Modes(by_resolver=False, by_registrar=False, by_reservation=False)

Bases: object

Container holding informations about modes which may be used by registrar to register documents.

by_resolver

bool – True if the mode can be used.

by_registrar

bool – True if the mode can be used.

by_reservation

bool – True if the mode can be used.

static from_xmldict(modes_tag)

Parse Modes information from XML.

Parameters:modes_tags (obj) – OrderedDict <modes> tag returned from xmltodict.
Returns:Modes instance.
Return type:obj

Registrar structure

class cz_urnnbn_api.api_structures.registrar.Registrar(code, uid, name=None, description=None, created=None, modified=None, modes=None)

Bases: object

Class holding informations about Registrar.

uid

str – Id of the registrar in URN:NBN system.

code

str – Code of the registrar. Each organization has own.

name

str – Full name of the registrar.

created

str – ISO 8601 date string.

modified

str – ISO 8601 date string.

description

str – Description of the registrar.

modes

objModes instance with informations about allowed modes.

catalogs

list – List of Catalog instances with informations about catalogs used by this registrar.

digital_libraries

list – List of DigitalLibrary instances with informations about digital libraries used by registrar.

static from_xmldict(reg_tag)

Parse basic information about registrar.

Parameters:reg_tag (obj) – OrderedDict returned from xmltodict.
Returns:Registrar instance with basic informations.
Return type:obj

URN_NBN structure

class cz_urnnbn_api.api_structures.urn_nbn.URN_NBN(value, status=None, country_code=None, registrar_code=None, document_code=None, digital_document_id=None, registered=None)

Bases: str

Class used to hold URN:NBN string and also other informations returned from server.

Note

This class subclasses str, so URN:NBN string can be obtained painlessly.

value

str – Whole URN:NBN string.

status

strACTIVE for example.

registered

str – ISO 8601 date string.

country_code

str – Code of the country (cz for URN:NBN).

document_code

str – Part of the URN:NBN holding the code.

registrar_code

str – Identification of registrar.

digital_document_id

str – ID of the document.

static from_xmldict(xdom, tag_picker=<function <lambda> at 0x7fec634882a8>)

Parse itself from xmldict structure.

tools submodule

Useful functions used by multiple structures.

cz_urnnbn_api.api_structures.tools.both_set_and_different(first, second)

If any of both arguments are unset (=``None``), return False. Otherwise return result of unequality comparsion.

Returns:True if both arguments are set and different.
Return type:bool
cz_urnnbn_api.api_structures.tools.to_list(tag)

Put tag to list if it ain’t list/tuple already.

Parameters:tag (obj) – Anything.
Returns:Tag.
Return type:list

Usage example

Example

To register new document, you can compose the XML manually:

from cz_urnnbn_api import api

urn_nbn = api.register_document_obj(
    api.MonographComposer(
        title="Title of the book",
        author="Name of the author",
        format="pdf"
    )
)

or use MODS metada, if you have them:

from cz_urnnbn_api import

urn_nbn = api.register_document(
    api.convert_mono_xml(open("mods_metadata.xml").read(), "pdf")
)

Then you can add digital instances to your urn_nbn identifiers:

api.register_digital_instance(
    urn_nbn=urn_nbn,
    url="someurl - lets say kramerius",
    digital_library_id="to get this, look at get_registrar_info()",
    format="epub",
    accessibility="public"
)

For list of allowed digital libraries, call get_registrar_info(), which will return Registrar with property Registrar.digital_libraries (DigitalLibrary).

Installation

Module is hosted at PYPI, and can be easily installed using PIP:

sudo pip install cz-urnnbn-api

Source code

Project is released under MIT license. Source code can be found at GitHub:

Unittests

Almost every feature of the project is tested by unittests. You can run those tests using provided run_tests.sh script, which can be found in the root of the project.

The run_tests.sh script can be used to run unittests (-u switch), which doesn’t activelly work with online API, and integration tests (-i switch), which works only with online API. Both tests can be run using -a switch.

If you have any trouble, just add --pdb switch at the end of your run_tests.sh command like this: ./run_tests.sh -a --pdb. This will drop you to PDB shell.

Requirements

This script expects that packages pytest and fake-factory is installed. In case you don’t have it yet, it can be easily installed using following command:

pip install --user pytest fake-factory

or for all users:

sudo pip install pytest fake-factory

Example

$ ./run_tests.sh -a
============================= test session starts ==============================
platform linux2 -- Python 2.7.6 -- py-1.4.26 -- pytest-2.6.4
plugins: cov
collected 29 items

tests/integration/test_api.py ....
tests/unit/test_rest.py ....
tests/unit/test_xml_composer.py .........
tests/unit/test_xml_convertor.py ..
tests/unit/api_structures/test_digital_instance.py .....
tests/unit/api_structures/test_modes.py ...
tests/unit/api_structures/test_tools.py ..

========================== 29 passed in 0.75 seconds ===========================

Indices and tables