edeposit.amqp.storage¶
Long term storage subsystem for the E-deposit project.
This project allows to store and retreive publications over AMQP and also to optionally access accessible publications via HTTP using builtin webserver written in bottle.py.
Package structure¶
File relations¶

API¶
archive_storage wrapper¶
This module provides frontend API for storing / retreiving DBArchive
from/to the universal object database.
-
storage.archive_storage.
save_archive
(archive)¶ Save archive into database and into proper indexes.
- Attr:
- archive (obj): Instance of the
DBArchive
.
Returns: DBArchive
without data.Return type: obj
Raises: InvalidType
– When the archive is not instance ofDBArchive
.UnindexablePublication
– When there is no index (property) which can be used to index archive in database.
-
storage.archive_storage.
search_archives
(query)¶ Return list of
DBArchive
which match all properties that are set (not None
) using AND operator to all of them.Example
- result = storage_handler.search_publications(
- DBArchive(isbn=”azgabash”)
)
Parameters: query (obj) – DBArchive
with some of the properties set.Returns: List of matching DBArchive
or[]
if no match was found.Return type: list Raises: InvalidType
– When the query is not instance ofDBArchive
.
publication_storage wrapper¶
This module provides frontend API for storing / retreiving DBPublication
from/to the universal object database.
-
storage.publication_storage.
save_publication
(pub)¶ Save pub into database and into proper indexes.
- Attr:
- pub (obj): Instance of the
DBPublication
.
Returns: DBPublication
without data.Return type: obj
Raises: InvalidType
– When the pub is not instance ofDBPublication
.UnindexablePublication
– When there is no index (property) which can be used to index pub in database.
-
storage.publication_storage.
search_pubs_by_uuid
(uuid)¶ Search publications by uuid.
Parameters: uuid (str) – UUID of publication. Returns: List of matching DBPublication
or[]
if no match was found.Return type: list
-
storage.publication_storage.
search_publications
(query)¶ Return list of
DBPublication
which match all properties that are set (not None
) using AND operator to all of them.Example
- result = storage_handler.search_publications(
- DBPublication(isbn=”azgabash”)
)
Parameters: query (obj) – DBPublication
with some of the properties set.Returns: List of matching DBPublication
or[]
if no match was found.Return type: list Raises: InvalidType
– When the query is not instance ofDBPublication
.
tree_handler module¶
This module provides database for Tree
instances.
-
class
storage.tree_handler.
TreeHandler
(conf_path='/home/docs/checkouts/readthedocs.org/user_builds/edeposit-amqp-storage/checkouts/stable/src/edeposit/amqp/storage/zconf/zeo_client.conf', project_key='tree_storage')¶ This class is used as database handler for
Tree
instances.-
name_db
¶ dict – Database handler dict for name.
-
aleph_id_db_key
¶ str – Key for the
aleph_id_db
.
-
aleph_id_db
¶ dict – Database handler dict for aleph_id.
-
issn_db
¶ dict – Database handler dict for issn.
-
path_db
¶ dict – Database handler dict for path.
-
parent_db
¶ dict – Database handler dict for parent.
Constructor.
Parameters: - conf_path (str) – Path to the ZEO configuration file. Default
ZEO_CLIENT_PATH
. - project_key (str) – Project key, which is used for lookups into ZEO.
Default
TREE_PROJECT_KEY
.
-
add_tree
(*args, **kwargs)¶ Add tree into database.
Parameters: - tree (obj) –
Tree
instance. - parent (ref, default None) – Reference to parent tree. This is used for all sub-trees in recursive call.
- tree (obj) –
-
remove_tree_by_path
(path)¶ Remove the tree from database by given path.
Parameters: path (str) – Path of the tree.
-
remove_tree
(tree)¶ Remove the tree from database using tree object to identfy the path.
Parameters: tree (obj) – Tree
instance.
-
trees_by_issn
(*args, **kwargs)¶ Search trees by issn.
Parameters: issn (str) – Tree.issn
property ofTree
.Returns: Set of matching Tree
instances.Return type: set
-
trees_by_path
(*args, **kwargs)¶ Search trees by path.
Parameters: path (str) – Tree.path
property ofTree
.Returns: Set of matching Tree
instances.Return type: set
-
-
storage.tree_handler.
tree_handler
(*args, **kwargs)¶ Singleton TreeHandler generator. Any arguments are given to
TreeHandler
, when it is first created.Returns: TreeHandler
instance.Return type: obj
storage_handler module¶
This module handles the database, maintains indexes and provides search function over this indexes.
-
exception
storage.storage_handler.
InvalidType
¶ Raised in case that object you are trying to store doesn’t have required interface.
-
exception
storage.storage_handler.
UnindexableObject
¶ Raised in case, that object doesn’t have at least one attribute set.
-
class
storage.storage_handler.
StorageHandler
(project_key, conf_path='/home/docs/checkouts/readthedocs.org/user_builds/edeposit-amqp-storage/checkouts/stable/src/edeposit/amqp/storage/zconf/zeo_client.conf')¶ Object database with indexing by the object attributes.
Each stored object is required to have following properties:
- indexes (list of strings)
- project_key (string)
For example:
class Person(Persistent): def __init__(self, name, surname): self.name = name self.surname = surname @property def indexes(self): return [ "name", "surname", ] @property def project_key(self): return PROJECT_KEY
Note
I suggest to use properties, because that way the values are not stored in database, but constructed at request by the property methods.
Constructor.
Parameters: -
store_object
(obj)¶ Save obj into database and into proper indexes.
- Attr:
- obj (obj): Indexable object.
Raises: InvalidType
– When the obj doesn’t have right properties.Unindexableobjlication
– When there is no indexes defined.
-
search_objects
(query)¶ Return list of objects which match all properties that are set (
not None
) using AND operator to all of them.Example
- result = storage_handler.search_objects(
- DBPublication(isbn=”azgabash”)
)
Parameters: query (obj) – Object implementing proper interface with some of the properties set. Returns: List of matching objects or []
if no match was found.Return type: list Raises: InvalidType
– When the query doesn’t implement required properties.
web_tools submodule¶
Functions shared by the server script and also by the backend.
-
exception
storage.web_tools.
PrivatePublicationError
¶ Bases:
exceptions.UserWarning
Indication that publication is private.
-
storage.web_tools.
compose_path
(pub, uuid_url=False)¶ Compose absolute path for given pub.
Parameters: - pub (obj) –
DBPublication
instance. - uuid_url (bool, default False) – Compose URL using UUID.
Returns: Absolute url-path of the publication, without server’s address and protocol.
Return type: Raises: PrivatePublicationError
– When the pub is private publication.- pub (obj) –
-
storage.web_tools.
compose_tree_path
(tree, issn=False)¶ Compose absolute path for given tree.
Parameters: Returns: Absolute path of the tree, without server’s address and protocol.
Return type:
-
storage.web_tools.
compose_full_url
(pub, uuid_url=False)¶ Compose full url for given pub, with protocol, server’s address and port.
Parameters: - pub (obj) –
DBPublication
instance. - uuid_url (bool, default False) – Compose URL using UUID.
Returns: Absolute url of the publication.
Return type: Raises: PrivatePublicationError
– When the pub is private publication.- pub (obj) –
settings submodule¶
Module is containing all necessary global variables for the package.
Module also has the ability to read user-defined data from following paths:
SETTINGS_PATH
env variable file pointer to .json file.$HOME/_SETTINGS_PATH
/etc/_SETTINGS_PATH
See _SETTINGS_PATH
for details.
Note
When the first path is found, others is ignored.
Example of the configuration file ($HOME/edeposit/storage.json
):
{
"PRIVATE_INDEX_USERNAME": "username",
"PRIVATE_INDEX_PASSWORD": "password"
}
Example of starting the program with env variable:
export WA_KAT_SETTINGS="/tmp/conf.json"; bin/edeposit_storage_server.py
Attributes¶
-
storage.settings.
ZCONF_PATH
= '/home/docs/checkouts/readthedocs.org/user_builds/edeposit-amqp-storage/checkouts/stable/src/edeposit/amqp/storage/zconf'¶ Path to the directory with zeo.conf and zeo_client.conf.
-
storage.settings.
ZEO_SERVER_PATH
= '/home/docs/checkouts/readthedocs.org/user_builds/edeposit-amqp-storage/checkouts/stable/src/edeposit/amqp/storage/zconf/zeo.conf'¶
-
storage.settings.
ZEO_CLIENT_PATH
= '/home/docs/checkouts/readthedocs.org/user_builds/edeposit-amqp-storage/checkouts/stable/src/edeposit/amqp/storage/zconf/zeo_client.conf'¶
-
storage.settings.
PUB_PROJECT_KEY
= 'pub_storage'¶ This is used in ZODB. DON’T CHANGE THIS.
-
storage.settings.
ARCH_PROJECT_KEY
= 'archive_storage'¶ This is used in ZODB. DON’T CHANGE THIS.
-
storage.settings.
TREE_PROJECT_KEY
= 'tree_storage'¶ This is used in ZODB. DON’T CHANGE THIS.
-
storage.settings.
PRIVATE_INDEX
= False¶ Should the index be private?
-
storage.settings.
PRIVATE_INDEX_USERNAME
= 'edeposit'¶ Username for private index.
-
storage.settings.
PRIVATE_INDEX_PASSWORD
= ''¶ Password for private index. You HAVE TO set it.
-
storage.settings.
PUBLIC_DIR
= ''¶ Path to the directory for public publications.
-
storage.settings.
PRIVATE_DIR
= ''¶ Path to the private directory, for non-downloadabe pubs.
-
storage.settings.
ARCHIVE_DIR
= ''¶ Path to the directory, where the archives will be stored.
-
storage.settings.
HNAS_INDICATOR
= ''¶ Path to the file saved on HNAS, which is used to indicate that HNAS is mounted.
-
storage.settings.
HNAS_IND_ALLOWED
= False¶ Should the HNAS indicator be used or not?
-
storage.settings.
WEB_ADDR
= 'localhost'¶ Address where the webserver should listen.
-
storage.settings.
WEB_PORT
= 8080¶ Port for the webserver.
-
storage.settings.
WEB_SERVER
= 'wsgiref'¶ Use paste for threading.
-
storage.settings.
WEB_DB_TIMEOUT
= 30¶ How often should web refresh connection to DB.
-
storage.settings.
DOWNLOAD_KEY
= 'download'¶ Used as part of the url. Don’t change this later.
-
storage.settings.
UUID_DOWNLOAD_KEY
= 'UUID'¶ Used as part of the url. Don’t change this.
-
storage.settings.
PATH_DOWNLOAD_KEY
= 'tree_by_path'¶ Key used for URL composition for trees.
-
storage.settings.
ISSN_DOWNLOAD_KEY
= 'tree_by_issn'¶ Key used for URL composition for trees.
-
storage.settings.
_SETTINGS_PATH
= 'edeposit/storage.json'¶ Appended to default search paths.
-
storage.settings.
_ALLOWED
= [<type 'str'>, <type 'unicode'>, <type 'int'>, <type 'float'>, <type 'long'>, <type 'bool'>]¶ Allowed types.
Structures¶
AMQP:
responses submodule¶
Structures used for AMQP responses.
-
class
storage.structures.comm.responses.
SearchResult
¶ Response to
SearchRequest
.-
records
¶ list – List of matching
Publication
objects.
Create new instance of SearchResult(records,)
-
requests submodule¶
Structures used for AMQP communication requests.
-
class
storage.structures.comm.requests.
SearchRequest
(query, light_request=False)¶ Retreive publication from archive using query - instance of
Publication
orArchive
. Any property of the is used to retreive data.-
query
¶ obj – Instance of
Publication
orArchive
.
-
light_request
¶ bool, default False – If true, don’t return the data. This is used when you need just the metadata info.
-
-
class
storage.structures.comm.requests.
SaveRequest
(record)¶ Save record to the storage.
-
record
¶ obj – Instance of the
Publication
,
-
:class:`.Archive`.
Create new instance of SaveRequest(record,)
-
Publication structure¶
Communication structure used by AMQP.
-
class
storage.structures.comm.publication.
Publication
(*args, **kwargs)¶ Bases:
storage.structures.comm.publication.Publication
Communication structure used to sent data to storage subsystem over AMQP.
-
title
¶ str – Title of the publication.
str – Name of the author.
-
pub_year
¶ str – Year when the publication was released.
-
isbn
¶ str – ISBN for the publication.
-
uuid
¶ str – UUID string to pair the publication with edeposit.
-
aleph_id
¶ str – ID used in aleph.
-
producent_id
¶ str – ID used for producent.
-
is_public
¶ bool – Is the file public?
-
filename
¶ str – Original filename.
-
is_periodical
¶ bool – Is the publication periodical?
-
path
¶ str – Path in the tree (used for periodicals).
-
b64_data
¶ str – Base64 encoded data ebook file.
-
url
¶ str – URL in case that publication is public.
-
file_pointer
¶ str – Pointer to the file on the file server.
-
Archive structure¶
Communication structure used by AMQP.
-
class
storage.structures.comm.archive.
Archive
(*args, **kwargs)¶ Bases:
storage.structures.comm.archive.Archive
Communication structure used to sent data to storage subsystem over AMQP.
-
isbn
¶ str – ISBN for the archive.
-
uuid
¶ str – UUID string to pair the archive with edeposit.
-
aleph_id
¶ str – ID used in aleph.
-
b64_data
¶ str – Base64 encoded data ebook file.
-
dir_pointer
¶ str – Pointer to the directory on the file server.
-
Tree structure¶
Communication structure used by AMQP.
-
class
storage.structures.comm.tree.
Tree
(*args, **kwargs)¶ Bases:
storage.structures.comm.tree.Tree
Communication structure used to sent data to storage subsystem over AMQP.
-
name
¶ str – Name of the periodical.
-
sub_trees
¶ list – List of other trees.
-
sub_publications
¶ list – List of sub-publication UUID’s.
-
aleph_id
¶ str – ID used in aleph.
-
issn
¶ str – ISSN given to the periodical.
-
is_public
¶ bool – Is the tree public?
-
path
¶ str, default “” – Path in the periodical structures.
Constructor.
Parameters: Raises: ValueError
– In case that name is not set, or sub_trees or sub_publications is not list/tuple.-
path
-
indexes
¶ Return list of property names, which may be used for indexing in DB.
Returns: List of strings. Return type: list
-
collect_publications
()¶ Recursively collect list of all publications referenced in this tree and all sub-trees.
Returns: List of UUID strings. Return type: list
-
Database:
DBPublication structure¶
Structure used in ZODB (database) for storing publications.
-
class
storage.structures.db.db_publication.
DBPublication
(**kwargs)¶ Bases:
persistent.Persistent
,kwargs_obj.kwargs_obj.KwargsObj
Database structure used to store basic metadata about Publications.
-
title
¶ str – Title of the publication.
str – Name of the author.
-
pub_year
¶ str – Year when the publication was released.
-
isbn
¶ str – ISBN for the publication.
-
uuid
¶ str – UUID string to pair the publication with edeposit.
-
aleph_id
¶ str – ID used in aleph.
-
producent_id
¶ str – ID used for producent.
-
is_public
¶ bool – Is the file public?
-
filename
¶ str – Original filename.
-
is_periodical
¶ bool – Is the publication periodical?
-
path
¶ str – Path in the tree (used for periodicals).
-
file_pointer
¶ str – Pointer to the file on the file server.
-
classmethod
from_comm
(pub)¶ Convert communication namedtuple to this class.
Parameters: pub (obj) – Publication
instance which will be converted.Returns: DBPublication
instance.Return type: obj
-
url
¶
-
indexes
¶ Returns – list: List of strings, which may be used as indexes in DB.
-
project_key
¶
-
to_comm
(light_request=False)¶ Convert self to
Publication
.Returns: Publication
instance.Return type: obj
-
DBArchive structure¶
Structure used in ZODB (database) for storing ZIP archives / unpacked directories.
-
class
storage.structures.db.db_archive.
DBArchive
(**kwargs)¶ Bases:
persistent.Persistent
,kwargs_obj.kwargs_obj.KwargsObj
Database structure used to store basic metadata about Archives.
-
isbn
¶ str – ISBN for the archive.
-
uuid
¶ str – UUID string to pair the archive with edeposit.
-
aleph_id
¶ str – ID used in aleph.
-
dir_pointer
¶ str – Pointer to the directory on the file server.
-
classmethod
from_comm
(pub)¶ Convert communication namedtuple to this class.
Parameters: pub (obj) – Archive
instance which will be converted.Returns: DBArchive
instance.Return type: obj
-
indexes
¶ Returns – list: List of strings, which may be used as indexes in DB.
-
project_key
¶
-
Generators:
structures_generator script¶
This script is used to generate Publication
, DBPublication
, Archive
and DBArchive
structures.
Installation¶
Installation of this project is little bit more complicated. Please read installation notes:
Installation notes¶
Module itself can be installed using PIP:
sudo pip install edeposit.amqp.storage
Configuration¶
After the installation, some configuration is required. Configuration is done using settings.py
script, which reads data from configuration path ~/edeposit/storage.json
.
Each uppercase attribute defined in settings
can be reconfigured using the storage.json
configuration file.
Required configuration options is:
Highly recommended options:
You should definitelly change the WEB_SERVER
to paste
. By default, the wsgiref backend is used, but that is only single-thread server. Paste will allow multithread access of users to your server.
Also to change the default database paths, you will need to update ZCONF_PATH
to path with the ZEO configuration.
Example of the configuration¶
/etc/edeposit/storage.json
:
{
"PUBLIC_DIR": "/var/storage/public",
"PRIVATE_DIR": "/var/storage/private",
"ZCONF_PATH": "/var/storage/zconf",
"PRIVATE_INDEX": true,
"PRIVATE_INDEX_PASSWORD": "secret password",
"WEB_SERVER": "paste"
}
Example of the ZEO configuration¶
/var/storage/zconf/zeo_client.conf
:
<zeoclient>
server localhost:8090
</zeoclient>
/var/storage/zconf/zeo.conf
:
<zeo>
address localhost:8090
</zeo>
<filestorage>
path /var/storage/zodb/storage.fs
</filestorage>
<eventlog>
level INFO
<logfile>
path /var/storage/zodb/zeo.log
format %(asctime)s %(message)s
</logfile>
</eventlog>
How to run the server¶
There are three script, which you have to start in order to get full functionality:
edeposit_storage_runzeo.sh
(database)edeposit_storage_server.py
(webserver)edeposit_amqp_storaged.py
(amqp handler)
Webserver and AMQP handler are optional, but database script is mandatory.
Supervisord¶
To run the scripts, you can use supervisord:
[program:storagedaemon]
command = /usr/bin/edeposit_amqp_storaged.py start --foreground
process_name = storagedaemon
directory = /usr/bin
priority = 10
redirect_stderr = true
user = edeposit
[program:storageweb]
command = /usr/bin/edeposit_storage_server.py
process_name = storageweb
directory = /usr/bin
priority = 10
redirect_stderr = true
user = root
[program:storagezeo]
command = /usr/bin/edeposit_storage_runzeo.sh
process_name = storagezeo
directory = /usr/bin
priority = 10
redirect_stderr = true
user = edeposit
For the storageweb, the user must be root only in case you wish to run the web on port 80
.
AMQP protocol¶
Here is the list of Request -> Response
pairs describing responses to AMQP communication:
SaveRequest.Archive -> Archive
SaveRequest.Publication -> Publication
SaveRequest.Tree -> TreeInfo
SearchRequest -> SearchResult
Source code¶
Project is released under the MIT license. Source code can be found at GitHub:
Unittests¶
Almost every feature of the project is tested by unittests. You can run those
tests using provided run_tests.sh
script, which can be found in the root
of the project.
If you have any trouble, just add --pdb
switch at the end of your run_tests.sh
command like this: ./run_tests.sh --pdb
. This will drop you to PDB shell.
Requirements¶
This script expects that packages pytest, fake-factory and sh is installed. In case you don’t have it yet, it can be easily installed using following command:
pip install --user pytest fake-factory sh
or for all users:
sudo pip install pytest fake-factory sh
Example¶
./run_tests.sh
============================= test session starts ==============================
platform linux2 -- Python 2.7.6, pytest-2.8.2, py-1.4.30, pluggy-0.3.1
rootdir: /home/bystrousak/Plocha/Dropbox/c0d3z/prace/edeposit.amqp.storage, inifile:
plugins: cov-1.8.1
collected 35 items
tests/test_amqp_chain.py .....
tests/test_publication_storage.py ........
tests/test_tree_handler.py ........
tests/structures/test_db_archive.py ...
tests/structures/test_db_publication.py ....
tests/structures/test_publication.py .
tests/structures/test_requests.py .
tests/structures/test_tree.py .....
========================== 35 passed in 11.02 seconds ==========================