Welcome to Pacifica Archive Interface’s documentation!

The Pacifica Archive Interface service provides a REST API to a hierarchical storage management (HSM) system. The service provides Pacifica the ability to store files for the long term in a scalable cheap storage solution.

Installation

The Pacifica software is available through PyPi so creating a virtual environment to install is what is shown below. Please keep in mind compatibility with the Pacifica Core services.

Installation in Virtual Environment

These installation instructions are intended to work on both Windows, Linux, and Mac platforms. Please keep that in mind when following the instructions.

Please install the appropriate tested version of Python for maximum chance of success.

Linux and Mac Installation

mkdir ~/.virtualenvs
python -m virtualenv ~/.virtualenvs/pacifica
. ~/.virtualenvs/pacifica/bin/activate
pip install pacifica-archiveinterface

Windows Installation

This is done using PowerShell. Please do not use Batch Command.

mkdir "$Env:LOCALAPPDATA\virtualenvs"
python.exe -m virtualenv "$Env:LOCALAPPDATA\virtualenvs\pacifica"
& "$Env:LOCALAPPDATA\virtualenvs\pacifica\Scripts\activate.ps1"
pip install pacifica-archiveinterface

Configuration

The configuration of the Archive Interface service is done in two files. The REST API utilizes CherryPy and review of their configuration documentation is recommended. The service configuration file is a INI formatted file containing configuration for accessing the HSM backend.

CherryPy Configuration File

An example of Archive Interface server CherryPy configuration:

[global]
log.screen: True
log.access_file: 'access.log'
log.error_file: 'error.log'
server.socket_host: '0.0.0.0'
server.socket_port: 8080

[/]
request.dispatch: cherrypy.dispatch.MethodDispatcher()
tools.response_headers.on: True
tools.response_headers.headers: [('Content-Type', 'application/json')]

Service Configuration File

The service configuration is an INI file and an example is as follows:

[posix]
; use id2filename method when accessing a posix endpoint
use_id2filename = false

[hpss]
; IBM HPSS HSM settings
auth = /var/hpss/etc/hpss.unix.keytab
sitename = example.com
user = hpss.unix

[hsm_sideband]
; Oracle HSM Sideband database settings (MySQL)
sam_qfs_prefix = /tmp/path
schema = schema_name
host = host
user = user
password = pass
port = 3306

ID Mapping to File Names

The Pacifica software depends on a flat ID space for indexing files. This needs to map to filenames on the backend storage in a nice way. To limit the number of files in a single directory (or number of directories in a directory) we use the algorithm in archiveinterface.id2filename. This takes a number and breaks it into bytes. Each byte is then represented in hex and used to build the directory tree.

For example id2filename(12345) becomes /39/3039 on the backend file system.

Running It

There are two ways of running the archive interface, using CherryPy builtin server and uWSGI to run the server.

CherryPy does provide a server and this works for some workloads.

pacifica-archiveinterface -t posix -p 8080 -a 127.0.0.1 --prefix /path

uWSGI is recommended for this service as it performs a lot better for higher throughput workloads.

export PAI_BACKEND_TYPE=posix
export PAI_PREFIX=/path
uwsgi --http-socket :8080 --master -p 8 --module pacifica.archiveinterface.wsgi

Post Deployment Testing

Inside the post_deployment_test directory there is a file called deployment_test.py This file will run a series of tests against a deployed archive interface. The test are ordered so that they post, stage, status, and get files properly. There are a few global variables at the top of the file that need to be adjusted to each deployment.

Variables

export ARCHIVE_URL='http://127.0.0.1:8080/'
export LARGE_FILE_SIZE=$(( 1024 * 1024 * 1024))
export MANY_FILES_TEST_COUNT=1000
  • ARCHIVE_URL is the URL to the newly deployed archive_interface
  • LARGE_FILE_SIZE is the size of the large file to test with (default 1Gib)
  • MANY_FILES_TEST_COUNT is the number of small files to spam (default 1000)

Running

pytest -v post_deployment_tests/deployment_test.py

Output will be the status of the tests against the archive interface

Example Usage

Verify working

To verify the system is working do a GET against the system with no id specified.

curl -X GET http://127.0.0.1:8080

Sample output:

{
    "message": "Pacifica Archive Interface Up and Running"
}

Put a File

The path in the URL should be only an integer specifying a unique file in the archive. Sending a different file to the same URL will over-write the contents of the previous file. Setting the Last- Modified header sets the mtime of the file in the archive and is required.

curl -X PUT -H 'Last-Modified: Sun, 06 Nov 1994 08:49:37 GMT' --upload-file /tmp/foo.txt http://127.0.0.1:8080/12345

Sample output:

{
    "message": "File added to archive",
    "total_bytes": "24"
}

Get a File

The HTTP GET method is used to get the contents of the specified file.

curl -o /tmp/foo.txt http://127.0.0.1:8080/12345

Sample output (without -o option): “Document Contents”

Status a File

The HTTP HEAD method is used to get a JSON document describing the status of the file. The status includes, but is not limited to, the size, mtime, ctime, whether its on disk or tape. The values can be found within the headers.

curl -I -X HEAD http://127.0.0.1:8080/12345

Sample output:

HTTP/1.0 204 No Content
Date: Fri, 07 Oct 2016 19:51:37 GMT
Server: WSGIServer/0.1 Python/2.7.5
X-Pacifica-Messsage: File was found
X-Pacifica-File: /tmp/12345
Content-Length: 18
Last-Modified: 1473806059.29
X-Pacifica-Ctime: 1473806059.29
X-Pacifica-Bytes-Per-Level: (18L,)
X-Pacifica-File-Storage-Media: disk
Content-Type: application/json

Stage a File

The HTTP POST method is used to stage a file for use. In posix this equates to a no-op on hpss it stages the file to the disk drive.

curl -X POST -d '' http://127.0.0.1:8080/12345

Sample Output:

{
    "file": "/12345",
    "message": "File was staged"
}

Move a File

The HTTP PATCH method is used to move a file. The upload file contains the path to current file on archive The Id at the end of the url is where the file will be moved to

curl -X PATCH -H 'Content-Type: application/json' http://127.0.0.1:8080/123456 -d'{
  "path": "/tmp/12345"
}'

Sample Output:

{
    "message": "File Moved Successfully"
}

Extending Supported Backends

Create a backend directory

Under pacifica/archiveinterface/backends add a directory for the new backend type

Create Classes that Implement the Abstract Backend Class methods

Abstract backend classses can ge found under: pacifica/archiveinterface/backends/abstract Descriptions of all the methods that need to be abstracted exists in the comments above the class.

Update Backend Factory

Update the archive backend factory found here: pacifica/archiveinterface/backends/factory.py In this file is a load_backend_archive() method. This method needs to have its logic extended to support the new backend type. This also entails loading the appropriate files for this backend using import

Update Interface Server

Update the main() method to support the new backend choice. File located: pacifica/archiveinterface/__main__.py In this file the type argument is defined with its supported types. Need to extend that array to include the new backend type

Archive Interface Python Module

Backends Python Module

Abstract Python Module

Module for the Abstract Classes.

Abstract Backend Module.

Module that has the Abstract class for Archive Backends Any new backends need to inherit from this class and implement its methods. If the methods are not implemented in the child, the child object will not be able to be instantiated.

class pacifica.archiveinterface.backends.abstract.archive.AbstractBackendArchive(prefix)[source]

Abstract Base Class for Archive Backends.

__init__(prefix)[source]

Constructor to build backend archive.

close()[source]

Close File.

Method that closes an open file for the backend archive that implements this class.

open(filepath, mode)[source]

Open File.

Method that opens a file for the backend archive that implements this class Should return a file like object, most likely self. This method is also responsible for making sure the dirname of the filepath exists before trying to open.

patch(file_id, old_path)[source]

Move a file.

read(blocksize)[source]

Read File.

Method that reads an open file for the backend archive that implements this class and returns the contents.

remove()[source]

Remove a file.

set_file_permissions()[source]

Set permissions for File.

Method that sets a files permissions for the backend archive that implements this class.

set_mod_time(mod_time)[source]

Set Modification Time for File.

Method that sets a files mod time for the backend archive that implements this class.

stage()[source]

Stage File.

Method that stages a file for the the backend that implements this class Stage moves a file to an appropriate location to be downloaded.

status()[source]

Return status of the file.

Method that gets the status of a file in the archive Needs to return an implemented object of the abstract_status_class The abstract_status_class should be implemented for each backend type.

write(buf)[source]

Write File.

Method that writes an open file for the backend archive that implements this class.

Module that has the abstract class for a file’s status.

class pacifica.archiveinterface.backends.abstract.status.AbstractStatus(mtime, ctime, bytes_per_level, filesize)[source]

Abstract Base Class for Status.

Child backend objects need to implement the following methods to allow file status to function.

__init__(mtime, ctime, bytes_per_level, filesize)[source]

Constructor for AbstractClass.

Implemented versions of this class need to set all the attributes defined in this abstract class.

bytes_per_level = None
ctime = None
define_levels()[source]

Defined list of levels.

Method that defines the storage levels in the archive backend. So a backend archive with a disk, tape, and error drive will return the following [“disk”, “tape”, “error”].

defined_levels = None
file_storage_media = None
filepath = None
filesize = None
find_file_storage_media()[source]

Find File Storage Media.

Method that finds the media the file in the archive backend is stored on. Usually disk or tape.

mtime = None
set_filepath(filepath)[source]

Set File Path.

Method that sets the filepath class attribute. Used to return the correct status of a file.

HPSS Python Module

HPSS Backend Module.

Module that implements the Abstract backend archive for an hpss backend.

class pacifica.archiveinterface.backends.hpss.archive.HpssBackendArchive(prefix)[source]

The HPSS implementation of the backend archive.

__init__(prefix)[source]

Constructor for the HPSS Backend Archive.

static _check_rcode(rcode, msg)[source]

Check if rcode is < 0 and raise error.

authenticate()[source]

Authenticate the user with the hpss system.

close()[source]

Close an HPSS File.

open(filepath, mode)[source]

Open an hpss file.

patch(file_id, old_path)[source]

Move a hpss file.

read(blocksize)[source]

Read a file from the hpss archive.

remove()[source]

Remove the file for an HPSS file.

set_file_permissions()[source]

Set the file permissions for an hpss archive file.

set_mod_time(mod_time)[source]

Set the mod time for an hpss archive file.

stage()[source]

Stage an hpss file to the top level drive.

status()[source]

Get the status of a file in the hpss archive.

write(buf)[source]

Write a file to the hpss archive.

pacifica.archiveinterface.backends.hpss.archive.path_info_munge(filepath)[source]

Munge the path for this filetype.

HPSS Extended File Module.

Module that holds the class to the interface for the hpss c extensions.

class pacifica.archiveinterface.backends.hpss.extended.HpssExtended(filepath)[source]

Provide the interface for the hpss ctypes.

__init__(filepath)[source]

Constructor for the HPSS Extended File type.

makedirs()[source]

Recursively make the directories for the filepath.

parse_latency(latency_tuple)[source]

Parse the latency tuple.

Parse the tuple returned by the c extension into the correct latency.

ping_core(sitename)[source]

Ping the Core server to see if its still active.

set_mod_time(mod_time)[source]

Use extensions to set the mod time on an hpss file.

stage()[source]

Stage an hpss file.

Do this to move the file to disk doesnt need to return anything. will throw exception on error however.

status()[source]

Get the status of a file.

If it is on tape or disk Found the documentation for this in the hpss programmers reference section 2.3.6.2.8 “Get Extanded Attributes”.

HPSS Status Module.

Module that implements the Abstract Status class for the hpss archive backend type.

class pacifica.archiveinterface.backends.hpss.status.HpssStatus(mtime, ctime, bytes_per_level, filesize)[source]

HPSS Status Class.

Class for handling hpss status pieces needs mtime,ctime, bytes per level array.

__init__(mtime, ctime, bytes_per_level, filesize)[source]

HPSS Status Constructor.

_disk = 'disk'
_error = 'error'
_tape = 'tape'
define_levels()[source]

Set up what each level definition means.

find_file_storage_media()[source]

Set if file is on disk or tape.

set_filepath(filepath)[source]

Set the filepath that the status is for.

Oracle HSM Sideband Python Module

Oracle HSM Side Band Database Module.

HSM Sideband Backend Archive Module.

Module that implements the abstract_backend_archive class for a HSM Sideband backend.

class pacifica.archiveinterface.backends.oracle_hsm_sideband.archive.HsmSidebandBackendArchive(prefix)[source]

HSM Sideband Backend Archive Class.

Class that implements the abstract base class for the hsm sideband archive interface backend.

__init__(prefix)[source]

Constructor for HSM Sideband Backend Archive.

close()[source]

Close a HSM Sideband file.

open(filepath, mode)[source]

Open a hsm sideband file.

patch(file_id, old_path)[source]

Move a hsm file.

read(blocksize)[source]

Read a HSM Sideband file.

remove()[source]

Remove the file for a posix file.

set_file_permissions()[source]

Set the file permissions for a posix file.

set_mod_time(mod_time)[source]

Set the mod time on a HSM file.

stage()[source]

Stage a HSM Sideband file.

status()[source]

Get the status of a HSM Sideband file.

write(buf)[source]

Write a HSM Sideband file to the archive.

pacifica.archiveinterface.backends.oracle_hsm_sideband.archive.path_info_munge(filepath)[source]

Munge the path for this filetype.

Module that allows for the extension of the hsm sideband archive.

pacifica.archiveinterface.backends.oracle_hsm_sideband.extended_file_factory.extended_hsmsideband_factory(filepath, mode, sam_qfs_path)[source]

Return appropiate binary io object with additional methods.

ORM for the sideband database.

class pacifica.archiveinterface.backends.oracle_hsm_sideband.orm.BaseModel(*args, **kwargs)[source]

Base class models inherit from.

Has Connection pieces.

DoesNotExist

alias of BaseModelDoesNotExist

_meta = <peewee.Metadata object>
_schema = <peewee.SchemaManager object>
classmethod database_close()[source]

Close the database connection.

classmethod database_connect()[source]

Make sure database is connected.

Dont reopen connection.

id = <AutoField: BaseModel.id>
reload()[source]

Reload my current state from the DB.

class pacifica.archiveinterface.backends.oracle_hsm_sideband.orm.SamArchive(*args, **kwargs)[source]

Model for sam_archive table in the sideband database.

DoesNotExist

alias of SamArchiveDoesNotExist

_meta = <peewee.Metadata object>
_schema = <peewee.SchemaManager object>
copy = <IntegerField: SamArchive.copy>
create_time = <IntegerField: SamArchive.create_time>
gen = <IntegerField: SamArchive.gen>
ino = <IntegerField: SamArchive.ino>
media_type = <CharField: SamArchive.media_type>
offset = <IntegerField: SamArchive.offset>
position = <BigIntegerField: SamArchive.position>
seq = <IntegerField: SamArchive.seq>
size = <BigIntegerField: SamArchive.size>
stale = <IntegerField: SamArchive.stale>
vsn = <CharField: SamArchive.vsn>
class pacifica.archiveinterface.backends.oracle_hsm_sideband.orm.SamFile(*args, **kwargs)[source]

Model for sam_file table in the sideband database.

DoesNotExist

alias of SamFileDoesNotExist

_meta = <peewee.Metadata object>
_schema = <peewee.SchemaManager object>
gen = <IntegerField: SamFile.gen>
ino = <IntegerField: SamFile.ino>
name = <CharField: SamFile.name>
name_hash = <IntegerField: SamFile.name_hash>
p_gen = <IntegerField: SamFile.p_gen>
p_ino = <IntegerField: SamFile.p_ino>
class pacifica.archiveinterface.backends.oracle_hsm_sideband.orm.SamInode(*args, **kwargs)[source]

Model for sam_inode table in the sideband database.

DoesNotExist

alias of SamInodeDoesNotExist

_meta = <peewee.Metadata object>
_schema = <peewee.SchemaManager object>
create_time = <IntegerField: SamInode.create_time>
csum = <CharField: SamInode.csum>
gen = <IntegerField: SamInode.gen>
gid = <IntegerField: SamInode.gid>
ino = <IntegerField: SamInode.ino>
modify_time = <IntegerField: SamInode.modify_time>
online = <IntegerField: SamInode.online>
size = <BigIntegerField: SamInode.size>
type = <IntegerField: SamInode.type>
uid = <IntegerField: SamInode.uid>
class pacifica.archiveinterface.backends.oracle_hsm_sideband.orm.SamPath(*args, **kwargs)[source]

Model for sam_path table in the sideband database.

DoesNotExist

alias of SamPathDoesNotExist

_meta = <peewee.Metadata object>
_schema = <peewee.SchemaManager object>
gen = <IntegerField: SamPath.gen>
ino = <IntegerField: SamPath.ino>
path = <CharField: SamPath.path>
class pacifica.archiveinterface.backends.oracle_hsm_sideband.orm.SamVersion(*args, **kwargs)[source]

Model for sam_version table in the sideband database.

DoesNotExist

alias of SamVersionDoesNotExist

_meta = <peewee.Metadata object>
_schema = <peewee.SchemaManager object>
id = <IntegerField: SamVersion.id>
version = <FloatField: SamVersion.version>

Module that implements the Abstract Status class.

For the oracle hsm sideband archive backend type.

class pacifica.archiveinterface.backends.oracle_hsm_sideband.status.HsmSidebandStatus(mtime, ctime, bytes_per_level, filesize)[source]

Class for handling hsmSideband status pieces.

Needs mtime,ctime, bytes per level array

__init__(mtime, ctime, bytes_per_level, filesize)[source]

Constructor to build the object.

_disk = 'disk'
_tape = 'tape'
define_levels()[source]

Set up what each level definition means.

find_file_storage_media()[source]

Get the file storage media.

Should always be disk for hsmSideband.

set_filepath(filepath)[source]

Set the filepath that the status is for.

Posix Python Module

Posix Module with ExtendedFile object.

Posix Backend Archive Module.

Module that implements the abstract_backend_archive class for a posix backend.

class pacifica.archiveinterface.backends.posix.archive.PosixBackendArchive(prefix)[source]

Posix Backend Archive Class.

Class that implements the abstract base class for the posix archive interface backend.

__init__(prefix)[source]

Constructor for Posix Backend Archive.

close()[source]

Close a posix file.

open(filepath, mode)[source]

Open a posix file.

patch(file_id, old_path)[source]

Move a posix file.

read(blocksize)[source]

Read a posix file.

remove()[source]

Remove the file permissions for a posix file.

set_file_permissions()[source]

Set the file permissions for a posix file.

set_mod_time(mod_time)[source]

Set the mod time on a posix file.

stage()[source]

Stage a posix file (no-opt essentially).

status()[source]

Get the status of a posix file.

write(buf)[source]

Write a posix file to the archive.

Extended File Object Module.

Module that Extends the functionality of the base file object

to import:

>>> import ExtendedFile
>>> from extendedfile import ExtendedFile
>>> ExtendedFile(path, mode)
pacifica.archiveinterface.backends.posix.extendedfile.extended_file_factory(filepath, mode)[source]

Return appropiate binary io object with additional methods.

Posix Status Module.

Module that implements the Abstract Status class for the posix archive backend type.

class pacifica.archiveinterface.backends.posix.status.PosixStatus(mtime, ctime, bytes_per_level, filesize)[source]

Posix Status Class.

Class for handling posix status pieces needs mtime,ctime, bytes per level array.

__init__(mtime, ctime, bytes_per_level, filesize)[source]

Constructor for posix status class.

_disk = 'disk'
define_levels()[source]

Set up what each level definition means.

find_file_storage_media()[source]

Get the file storage media. Showed always be disk for posix.

set_filepath(filepath)[source]

Set the filepath that the status is for.

Factory Python Module

Factory for returning a Archive backend.

New Backends must be added to the __share_classes list and that class needs to be imported in

Call the factory like the following: FACTORY = ArchiveBackendFactory() BACKEND = FACTORY.get_backend_archive(type, prefix)

class pacifica.archiveinterface.backends.factory.ArchiveBackendFactory[source]

Factory Class for Archive Backends.

get_backend_archive(name, prefix)[source]

Method for creating an instance of the backend archive.

load_backend_archive(name)[source]

Method for loading in the correct backend type.

Only want to load backend type being used.

share_classes = {}

Backends module contains archive interface backends specifics.

Configuration Python Module

Configuration reading and validation module.

pacifica.archiveinterface.config.get_config()[source]

Return the ConfigParser object with defaults set.

Exception Python Module

Module with the Archive Interface Error class.

exception pacifica.archiveinterface.exception.ArchiveInterfaceError[source]

ArchiveInterfaceError.

Basic exception class for this module. Will be used to throw exceptions up to the top level of the application

Globals Python Module

Global configuration options expressed in environment variables.

ID to Filename Python Module

Module to Convert an integer id to a filepath for storage system.

pacifica.archiveinterface.id2filename.id2dirandfilename(fileid)[source]

Algorithm for getting filepath from an integer id.

pacifica.archiveinterface.id2filename.id2filename(fileid)[source]

Will return the filepath associated to passed fileid.

Rest Python Module

Class for the archive interface.

Allows API to file interactions for passed in archive backends.

class pacifica.archiveinterface.rest_generator.ArchiveInterfaceGenerator(archive)[source]

Archive Interface Generator.

Defines the methods that can be used on files for request types.

DELETE(filepath)[source]

Delete a file from WSGI request.

Delete the file specified in the request to disk.

GET(*args)[source]

Get a file from WSGI request.

Gets a file specified in the request and writes back the data.

HEAD(filepath)[source]

Get the file status from WSGI request.

Gets the status of a file specified in the request.

PATCH(filepath)[source]

Move a file from the original path to the new one specified.

POST(filepath)[source]

Stage a file from WSGI request.

Stage the file specified in the request to disk.

PUT(filepath)[source]

Write a file from WSGI requests.

Writes a file passed in the request to the archive.

__init__(archive)[source]

Create an archive interface generator.

exposed = True
pacifica.archiveinterface.rest_generator.error_page_default(**kwargs)[source]

The default error page should always enforce json.

Utils Python Module

Group of utility functions.

Used in various parts of the archive interface.

pacifica.archiveinterface.archive_utils.bytes_type(unicode_obj)[source]

Convert the unicode object into bytes.

pacifica.archiveinterface.archive_utils.file_status(status, response)[source]

Response for when file is on the hpss system.

pacifica.archiveinterface.archive_utils.get_http_modified_time(env)[source]

Get the modified time from the request in unix timestamp.

Returns current time if no time was passed.

pacifica.archiveinterface.archive_utils.un_abs_path(path_name)[source]

Remove absolute path piece.

WSGI Python Module

Pacifica Archive Interface.

This is the main program that starts the WSGI server.

The core of the server code is in archive_interface.py.

Any new Backends added need to have the type argument extended to support the new Backend Archie type

Pacifica ArchiveInterface module.

Indices and tables