Welcome to arlib’s documentation!¶
Introduction¶
arlib: Common interface for archive manipulation (Tar, Zip, etc)
**Table of Contents** - [Installation](#installation) - [Install from PyPI](#install-from-pypi) - [Install from Anaconda](#install-from-anaconda) - [Simple example](#simple-example) - [Open archive](#open-archive) - [List member names](#list-member-names) - [Open a member](#open-a-member) - [License](#license) - [Documentation](#documentation)
Rationale¶
Sometimes, we need to deal with different archive files. There are several packages/modules for archive file manipulation, e.g., zipfile for “.zip” files, *tarfile for “.tar.gz” or “.tar.bz2” files, etc. If we want to support different archive type in our project, probably we need to do the following:
if zipfile.is_zipfile(file):
ar = zipfile.ZipFile(file)
f = ar.open('member-name')
# some processing
elif zipfile.is_tarfile(file):
ar = tarfile.open(file)
f = ar.extractfile('member-name')
# some processing
else:
# other stuffs
The problems of the above approach are:
- We need repeat the above code everywhere we want to support different archive types.
- Different archive manipulation modules (e.g. zipfile and tarfile) may have different API convention.
arlib is designed to solve the above problems. It abstracts the logic of archive manipulations and provides a single high level interface for users.
Installation¶
Install from PyPI¶
pip install arlib
Install from Anaconda¶
conda install -c liyugong arlib
Simple example¶
The abstract class arlib.Archive defines the common interface to handle different archive types, such as tar file, zip file or an directory. Three concrete classes arlib.TarArchive, arlib.ZipArchive and arlib.DirArchive implement the interface correspondingly.
Open archive¶
The simplest way to open an archive is using arlib.open function
ar = arlib.open('abc.tar.gz', 'r')
This will determine the type of the archive automatically and return a corresponding object of one of the three engine classes. If we don’t want the automatic engine determination mechanism, we can also specify the class via argument engine, e.g.
ar = arlib.open('abc.tar.gz', 'r', engine=ZipArchive)
or we can simply construct an object of the engine class
ar = arlib.ZipArchive('abc.tar.gz', 'r')
List member names¶
The property member_names will return a list of the names of members contained in the archive, e.g.,
print(ar.member_names)
Check member¶
Use the method member_is_dir and member_is_file to check whether a member is a directory or a regular file
ar = arlib.open('abc.tar', 'r')
ar.member_is_dir('member_name')
ar.member_is_file('member_name')
Open a member¶
Use the method open_member to open a member in the archive as a file object
with ar.open_member('a.txt', 'r') as f:
# do sth by using f as an opened file object
Extract members to a location¶
Use the method extract to extract members to a specified location
ar = arlib.open('abc.tar', 'r')
ar.extract() # extract all the members to the current working directory
ar.extract(path='d:/hello', members=['abc.txt', 'dir1/'])
License¶
The arlib package is released under the MIT License
Documentation¶
User’s Guide¶
Overview¶
arlib is designed using the bridge pattern . The abstract archive manipulation functionality, e.g. open the archive file, query member names, open a member, are defined in Archive, an abstract base class called “engine”. Core functionalities are defined by the corresponding abstract methods and properties:
Archive.member_names
: Return a list of names of member files in the archiveArchive.open_member()
: Open a member as a file object
The core functionalities are implemented in derived classes which we call them concrete engines. Other functionalities may be overridden by concrete engines but that’s not required. Currently, three concrete engines are implemented in the library:
TarArchive
: Manipulates tar filesZipArchive
: Manipulates zip filesDirArchive
: Treat a directory as an archive and files inside as members
Since Archive
is a abc, which can not be instantiate, a
function open()
can be used as a factory to create concrete
engines. The type of concrete engines are automatically determined by
the archive file property and the mode argument to open the archive.
Automatically engine selection¶
Automatically engine selection is achived by auto_engine()
,
which will be called in the constructor of Archive
. Users
rarely need to call auto_engine()
directly. Call the constructor
of Archive
will implicitely call auto_engine()
.
auto_engine()
will call an ordered list of engine determination
function (EDF) to decide the appropriate engine type. The signature of
a EDF must be edf_func(path: path-like, mode: str)
, where
path
is the path to the archive, and mode
is the mode
string to open the archive. The function should return a concrete
engine type if it can be determined, or return None
otherwise.
The EDF list already contains several EDFs. Users can extend the list by registering new EDFs:
arlib.register_auto_engine(func)
A priority value can also be specified:
arlib.register_auto_engine(func, priority)
The value of priority
define the ordering of the registered
EDFs. The smaller the priority
value, the higher the priority
values. EDFs with higher priority will be called before EDFs with
lower priority values. The default priority value is 50.
A third bool type argument prepend
can also be specified for
register_auto_engine()
. When prepend
is true, the EDF will
be put before (i.e. higher priority) other registered EDFs with the
same priority value. Otherwise, it will be put after them.
Since register_auto_engine()
returns the input function object
func
, it can also be used as a non-parameterized decorator:
@arlib.register_auto_engine
def func(path, mode):
# function definition
The function register_auto_engine()
also support another version of calling signature arlib.register_auto_engine(priority, prepend)
, which will return a wrapped decorator with arguments. The typical usage is:
@arlib.register_auto_engine(priority=50, prepend=False)
def func(path, mode):
# function definition
Obtain list of member names¶
The abstract property Archive.member_names
will return a list
of str
, which represents the names of the members in the
archive:
ar = arlib.Archive('a.zip', 'r')
members = ar.member_names
Concrete engines such as TarArchive
and ZipArchive
implement the property using the underlying zipfile
and
tarfile
modules. Archive.member_names
provides a
uniform interface to corresponding underlying functions.
Check member properties¶
The methods Archive.member_is_dir()
and
Archive.member_is_file()
whether the specified member is a
directory or a regular file.
Open member as a file object¶
The abstract method Archive.open_member()
provide a uniform
interface for opening member file as a file object. The signature of
the method is open_member(name, mode, **kwargs)
, where
name
is the name of member file, and mode
is the mode
argument the same as in the built-in open()
function. kwargs
are keyword arguments that will be passed to
underlying methods in zipfile
, tarfile
etc.
Extract members to a location¶
The method Archive.extract()
provide a uniform interface for
extracting members to a location. Two optional arguments can be
specified: path
for the location of the destination,
members
for a list of members to extract.
with arlib.open('abc.tar') as ar:
ar.extract('c:/', ['a.txt','dir2/'])
Context manager¶
The Archive
class also defines the context manager
functionality. Specifically, Archive.__enter__()
returns the
archive itself, and Archive.__exit__()
calls
self.close()
then return True
.
Extend the library¶
The architecture of the library is flexible enough to add more archive types. Adding a new archive type includes the following steps:
Derive a new class and implement the core functionalities
class AnotherArchive(Archive): def __init__(self, path, mode, **kwargs): # definition @property def member_names(self): # definition def open_member(self, name, mode='r', **kwargs): # definition
(optional) override methods
Archive.close()
,Archive.__enter__()
,Archive.__exit__()
etc(optional) defined and register a new EDF which could automatically determine the new archive type
@register_auto_engine def another_auto_engine(path, mode): # definition
(optional) override methods
Archive.extract()
. The default implementation inArchive
use shutil.copyfileobj copy corresponding members to the destination. Use the corresponding archive implementation may be more efficient.
API Reference¶
-
class
arlib.
Archive
[source]¶ Common-interface to different type of archive files manipulation
Parameters: - path (path-like, file-like) – Path of the archive to read or write
- mode (str) – The mode to open the member, same as in
open()
. Default to ‘r’. - engine (type) –
Class object of a specific subclass Archive which implements the logic of processing a specific type of Archive. Provided implements:
- ZipArchive: zip file archive using the zipfile module
- TarArchive: tar file archive using the tarfile module
- DirArchive: directory as an archive using the pathlib module
- None: Automatically determine engines by file properties and mode
- kwargs – Additional keyword arguments passed to the underlying engine constructor
Note
The constructor of a concrete engine should take at least one positional argument path and one optional argument mode with default value to r.
-
extract
(path=None, members=None)[source]¶ Extract members to a location
Parameters: - path (path-like) – Location of the extracted files.
- members (Seq[str]) – Members to extract, specified by a list of names.
-
member_is_dir
(name)[source]¶ Check if a specific member is a directory
Parameters: name (str) – Member name. Returns:
bool: True if the member is a directory, False otherwise.
-
member_is_file
(name)[source]¶ Check if a specific member is a regular file
Parameters: name (str) – Member name. Returns:
bool: True if the member is a regular file, False otherwise.
-
member_names
¶ Get list of names of the members (i.e. files contained in the archive)
Returns: list of member names Return type: list[str]
-
class
arlib.
DirArchive
(path, mode='r')[source]¶ Archive engine that treat a directory as an archive using pathlib module
-
extract
(path=None, members=None)[source]¶ Extract members to a location
Parameters: - path (path-like) – Location of the extracted files.
- members (Seq[str]) – Members to extract, specified by a list of names.
-
member_names
¶ Get list of names of the members (i.e. files contained in the archive)
Returns: list of member names Return type: list[str]
-
-
class
arlib.
TarArchive
(path, mode='r', **kwargs)[source]¶ Archive engine for tar files using the tarfile module
Parameters: -
extract
(path=None, members=None)[source]¶ Extract members to a location
Parameters: - path (path-like) – Location of the extracted files.
- members (Seq[str]) – Members to extract, specified by a list of names.
-
member_names
¶ Get list of names of the members (i.e. files contained in the archive)
Returns: list of member names Return type: list[str]
-
-
class
arlib.
ZipArchive
(path, *args, **kwargs)[source]¶ Archive engine for zip files using the zipfile module
-
extract
(path=None, members=None)[source]¶ Extract members to a location
Parameters: - path (path-like) – Location of the extracted files.
- members (Seq[str]) – Members to extract, specified by a list of names.
-
member_names
¶ Get list of names of the members (i.e. files contained in the archive)
Returns: list of member names Return type: list[str]
-
-
arlib.
assert_is_archive
(path, mode)[source]¶ Assert that
path
can be opened as a valid archive withmode
Parameters: - path (file-like, path-like) – Opened file object or path to the archive file.
- mode (str) – Mode str to open the file. Default to “r”.
Examples
>>> assert_is_archive('a.tar.gz', 'w') >>> assert_is_archive('a.txt', 'w') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: a.txt cannot be opened as a valid archive with w
See also
-
arlib.
auto_engine
(path, mode='r')[source]¶ Automatically determine engine type from file properties and file mode using the registered determining functions
Parameters: - path (file-like, path-like) – Opened file object or path to the archive file
- mode (str) – Mode str to open the file. Default to “r”.
Returns: - a subclass of Archive if successfully find one
engine, otherwise None
Return type: type, NoneType
See also
-
arlib.
is_archive
(path, mode='r')[source]¶ Determine if the file specified by
path
is a valid archive when opened withmode
Basically, the function checks the result of
auto_engien()
, and returnTrue
if the result is not None, and returnFalse
otherwise.Parameters: - path (file-like, path-like) – Opened file object or path to the archive file.
- mode (str) – Mode str to open the file. Default to “r”.
Returns: True
if the path is valid archive,False
Return type: otherwise.
Examples
>>> is_archive('a.tar.gz', 'w') True >>> is_archive('a.tar.bz2', 'w') True >>> is_archive('a.txt', 'w') False
See also
-
arlib.
open
(path, mode='r', engine=None, *args, **kwargs)[source]¶ Open an archive file
Parameters: - path (path-like, file-like) – Path of the archive to read or write
- mode (str) – The mode to open the member, same as in
open()
. Default to ‘r’. - engine (type) –
Class object of a specific subclass Archive which implements the logic of processing a specific type of Archive. Provided implements:
- ZipArchive: zip file archive using the zipfile module
- TarArchive: tar file archive using the tarfile module
- DirArchive: directory as an archive using the pathlib module
- None: Automatically determine engines by file properties and mode
- kwargs – Additional keyword arguments passed to the underlying engine constructor
-
arlib.
register_auto_engine
(func, priority=50, prepend=False)[source]¶ Register automatic engine determing function
Two possible signatures:
register_auto_engine(func, priority=50, prepend=False)
register_auto-engine(priority=50, prepend=False)
The first one can be used as a regular function as well as a decorator. The second one is a decorator with arguments
Parameters: - func (callable) – A callable which determines archive engine from file properties and open mode. The signature should be: func(path, mode) where path is a file-like or path-like object, and mode str to open the file.
- priority (int, float) – Priority of the func, small number means higher priority. When multiple functions are registered by multiple call of register_auto_engine, functions will be used in an ordering determined by thier priortities. Default to 50.
- prepend (bool) – If there is already a function with the same priority registered, insert to the left (before) or right (after) of it. Default to False.
Returns: The first version of signature will return the input callable
func
, therefore it can be used as a decorator (without arguments). The second version will return a decorator wrap.
ChangeLog¶
0.0.4¶
- Add
arlib.open()
as a shortcut ofArchive
constructor (Issue #1, PR #2). - Add
is_archive()
to determine if a file could be opened as a valid archive (Issue #3, PR #4). - Add
assert_is_archive()
(PR #5). - Reimplement auto_engine mechanism using decoutils package
- Add functionality to check whether a member is a directory or regular file (PR #9).
- Add functionality to extract members (PR #10).
0.0.3¶
- Support tar, zip files and folder
- Automatic archive type deduction
- Support member names listing
- Support opening members as file streams