Welcome to arlib’s documentation!

Introduction

arlib: Common interface for archive manipulation (Tar, Zip, etc)

Build Status Documentation Status Coverage Status

**Table of Contents** - [Installation](#installation) - [Install from PyPI](#install-from-pypi) - [Install from Anaconda](#install-from-anaconda) - [Simple example](#simple-example) - [Open archive](#open-archive) - [List member names](#list-member-names) - [Open a member](#open-a-member) - [License](#license) - [Documentation](#documentation)

Rationale

Sometimes, we need to deal with different archive files. There are several packages/modules for archive file manipulation, e.g., zipfile for “.zip” files, *tarfile for “.tar.gz” or “.tar.bz2” files, etc. If we want to support different archive type in our project, probably we need to do the following:

if zipfile.is_zipfile(file):
    ar = zipfile.ZipFile(file)
    f = ar.open('member-name')
    # some processing
elif zipfile.is_tarfile(file):
    ar = tarfile.open(file)
    f = ar.extractfile('member-name')
    # some processing
else:
    # other stuffs

The problems of the above approach are:

  • We need repeat the above code everywhere we want to support different archive types.
  • Different archive manipulation modules (e.g. zipfile and tarfile) may have different API convention.

arlib is designed to solve the above problems. It abstracts the logic of archive manipulations and provides a single high level interface for users.

Installation

Install from PyPI

pip install arlib

Install from Anaconda

conda install -c liyugong arlib

Simple example

The abstract class arlib.Archive defines the common interface to handle different archive types, such as tar file, zip file or an directory. Three concrete classes arlib.TarArchive, arlib.ZipArchive and arlib.DirArchive implement the interface correspondingly.

Open archive

The simplest way to open an archive is using arlib.open function

ar = arlib.open('abc.tar.gz', 'r')

This will determine the type of the archive automatically and return a corresponding object of one of the three engine classes. If we don’t want the automatic engine determination mechanism, we can also specify the class via argument engine, e.g.

ar = arlib.open('abc.tar.gz', 'r', engine=ZipArchive)

or we can simply construct an object of the engine class

ar = arlib.ZipArchive('abc.tar.gz', 'r')

List member names

The property member_names will return a list of the names of members contained in the archive, e.g.,

print(ar.member_names)

Check member

Use the method member_is_dir and member_is_file to check whether a member is a directory or a regular file

ar = arlib.open('abc.tar', 'r')
ar.member_is_dir('member_name')
ar.member_is_file('member_name')

Open a member

Use the method open_member to open a member in the archive as a file object

with ar.open_member('a.txt', 'r') as f:
    # do sth by using f as an opened file object

Extract members to a location

Use the method extract to extract members to a specified location

ar = arlib.open('abc.tar', 'r')
ar.extract() # extract all the members to the current working directory
ar.extract(path='d:/hello', members=['abc.txt', 'dir1/'])

License

The arlib package is released under the MIT License

User’s Guide

Overview

arlib is designed using the bridge pattern . The abstract archive manipulation functionality, e.g. open the archive file, query member names, open a member, are defined in Archive, an abstract base class called “engine”. Core functionalities are defined by the corresponding abstract methods and properties:

The core functionalities are implemented in derived classes which we call them concrete engines. Other functionalities may be overridden by concrete engines but that’s not required. Currently, three concrete engines are implemented in the library:

Since Archive is a abc, which can not be instantiate, a function open() can be used as a factory to create concrete engines. The type of concrete engines are automatically determined by the archive file property and the mode argument to open the archive.

Automatically engine selection

Automatically engine selection is achived by auto_engine(), which will be called in the constructor of Archive. Users rarely need to call auto_engine() directly. Call the constructor of Archive will implicitely call auto_engine().

auto_engine() will call an ordered list of engine determination function (EDF) to decide the appropriate engine type. The signature of a EDF must be edf_func(path: path-like, mode: str), where path is the path to the archive, and mode is the mode string to open the archive. The function should return a concrete engine type if it can be determined, or return None otherwise.

The EDF list already contains several EDFs. Users can extend the list by registering new EDFs:

arlib.register_auto_engine(func)

A priority value can also be specified:

arlib.register_auto_engine(func, priority)

The value of priority define the ordering of the registered EDFs. The smaller the priority value, the higher the priority values. EDFs with higher priority will be called before EDFs with lower priority values. The default priority value is 50.

A third bool type argument prepend can also be specified for register_auto_engine(). When prepend is true, the EDF will be put before (i.e. higher priority) other registered EDFs with the same priority value. Otherwise, it will be put after them.

Since register_auto_engine() returns the input function object func, it can also be used as a non-parameterized decorator:

@arlib.register_auto_engine
def func(path, mode):
    # function definition

The function register_auto_engine() also support another version of calling signature arlib.register_auto_engine(priority, prepend), which will return a wrapped decorator with arguments. The typical usage is:

@arlib.register_auto_engine(priority=50, prepend=False)
def func(path, mode):
    # function definition

Obtain list of member names

The abstract property Archive.member_names will return a list of str, which represents the names of the members in the archive:

ar = arlib.Archive('a.zip', 'r')
members = ar.member_names

Concrete engines such as TarArchive and ZipArchive implement the property using the underlying zipfile and tarfile modules. Archive.member_names provides a uniform interface to corresponding underlying functions.

Check member properties

The methods Archive.member_is_dir() and Archive.member_is_file() whether the specified member is a directory or a regular file.

Open member as a file object

The abstract method Archive.open_member() provide a uniform interface for opening member file as a file object. The signature of the method is open_member(name, mode, **kwargs), where name is the name of member file, and mode is the mode argument the same as in the built-in open() function. kwargs are keyword arguments that will be passed to underlying methods in zipfile, tarfile etc.

Extract members to a location

The method Archive.extract() provide a uniform interface for extracting members to a location. Two optional arguments can be specified: path for the location of the destination, members for a list of members to extract.

with arlib.open('abc.tar') as ar:
    ar.extract('c:/', ['a.txt','dir2/'])

Context manager

The Archive class also defines the context manager functionality. Specifically, Archive.__enter__() returns the archive itself, and Archive.__exit__() calls self.close() then return True.

Extend the library

The architecture of the library is flexible enough to add more archive types. Adding a new archive type includes the following steps:

  1. Derive a new class and implement the core functionalities

    class AnotherArchive(Archive):
        def __init__(self, path, mode, **kwargs):
            # definition
    
        @property
        def member_names(self):
            # definition
    
        def open_member(self, name, mode='r', **kwargs):
            # definition
    
  2. (optional) override methods Archive.close(), Archive.__enter__(), Archive.__exit__() etc

  3. (optional) defined and register a new EDF which could automatically determine the new archive type

    @register_auto_engine
    def another_auto_engine(path, mode):
        # definition
    
  4. (optional) override methods Archive.extract(). The default implementation in Archive use shutil.copyfileobj copy corresponding members to the destination. Use the corresponding archive implementation may be more efficient.

API Reference

class arlib.Archive[source]

Common-interface to different type of archive files manipulation

Parameters:
  • path (path-like, file-like) – Path of the archive to read or write
  • mode (str) – The mode to open the member, same as in open(). Default to ‘r’.
  • engine (type) –

    Class object of a specific subclass Archive which implements the logic of processing a specific type of Archive. Provided implements:

    • ZipArchive: zip file archive using the zipfile module
    • TarArchive: tar file archive using the tarfile module
    • DirArchive: directory as an archive using the pathlib module
    • None: Automatically determine engines by file properties and mode
  • kwargs – Additional keyword arguments passed to the underlying engine constructor

Note

The constructor of a concrete engine should take at least one positional argument path and one optional argument mode with default value to r.

close()[source]

Release resources such as closing files etc

extract(path=None, members=None)[source]

Extract members to a location

Parameters:
  • path (path-like) – Location of the extracted files.
  • members (Seq[str]) – Members to extract, specified by a list of names.
member_is_dir(name)[source]

Check if a specific member is a directory

Parameters:name (str) – Member name.

Returns:

bool: True if the member is a directory, False otherwise.

member_is_file(name)[source]

Check if a specific member is a regular file

Parameters:name (str) – Member name.

Returns:

bool: True if the member is a regular file, False otherwise.

member_names

Get list of names of the members (i.e. files contained in the archive)

Returns:list of member names
Return type:list[str]
open_member(name, mode='r', **kwargs)[source]

Open a member file contained in the archive

Parameters:
  • name (str) – name of the member file to open
  • mode (str) – The mode to open the member, same as in open(). Default to ‘r’.
  • kwargs – Additional keyword arguments that will be passed to the underlying function.
Returns:

A opened file object associated with the member file

Return type:

file-like

class arlib.DirArchive(path, mode='r')[source]

Archive engine that treat a directory as an archive using pathlib module

extract(path=None, members=None)[source]

Extract members to a location

Parameters:
  • path (path-like) – Location of the extracted files.
  • members (Seq[str]) – Members to extract, specified by a list of names.
member_names

Get list of names of the members (i.e. files contained in the archive)

Returns:list of member names
Return type:list[str]
open_member(name, mode='r', **kwargs)[source]

Open a member in the directory

Parameters:
  • name (str) – Name of the member file
  • mode (str) – The mode argument to open. Same as in open().
  • kwargs – Additional keyword arguments that will be passed to open()
Returns:

The opened file object associated with the member file.

Return type:

file-like

class arlib.TarArchive(path, mode='r', **kwargs)[source]

Archive engine for tar files using the tarfile module

Parameters:
  • path (path-like) – Path to the archive
  • mode (str) – The mode to open the member, same as in open().
  • kwargs – Other keyword arguments that will be passed to the underlying function.
close()[source]

Release resources such as closing files etc

extract(path=None, members=None)[source]

Extract members to a location

Parameters:
  • path (path-like) – Location of the extracted files.
  • members (Seq[str]) – Members to extract, specified by a list of names.
member_names

Get list of names of the members (i.e. files contained in the archive)

Returns:list of member names
Return type:list[str]
open_member(name, mode='r')[source]

Open member file contained in the tar archive

Parameters:
  • name (str) – Name of the member to open
  • mode (str) – The mode argument to open. Same as in open().
Returns:

The opened file object associated with the member file.

Return type:

file-like

Note

Members of tar archive cannot be opened in write mode.

class arlib.ZipArchive(path, *args, **kwargs)[source]

Archive engine for zip files using the zipfile module

close()[source]

Release resources such as closing files etc

extract(path=None, members=None)[source]

Extract members to a location

Parameters:
  • path (path-like) – Location of the extracted files.
  • members (Seq[str]) – Members to extract, specified by a list of names.
member_names

Get list of names of the members (i.e. files contained in the archive)

Returns:list of member names
Return type:list[str]
open_member(name, mode='r', **kwargs)[source]

Open a member file in the zip archive

Parameters:
  • name (str) – Name of the member file
  • mode (str) – The mode argument to open. Same as in open().
  • kwargs – Additional keyword arguments that will be passed to zipfile.ZipFile.open()
Returns:

The opened file object associated with the member file.

Return type:

file-like

arlib.assert_is_archive(path, mode)[source]

Assert that path can be opened as a valid archive with mode

Parameters:
  • path (file-like, path-like) – Opened file object or path to the archive file.
  • mode (str) – Mode str to open the file. Default to “r”.

Examples

>>> assert_is_archive('a.tar.gz', 'w')
>>> assert_is_archive('a.txt', 'w')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: a.txt cannot be opened as a valid archive with w

See also

is_archive()

arlib.auto_engine(path, mode='r')[source]

Automatically determine engine type from file properties and file mode using the registered determining functions

Parameters:
  • path (file-like, path-like) – Opened file object or path to the archive file
  • mode (str) – Mode str to open the file. Default to “r”.
Returns:

a subclass of Archive if successfully find one

engine, otherwise None

Return type:

type, NoneType

See also

is_archive()

arlib.is_archive(path, mode='r')[source]

Determine if the file specified by path is a valid archive when opened with mode

Basically, the function checks the result of auto_engien(), and return True if the result is not None, and return False otherwise.

Parameters:
  • path (file-like, path-like) – Opened file object or path to the archive file.
  • mode (str) – Mode str to open the file. Default to “r”.
Returns:

True if the path is valid archive, False

Return type:

bool

otherwise.

Examples

>>> is_archive('a.tar.gz', 'w')
True
>>> is_archive('a.tar.bz2', 'w')
True
>>> is_archive('a.txt', 'w')
False

See also

auto_engine()

arlib.open(path, mode='r', engine=None, *args, **kwargs)[source]

Open an archive file

Parameters:
  • path (path-like, file-like) – Path of the archive to read or write
  • mode (str) – The mode to open the member, same as in open(). Default to ‘r’.
  • engine (type) –

    Class object of a specific subclass Archive which implements the logic of processing a specific type of Archive. Provided implements:

    • ZipArchive: zip file archive using the zipfile module
    • TarArchive: tar file archive using the tarfile module
    • DirArchive: directory as an archive using the pathlib module
    • None: Automatically determine engines by file properties and mode
  • kwargs – Additional keyword arguments passed to the underlying engine constructor
arlib.register_auto_engine(func, priority=50, prepend=False)[source]

Register automatic engine determing function

Two possible signatures:

  • register_auto_engine(func, priority=50, prepend=False)
  • register_auto-engine(priority=50, prepend=False)

The first one can be used as a regular function as well as a decorator. The second one is a decorator with arguments

Parameters:
  • func (callable) – A callable which determines archive engine from file properties and open mode. The signature should be: func(path, mode) where path is a file-like or path-like object, and mode str to open the file.
  • priority (int, float) – Priority of the func, small number means higher priority. When multiple functions are registered by multiple call of register_auto_engine, functions will be used in an ordering determined by thier priortities. Default to 50.
  • prepend (bool) – If there is already a function with the same priority registered, insert to the left (before) or right (after) of it. Default to False.
Returns:

The first version of signature will return the input callable func, therefore it can be used as a decorator (without arguments). The second version will return a decorator wrap.

ChangeLog

0.0.4

0.0.3

  • Support tar, zip files and folder
  • Automatic archive type deduction
  • Support member names listing
  • Support opening members as file streams

Indices and tables