Welcome to compfile’s documentation!

Introduction

compfile Common interfaces for manipulating compressed files (lzma, gzip etc)

Build Status Documentation Status Coverage Status

Rationale

Sometimes, we need to deal with different compressed files. There are several packages/modules for compressed file manipulation, e.g., gzip module for “.gz” files, *lzma module for “.lzma” and “.xz” files, etc. If we want to support different types of compressed file in our project, probably we need to do the following:

if fnmatch.fnmatch(fname, "*.gz"):
    f = gzip.open(fname, 'rb')
    # do something with f
elif fnmatch.fnmatch(fname, "*.bz2'):
    f = bz2.open(fname, 'rb')
    # do something with f
else:
    # other stuffs

The problems of the above approch are:

  • We need to repeat the compression type inference logic everywhere we want to support different compression types.
  • Different compression type manipulation modules may have different API convention.

compfile is designed to solve the above problems. It abstracts the logic of compressed file manipulations and provides a single high level interface for users.

Installation

Install from PyPI

pip install compfile

Install from Anaconda

conda install -c liyugong compfile

Install from GitHub

pip install git+https://github.com/gongliyu/compfile.git@master

Simple example

Using compfile is pretty simple. Just construct a compfile.CompFile object or call compfile.open

with compfile.open(fname, 'r') as f:
    # do something with f

The object returned is a file object, so we can do ordinary file processing with it.

License

The compfile package is released under the MIT License

User’s Guide

Automatically engine selection

Automatically engine selection is achived by auto_engine(), which will be called in the function compfile.open(). Users rarely need to call auto_engine() directly.

auto_engine() will call an ordered list of engine determination function (EDF) to decide the appropriate engine type. The signature of a EDF must be edf_func(path: path-like), where path is the path to the archive. The function should return a callable object if it can be determined, or return None otherwise. The actually engine (i.e. the callable object returned by EDF can open a specific type of compressed file. Typical engines are bz2.open(), gzip.open() etc.

The EDF list already contains several EDFs. Users can extend the list by registering new EDFs:

arlib.register_auto_engine(func)

A priority value can also be specified:

arlib.register_auto_engine(func, priority)

The value of priority define the ordering of the registered EDFs. The smaller the priority value, the higher the priority values. EDFs with higher priority will be called before EDFs with lower priority values. The default priority value is 50.

A third bool type argument prepend can also be specified for register_auto_engine(). When prepend is true, the EDF will be put before (i.e. higher priority) other registered EDFs with the same priority value. Otherwise, it will be put after them.

register_auto_engine() can also be used as decorators

@compfile.register_auto_engine
def func(path):
    # function definition


@compfile.register_auto_engine(priority=50, prepend=False)
def func2(path):
    # function definition

Current implementation

Currently, the following engines are registered:

Extend the library

The architecture of the library is flexible enough to add more compressed file types. Adding a new compressed file type simply involves registering and EDF:

def open_abc(fpath, mode):
    # open the *.abc compressed file
    return uncompressed_file_file

@register_auto_engine
def edf_abc(fpath):
    if abc.endswith('.abc'):
        return open_abc
    else:
        return None

API Reference

compfile.auto_engine(path)[source]

Automatically determine engine type from file properties and file mode using the registered determining functions

Parameters:path (path-like) – Path to the compressed file
Returns:
a subclass of CompFile if successfully find one
engine, otherwise None
Return type:type, NoneType
compfile.is_compressed_file(path)[source]

Infer if the file is a compressed file from file name (path-like)

Parameters:path (path-like) – Path to the file.
Returns:Whether the file is a compressed file.
Return type:bool

Example

>>> is_compressed_file('a.txt.bz2')
True
>>> is_compressed_file('a.txt.gz')
True
>>> is_compressed_file('a.txt')
False
compfile.open(fpath, mode, *args, **kwargs)[source]

Open a compressed file as an uncompressed file stream

Parameters:
  • fpath (str) – Path to the compressed file.
  • mode (str) – Mode arguments used to open the file. Same as open().
Returns:

An uncompressed file stream

Return type:

file-object

Note

We follow the convention of built-in function open() for the argument mode rather than the conventions of underlying module such as bz2. That’s to say, we treat “r” as “rt” rather than “rb”.

compfile.register_auto_engine(func, priority=50, prepend=False)[source]

Register automatic engine determing function

Two possible signatures:

  • register_auto_engine(func, priority=50, prepend=False)
  • register_auto-engine(priority=50, prepend=False)

The first one can be used as a regular function as well as a decorator. The second one is a decorator with arguments

Parameters:
  • func (callable) – A callable which determines archive engine from file properties and open mode. The signature should be: func(path, mode) where path is a file-like or path-like object, and mode str to open the file.
  • priority (int, float) – Priority of the func, small number means higher priority. When multiple functions are registered by multiple call of register_auto_engine, functions will be used in an ordering determined by thier priortities. Default to 50.
  • prepend (bool) – If there is already a function with the same priority registered, insert to the left (before) or right (after) of it. Default to False.
Returns:

The first version of signature will return the input callable func, therefore it can be used as a decorator (without arguments). The second version will return a decorator wrap.

ChangeLog

0.0.2.1

  • Fix requirements.txt

0.0.2

0.0.1

  • Support gz, xz, lzma, bz2 files.
  • Automatic compression type deduction.
  • Support open file as uncompressed streams (text and binary mode).

Indices and tables