Welcome to compfile’s documentation!¶
Introduction¶
compfile Common interfaces for manipulating compressed files (lzma, gzip etc)
Rationale¶
Sometimes, we need to deal with different compressed files. There are several packages/modules for compressed file manipulation, e.g., gzip module for “.gz” files, *lzma module for “.lzma” and “.xz” files, etc. If we want to support different types of compressed file in our project, probably we need to do the following:
if fnmatch.fnmatch(fname, "*.gz"):
f = gzip.open(fname, 'rb')
# do something with f
elif fnmatch.fnmatch(fname, "*.bz2'):
f = bz2.open(fname, 'rb')
# do something with f
else:
# other stuffs
The problems of the above approch are:
- We need to repeat the compression type inference logic everywhere we want to support different compression types.
- Different compression type manipulation modules may have different API convention.
compfile is designed to solve the above problems. It abstracts the logic of compressed file manipulations and provides a single high level interface for users.
Installation¶
Install from PyPI¶
pip install compfile
Install from Anaconda¶
conda install -c liyugong compfile
Install from GitHub¶
pip install git+https://github.com/gongliyu/compfile.git@master
Simple example¶
Using compfile is pretty simple. Just construct a compfile.CompFile object or call compfile.open
with compfile.open(fname, 'r') as f:
# do something with f
The object returned is a file object, so we can do ordinary file processing with it.
License¶
The compfile package is released under the MIT License
Documentation¶
User’s Guide¶
Automatically engine selection¶
Automatically engine selection is achived by auto_engine()
,
which will be called in the function compfile.open()
. Users
rarely need to call auto_engine()
directly.
auto_engine()
will call an ordered list of engine determination
function (EDF) to decide the appropriate engine type. The signature of
a EDF must be edf_func(path: path-like)
, where path
is
the path to the archive. The function should return a callable object
if it can be determined, or return None
otherwise. The
actually engine (i.e. the callable object returned by EDF can open a
specific type of compressed file. Typical engines are
bz2.open()
, gzip.open()
etc.
The EDF list already contains several EDFs. Users can extend the list by registering new EDFs:
arlib.register_auto_engine(func)
A priority value can also be specified:
arlib.register_auto_engine(func, priority)
The value of priority
define the ordering of the registered
EDFs. The smaller the priority
value, the higher the priority
values. EDFs with higher priority will be called before EDFs with
lower priority values. The default priority value is 50.
A third bool type argument prepend
can also be specified for
register_auto_engine()
. When prepend
is true, the EDF will
be put before (i.e. higher priority) other registered EDFs with the
same priority value. Otherwise, it will be put after them.
register_auto_engine()
can also be used as decorators
@compfile.register_auto_engine
def func(path):
# function definition
@compfile.register_auto_engine(priority=50, prepend=False)
def func2(path):
# function definition
Current implementation¶
Currently, the following engines are registered:
bz2.open()
gzip.open()
lzma.open()
for Python3 and for Python2 with lzma installed
Extend the library¶
The architecture of the library is flexible enough to add more compressed file types. Adding a new compressed file type simply involves registering and EDF:
def open_abc(fpath, mode):
# open the *.abc compressed file
return uncompressed_file_file
@register_auto_engine
def edf_abc(fpath):
if abc.endswith('.abc'):
return open_abc
else:
return None
API Reference¶
-
compfile.
auto_engine
(path)[source]¶ Automatically determine engine type from file properties and file mode using the registered determining functions
Parameters: path (path-like) – Path to the compressed file Returns: - a subclass of CompFile if successfully find one
- engine, otherwise None
Return type: type, NoneType
-
compfile.
is_compressed_file
(path)[source]¶ Infer if the file is a compressed file from file name (path-like)
Parameters: path (path-like) – Path to the file. Returns: Whether the file is a compressed file. Return type: bool Example
>>> is_compressed_file('a.txt.bz2') True >>> is_compressed_file('a.txt.gz') True >>> is_compressed_file('a.txt') False
-
compfile.
open
(fpath, mode, *args, **kwargs)[source]¶ Open a compressed file as an uncompressed file stream
Parameters: Returns: An uncompressed file stream
Return type: file-object
-
compfile.
register_auto_engine
(func, priority=50, prepend=False)[source]¶ Register automatic engine determing function
Two possible signatures:
register_auto_engine(func, priority=50, prepend=False)
register_auto-engine(priority=50, prepend=False)
The first one can be used as a regular function as well as a decorator. The second one is a decorator with arguments
Parameters: - func (callable) – A callable which determines archive engine from file properties and open mode. The signature should be: func(path, mode) where path is a file-like or path-like object, and mode str to open the file.
- priority (int, float) – Priority of the func, small number means higher priority. When multiple functions are registered by multiple call of register_auto_engine, functions will be used in an ordering determined by thier priortities. Default to 50.
- prepend (bool) – If there is already a function with the same priority registered, insert to the left (before) or right (after) of it. Default to False.
Returns: The first version of signature will return the input callable
func
, therefore it can be used as a decorator (without arguments). The second version will return a decorator wrap.
ChangeLog¶
0.0.2.1¶
- Fix requirements.txt
0.0.2¶
- Redesign: remove classes, keep
compfile.open()
as the public API (Issue #3, PR #4). - Add support for uncompressed file (will open use builtin
open()
), (PR #8).
0.0.1¶
- Support gz, xz, lzma, bz2 files.
- Automatic compression type deduction.
- Support open file as uncompressed streams (text and binary mode).