Welcome to Table Enforcer’s documentation!¶
Contents:
Table Enforcer¶
Demo Usage¶
Have a look at this Demo Notebook
Description¶
A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.
This is a very young attempt at solving a recurrent problem many people have. So far I have looked at multiple solutions, but none really did it for me.
I need to load, recode, and validate tables all day everyday. Sometimes its simple; you can pandas.read_table()
and all is good. But sometimes you have a 400 column long RedCap data dump that is complicated af and you need to develop your recoding logic through an iterative process.
This is an attempt to apply a sort of “test driven development” approach to data cleaning.
Basic Workflow¶
For each column that you care about in your source table:
Define a
Column
object that represents the ideal state of your data by passing a list of small, independent, reusable validator functions and some descriptive information.Use this object to validate the column data from your source table.
- It WILL fail.
Add small, composable, reusable recoding functions to the column object and iterate until your validations pass.
Define an
Enforcer
object by passing it a list of your column representation objects.This enforcer can be used to recode or validate recoded tables of the same kind as your source table wherever your applications use that type of data.
Note
Soon, I want to add more kinds of Column
objects that implement one-to-many and many-to-one recoding logic as sometimes a column tries to do too much and should really be multiple columns as well as the reverse.
Please take a look and offer thoughts/advice.
- Free software: MIT license
- Web site: https://github.com/xguse/table_enforcer
- Documentation: https://table-enforcer.readthedocs.io.
Features¶
Enforcer
andColumn
classes to define what columns should look like in a table.- Small but growing cadre of built-in validator functions and decorators.
- Decorators for use in defining parameterized validators like
between_4_and_60()
.- Declaration syntax for
Enforcer
is loosely based on SqlAlchemy’s Table pattern.
Credits¶
This package was created with Cookiecutter and the xguse/cookiecutter-pypackage project template which is based on audreyr/cookiecutter-pypackage.
Installation¶
From sources¶
The sources for Table Enforcer can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/xguse/table_enforcer
Or download the tarball:
$ curl -OL https://github.com/xguse/table_enforcer/tarball/master
Once you have a copy of the source, you can install it with the following steps:
- Navigate to the main repository directory.
- Activate whichever virtual environment that you want the package installed to.
- Run the following command.
$ pip install .
Usage¶
Have a look at this Demo Notebook
Source Code Documentation¶
table_enforcer package¶
Subpackages¶
table_enforcer.recode package¶
Submodules¶
table_enforcer.recode.funcs module¶
Provide builtin recoder functions for common use cases.
Like validators, recoders take a single pandas.Series object as input and return a pandas.Series of the same shape and indexes as the original series object. However, instead of returning a series of True/False values, it performs some operation on the data that gets the column data closer to being how you want it to look during analysis operations.
Module contents¶
table_enforcer.validate package¶
Submodules¶
table_enforcer.validate.decorators module¶
Provide decoration functions to augment the behavior of validator functions.
-
table_enforcer.validate.decorators.
bounded_length
(low, high=None)[source]¶ Test that the length of the data items fall within range: low <= x <= high.
If high is None, treat as exact length.
table_enforcer.validate.funcs module¶
Provide builtin validator functions for common use cases.
In general, validators take a single pandas.Series object as input and return a pandas.Series of the same shape and indexes containing True or False relative to which items passed the validation logic.
-
table_enforcer.validate.funcs.
negative
(series: pandas.core.series.Series) → pandas.core.series.Series[source]¶ Test that the data items are negative.
-
table_enforcer.validate.funcs.
not_null
(series: pandas.core.series.Series) → pandas.core.series.Series[source]¶ Return Series with True/False bools based on which items pass.
-
table_enforcer.validate.funcs.
positive
(series: pandas.core.series.Series) → pandas.core.series.Series[source]¶ Test that the data items are positive.
Module contents¶
Submodules¶
table_enforcer.errors module¶
Provide error classes.
-
exception
table_enforcer.errors.
NotImplementedYet
(msg=None)[source]¶ Bases:
NotImplementedError
,table_enforcer.errors.TableEnforcerError
Raise when a section of code that has been left for another time is asked to execute.
-
exception
table_enforcer.errors.
ValidationError
[source]¶ Bases:
table_enforcer.errors.TableEnforcerError
Raise when a validation/sanity check comes back with unexpected value.
table_enforcer.main_classes module¶
Main module.
-
class
table_enforcer.main_classes.
Enforcer
(columns)[source]¶ Bases:
object
Class to define table definitions.
-
make_validations
(table: pandas.core.frame.DataFrame) → munch.Munch[source]¶ Return a dict-like object containing dataframes of which tests passed/failed for each column.
-
-
class
table_enforcer.main_classes.
Column
(name: str, dtype: type, unique: bool, validators: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.frame.DataFrame]], recoders: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.series.Series]]) → None[source]¶ Bases:
object
Class representing a single table column.
-
__init__
(name: str, dtype: type, unique: bool, validators: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.frame.DataFrame]], recoders: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.series.Series]]) → None[source]¶ Construct a new Column object.
-
_dict_of_funcs
(funcs: list) → pandas.core.series.Series[source]¶ Return a pd.Series of functions with index derived from the function name.
-
_validate_series_dtype
(series: pandas.core.series.Series) → pandas.core.series.Series[source]¶ Validate that the series data is the correct dtype.
-
Module contents¶
Top-level package for Table Enforcer.
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/xguse/table_enforcer/issues.
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation¶
Table Enforcer could always use more documentation, whether as part of the official Table Enforcer docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/xguse/table_enforcer/issues.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up table_enforcer for local development.
Fork the table_enforcer repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/table_enforcer.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv table_enforcer $ cd table_enforcer/ $ python setup.py develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 table_enforcer tests $ python setup.py test or py.test $ tox
To get flake8 and tox, just pip install them into your virtualenv.
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests.
- If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
- The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/xguse/table_enforcer/pull_requests and make sure that the tests pass for all supported Python versions.
Credits¶
Development Lead¶
- Gus Dunn <w.gus.dunn@gmail.com>
Contributors¶
None yet. Why not be the first?
History¶
v0.1.5 / 2018-02-01¶
- Added tests for imports and more Class behavior
- main_classes: calling recode with validate is now prefered
v0.1.4 / 2018-01-26¶
- main_classes.py: removed faulty imports
v0.1.3 / 2018-01-26¶
- corrected Usage_Demo.ipynb
- formatting and typing
- table_enforcer.py -> main_classes.py
v0.1.2 / 2017-11-17¶
- flake8
- set up basic testing
- changed travis build settings
- updated usage demo and readme
v0.1.1 / 2017-11-16¶
- Added usage notebook link to docs.
- reorganized import strategy of Enforcer/Column objs
- added more builtin validators/recoders/decorators
- updated reqs
- initialized travis integration
- updated docs
- Added usage demo notebook for docs
- updated ignore patterns
- validators.py: renamed
v0.1.0 / 2017-11-15¶
- first minimally functional package
- Enforcer and Column classes defined and operational
- small cadre of built-in validator functions and decorators
- ignore jupyter stuff
- linter setups
v0.0.1 / 2017-11-14¶
- First commit