Welcome to Table Enforcer’s documentation!

Contents:

Table Enforcer

https://img.shields.io/pypi/v/table_enforcer.svg https://img.shields.io/travis/xguse/table_enforcer.svg Documentation Status

Demo Usage

Have a look at this Demo Notebook

Description

A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.

This is a very young attempt at solving a recurrent problem many people have. So far I have looked at multiple solutions, but none really did it for me.

I need to load, recode, and validate tables all day everyday. Sometimes its simple; you can pandas.read_table() and all is good. But sometimes you have a 400 column long RedCap data dump that is complicated af and you need to develop your recoding logic through an iterative process.

This is an attempt to apply a sort of “test driven development” approach to data cleaning.

Basic Workflow

  1. For each column that you care about in your source table:

    1. Define a Column object that represents the ideal state of your data by passing a list of small, independent, reusable validator functions and some descriptive information.

    2. Use this object to validate the column data from your source table.

      • It WILL fail.
    3. Add small, composable, reusable recoding functions to the column object and iterate until your validations pass.

  2. Define an Enforcer object by passing it a list of your column representation objects.

  3. This enforcer can be used to recode or validate recoded tables of the same kind as your source table wherever your applications use that type of data.

Note

Soon, I want to add more kinds of Column objects that implement one-to-many and many-to-one recoding logic as sometimes a column tries to do too much and should really be multiple columns as well as the reverse.

Please take a look and offer thoughts/advice.

Features

  • Enforcer and Column classes to define what columns should look like in a table.
  • Small but growing cadre of built-in validator functions and decorators.
  • Decorators for use in defining parameterized validators like between_4_and_60().
  • Declaration syntax for Enforcer is loosely based on SqlAlchemy’s Table pattern.

Credits

This package was created with Cookiecutter and the xguse/cookiecutter-pypackage project template which is based on audreyr/cookiecutter-pypackage.

Installation

From sources

The sources for Table Enforcer can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/xguse/table_enforcer

Or download the tarball:

$ curl  -OL https://github.com/xguse/table_enforcer/tarball/master

Once you have a copy of the source, you can install it with the following steps:

  1. Navigate to the main repository directory.
  2. Activate whichever virtual environment that you want the package installed to.
  3. Run the following command.
$ pip install .

Usage

Have a look at this Demo Notebook

Source Code Documentation

table_enforcer package

Subpackages

table_enforcer.recode package
Submodules
table_enforcer.recode.funcs module

Provide builtin recoder functions for common use cases.

Like validators, recoders take a single pandas.Series object as input and return a pandas.Series of the same shape and indexes as the original series object. However, instead of returning a series of True/False values, it performs some operation on the data that gets the column data closer to being how you want it to look during analysis operations.

table_enforcer.recode.funcs.lower(series)[source]

Transform all text to lowercase.

table_enforcer.recode.funcs.upper(series)[source]

Transform all text to uppercase.

Module contents
table_enforcer.validate package
Submodules
table_enforcer.validate.decorators module

Provide decoration functions to augment the behavior of validator functions.

table_enforcer.validate.decorators.bounded_length(low, high=None)[source]

Test that the length of the data items fall within range: low <= x <= high.

If high is None, treat as exact length.

table_enforcer.validate.decorators.choice(choices)[source]

Test that the data items are members of the set choices.

table_enforcer.validate.decorators.minmax(low, high)[source]

Test that the data items fall within range: low <= x <= high.

table_enforcer.validate.funcs module

Provide builtin validator functions for common use cases.

In general, validators take a single pandas.Series object as input and return a pandas.Series of the same shape and indexes containing True or False relative to which items passed the validation logic.

table_enforcer.validate.funcs.lower(series)[source]

Test that the data items are all lowercase.

table_enforcer.validate.funcs.negative(series: pandas.core.series.Series) → pandas.core.series.Series[source]

Test that the data items are negative.

table_enforcer.validate.funcs.not_null(series: pandas.core.series.Series) → pandas.core.series.Series[source]

Return Series with True/False bools based on which items pass.

table_enforcer.validate.funcs.positive(series: pandas.core.series.Series) → pandas.core.series.Series[source]

Test that the data items are positive.

table_enforcer.validate.funcs.unique(series: pandas.core.series.Series) → pandas.core.series.Series[source]

Test that the data items do not repeat.

table_enforcer.validate.funcs.upper(series)[source]

Test that the data items are all uppercase.

Module contents

Submodules

table_enforcer.errors module

Provide error classes.

exception table_enforcer.errors.NotImplementedYet(msg=None)[source]

Bases: NotImplementedError, table_enforcer.errors.TableEnforcerError

Raise when a section of code that has been left for another time is asked to execute.

__init__(msg=None)[source]

Set up the Exception.

exception table_enforcer.errors.TableEnforcerError[source]

Bases: Exception

Base error class.

exception table_enforcer.errors.ValidationError[source]

Bases: table_enforcer.errors.TableEnforcerError

Raise when a validation/sanity check comes back with unexpected value.

table_enforcer.main_classes module

Main module.

class table_enforcer.main_classes.Enforcer(columns)[source]

Bases: object

Class to define table definitions.

__init__(columns)[source]

Initialize an enforcer instance.

make_validations(table: pandas.core.frame.DataFrame) → munch.Munch[source]

Return a dict-like object containing dataframes of which tests passed/failed for each column.

recode(table: pandas.core.frame.DataFrame, validate=False) → pandas.core.frame.DataFrame[source]

Return a fully recoded dataframe.

If validate: raise ValidationError if validation fails.

validate(table: pandas.core.frame.DataFrame) → bool[source]

Return True if all validation tests pass: False otherwise.

class table_enforcer.main_classes.Column(name: str, dtype: type, unique: bool, validators: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.frame.DataFrame]], recoders: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.series.Series]]) → None[source]

Bases: object

Class representing a single table column.

__init__(name: str, dtype: type, unique: bool, validators: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.frame.DataFrame]], recoders: typing.List[typing.Callable[[pandas.core.series.Series], pandas.core.series.Series]]) → None[source]

Construct a new Column object.

_dict_of_funcs(funcs: list) → pandas.core.series.Series[source]

Return a pd.Series of functions with index derived from the function name.

_validate_series_dtype(series: pandas.core.series.Series) → pandas.core.series.Series[source]

Validate that the series data is the correct dtype.

recode(table: pandas.core.frame.DataFrame, validate=False) → pandas.core.series.Series[source]

Pass the appropriate column data in table through each recoder function in series and return the final result.

If validate: raise ValidationError if validation fails.

validate(table: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Return a dataframe of validation results for the correct column in table vs the vector of validators.

Module contents

Top-level package for Table Enforcer.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/xguse/table_enforcer/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

Table Enforcer could always use more documentation, whether as part of the official Table Enforcer docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/xguse/table_enforcer/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up table_enforcer for local development.

  1. Fork the table_enforcer repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/table_enforcer.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv table_enforcer
    $ cd table_enforcer/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 table_enforcer tests
    $ python setup.py test or py.test
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
  3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/xguse/table_enforcer/pull_requests and make sure that the tests pass for all supported Python versions.

Tips

To run a subset of tests:

$ py.test tests.test_table_enforcer

Credits

Development Lead

Contributors

None yet. Why not be the first?

History

v0.1.5 / 2018-02-01

  • Added tests for imports and more Class behavior
  • main_classes: calling recode with validate is now prefered

v0.1.4 / 2018-01-26

  • main_classes.py: removed faulty imports

v0.1.3 / 2018-01-26

  • corrected Usage_Demo.ipynb
  • formatting and typing
  • table_enforcer.py -> main_classes.py

v0.1.2 / 2017-11-17

  • flake8
  • set up basic testing
  • changed travis build settings
  • updated usage demo and readme

v0.1.1 / 2017-11-16

  • Added usage notebook link to docs.
  • reorganized import strategy of Enforcer/Column objs
  • added more builtin validators/recoders/decorators
  • updated reqs
  • initialized travis integration
  • updated docs
  • Added usage demo notebook for docs
  • updated ignore patterns
  • validators.py: renamed

v0.1.0 / 2017-11-15

  • first minimally functional package
  • Enforcer and Column classes defined and operational
  • small cadre of built-in validator functions and decorators
  • ignore jupyter stuff
  • linter setups

v0.0.1 / 2017-11-14

  • First commit

Indices and tables