laforge: a low-key build system for data work.

Contents

Overview

laforge is a low-key build system designed to interoperate Python, SQL, and Stata data work, originally developed internally for IRIS, the Institute for Research on Innovation and Science at the University of Michigan’s Institute for Social Research.

Features:

  • Interoperable: Read, write, and execute Python, SQL, and Stata scripts/data.

  • Straightforward: Simple build INI files designed for a one-click build.

  • No lock-in: Maintain scripts independent from laforge.

An Example Build

Directory Organization

Because laforge needs to find the scripts, and because scripts will likely interact with output from other scripts, I tend to keep related project/sub-project files together:

project
├───input
│   ├───a.csv
│   ├───b.xlsx
│   └───c.csv
├───output
│   ├───results_d.csv
│   ├───results_e.csv
│   └───results_f.csv
├───.env
├───build.ini
├───g.py
├───h.sql
├───i.py
└───j.py

.env

distro = mssql
server = MSSQL
database = testdb
schema = laforge

build.ini

[DEFAULT]

Module Documentation

builder

Builder reads and executes tasks and lists of tasks.

laforge.builder.show_env(path=None)[source]

Show the calculated generic section environment

laforge.builder.run_build(script_path, *, log, debug=False, dry_run=False)[source]

laforge’s core build command

class laforge.builder.Verb[source]

An enumeration.

class laforge.builder.Target[source]

An enumeration.

class laforge.builder.FileCall(method, kwargs)
property kwargs

Alias for field number 1

property method

Alias for field number 0

exception laforge.builder.TaskConstructionError[source]
exception laforge.builder.TaskExecutionError[source]
class laforge.builder.TaskList(from_string, location='.')[source]

Todo

Implement cache_results=False

load_section_config(section='DEFAULT')[source]

Put together config from env, TaskList config, section config

execute()[source]

Execute each task in the list.

Todo

Restore quiet?

dry_run()[source]

List each task in the list.

class laforge.builder.BaseTask(*, identifier, verb, target, content, config)[source]

Create a task to (verb) (something)

Todo

if “:” in self.content:

previous_result_key, actual_path_content = self.content.split(“:”)

property path

For handlers where dir[verb] + content = path

class laforge.builder.FileReader(*, identifier, verb, target, content, config)[source]
class laforge.builder.InternalPythonExecutor(*, identifier, verb, target, content, config)[source]

Execute (without importing) Python script by path

This will set __main__ = 'laforge'

..todo

Allow implict/explicit return of results.
class laforge.builder.ShellExecutor(*, identifier, verb, target, content, config)[source]
class laforge.builder.StataExecutor(*, identifier, verb, target, content, config)[source]
class laforge.builder.SQLQueryReader(*, identifier, verb, target, content, config)[source]
class laforge.builder.SQLExecutor(*, identifier, verb, target, content, config)[source]
class laforge.builder.SQLReaderWriter(*, identifier, verb, target, content, config)[source]
class laforge.builder.FileWriter(*, identifier, verb, target, content, config)[source]

Handles all tasks writing to file.

class laforge.builder.ExistenceChecker(*, identifier, verb, target, content, config)[source]

command

Command-line interface for laforge.

distros

exception laforge.distros.SQLDistroNotFound[source]

sql

SQL utilities for mid-level interaction. Inspired by pathlib; powered by SQLALchemy.

Note

Supported: MSSQL, MariaDB/MySQL, PostgreSQL, SQLite. Supportable: Firebird, Oracle, Sybase.

exception laforge.sql.SQLTableNotFound[source]
exception laforge.sql.SQLChannelNotFound[source]
exception laforge.sql.SQLIdentifierProblem[source]
class laforge.sql.Channel(distro, *, server=None, database=None, schema=None, **engine_kwargs)[source]

Abstraction from Engine, other static details.

execute_statement(statement, fetch=False)[source]

Execute SQL (core method)

Todo

De-messify

laforge.sql.execute(statement, fetch=False, channel=None)[source]

Convenience method, autofetches Channel if possible

class laforge.sql.Script(query, channel=None)[source]

SQL query string, parsable by ‘go’ separation and execute()able.

execute(statements=None)[source]

Execute itsel(f|ves)

to_table()[source]

Executes all and tries to return a DataFrame for the result of the final query.

This is one of two ways that laforge retrieves tables.

Warning

This is limited by the capacity of Pandas to retrieve only the final result. For Microsoft SQL Server, if a lengthy set of queries is desired, the most reliable approach appears to be a single final query after a ‘go’ as a batch gterminator.

Warning

This will rename columns that do not conform to naming standards.

class laforge.sql.Table(name, channel=None, **kwargs)[source]

Represents a SQL table, featuring methods to read/write DataFrames.

Todo

Factor out to superclass to allow views

write(df, if_exists='replace')[source]

From DataFrame, create a new table and fill it with values

read()[source]

Return the full table as a DataFrame

drop(ignore_existence=False)[source]

Delete the table within SQL

class laforge.sql.Scalar(prox)[source]

Little helper to produce clearly typed single (upper left) ResultProxy result.

class laforge.sql.Identifier(user_input, extra=None)[source]

Single standardized variable/database/schema/table/column/anything identifier.

Todo

class InvalidIdentifierError relay_id_problem(identifier, action, reason=None, replacement=None)

tech

laforge.tech.make_first_upper(s)[source]

Uppercase the first letter of s, leaving the rest alone.

toolbox

Handful of utility functions

Note

These intentionally only depend on builtins.

Note

Some copyright information within this file is identified per-block below.

laforge.toolbox.round_up(n, nearest=1)[source]

Round up n to the nearest nearest.

Parameters
  • n

  • nearest – (Default value = 1)

laforge.toolbox.prepare_to_access(path)[source]

Make directory exist and verify that file would be writable

laforge.toolbox.verify_file_is_writable(path, retry_attempts=3, retry_seconds=5)[source]

Check for locked file (e.g. Excel has CSV open)

laforge.toolbox.flatten(foo)[source]

Take any set of nests in an iterator and reduce it into one generator.

‘Nests’ include any iterable except strings.

Parameters

foo

Note

flatten() was authored by Amber Yust at https://stackoverflow.com/a/5286571. This function is not claimed under the laforge license.

Contributing to Development

laforge supports Python 3.6+.

Process

Tool

Documentation

Automation

Nox

https://nox.readthedocs.io/

Test

pytest

https://docs.pytest.org/

Test coverage

pytest-cov

https://pytest-cov.readthedocs.io/

Format

Black

https://black.readthedocs.io/

Lint

Flake8

http://flake8.pycqa.org/

List more

Pylint

https://pylint.readthedocs.io/en/latest/

Document

Sphinx

https://www.sphinx-doc.org/

Suggested Environment

# Create virtual environment
python -m venv .venv

# Activate virtual environment with shell-specific script:
. .venv/bin/activate.fish           # fish
# $ source ./.venv/bin/activate     # bash
#  source ./.venv/bin/activate.csh  # csh
# Note that Python for Windows creates ./Scripts/ rather than ./bin/
# .\.venv\Scripts\Activate.ps1      # PowerShell
# .venv\Scripts\Activate.bat        # cmd

# Install packages
python -m pip install -r requirements.txt

# Optional [packages] to include Excel and/or non-SQLite databases
python -m pip install -e .[mysql]

# Run tests
python -m pytest

# Run the gauntlet
python -m nox

Embedded TODOs

Todo

Implement cache_results=False

original entry

Todo

Restore quiet?

original entry

Todo

if “:” in self.content:

previous_result_key, actual_path_content = self.content.split(“:”)

original entry

Todo

De-messify

original entry

Todo

Factor out to superclass to allow views

original entry

Todo

class InvalidIdentifierError relay_id_problem(identifier, action, reason=None, replacement=None)

original entry

Docstring Gaps

Undocumented Python objects
===========================
laforge.builder
---------------
Functions:
 * find_build_config_in_directory
 * get_package_logger
 * get_verb
 * is_verb
 * run_cmd
 * seconds_since

Classes:
 * BaseTask -- missing methods:

   - implement
   - validate_results
 * ExistenceChecker -- missing methods:

   - implement
 * FileReader -- missing methods:

   - implement
 * FileWriter -- missing methods:

   - implement
   - write
 * InternalPythonExecutor -- missing methods:

   - implement
 * SQLExecutor -- missing methods:

   - implement
 * SQLQueryReader -- missing methods:

   - implement
 * SQLReaderWriter -- missing methods:

   - implement
 * ShellExecutor -- missing methods:

   - implement
 * StataExecutor -- missing methods:

   - implement
 * Task
 * TaskList -- missing methods:

   - load_tasks
   - template_content

laforge.command
---------------
Functions:
 * user_confirms_cleartext

laforge.distros
---------------
Classes:
 * Distro
 * MSSQL
 * MySQL
 * PostgresQL
 * SQLite

laforge.sql
-----------
Functions:
 * fix_bad_columns

Classes:
 * Channel -- missing methods:

   - clean_up_statement
   - find
   - grab
   - retrieve_engine
   - save_engine
 * Identifier -- missing methods:

   - check
 * Script -- missing methods:

   - read
 * Table -- missing methods:

   - exists
   - resolve

laforge.tech
------------
Functions:
 * capitalize_sentences
 * nobabble

Classes:
 * ModifiableVerb
 * Technobabbler

laforge.toolbox
---------------
Functions:
 * is_reserved_word

Free Software License

Copyright 2019 Matt VanEseltine.

laforge is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

laforge is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. Copies are attached with this documentation and available online at https://www.gnu.org/licenses/agpl.html.