Overview

pgbedrock is an application for managing the roles, memberships, ownerships, and most importantly the permissions for tables, sequences, and schemas in a Postgres database.

Given the parameters to connect to a Postgres database (i.e. host, port, etc.) and a YAML file (a “spec”) representing the desired database configuration, pgbedrock makes sure that the configuration of that database matches the spec. If there are differences, it will alter the database to make it match the spec.

It can be run as a docker container (via docker run quay.io/squarespace/pgbedrock) or as a local command-line utility (via pip install pgbedrock).

Example

As an example, the definition for the jdoe role in the spec might look like this:

jdoe:
    can_login: yes
    is_superuser: no
    attributes:
        - PASSWORD "{{ env['JDOE_PASSWORD'] }}"
    member_of:
        - analyst
    owns:
        schemas:
            - finance_reports
        tables:
            - finance_reports.Q2_revenue
            - finance_reports.Q2_margin
    privileges:
        schemas:
            read:
                - finance
                - marketing
            write:
                - reports
        tables:
            read:
                - finance.*
                - marketing.ad_spend
                - marketing.impressions
            write:
                - reports.*
        sequences:
            write:
                - reports.*

When pgbedrock is run, it would make sure that:

  • The role jdoe exists
  • jdoe can log in
  • jdoe is not a superuser
  • jdoe’s password is the same as what is in the $JDOE_PASSWORD environment variable
  • All other role attributes for jdoe are the Postgres defaults (as defined by pg_authid).
  • jdoe is a member of the analyst role
  • jdoe is a member of no other roles
  • jdoe owns the finance_reports schema
  • jdoe owns the finance_reports.Q2_revenue and finance_reports.Q2_margin tables
  • jdoe has read-level schema access (in Postgres terms: USAGE) for the finance and marketing schemas
  • jdoe has write-level schema access (CREATE) for the reports schema
  • jdoe has read-level access (SELECT) to all tables in the finance schema and to the marketing.ad_spend and marketing.impressions tables
  • jdoe has default privileges to read from all future tables created in the finance schema
  • jdoe has write-level access (SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, and TRIGGER) to all tables in the reports schema
  • jdoe has default privileges to write to all future tables created in the reports schema
  • jdoe has write-level access (SELECT, USAGE, UPDATE) to all sequences in the reports schema
  • jdoe has default privileges to write to all future sequences created in the reports schema
  • jdoe does not have any access other than that listed above (except whatever it inherits from the analyst role that jdoe is a member of)

Quickstart

Using pgbedrock requires three steps: generating a spec for a database, reviewing that spec, and configuring the database using that spec. Below we will do this using the pgbedrock docker image, but these steps can also be done with the pip-installed version of the tool.

  1. Generate a spec for a database. Specify the connection parameters below (host, port, database, username, and user password) as well as the place to output the tentative spec. Note that the user passed with -U must be a superuser.

    docker run -it \
        quay.io/squarespace/pgbedrock generate \
        -h myhost.mynetwork.net \
        -p 5432 \
        -d mydatabase \
        -U mysuperuser \
        -w supersecret > path/to/spec.yml
    
  2. Review the spec. pgbedrock is not quite as flexible as Postgres’s permissioning, and as a result the generated spec may differ slightly from the current state of your database. For more information on these potential simplifications, see Notable Functionality And Caveats. As a result, it is recommended to run pgbedrock configure in check mode the first time you use it to see what changes it would introduce to your current setup. This looks similar to the command above, but requires us to also pass in the passwords for any roles whose passwords are managed within Postgres itself. These can be identified in the spec file as roles with a line that looks like PASSWORD "{{ env['MYROLE_PASSWORD'] }}" (if you forget to pass in these passwords pgbedrock will just throw an error and refuse to run). Note that you must run pgbedrock configure against the Postgres primary. To run pgbedrock in check mode we do the following:

    docker run -it \
        -e "JDOE_PASSWORD=${JDOE_PASSWORD}" \
        -e "JSMITH_PASSWORD=${JSMITH_PASSWORD}" \
        -v /path/to/spec.yml:/opt/spec.yml \
        quay.io/squarespace/pgbedrock configure spec.yml \
        -h myhost.mynetwork.net \
        -p 5432 \
        -d mydatabase \
        -U mysuperuser \
        -w supersecret \
        --check
    

    Note that --check is actually the default behavior, so we could also omit that.

  3. Configure the database using the spec. Once you feel comfortable with the changes pgbedrock would introduce, run the above command again using --live instead of --check. Changes will now be made real. To make future changes, modify the spec file and run the above command.

Documentation Contents

Project Goals

pgbedrock was created with several goals in mind:

  1. Simplify permission complexity.
    pgbedrock simplifies object access down to read vs. write. As a result, an administrator doesn’t need to know that within Postgres ‘read’ access is really SELECT for tables but USAGE for schemas, or that write access for schemas means CREATE but for tables it is a combination of INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, and TRIGGER.
  2. Co-locate all config.
    Within Postgres itself, role, role membership, ownership, and permission information is distributed across a variety of locations: pg_authid, pg_class, pg_namespace, pg_default_acl, and so on. As a result, it is hard to get a high-level “lay of the land”. pgbedrock puts all this config into one YAML file so it’s easy to stay on top of how the database is configured.
  3. Assert that config matches reality.
    Because information is so distributed in a normal Postgres cluster, it is easy for things to get out of sync. pgbedrock checks the YAML spec against the provided database and asserts that the two match. If they do not, it makes changes to the database to make them match, transparently reporting all of the queries that it ran to make those changes.
  4. Provide an auditable log of changes.
    By using a YAML spec, our config can be put into source control, allowing us to see who had access at any given time. In addition, each time pgbedrock runs it will output the set of SQL queries that it ran to bring the cluster in line with the spec. By storing those outputs an administrator will have an audit trail of when each change occurred.

As a knock-on benefit, by having pgbedrock run on a schedule one can enforce that config changes be put into code and through a PR process: changes made live to a cluster will be revoked the next time the tool runs, helping dissuade administrators from continually making live, unaudited changes.

Generating A Spec

The pgbedrock generate command creates a spec given a database’s current state, printing its results to STDOUT. As a result, one can create a spec with:

docker run -it \
    quay.io/squarespace/pgbedrock generate \
    -h myhost.mynetwork.net \
    -p 5432 \
    -d mydatabase \
    -U mysuperuser \
    -w supersecret > path/to/spec.yml

Alternatively, if you’d prefer to use the Python command-line interface instead, pip install pgbedrock and run the above command starting from pgbedrock generate. The rest of the command is identical.

Note that a generated spec may differ from reality due to simplifications that pgbedrock makes. For an example, see the “pgbedrock simplifies permissions down to read vs. write” bullet in the Notable Functionality And Caveats. As a result, after generating a spec it is recommended to run pgbedrock configure against it right away in check mode to see what differences exist.

In addition to roles being granted various missing write privileges, another common change seen after running pgbedrock generate is various default privilege grants occurring. If within the database there is currently a default privilege granted to a role within a schema, pgbedrock assumes that the grantee is intended to have this default privilege regardless of who creates the future object. To do this in Postgres correctly, pgbedrock needs to grant that default privileges from all roles that could create new objects (see the “Default privileges are granted for permissions like myschema.*” bullet in the Notable Functionality And Caveats section for more details).

Making Configuration Changes

The pgbedrock configure command takes a set of parameters for connecting to a database and a path to a YAML spec and modifies the database so that it matches the spec file. In check mode these changes will be reported but not committed.

One can configure a database with:

docker run -it \
    -e "JDOE_PASSWORD=${JDOE_PASSWORD}" \
    -e "JSMITH_PASSWORD=${JSMITH_PASSWORD}" \
    -v /path/to/spec.yml:/opt/spec.yml \
    quay.io/squarespace/pgbedrock configure spec.yml \
    -h myhost.mynetwork.net \
    -p 5432 \
    -d mydatabase \
    -U mysuperuser \
    --prompt \
    --check \
    --attributes \
    --memberships \
    --no-ownerships \
    --no-privileges

A few notes on the above:

  • We use -it here because we are not providing a password as an input variable. Instead, we will bring up an interactive password prompt (via --prompt). If we instead just passed in a password with -w then we would not need to use -it.
  • We use -t so docker allocates a pseudo-tty for us, which allows us to see the progress bars as pgbedrock works. This isn’t strictly necessary if you don’t want to see the progress bars.
  • Because our spec.yml has templated passwords for the jdoe and jsmith roles, we pass in the environment variables for those passwords to our docker container (note that here we’re passing them from environment variables in our own environment; obviously you could just hard-code them in if you wanted, i.e. -e "JDOE_PASSWORD=rumplestiltskin").
  • The role we provide with -U must be a superuser since they will need the ability to modify roles, memberships, schema ownership, and privileges.
  • We use --prompt to have an interactive prompt come up for us to put in our password.
  • We use --check to be sure that our changes will run in check mode, meaning that we will see what pgbedrock would change, but it will not actually commit those changes to our database cluster at the end of execution. Note that check mode is the default, so we would not have to provide this flag, but it is still a good idea to do so to be safe and explicit. If we wanted the changes pgbedrock makes to be committed we would instead use the --live flag.
  • We choose to run only the attributes and memberships submodules here. In general it is a good idea to run all of the submodules (which is the default), but it can be useful to only use a subset if you are just tweaking a spec and checking what would change.

Further details on the meanings of parameters that pgbedrock accepts can be found by running docker run quay.io/squarespace/pgbedrock configure --help.

Also note that above we are running pgbedrock through a docker container, but if you’d prefer to use the Python command-line interface instead, pip install pgbedrock and run the above command starting from pgbedrock configure. The rest of the command is identical. Note that any environment variables that you have templated into your spec.yml file must be set within your shell.

Spec Overview

At A Glance

The spec.yml file is a YAML document that holds all information about roles, role memberships, object ownerships, and privileges for a given database. It is best generated programmatically with pgbedrock generate.

The spec.yml is comprised of a number of role definitions. An example role definition within this file may look something like the below:

jdoe:
    can_login: yes
    is_superuser: no
    attributes:
        - PASSWORD "{{ env['JDOE_PASSWORD'] }}"
    member_of:
        - analyst
    owns:
        schemas:
            - finance_reports
        tables:
            - finance_reports.Q2_revenue
            - finance_reports.Q2_margin
    privileges:
        schemas:
            read:
                - finance
                - marketing
            write:
                - reports
        tables:
            read:
                - finance.*
                - marketing.ad_spend
                - marketing.impressions
            write:
                - reports.*
        sequences:
            write:
                - reports.*

All items other than the role name itself are optional. As a result, if you wanted to create a role foo with all defaults you could do so with just:

foo:

A role definition can include any of the keywords listed below.

Keywords

attributes
Type Default
list Empty

Items in the list may be any of the following attributes accepted by Postgres’s CREATE ROLE statement. Most attributes can be preceeded by ‘NO’ to negate them:

Keyword Default
BYPASSRLS NOBYPASSRLS
CONNECTION LIMIT <int> -1
CREATEDB NOCREATEDB
CREATEROLE NOCREATEROLE
INHERIT INHERIT
PASSWORD <password> None
REPLICATION NOREPLICATION
VALID UNTIL <date string> ‘infinity’
can_login
Type Default
bool False
has_personal_schema
Type Default
bool False

Whether the role should have a personal schema as defined in the “personal_schemas are supported” bullet in Notable Functionality And Caveats.

is_superuser
Type Default
bool False
member_of
Type Default
list Empty

The roles that this role is a member of. Within Postgres, this means that if roleA is a member of roleB, then roleA will inherit all privileges that roleB has.

owns
Type Default
dict Empty

The objects that this role owns. At present pgbedrock manages schema, table, and sequence ownership. Each of these objects is provided as a keyword followed by a list of the objects of that kind that is owned by this role. For example:

analyst:
    owns:
        schemas:
            - finance
        sequences:
            - finance.*
        tables:
            - finance.*
            - marketing.ad_spend
privileges
Type Default
dict Empty

The privileges section may be easiest to explain with an example:

analyst:
    can_login: no
    privileges:
        schemas:
            read:
                - finance
                - marketing
            write:
                - reports
        tables:
            read:
                - finance.*
                - marketing.*
            write:
                - reports.*

Here we have a role analyst that will be used as a group role (i.e. it has no login access, but we will grant it to each of our analyst employees so that they inherit its permissions). We have given this analyst role read access on the finance and marketing schemas and to all tables in them, as well as write access to the reports schema and to all tables in it.

The above example shows the general structure of the privileges section: the first keys within it are the object types. pgbedrock currently supports schemas, sequences, and tables as object types, each of which is optional to include. Within each object type, we have keys for read and write, also both optional. Under each of these entries we have a list of the items to grant to.

Note that the foo.* syntax is not a regex expression but rather a shorthand for listing everything in the schema. As a result, putting foo.bar* (to get tables foo.barn or foo.barbados) won’t work; only foo.* will work.

Password Management

Password management deserves some additional clarification. Since passwords shouldn’t be stored in plain text in version control, pgbedrock takes user-provided environment variables to fill in passwords. For example, one could have a role defined as:

myrole:
    attributes:
        - PASSWORD "{{ env['MYROLE_PASSWORD'] }}"

Note that the environment variable can be named whatever you would like. As long as that variable exists in the environment, pgbedrock will use it. If a variable is declared in the spec template but does not exist in the environment, pgbedrock will refuse to run and will report the name of the missing environment variable in its error message.

Note that if you are running pgbedrock through docker you will need to pass these environment variables into the docker container. This can be done using the -e flag for docker run as shown in the example for the Making Configuration Changes section above.

Notable Functionality And Caveats

  • Only Postgres 9.5, 9.6, and 10 are currently supported
    Support for older Postgres versions is unlikely to be prioritized.
  • pgbedrock will not delete or alter any objects
    pgbedrock is explicitly written to not do anything destructive toward the objects in the database. A revoked permission can simply be re-granted, but a table/schema/sequence that has been deleted is gone for good (unless you have backups). As a result, pgbedrock will not delete any objects, including roles, schemas, tables, and sequences. pgbedrock will configure these objects, but if they need to be deleted you will have to do that manually. If one of these objects is not listed in the spec.yml file then pgbedrock will refuse to run, alerting the user of the discrepancy and asking them to manually take action (i.e. delete the role / schema / table / sequence or add it to the spec).
  • Ownership and privilege management currently supports only schemas, tables, and sequences
    Support for managing ownership and privileges of other objects (for example: functions, foreign data wrappers, foreign servers, etc.) is not avaiable but may be added in the future.
  • Roles and memberships are cluster-wide in Postgres
    This means that if you have multiple databases within one Postgres instance, all of those databases share the same roles and role memberships. The consequence of this is that if you use pgbedrock to manage all of those databases, then you will need to list the roles and role memberships in each database’s spec file.
  • pgbedrock simplifies permissions down to read vs. write
    In our experience, this is easier to reason about, easier to remember, and is a sufficient level of granularity for most use cases. However, a consequence of this is that if you do use more fine-grained controls in your database then you will need to be more permissive or restrictive in your permissions in order to use pgbedrock (or, even better, put in a pull request to add support for finer-grained controls to pgbedrock!). As a concrete example, if roleA currently has INSERT permission to a table, then to use pgbedrock you will have to decide whether they will get read access (and thus lose that INSERT permission) or write access (and thus get UPDATE, DELETE, etc. permissions as well). If the spec is created with pgbedrock generate, pgbedrock will take the latter approach (i.e. granting additional write-level access), so make sure to check the initial spec after generating it to verify that any changes it introduces are acceptable.
  • Default privileges are granted for permissions like myschema.*
    When a permission grant looks like myschema.*, pgbedrock interprets that to mean “grant this permission for all existing tables and for all future tables too” (i.e. a default privilege). However, default privileges in Postgres are only applied to new tables created by the role that granted the privilege, meaning that if roleA grants default SELECT privileges on tables to roleB, then those default privileges will apply if and only if roleA is the one who creates a subsequent table. If instead roleC creates a table then the default privileges won’t happen. To deal with this, when pgbedrock sees myschema.* it will identify all roles that have the ability to create objects in that schema and grant default privileges from each of these roles to the role that should have the default privileges.
  • personal_schemas are supported
    It is common to give users a “sandbox” where they can create objects, modify them, delete them, etc. A typical way to do this is to create a schema with the same name as the role and let them own it, i.e. the role jdoe would own the schema jdoe. Every object in the jdoe schema should thus be owned by jdoe. pgbedrock supports this concept in a few ways. First, by specifying has_personal_schema: yes for a role, a personal schema will be created if it does not exist. If the schema already exists, pgbedrock will make sure that the schema and all objects in it that pgbedrock manages are owned by this role, making changes to ownership to make this true. Finally, personal_schemas can be used as a special term in privilege grants. For example, a role can be given read-level table privileges to personal_schemas.*, which will let that role read all tables in all personal schemas in the database. To be a personal schema, the schema must be owned by a role with the same name as the schema and that role must be able to login.

Development

Several functionalities for testing and debugging are described below.

Debugging With Verbose Mode

To see all queries executed by pgbedrock as it runs, run pgbedrock with the --verbose flag. Note that this will likely produce a lot of output, so you may want to tee it into a log file.

Getting Set Up For Local Development

First, get your Python environment set up:

mkvirtualenv pgbedrock3 --python python3
pip3 install -e . -r requirements-dev.txt -r requirements-publish.txt

Note that if the pip install step fails on psycopg2 you may have to do the following:

  • brew install postgresql openssl
  • xcode-select --install, followed by a restart of your machine
  • If you still get an error about a library for -lssl not found, then you have two options: brew reinstall python to get Python to use brew’s OpenSSL, or explicitly tell pip to use Brew’s OpenSSL via LDFLAGS="-L$(brew --prefix openssl)/lib" pip3 install psycopg2.

Testing Functionality

Various testing functionality exists:

  • make test - Run tests for both Python 2 and 3 (via docker containers) against all supported Postgres versions
  • pytest - Run tests for whichever Python version is in your virtualenv. This requires running make start_postgres first to start up a local dockerized Postgres. Also, if you’ve previously run the test suite with docker that you will need to run make clean first to clear out pytest’s cache or else pytest will error out.
  • make coverage - Check package coverage and test coverage

Releasing A New Version

If you make a PR that gets merged into master, a new version of pgbedrock can be created as follows.

  1. Increment the __version__ in the pgbedrock/__init__.py file and commit that change.

  2. Update the CHANGELOG file calling out changes to code regarding added features, new behaviors that could introduce breaking changes, and credits.

  3. Update CONTRIBUTORS, adding new contributors alphabetically according to git log –format=%an | sort | uniq, excluding duplicates and correcting author names as requested by contributors.

  4. Push a new git tag to the repo by doing:

    • Write the tag message in a dummy file called tag_message. We do this to allow multi-line tag messages
    • git tag x.x.x -F tag_message
    • git push --tags origin master
  1. Run make release-pypi.
  2. Run make release-quay. This may require doing a docker login to quay first.