Overview¶
pgbedrock is an application for managing the roles, memberships, ownerships, and most importantly the permissions for tables, sequences, and schemas in a Postgres database.
Given the parameters to connect to a Postgres database (i.e. host, port, etc.) and a YAML file (a “spec”) representing the desired database configuration, pgbedrock makes sure that the configuration of that database matches the spec. If there are differences, it will alter the database to make it match the spec.
It can be run as a docker container (via docker run quay.io/squarespace/pgbedrock
) or
as a local command-line utility (via pip install pgbedrock
).
Example¶
As an example, the definition for the jdoe
role in the spec might look like this:
jdoe:
can_login: yes
is_superuser: no
attributes:
- PASSWORD "{{ env['JDOE_PASSWORD'] }}"
member_of:
- analyst
owns:
schemas:
- finance_reports
tables:
- finance_reports.Q2_revenue
- finance_reports.Q2_margin
privileges:
schemas:
read:
- finance
- marketing
write:
- reports
tables:
read:
- finance.*
- marketing.ad_spend
- marketing.impressions
write:
- reports.*
sequences:
write:
- reports.*
When pgbedrock is run, it would make sure that:
- The role
jdoe
existsjdoe
can log injdoe
is not a superuserjdoe
’s password is the same as what is in the$JDOE_PASSWORD
environment variable- All other role attributes for
jdoe
are the Postgres defaults (as defined by pg_authid).jdoe
is a member of theanalyst
rolejdoe
is a member of no other rolesjdoe
owns thefinance_reports
schemajdoe
owns thefinance_reports.Q2_revenue
andfinance_reports.Q2_margin
tablesjdoe
has read-level schema access (in Postgres terms:USAGE
) for thefinance
andmarketing
schemasjdoe
has write-level schema access (CREATE
) for thereports
schemajdoe
has read-level access (SELECT
) to all tables in thefinance
schema and to themarketing.ad_spend
andmarketing.impressions
tablesjdoe
has default privileges to read from all future tables created in thefinance
schemajdoe
has write-level access (SELECT
,INSERT
,UPDATE
,DELETE
,TRUNCATE
,REFERENCES
, andTRIGGER
) to all tables in thereports
schemajdoe
has default privileges to write to all future tables created in thereports
schemajdoe
has write-level access (SELECT
,USAGE
,UPDATE
) to all sequences in thereports
schemajdoe
has default privileges to write to all future sequences created in thereports
schemajdoe
does not have any access other than that listed above (except whatever it inherits from theanalyst
role thatjdoe
is a member of)
Quickstart¶
Using pgbedrock requires three steps: generating a spec for a database, reviewing that spec, and configuring the database using that spec. Below we will do this using the pgbedrock docker image, but these steps can also be done with the pip-installed version of the tool.
Generate a spec for a database. Specify the connection parameters below (host, port, database, username, and user password) as well as the place to output the tentative spec. Note that the user passed with
-U
must be a superuser.docker run -it \ quay.io/squarespace/pgbedrock generate \ -h myhost.mynetwork.net \ -p 5432 \ -d mydatabase \ -U mysuperuser \ -w supersecret > path/to/spec.yml
Review the spec. pgbedrock is not quite as flexible as Postgres’s permissioning, and as a result the generated spec may differ slightly from the current state of your database. For more information on these potential simplifications, see Notable Functionality And Caveats. As a result, it is recommended to run
pgbedrock configure
in check mode the first time you use it to see what changes it would introduce to your current setup. This looks similar to the command above, but requires us to also pass in the passwords for any roles whose passwords are managed within Postgres itself. These can be identified in the spec file as roles with a line that looks likePASSWORD "{{ env['MYROLE_PASSWORD'] }}"
(if you forget to pass in these passwords pgbedrock will just throw an error and refuse to run). Note that you must runpgbedrock configure
against the Postgres primary. To run pgbedrock in check mode we do the following:docker run -it \ -e "JDOE_PASSWORD=${JDOE_PASSWORD}" \ -e "JSMITH_PASSWORD=${JSMITH_PASSWORD}" \ -v /path/to/spec.yml:/opt/spec.yml \ quay.io/squarespace/pgbedrock configure spec.yml \ -h myhost.mynetwork.net \ -p 5432 \ -d mydatabase \ -U mysuperuser \ -w supersecret \ --check
Note that
--check
is actually the default behavior, so we could also omit that.Configure the database using the spec. Once you feel comfortable with the changes pgbedrock would introduce, run the above command again using
--live
instead of--check
. Changes will now be made real. To make future changes, modify the spec file and run the above command.
Documentation Contents¶
Project Goals¶
pgbedrock was created with several goals in mind:
- Simplify permission complexity.
- pgbedrock simplifies object access down to read vs. write. As a result, an administrator
doesn’t need to know that within Postgres ‘read’ access is really
SELECT
for tables butUSAGE
for schemas, or that write access for schemas meansCREATE
but for tables it is a combination ofINSERT
,UPDATE
,DELETE
,TRUNCATE
,REFERENCES
, andTRIGGER
.
- Co-locate all config.
- Within Postgres itself, role, role membership, ownership, and permission information is
distributed across a variety of locations:
pg_authid
,pg_class
,pg_namespace
,pg_default_acl
, and so on. As a result, it is hard to get a high-level “lay of the land”. pgbedrock puts all this config into one YAML file so it’s easy to stay on top of how the database is configured.
- Assert that config matches reality.
- Because information is so distributed in a normal Postgres cluster, it is easy for things to get out of sync. pgbedrock checks the YAML spec against the provided database and asserts that the two match. If they do not, it makes changes to the database to make them match, transparently reporting all of the queries that it ran to make those changes.
- Provide an auditable log of changes.
- By using a YAML spec, our config can be put into source control, allowing us to see who had access at any given time. In addition, each time pgbedrock runs it will output the set of SQL queries that it ran to bring the cluster in line with the spec. By storing those outputs an administrator will have an audit trail of when each change occurred.
As a knock-on benefit, by having pgbedrock run on a schedule one can enforce that config changes be put into code and through a PR process: changes made live to a cluster will be revoked the next time the tool runs, helping dissuade administrators from continually making live, unaudited changes.
Generating A Spec¶
The pgbedrock generate
command creates a spec given a database’s current state, printing its
results to STDOUT
. As a result, one can create a spec with:
docker run -it \ quay.io/squarespace/pgbedrock generate \ -h myhost.mynetwork.net \ -p 5432 \ -d mydatabase \ -U mysuperuser \ -w supersecret > path/to/spec.yml
Alternatively, if you’d prefer to use the Python command-line interface instead, pip install
pgbedrock and run the above command starting from pgbedrock generate
. The rest of the command
is identical.
Note that a generated spec may differ from reality due to simplifications that pgbedrock makes. For
an example, see the “pgbedrock simplifies permissions down to read vs. write” bullet in the
Notable Functionality And Caveats. As a result, after generating a spec it is recommended
to run pgbedrock configure
against it right away in check mode to see what differences exist.
In addition to roles being granted various missing write privileges, another common change seen
after running pgbedrock generate
is various default privilege grants occurring. If within the
database there is currently a default privilege granted to a role within a schema, pgbedrock assumes
that the grantee is intended to have this default privilege regardless of who creates the future
object. To do this in Postgres correctly, pgbedrock needs to grant that default privileges from all
roles that could create new objects (see the “Default privileges are granted for permissions like
myschema.*
” bullet in the Notable Functionality And Caveats section for more details).
Making Configuration Changes¶
The pgbedrock configure
command takes a set of parameters for connecting to a database and a
path to a YAML spec and modifies the database so that it matches the spec file. In check mode these
changes will be reported but not committed.
One can configure a database with:
docker run -it \ -e "JDOE_PASSWORD=${JDOE_PASSWORD}" \ -e "JSMITH_PASSWORD=${JSMITH_PASSWORD}" \ -v /path/to/spec.yml:/opt/spec.yml \ quay.io/squarespace/pgbedrock configure spec.yml \ -h myhost.mynetwork.net \ -p 5432 \ -d mydatabase \ -U mysuperuser \ --prompt \ --check \ --attributes \ --memberships \ --no-ownerships \ --no-privileges
A few notes on the above:
- We use
-it
here because we are not providing a password as an input variable. Instead, we will bring up an interactive password prompt (via--prompt
). If we instead just passed in a password with-w
then we would not need to use-it
.- We use
-t
so docker allocates a pseudo-tty for us, which allows us to see the progress bars as pgbedrock works. This isn’t strictly necessary if you don’t want to see the progress bars.- Because our spec.yml has templated passwords for the jdoe and jsmith roles, we pass in the environment variables for those passwords to our docker container (note that here we’re passing them from environment variables in our own environment; obviously you could just hard-code them in if you wanted, i.e.
-e "JDOE_PASSWORD=rumplestiltskin"
).- The role we provide with
-U
must be a superuser since they will need the ability to modify roles, memberships, schema ownership, and privileges.- We use
--prompt
to have an interactive prompt come up for us to put in our password.- We use
--check
to be sure that our changes will run in check mode, meaning that we will see what pgbedrock would change, but it will not actually commit those changes to our database cluster at the end of execution. Note that check mode is the default, so we would not have to provide this flag, but it is still a good idea to do so to be safe and explicit. If we wanted the changes pgbedrock makes to be committed we would instead use the--live
flag.- We choose to run only the attributes and memberships submodules here. In general it is a good idea to run all of the submodules (which is the default), but it can be useful to only use a subset if you are just tweaking a spec and checking what would change.
Further details on the meanings of parameters that pgbedrock accepts can be found by running
docker run quay.io/squarespace/pgbedrock configure --help
.
Also note that above we are running pgbedrock through a docker container, but if you’d prefer to
use the Python command-line interface instead, pip install pgbedrock and run the above command
starting from pgbedrock configure
. The rest of the command is identical. Note that any
environment variables that you have templated into your spec.yml file must be set within your shell.
Spec Overview¶
At A Glance¶
The spec.yml file is a YAML document that holds all information about roles, role memberships,
object ownerships, and privileges for a given database. It is best generated programmatically with
pgbedrock generate
.
The spec.yml is comprised of a number of role definitions. An example role definition within this file may look something like the below:
jdoe: can_login: yes is_superuser: no attributes: - PASSWORD "{{ env['JDOE_PASSWORD'] }}" member_of: - analyst owns: schemas: - finance_reports tables: - finance_reports.Q2_revenue - finance_reports.Q2_margin privileges: schemas: read: - finance - marketing write: - reports tables: read: - finance.* - marketing.ad_spend - marketing.impressions write: - reports.* sequences: write: - reports.*
All items other than the role name itself are optional. As a result, if you wanted to create a role
foo
with all defaults you could do so with just:
foo:
A role definition can include any of the keywords listed below.
Keywords¶
attributes¶
Type Default list Empty Items in the list may be any of the following attributes accepted by Postgres’s CREATE ROLE statement. Most attributes can be preceeded by ‘NO’ to negate them:
Keyword Default BYPASSRLS NOBYPASSRLS CONNECTION LIMIT <int> -1 CREATEDB NOCREATEDB CREATEROLE NOCREATEROLE INHERIT INHERIT PASSWORD <password> None REPLICATION NOREPLICATION VALID UNTIL <date string> ‘infinity’
can_login¶
Type Default bool False
has_personal_schema¶
Type Default bool False Whether the role should have a personal schema as defined in the “personal_schemas are supported” bullet in Notable Functionality And Caveats.
is_superuser¶
Type Default bool False
member_of¶
Type Default list Empty The roles that this role is a member of. Within Postgres, this means that if
roleA
is a member ofroleB
, thenroleA
will inherit all privileges thatroleB
has.
owns¶
Type Default dict Empty The objects that this role owns. At present pgbedrock manages schema, table, and sequence ownership. Each of these objects is provided as a keyword followed by a list of the objects of that kind that is owned by this role. For example:
analyst: owns: schemas: - finance sequences: - finance.* tables: - finance.* - marketing.ad_spend
privileges¶
Type Default dict Empty The privileges section may be easiest to explain with an example:
analyst: can_login: no privileges: schemas: read: - finance - marketing write: - reports tables: read: - finance.* - marketing.* write: - reports.*Here we have a role
analyst
that will be used as a group role (i.e. it has no login access, but we will grant it to each of our analyst employees so that they inherit its permissions). We have given this analyst role read access on the finance and marketing schemas and to all tables in them, as well as write access to the reports schema and to all tables in it.The above example shows the general structure of the privileges section: the first keys within it are the object types. pgbedrock currently supports schemas, sequences, and tables as object types, each of which is optional to include. Within each object type, we have keys for read and write, also both optional. Under each of these entries we have a list of the items to grant to.
Note that the
foo.*
syntax is not a regex expression but rather a shorthand for listing everything in the schema. As a result, puttingfoo.bar*
(to get tablesfoo.barn
orfoo.barbados
) won’t work; onlyfoo.*
will work.
Password Management¶
Password management deserves some additional clarification. Since passwords shouldn’t be stored in plain text in version control, pgbedrock takes user-provided environment variables to fill in passwords. For example, one could have a role defined as:
myrole:
attributes:
- PASSWORD "{{ env['MYROLE_PASSWORD'] }}"
Note that the environment variable can be named whatever you would like. As long as that variable exists in the environment, pgbedrock will use it. If a variable is declared in the spec template but does not exist in the environment, pgbedrock will refuse to run and will report the name of the missing environment variable in its error message.
Note that if you are running pgbedrock through docker you will need to pass these environment
variables into the docker container. This can be done using the -e
flag for docker run as shown
in the example for the Making Configuration Changes section above.
Notable Functionality And Caveats¶
- Only Postgres 9.5, 9.6, and 10 are currently supported
- Support for older Postgres versions is unlikely to be prioritized.
- pgbedrock will not delete or alter any objects
- pgbedrock is explicitly written to not do anything destructive toward the objects in the database. A revoked permission can simply be re-granted, but a table/schema/sequence that has been deleted is gone for good (unless you have backups). As a result, pgbedrock will not delete any objects, including roles, schemas, tables, and sequences. pgbedrock will configure these objects, but if they need to be deleted you will have to do that manually. If one of these objects is not listed in the spec.yml file then pgbedrock will refuse to run, alerting the user of the discrepancy and asking them to manually take action (i.e. delete the role / schema / table / sequence or add it to the spec).
- Ownership and privilege management currently supports only schemas, tables, and sequences
- Support for managing ownership and privileges of other objects (for example: functions, foreign data wrappers, foreign servers, etc.) is not avaiable but may be added in the future.
- Roles and memberships are cluster-wide in Postgres
- This means that if you have multiple databases within one Postgres instance, all of those databases share the same roles and role memberships. The consequence of this is that if you use pgbedrock to manage all of those databases, then you will need to list the roles and role memberships in each database’s spec file.
- pgbedrock simplifies permissions down to read vs. write
- In our experience, this is easier to reason about, easier to remember, and is a sufficient level
of granularity for most use cases. However, a consequence of this is that if you do use more
fine-grained controls in your database then you will need to be more permissive or restrictive
in your permissions in order to use pgbedrock (or, even better, put in a pull request to add
support for finer-grained controls to pgbedrock!). As a concrete example, if roleA currently has
INSERT
permission to a table, then to use pgbedrock you will have to decide whether they will get read access (and thus lose thatINSERT
permission) or write access (and thus getUPDATE
,DELETE
, etc. permissions as well). If the spec is created withpgbedrock generate
, pgbedrock will take the latter approach (i.e. granting additional write-level access), so make sure to check the initial spec after generating it to verify that any changes it introduces are acceptable.
- Default privileges are granted for permissions like
myschema.*
- When a permission grant looks like
myschema.*
, pgbedrock interprets that to mean “grant this permission for all existing tables and for all future tables too” (i.e. a default privilege). However, default privileges in Postgres are only applied to new tables created by the role that granted the privilege, meaning that if roleA grants defaultSELECT
privileges on tables to roleB, then those default privileges will apply if and only if roleA is the one who creates a subsequent table. If instead roleC creates a table then the default privileges won’t happen. To deal with this, when pgbedrock seesmyschema.*
it will identify all roles that have the ability to create objects in that schema and grant default privileges from each of these roles to the role that should have the default privileges.
- Default privileges are granted for permissions like
- personal_schemas are supported
- It is common to give users a “sandbox” where they can create objects, modify them, delete them,
etc. A typical way to do this is to create a schema with the same name as the role and let them
own it, i.e. the role
jdoe
would own the schemajdoe
. Every object in thejdoe
schema should thus be owned byjdoe
. pgbedrock supports this concept in a few ways. First, by specifyinghas_personal_schema: yes
for a role, a personal schema will be created if it does not exist. If the schema already exists, pgbedrock will make sure that the schema and all objects in it that pgbedrock manages are owned by this role, making changes to ownership to make this true. Finally,personal_schemas
can be used as a special term in privilege grants. For example, a role can be given read-level table privileges topersonal_schemas.*
, which will let that role read all tables in all personal schemas in the database. To be a personal schema, the schema must be owned by a role with the same name as the schema and that role must be able to login.
Development¶
Several functionalities for testing and debugging are described below.
Debugging With Verbose Mode¶
To see all queries executed by pgbedrock as it runs, run pgbedrock with the --verbose
flag.
Note that this will likely produce a lot of output, so you may want to tee it into a log file.
Getting Set Up For Local Development¶
First, get your Python environment set up:
mkvirtualenv pgbedrock3 --python python3
pip3 install -e . -r requirements-dev.txt -r requirements-publish.txt
Note that if the pip install step fails on psycopg2 you may have to do the following:
brew install postgresql openssl
xcode-select --install
, followed by a restart of your machine- If you still get an error about a library for -lssl not found, then you have two options:
brew reinstall python
to get Python to use brew’s OpenSSL, or explicitly tell pip to use Brew’s OpenSSL viaLDFLAGS="-L$(brew --prefix openssl)/lib" pip3 install psycopg2
.
Testing Functionality¶
Various testing functionality exists:
make test
- Run tests for both Python 2 and 3 (via docker containers) against all supported Postgres versionspytest
- Run tests for whichever Python version is in your virtualenv. This requires runningmake start_postgres
first to start up a local dockerized Postgres. Also, if you’ve previously run the test suite with docker that you will need to runmake clean
first to clear out pytest’s cache or else pytest will error out.make coverage
- Check package coverage and test coverage
Releasing A New Version¶
If you make a PR that gets merged into master, a new version of pgbedrock can be created as follows.
Increment the
__version__
in thepgbedrock/__init__.py
file and commit that change.Update the CHANGELOG file calling out changes to code regarding added features, new behaviors that could introduce breaking changes, and credits.
Update CONTRIBUTORS, adding new contributors alphabetically according to git log –format=%an | sort | uniq, excluding duplicates and correcting author names as requested by contributors.
Push a new git tag to the repo by doing:
- Write the tag message in a dummy file called
tag_message
. We do this to allow multi-line tag messages git tag x.x.x -F tag_message
git push --tags origin master
- Write the tag message in a dummy file called
- Run
make release-pypi
. - Run
make release-quay
. This may require doing a docker login to quay first.