Welcome to Django Postgres Stats’ documentation!

Django Postgres Stats exposes statistical and datetime functions specific to Postgres to Django without making the user write raw SQL. The plan is to expand the library over time to cover many Postgres-specific functions. For now, only functions personally used by the authors have been added.

Contents:

Installation instructions

Download and installation

Installation with pip

You can use pip to install django-postgres-stats:

$ pip install django-postgres-stats

Using in Django

You will need to add the postgres_stats application to the INSTALLED_APPS setting of your Django project settings.py file.:

INSTALLED_APPS = (
    ...
    'postgres_stats',
)

Version control

Django Postgres Stats is hosted on GitHub:

https://github.com/rtidatascience/django-postgres-stats

Postgres functions

Database functions

You can use these like the standard Django database functions.

class postgres_stats.functions.DateTrunc(expression, precision, **extra)[source]

Accepts a single timestamp field or expression and returns that timestamp truncated to the specified precision. This is useful for investigating time series.

The precision named parameter can take:

  • microseconds
  • milliseconds
  • second
  • minute
  • hour
  • day
  • week
  • month
  • quarter
  • year
  • decade
  • century
  • millennium

Usage example:

checkin = Checkin.objects.
    annotate(day=DateTrunc('logged_at', 'day'),
             hour=DateTrunc('logged_at', 'hour')).
    get(pk=1)

assert checkin.logged_at == datetime(2015, 11, 1, 10, 45, 0)
assert checkin.day == datetime(2015, 11, 1, 0, 0, 0)
assert checkin.hour == datetime(2015, 11, 1, 10, 0, 0)
class postgres_stats.functions.Extract(expression, subfield, **extra)[source]

Accepts a single timestamp or interval field or expression and returns the specified subfield of that expression. This is useful for grouping data.

The subfield named parameter can take:

  • century
  • day
  • decade
  • dow (day of week)
  • doy (day of year)
  • epoch (seconds since 1970-01-01 00:00:00 UTC)
  • hour
  • isodow
  • isodoy
  • isoyear
  • microseconds
  • millennium
  • milliseconds
  • minute
  • month
  • quarter
  • second
  • timezone
  • timezone_hour
  • timezone_minute
  • week
  • year

See the Postgres documentation for details about the subfields.

Usage example:

checkin = Checkin.objects.
    annotate(day=Extract('logged_at', 'day'),
             minute=Extract('logged_at', 'minute'),
             quarter=Extract('logged_at', 'quarter')).
    get(pk=1)

assert checkin.logged_at == datetime(2015, 11, 1, 10, 45, 0)
assert checkin.day == 1
assert checkin.minute == 45
assert checkin.quarter == 4

Aggregations

You can use these like the standard Django aggregations.

class postgres_stats.aggregates.Percentile(expression, percentiles, continuous=True, **extra)[source]

Accepts a numerical field or expression and a list of fractions and returns values for each fraction given corresponding to that fraction in that expression.

If continuous is True (the default), the value will be interpolated between adjacent values if needed. Otherwise, the value will be the first input value whose position in the ordering equals or exceeds the specified fraction.

You will likely have to declare the output_field for your results. Django cannot guess what type of value will be returned.

Usage example:

from django.contrib.postgres.fields import ArrayField

numbers = [31, 83, 237, 250, 305, 314, 439, 500, 520, 526, 527, 533,
           540, 612, 831, 854, 857, 904, 928, 973]
for n in numbers:
    Number.objects.create(n=n)

results = Number.objects.all().aggregate(
    median=Percentile('n', 0.5, output_field=models.FloatField()))
assert results['median'] == 526.5

results = Number.objects.all().aggregate(
    quartiles=Percentile('n', [0.25, 0.5, 0.75],
                         output_field=ArrayField(models.FloatField())))
assert results['quartiles'] == [311.75, 526.5, 836.75]

results = Number.objects.all().aggregate(
    quartiles=Percentile('n', [0.25, 0.5, 0.75],
                         continuous=False,
                         output_field=ArrayField(models.FloatField())))
assert results['quartiles'] == [305, 526, 831]

Roadmap

In general, we want to create the ability to use Postgres functions that will give us better statistical methods. If a function can be easily accessed using Func, we do not need to implement it.

Next functions

After we have the set of functions, we can begin to introduce other parts of Postgres that are not yet available in Django.

Indices and tables