Overview

Buildhub aims to provide a public database of comprehensive information about Mozilla products releases and builds.

Concretely, it is a JSON API (Kinto) where you can query a collection of records.

Quickstart

Browse the database

Basic JSON API

Buildhub is just a collection of records on a Kinto server.

A set of filters and pagination options can be used to query the collection. See the dedicated section.

Elasticsearch API

An ElasticSearch endpoint is also available for more powerful queries:

More information in the Elasticsearch documentation

Table of contents

API

The BuildHub API is just a Kinto instance with a collection of records, coupled with ElasticSearch. A series of jobs is charge of keeping those records up to date when new releases are published.

Clients

Since it is an HTTP API and the records are public, you can simply use any HTTP client, like curl or HTTPie.

But for more convenience, especially regarding pagination and error handling, some dedicated libraries are also available:

Data

A single record has the following fields (see below for more details):

{
    "id": "firefox_beta_50-0b11_macosx_el",
    "schema": 1497453926485,
    "last_modified": 1498140377629,
    "target": {
        "platform": "macosx",
        "os": "mac",
        "version": "50.0b11",
        "channel": "beta",
        "locale": "el"
    },
    "build": {
        "id": "20161027110534",
        "date": "2016-10-27T11:05:34Z",
        "number": 2,
        "as": "$(CC)",
        "ld": "ldd",
        "cc": "/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc -std=gnu99",
        "cxx": "/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ -std=gnu++11",
        "host": "x86_64-pc-linux-gnu",
        "target": "x86_64-pc-linux-gnu",
    },
    "source": {
        "tree": "releases/mozilla-beta",
        "product": "firefox",
        "revision": "829a3f99f2606759305e3db204185242566a4ca6",
        "repository": "https://hg.mozilla.org/releases/mozilla-beta"
    },
    "download": {
        "mimetype": "application/x-apple-diskimage",
        "date": "2016-10-28T00:56:42Z",
        "size": 86180614,
        "url": "https://archive.mozilla.org/pub/firefox/releases/50.0b11/mac/el/Firefox 50.0b11.dmg"
    }
}

Listing records

Elasticsearch API

An ElasticSearch endpoint is available for faster and more powerful queries. It powers the online catalog.

Simple queries can be done with the QueryString:

Or via a POST request:

echo '{
  "query": {
    "bool": {
      "filter": [{
        "term": {
          "source.product": "devedition"
        }
      }]
    }
  },
  "size": 100
}' | http POST $SERVER/buckets/build-hub/collections/releases/search

Note

For aggregations (ie. distinct values) there is no need to retrieve the whole set of results. For example:

echo '{
  "aggs": {
    "platforms": {
      "terms": {
        "field": "target.platform",
        "size": 100
      }
    }
  },
  "size": 0
}' | http POST $SERVER/buckets/build-hub/collections/releases/search

More information in the Elasticsearch documentation

Basic Kinto search API

A set of filters and pagination options can be used to query the list. The most notable features are:

  • querystring filters (with ?field=value or dedicated operators like ?min_field=value or ?has_field=true)
  • paginated list of records (follow the URL in the Next-Page response header)
  • fields selection (with ?_fields=)
  • polling for changes (with ?_since=timestamp filter or ETags in request headers)

More information in the Kinto documentation.

Example queries

Is this an official build id?

In order to check that a build id exists, we’ll just check that it is mentioned in at least one record.

curl -s $SERVER/buckets/build-hub/collections/releases/search?q=build.id=20170713200529 | \
    jq -r '.hits.total'

Or using the Kinto records endpoint, with the JavaScript client:

import KintoClient from "kinto-http";
const client = new KintoClient(SERVER);
const collection = client.bucket("build-hub").collection("releases");
records = await collection.listRecords({limit: 1, filters: {"build.id": "20110110192031"}});
console.log(records.length >= 1);

Or the Python client:

import kinto_http

client = kinto_http.Client("https://buildhub.prod.mozaws.net/v1")
records = client.get_records(**{"build.id": "20110110192031", "_limit": 1, "pages": 1},
                             bucket="build-hub", collection="releases")
print(len(records) >= 1)
What is the Mercurial commit ID of a build ID?
client = kinto_http.Client("https://buildhub.prod.mozaws.net/v1")
records = client.get_records(**{"build.id": "20110110192031", "_limit": 1, "pages": 1},
                             bucket="build-hub", collection="releases")
try:
    revision = records[0]["source"]["revision"]
except IndexError:
    raise ValueError("Unknown build id")
except KeyError:
    raise ValueError("Unknown revision")
What locales are available for a certain version?

Using the ElasticSearch endpoint, with HTTPie and jq:

$ echo '{
  "aggs": {
    "locales": {
      "terms": {
        "field": "target.locale",
        "size": 1000,
        "order": {
          "_term": "asc"
        }
      }
    }
  },
  "query": {
    "bool": {
      "filter": [{
        "term": {
          "target.version": "57.0b9"
        }
      }, {
        "term": {
          "source.product": "firefox"
        }
      }]
    }
  },
  "size": 0
}' | http POST $SERVER/buckets/build-hub/collections/releases/search | \
jq -r '.aggregations.locales.buckets[] | .key'

ach
af
an
ar
bn-BD
bn-IN
...

Using the Kinto records endpoint, with the Kinto JavaScript client:

import KintoClient from "kinto-http";

const client = new KintoClient("https://buildhub.prod.mozaws.net/v1");
const collection = client.bucket("build-hub").collection("releases");
const records = await collection.listRecords({filters: {"target.version": "53.0b9"}});
const locales = new Set(records.map(r => r.target.locale));
What are the available build ids of a specific version?

Using the ElasticSearch endpoint, with Python aiohttp:

async def fetch_build_ids(session, product, version):
    query = {
      "aggs": {
        "build_ids": {
          "terms": {
            "field": "build.id",
            "size": 100000,
            "order": {
              "_term": "desc"
            }
          }
        }
      },
      "query": {
        "bool": {
          "filter": [{
            "term": {
              "target.version": version
            }
          }, {
            "term": {
              "source.product": product
            }
          }]
        }
      },
      "size": 0,
    }
    async with session.post(SERVER_URL, data=json.dumps(query)) as response:
        data = await response.json()

    aggs = data['aggregations']['build_ids']['buckets']
    buildids = [r['key'] for r in aggs]
    return buildids

Using the Kinto records endpoint, with curl and jq:

$ curl -s "${SERVER}/buckets/build-hub/collections/releases/records?target.version=56.0b12" | \
    jq -r '.data[] | .build.id' | \
    sort -u

20170914024831

More about the data schema

Field Description
id A unique ID for a build (see details).
schema The schema version when the record was added to the database.
last_modified The timestamp incremented when the record was created/modified.
source Information about the source code version used to build the release.
source.product One of firefox, thunderbird, fennec or devedition
source.revision Optional Mercurial changeset
source.repository Optional Mercurial repository
source.tree Optional Mercurial tree
target Major information about the release.
target.version Public version number
target.locale Locale name
target.channel AUS update channel name
target.os Operating system
target.platform OS and CPU architecture
build Information about the build itself.
build.id Optional Build identifier.
build.date Optional Build date time.
build.number Optional Release candidate number.
build.as Optional Assembler executable
build.ld Optional Linker executable
build.cc Optional C compiler command
build.cxx Optional C++ compiler command
build.host Optional Compiler host alias (cpu)-(vendor)-(os)
build.target Optional Target host alias (cpu)-(vendor)-(os)
download Information about the resulting downloadable archive.
download.url Public archive URL
download.size In Bytes
download.mimetype File type
download.date Publication date

The complete JSON schema is available in the collection metadata:

The records added to the collection will be validated against that schema.

More about the release record ID

If you have some information about a release, you might want to guess its ID directly in order to fetch the individual record directly.

The unique ID of a release is the following:

{PRODUCT_NAME}_{CHANNEL}_{VERSION}_{PLATFORM}_{LOCALE}
  • {PRODUCT_NAME}: It can be either firefox, fennec or thunderbird
  • {CHANNEL}: It can be either aurora, beta, nightly, nightly-old-id The channel is not part of the ID for release and esr builds
  • {VERSION}: It is the full version of the build. Dots are replaced by - i.e 55-0-1, 55-1b2 For nightly we use the date and time of the build as a version prefix. i.e: 2017-06-01-10-02-05_55-0a1
  • {PLATFORM}: It is the target platform. i.e: macosx, android-arm, android-api-15, win32, win64, linux-i386, etc.
  • {LOCALE}: It is the locale code. i.e fr-fr, en-us

All dots are replaced with dashes and all string are in lowercase.

Here are some example of release ID:

  • firefox_nightly_2017-05-03-03-02-12_55-0a1_win64_en-us
  • thunderbird_52-0-1_linux-x86_64_en-us
  • firefox_aurora_54-0a2_macosx_en-us
  • firefox_beta_52-0b6_linux-x86_64_en-us
  • firefox_50-0rc1_linux-x86_64_fr
  • firefox_52-0esr_linux-x86_64_en-us
  • fennec_nightly-old-id_2017-05-30-10-01-27_55-0a1_android-api-15_multi

Jobs

A script will aggregate all build information from Mozilla archives, and another is in charge of keeping it up to date.

Everything can be executed from a command-line, but we use Amazon Lambda in production.

_images/overview.png

Currently we use Kinto as a generic database service. It allows us to leverage its simple API for storing and querying records. It also comes with a set of client libraries for JavaScript, Python etc.

Initialization

Note

The user:pass in the command-line examples is the Basic auth for Kinto.

The following is not mandatory but recommended. Kinto can use the JSON schema to validate the records. The following setting should be set to true in the server configuration file:

kinto.experimental_collection_schema_validation = true

Load latest S3 inventory

A command to initialize the remote Kinto server, download the latest S3 manifests, containing information about all available files on archive.mozilla.org, and send that information as buildhub records to the remote Kinto server.

latest-inventory-to-kinto

The command will go through the list of files, pick release files, and deduce their metadata. It is meant to be executed on an empty server, or periodically to catch up with recent releases in case the other event-based lambda had failed.

Its configuration is read from environment variables:

  • SERVER_URL (default: http://localhost:8888/v1)
  • BUCKET (default: build-hub)
  • COLLECTION (default: releases)
  • AUTH (default: user:pass)
  • CACHE_FOLDER (default: .)
  • NB_RETRY_REQUEST (default: 3)
  • BATCH_MAX_REQUESTS (default: taken from server)
  • TIMEOUT_SECONDS (default: 300)
  • INITIALIZE_SERVER (default: true): whether to initialize the destination bucket/collection.
  • SENTRY_DSN (default: empty/disabled. Example: https://<key>:<secret>@sentry.io/buildhub)

To use this script as an Amazon Lambda function, use the entry point:

  • buildhub.lambda_s3_inventory.lambda_handler

S3 Event lambda

The Amazon Lambda function that is in charge of keeping the database up-to-date. This one cannot be executed from the command-line.

When releases are published on S3, an S3 Event is triggered and the lambda is invoked.

Use the following entry point:

  • buildhub.lambda_s3_event.lambda_handler

Note

Since release records contain information from JSON metadata files, we handle the case when the JSON metdata file is published before the actual archive, and vice-versa.

The lambda accepts the following configuration (from environment variables):

  • SERVER_URL (default: http://localhost:8888/v1)
  • BUCKET (default: build-hub)
  • COLLECTION (default: releases)
  • CACHE_FOLDER (default: .)
  • AUTH (default: user:pass)
  • NB_RETRY_REQUEST (default: 3)
  • TIMEOUT_SECONDS (default: 300)
  • SENTRY_DSN (default: empty/disabled. Example: https://<key>:<secret>@sentry.io/buildhub)

Setup and configure Amazon Lambda

In order to build the AWS Lambda Zip archive in an isolated environment, we use Docker:

  • make lambda.zip

(...or most likely sudo make lambda.zip)

This will produce a zip file that has to be uploaded in AWS Lambda configuration panel.

_images/lambda-1.png _images/lambda-2.png _images/lambda-3.png _images/lambda-4.png

Using Docker

Some commands are exposed in the container entry-point command (docker run).

The exhaustive list of available commands and description is available using:

docker run -t mozilla/buildhub

For example, run tests:

docker run -t mozilla/buildhub test

Or load the latest S3 inventory:

docker run -e "SERVER_URL=https://buildhub.prod.mozaws.net/v1" -e "AUTH=user:pass" -t mozilla/buildhub latest-inventory-to-kinto

Load S3 inventory manually

In order to fetch inventories from S3, install the dedicated Amazon Services client:

sudo apt-get install awscli

We are interested in two listing: firefox and archive (thunderbird, mobile).

export LISTING=archive

List available manifests in the inventories folder:

aws --no-sign-request --region us-east-1 s3 ls "s3://net-mozaws-prod-delivery-inventory-us-east-1/public/inventories/net-mozaws-prod-delivery-$LISTING/delivery-$LISTING/"

Download the latest manifest:

aws --no-sign-request --region us-east-1 s3 cp s3://net-mozaws-prod-delivery-inventory-us-east-1/public/inventories/net-mozaws-prod-delivery-$LISTING/delivery-$LISTING/2017-08-02T00-11Z/manifest.json

Download the associated files (using jq):

files=$(jq -r '.files[] | .key' < 2017-08-01T00-12Z/manifest.json)
for file in $files; do
    aws --no-sign-request --region us-east-1 s3 cp "s3://net-mozaws-prod-delivery-inventory-us-east-1/public/$file" .
done

Initialize the remote server from a manifest that will define the buckets, collection, records schema, and related permissions. This command is idempotent, and will only modify existing objects if something was changed.

kinto-wizard load --server https://kinto/ --auth user:pass jobs/buildhub/initialization.yml

Parse S3 inventory, fetch metadata, and print records as JSON in stdout:

zcat *.csv.gz | inventory-to-records > records.data

Load records into Kinto:

cat records.data | to-kinto --server https://kinto/ --bucket build-hub --collection release --auth user:pass

Repeat with LISTING=firefox.

Note

All three commands can be piped together with their respective parameters:

zcat *.csv.gz | inventory-to-records | to-kinto

Support

Do not hesitate to ask questions, start conversations, or report issues on the BuildHub Github repo.