Buildhub2¶
Buildhub2¶
Buildhub2 is an index of build information for Firefox, Firefox Dev Edition, Thunderbird, and Fennec.
License: MPLv2
Community Participation Guidelines: https://github.com/mozilla-services/tecken/blob/master/CODE_OF_CONDUCT.md
Code style: Black
Production server: https://buildhub.moz.tools/
Overview¶
Every time Taskcluster builds a version of
Firefox, Fennec, etc. the built files are put into an S3 bucket. One of the
files that is always accompanied is a file called buildhub.json
that we
download, validate an index into a PostgreSQL database as well as
Elasticsearch.
When files are saved to the S3 bucket, the filename gets added to the SQS queue
which is consumed by the daemon. The daemon looks at the filenames and indexes
the buildhub.json
ones into Buidlhub2.
Buildhub2 has a webapp which is a single-page-app that helps you make Elasticsearch queries and displays the results.
Buildhub2 has an API which you can use to query the data.
For more on these, see the user docs.
First Principles¶
Buildhub2 reflects data on archive.mozilla.org.
Buildhub2 will never modify, create, or remove build data from the
buildhub.json
files that are discovered and indexed. If the data is wrong,
it needs to be fixed on archive.mozilla.org.
Buildhub2 records are immutable.
If a certain buildhub.json
file is created, its primary key becomes a hash
of its content. If, under the same URL, the buildhub.json
is modified, it
will lead to a new record in Buildhub.
Using Buildhub2¶
Contents
Products supported by Buildhub2¶
Buildhub2 indexes build information that exists on archive.mozilla.org.
If you want build information for your product in Buildhub2, you’ll need to change the release process to additionally add files to archive.mozilla.org in the same way that Firefox does.
Fields in Buildhub2¶
Buildhub2 records have the same structure as buildhub.json
files on archive.mozilla.org.
{
"build": {
"as": "/builds/worker/workspace/build/src/clang/bin/clang -std=gnu99",
"cc": "/builds/worker/workspace/build/src/clang/bin/clang -std=gnu99",
"cxx": "/builds/worker/workspace/build/src/clang/bin/clang++",
"date": "2019-06-03T18:14:08Z",
"host": "x86_64-pc-linux-gnu",
"id": "20190603181408",
"target": "x86_64-pc-linux-gnu"
},
"download": {
"date": "2019-06-03T20:49:46.559307+00:00",
"mimetype": "application/octet-stream",
"size": 63655677,
"url": "https://archive.mozilla.org/pub/firefox/candidates/68.0b7-candidates/build1/linux-x86_64/en-US/firefox-68.0b7.tar.bz2"
},
"source": {
"product": "firefox",
"repository": "https://hg.mozilla.org/releases/mozilla-beta",
"revision": "ed47966f79228df65b6326979609fbee94731ef0",
"tree": "mozilla-beta"
},
"target": {
"channel": "beta",
"locale": "en-US",
"os": "linux",
"platform": "linux-x86_64",
"version": "68.0b7"
}
}
If you want different fields, the Taskcluster task will need to be changed to include the new information. Additionally, Buildhub2 will need to adjust the schema. Please open up an issue with your request.
Website¶
You can query build information using the website at https://buildhub.moz.tools/.
The search box uses Elasticsearch querystring syntax.
See also
Elasticsearch querystring syntax: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/query-dsl-query-string-query.html#query-string-syntax
API¶
The API endpoint is at: https://buildhub.moz.tools/api/search
You can query it by passing in Elasticsearch search queries as HTTP POST payloads.
See also
Links to Elasticsearch 6.7 search documentation:
Request body search: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-request-body.html
Query: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-request-query.html
Query DSL: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/query-dsl.html
Aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-aggregations.html
Example: Is this an official build id?¶
Is 20170713200529
an official build id?
We can query for records where build.id
has that value, limit the size to 0
so we’re not getting records, back, and then check the total.
$ curl -s -X POST https://buildhub.moz.tools/api/search \
-d '{"size": 0, "query": {"term": {"build.id": "20170713200529"}}}' | \
jq .hits.total
Example: What is the Mercurial commit ID for a given build id?¶
What is the Mercurial commit ID for a given build id?
Query for the build id and only return 1 record. Extract the specific value
using jq
.
$ curl -s -X POST https://buildhub.moz.tools/api/search \
-d '{"size": 1, "query": {"term": {"build.id": "20170713200529"}}}' | \
jq '.hits.hits[] | ._source.source.revision'
Example: What platforms are available for a given build id?¶
What platforms are available for a given build id?
To get this, we want to do an aggregation on target.platform
. We set the
size
to 0 so it doesn’t return aggregations and results for the query.
$ curl -s -X POST https://buildhub.moz.tools/api/search \
-d '{"size": 0, "query": {"term": {"build.id": "20170713200529"}}, "aggs": {"platforms": {"terms": {"field": "target.platform"}}}}' | \
jq '.aggregations.platforms.buckets'
Architecture and Overview¶
Contents
High-level¶
Mozilla builds versions of Firefox, Fennec etc. and the built files are
uploaded to an S3 bucket. With each build a buildhub.json
file is created
that has all the possible information we intend to store and make searchable.
When a new file is added (or edited) in S3 it triggers an event notification that goes to an AWS SQS queue.
The daemon consumes the SQS queue. The daemon script looks for the exact file match. Since the SQS message only contains the name of the S3 object, it triggers a function that downloads that file, validates its content and stores it in PostgreSQL and also in Elasticsearch.
The four parts of Buildhub are:
The Django web server
The SQS consumer daemon script
The PostgreSQL and Elasticsearch that makes it possible to search
A
create-react-app
based React app for the UI which essentially runs SearchKit
Flow¶
TaskCluster builds a, for example,
Firefox-79-installer.exe
and abuildhub.json
TaskCluster uploads these files into S3.
An S3 configuration triggers an SQS event that puts this S3-write into the queue.
Buildhub2 processor daemon polls the SQS queue and gets the file creation event.
Buildhub2 processor daemon downloads the
buildhub.json
file from S3 using Pythonboto3
.Buildhub2 processor daemon reads its payload and checks the JSON Schema validation.
Buildhub2 processor daemon inserts the JSON into PostgreSQL using the Django ORM.
The JSON is then inserted into Elasticsearch.
Validation¶
The validation step before storing anything is to check that the data in the
buildhub.json
file matches the schema.yaml
file. Since TaskCluster
builds the buildhub.json
file and this service picks it up asynchronous and
delayed, there is at the moment no easy way to know an invalid
buildhub.json
file was built.
If you want to change the schema.yaml
make sure it matches the schema used
inside mozilla-central
when the buildhub.json
files are created.
Keys¶
The following keys are tracked in the code. Each one with a different purpose.
Generally the pattern is that every key starts with a “context” keyword followed
by an underscore. For example sqs_
. That prefix is primarily to be able to
trace it back to the source code, but also as a form of namespace.
sqs_process_buildhub_json_key
¶
Timer.
How long it takes to consider a buildhub.json
S3 key. This involves both
downloading it from S3 and to attempt to insert it into our database. That
“attempt to insert” means the hash is calculated, looked up and depending on if
it was found makes an insert or does nothing.
sqs_inserted
¶
Incr.
Count of inserts that were new and actually inserted into the database coming from the SQS queue.
sqs_not_inserted
¶
Incr.
Count of inserts, from a buildhub.json
that were attempted to be inserted
but were rejected because it was already in the database.
sqs_messages
¶
Incr.
This is a count of messages received by consuming the SQS queue. Assume this to be equal to the number of messages deleted from the SQS queue. It can be less messages deleted and than received in the unexpected cases where messages trigger an unexpected Python exception (caught in Sentry).
Note! The total number of sqs_inserted
+ sqs_not_inserted
is not equal
to the sqs_messages
because of files that are not matching what we’re looking
to process.
sqs_key_matched
¶
Incr.
Every time an S3 record is received whose S3 key we match. Expect this number
to match sqs_inserted
+ sqs_not_inserted
.
sqs_not_key_matched
¶
Incr.
Every message received (see sqs_messages
) can contain multiple types of
messages. We only look into the S3 records. Of those, some S3 keys we can
quickly ignore as not matched. That is what this increment is counting.
So roughly, this number is sqs_messages
minus sqs_insert
minus
sqs_not_inserted
.
api_search
¶
Timer.
Timer of how long it takes to fullfil every /api/search
request. This time
involves the Django request/response overheads as well as the time it takes to
send and receive the actual query to Elasticsearch.
api_search_records
¶
Gauge.
A count of the number of builds found by Elasticsarch in each API/search request.
api_search_requests
¶
Incr.
Measurement of the number of requests received to be proxied to Elasticsearch.
Note that every incr is accompanied with a tag. That is method:$METHOD
.
For example, method:POST
.
backfill_inserted
¶
Incr.
When a build is inserted from the backfill job that we did not already have. If this number goes up it means the SQS consumption is failing.
backfill_not_inserted
¶
Incr.
When running the backfill, we iterate through all keys in the S3 bucket and to avoid having to download every single matched key, we maintain a the keys’ full path and ETag in the database to make the lookups faster. If a key and ETag is not recognized and we attempt to download and insert it but end up not needing to, then this increment goes up. Expect this number to stay very near zero in a healthy environment.
backfill_listed
¶
Incr.
When running the backfill, this is a count of the number of S3 objects we download per page. To get an insight into the number of S3 objects considered, in total, use this number but over a window of time.
backfill_matched
¶
Incr.
When running the backfill, we quickly filter all keys, per batch, down to the
ones that we consider. This is a count of that. It’s an increment per batch.
Similar to backfill_listed
, to get an insight into the total, look at this
count over a window of time.
backfill
¶
Timer.
How long it takes to run the whole backfill job. This includes iterating over every single S3 key.
kinto_migrated
¶
Incr.
When we run the migration from Kinto, a count of the number of messages (per batch) that we received from batch fetching from the legacy Kinto database.
kinto_inserted
¶
Incr.
A count of the number of builds that are inserted from the Kinto migration. One useful use of this is to that you can run the Kinto migration repeatedly until this number does not increment.
Development environment¶
You can set up a Buildhub2 development environment that runs on your local machine for development and testing.
Setting up¶
To set up a dev environment, install the following:
Docker
make
git
bash
Clone the repo from GitHub at https://github.com/mozilla-services/buildhub2.
Then do the following:
# Build the Docker images
$ make build
# Wipe and initialize services
$ make setup
Once you’ve done that, you can run Buildhub2.
Configuration¶
The Django settings depends on there being an environment variable
called DJANGO_CONFIGURATION
.
# If production
DJANGO_CONFIGURATION=Prod
# If stage
DJANGO_CONFIGURATION=Stage
You need to set a random DJANGO_SECRET_KEY
. It should be predictably
random and a decent length:
DJANGO_SECRET_KEY=sSJ19WAj06QtvwunmZKh8yEzDdTxC2IPUXfea5FkrVGNoM4iOp
The ALLOWED_HOSTS
needs to be a list of valid domains that will be
used to from the outside to reach the service. If there is only one
single domain, it doesn’t need to list any others. For example:
DJANGO_ALLOWED_HOSTS=buildhub.mozilla.org
For Sentry the key is SENTRY_DSN
which is sensitive but for the
front-end (which hasn’t been built yet at the time of writing) we also
need the public key called SENTRY_PUBLIC_DSN
. For example:
SENTRY_DSN=https://bb4e266xxx:d1c1eyyy@sentry.prod.mozaws.net/001
SENTRY_PUBLIC_DSN=https://bb4e266xxx@sentry.prod.mozaws.net/001
Content Security Policy (CSP) headers are on by default. To change the URL for
where violations are sent you can change DJANGO_CSP_REPORT_URI
. By default
it’s set to ''
. Meaning, unless set it won’t be included as a header. See
the MDN documentation on report-uri for more info.
To configure writing to BigQuery, the following variables will need to be set:
DJANGO_BQ_ENABLED=True
DJANGO_BQ_PROJECT_ID=...
DJANGO_BQ_DATASET_ID=...
DJANGO_BQ_TABLE_ID=...
The project and dataset will need to be provisioned before running the server
with this functionality enabled. Additionally, credentials will need to be
passed to the server. If it is running in Google Compute Engine, this is
configured through the default service account. To run this via
docker-compose
, the following lines in docker-compose.yml
will need to
be un-commented:
volumes:
...
# - ${GOOGLE_APPLICATION_CREDENTIALS}:/tmp/credentials
In addition, set the following variable after downloading the service account
credentials from IAM & admin > Service accounts
in the Google Cloud Platform
console for the project.
GOOGLE_APPLICATION_CREDENTIALS=/path/to/keyfile.json
Run make test
and check that none of the tests are skipped.
Adding data¶
FIXME: How to add data to your local instance?
Running the webapp¶
The webapp consists of a part that runs on the server powered by Django and a part that runs in the browser powered by React.
To run all the services required and the server and a service that builds static assets needed by the browser ui, do:
$ make run
This will start the server on port 8000
and the web ui on port 3000
.
You can use http://localhost:3000
with your browser to use the web interface
and curl/requests/whatever to use the API.
Running the daemon¶
Buildhub2 has a daemon that polls SQS for events and processes new files on archive.mozilla.org.
You can run the daemon with:
$ make daemon
You can quit it with Ctrl-C
.
Development conventions and howto¶
Contents
Conventions¶
License preamble¶
All code files need to start with the MPLv2 header:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
Linting¶
We use flake8 for linting Python code. See https://github.com/mozilla-services/buildhub2/blob/master/.flake8 for rules.
We use black for fixing Python code
formatting. The specific version is in requirements.txt
so use that one in
your IDE.
We use eslint for linting and fixing JS code.
CI will fail if linting raises any errors.
To run linting tasks on Python and JS files:
$ make lintcheck
To run lint-based fixing tasks on Python and JS files:
$ make lintfix
Documentation¶
We use Sphinx to generate documentation. Documentation is written using restructured text.
To build the docs, do:
$ make docs
You can view the docs in docs/_build/html/index.html
in your browser.
Documentation is published at https://buildhub2.readthedocs.io/ every time changes land in master branch.
Backend (webapp server and daemon)¶
The backend is written in Python using Django. This covers both the backend webserver as well as the daemon.
Maintaining dependencies¶
All Python requirements needed for development and production needs to be
listed in requirements.txt
with sha256 hashes.
The most convenient way to modify this is to run hashin
. For example:
$ pip install hashin
$ hashin Django==1.10.99
$ hashin other-new-package
This will automatically update your requirements.txt
but it won’t
install the new packages. To do that, you need to exit the shell and run:
$ make build
To check which Python packages are outdated, use piprot in a shell:
$ make shell
root@...:/app# pip install piprot
root@...:/app# piprot -o requirements.txt
The -o
flag means it only lists requirements that are out of date.
Note
A good idea is to install hashin
and piprot
globally
on your computer instead. It doesn’t require a virtual environment if
you use pipx.
Frontend (ui)¶
The ui is a React single-page-app. It makes API calls to the backend to retrieve data.
All source code is in the ./ui
directory. More specifically
the ./ui/src
which are the files you’re most likely going to
edit to change the front-end.
All CSS
is loaded with yarn
by either drawing from .css
files
installed in the node_modules
directory or from imported .css
files
inside the ./ui/src
directory.
The project is based on create-react-app so the main rendering engine is
React. There is no server-side rendering. The idea is that all (unless
explicitly routed in Nginx) requests that don’t immediately find a static file
should fall back on ./ui/build/index.html
. For example, loading
https://buildhub.moz.tools/uploads/browse` will actually load
./ui/build/index.html
which renders the .js
bundle which loads
react-router
which, in turn, figures out which component to render and
display based on the path (“/uploads/browse” for example).
Handling dependencies¶
A “primitive” way of changing dependencies is to edit the list
of dependencies in ui/package.json
and running
docker-compose build ui
. This is not recommended.
A much better way to change dependencies is to use yarn
. Use
the yarn
installed in the Docker ui container. For example:
$ docker-compose run ui bash
> yarn outdated # will display which packages can be upgraded today
> yarn upgrade date-fns --latest # example of upgrading an existing package
> yarn add new-hotness # adds a new package
When you’re done, you have to rebuild the ui Docker container:
$ docker-compose build ui
Your change should result in changes to ui/package.json
and
ui/yarn.lock
which needs to both be checked in and committed.
Tools¶
Elasticsearch¶
To access Elasticsearch, you can use the Elasticsearch API against
http://localhost:9200
.
Deployment¶
Buildhub2 has two server environments: stage and prod.
Buildhub2 images are located on Docker Hub.
Notifications for deployment status are in #buildhub
on Slack.
Deploy to Stage¶
Stage is at: https://stage.buildhub2.nonprod.cloudops.mozgcp.net/
To deploy to stage, tag the master branch and push the tag:
$ make tag
Deploy to Prod¶
Prod is at: https://buildhub.moz.tools/
To deploy to prod, ask ops to promote the tag on stage.
Backfilling¶
There’s a ./manage.py backfill
command that uses the S3 API to iterate over
every single key in an S3 bucket, filter out those called *buildhub.json
and then check to see if we have those records.
The script takes FOREVER to run. The Mozilla production S3 bucket used for all builds is over 60 million records and when listing over them you can only read 1,000 keys at a time.
When iterating over all S3 keys it first filter out the *buildhub.json
ones,
compares the S3 keys and ETags with what is in the database, and inserts/updates
accordingly.
Configuration¶
The S3 bucket it uses is called net-mozaws-prod-delivery-inventory-us-east-1
in us-east-1
. It’s left as default in the configuration. If you need to
override it set, for example:
DJANGO_S3_BUCKET_URL=https://s3-us-west-2.amazonaws.com/buildhub-sqs-test
If you know, in advance, what the S3 bucket that is mentioned in the SQS payloads is, you can set that up with:
DJANGO_SQS_S3_BUCKET_URL=https://s3-us-west-2.amazonaws.com/mothership
If either of these are set, they are tested during startup to make sure you have relevant read access.
Reading the S3 bucket is public and doesn’t require AWS_ACCESS_KEY_ID
and AWS_ACCESS_KEY_ID
but to read the SQS queue these need to be set up.
AWS_ACCESS_KEY_ID=AKI....H6A
AWS_SECRET_ACCESS_KEY=....
Note
The access key ID and secret access keys are not prefixed with DJANGO_
.
How to run it¶
Get ops to run:
$ ./manage.py backfill
This uses settings.S3_BUCKET_URL
which is the DJANGO_S3_BUCKET_URL
environment variable.
The script will dump information about files it’s seen into a .json
file on
disk (see settings.RESUME_DISK_LOG_FILE
aka.
DJANGO_RESUME_DISK_LOG_FILE
which is
/tmp/backfill-last-successful-key.json
by default). With this file, it’s
possible to resume the backfill from where it last finished. This is useful
if the backfill breaks due to an operational error or even if you Ctrl-C
the command the first time. To make it resume, you have to set the flag
--resume
:
$ ./manage.py backfill --resume
You can set this from the very beginning too. If there’s no disk to get information about resuming from, it will just start from scratch.
Migrating from Kinto (over HTTP)¶
Note
This can be removed after Buildhub has been decomissioned.
If you intend to migrate from the old Buildhub’s Kinto database you need to run:
$ ./manage.py kinto-migration http://localhost:8888
That URL obviously depends on where the Kinto server is hosted. If the old Kinto database contains old legacy records that don’t conform you might get errors like:
Traceback (most recent call last):
...
jsonschema.exceptions.ValidationError: ['c:/builds/moz2_slave/m-rel-w64-00000000000000000000/build/', 'src/vs2015u3/VC/bin/amd64/link.exe'] is not of type 'string'
Failed validating 'type' in schema['properties']['build']['properties']['ld']:
{'description': 'Executable', 'title': 'Linker', 'type': 'string'}
On instance['build']['ld']:
['c:/builds/moz2_slave/m-rel-w64-00000000000000000000/build/',
'src/vs2015u3/VC/bin/amd64/link.exe']
Then simply run:
$ ./manage.py kinto-migration http://localhost:8888 --skip-validation
Note, during an early period, where the old Kinto database is still getting populated you can run this command repeatedly and it will continue where it left off.
Note
If you have populated a previously empty PostgreSQL from records from the Kinto
database, you have to run ./manage.py reindex-elasticsearch
again.
Migrating from Kinto (by PostgreSQL)¶
Note
This can be removed after Buildhub has been decomissioned.
A much faster way to migrate from Kinto (legacy Buildhub) is to have a dedicated PostgreSQL connection.
Once that’s configured you simply run:
$ ./manage.py kinto-database-migration
This will validate every single record and crash if any single record is invalid. If you’re confident that all the records, about to be migrated, are valid, you can run:
$ ./manage.py kinto-database-migration --skip-validation
Another option is to run the migration and run validation on each record, but instead of crashing, simply skip the invalid ones. In fact, this is the recommended way to migrate:
$ ./manage.py kinto-database-migration --skip-invalid
Keep an eye on the log output about the number of invalid records skipped.
It will migrate every single record in one sweep (but broken up into batches of 10,000 rows at a time). If it fails, you can most likely just try again.
Also, see the note about about the need to run ./manage.py reindex-elasticsearch
afterwards.
Configuration¶
When doing the migration from Kinto you can either rely on HTTP, or, you can
connect directly to a Kinto database. The way this works is it, optionally,
sets up a separate PostgreSQL connection. The kinto-migration
script will
then be able to talk directly to this database. It’s disabled by default.
To enable it, it’s the same “rules” as for DATABASE_URL
except it’s called
KINTO_DATABASE_URL
. E.g.:
KINTO_DATABASE_URL="postgres://username:password@hostname/kinto"
Testing¶
Unit tests¶
Buildhub2 has a suite of unit tests for Python. We use pytest to run them.
$ make test
If you need to run specific tests or pass in different arguments, you can run
bash in the base container and then run pytest
with whatever args you want.
For example:
$ make shell
root@...:/app# pytest
SQS Functional testing¶
By default, for local development you can consume the SQS queue set up for Dev.
For this you need AWS credentials. You need to set up your AWS IAM Dev credentials
in ~/.aws/credentials
(under default) or in .env
.
The best tool for putting objects into S3 and populate the Dev SQS queue is to run s3-file-maker. To do that run, on your host:
cd "$GOPATH/src"
git clone https://github.com/mostlygeek/s3-file-maker.git
cd s3-file-maker
dep ensure
go build main.go
./main [--help]
Note
This SQS queue can only be consumed by one person at a time.