Skyline documentation¶
Contents:
Overview¶
A brief history¶
Skyline was originally open sourced by Etsy as a real-time anomaly detection system. It was originally built to enable passive monitoring of hundreds of thousands of metrics, without the need to configure a model/thresholds for each one, as you might do with Nagios. It was designed to be used wherever there are a large quantity of high-resolution timeseries which need constant monitoring. Once a metric stream was set up (from statsd, graphite or other), additional metrics are automatically added to Skyline for analysis, anomaly detection, alerting and briefly publishing in the Webapp frontend. github/etsy stopped actively maintaining Skyline in 2014.
Skyline - as a work in progress¶
Etsy found the “one size fits all approach” to anomaly detection wasn’t actually proving all that useful to them.
There is some truth in that in terms of the one size fits all methodology that Skyline was framed around. With hundreds of thousands of metrics this does make Skyline fairly hard to tame, in terms of how useful it is and tuning the noise is difficult. Tuning the noise to make it constantly useful and not just noisy, removes the “without the need to configure a model/thresholds” element somewhat.
So why continue developing Skyline?
The architecture/pipeline works very well at doing what it does. It is solid and battle tested. To overcome some of the limitations of Skyline. This project extends it.
And...
Skyline is FAST!!! Faster enough to handle 10s of 1000s of timeseries in near real-time. In the world of Python, data analysis, R and most machine learning, Skyline is FAST. Processing and analyzing 1000s and 1000s of constantly changing timeseries, every minute of every day and it can do it in multiple resolutions, on a fairly low end commodity server.
Skyline learns using a timeseries similarities comparison method based on features extraction and comparison using the tsfresh package.
The new look of Skyline apps¶
- Horizon - feed metrics to Redis via a pickle input
- Analyzer - analyze metrics
- Mirage - analyze specific metrics at a custom time range
- Boundary - analyze specific timeseries for specific conditions
- Crucible - store anomalous timeseries resources and ad-hoc analysis of any timeseries
- Panorama - anomalies database and historical views
- Webapp - frontend to view current and histroical anomalies and browse Redis with rebrow
- Ionosphere - timeseries fingerprinting and learning
Skyline is still a near real-time anomaly detection system, however it has various modes of operation that are modular and self contained, so that only the desired apps need to be enabled.
Skyline can now be feed/query and analyze timeseries on an ad-hoc basis, on the fly. This means Skyline can now be used to analyze and process static data too, it is no longer just a machine/app metric fed system.
What’s new¶
See whats-new for a comprehensive overview and description of the latest version/s of Skyline.
What’s old¶
It must be stated the original core of Skyline has not been altered in any way,
other than some fairly minimal Pythonic performance improvements, a bit of
optimization in terms of the logic used to reach settings.CONSENSUS
and a
package restructure. In terms of the original Skyline Analyzer, it does the
same things just a little differently, hopefully better and a bit more.
There is little point in trying to improve something as simple and elegant in methodology and design as Skyline, which has worked so outstandingly well to date. This is a testament to a number of things, in fact the sum of all it’s parts, Etsy, Abe and co. did a great job in the conceptual design, methodology and actual implementation of Skyline and they did it with very good building blocks from the scientific community.
The architecture in a nutshell¶
Skyline uses to following technologies and libraries at its core:
- Python - the main skyline application language - Python
- Redis - Redis an in-memory data structure store
- numpy - NumPy is the fundamental package for scientific computing with Python
- scipy - SciPy Library - Fundamental library for scientific computing
- pandas - pandas - Python Data Analysis Library
- mysql/mariadb - a database - MySQL or MariaDB
- rebrow - Skyline uses a modified port of Marian Steinbach’s excellent rebrow
- tsfresh - tsfresh - Automatic extraction of relevant features from time series
- memcached - memcached - memory object caching system
- pymemcache - pymemcache - A comprehensive, fast, pure-Python memcached client
Requirements¶
The requirements are:
- Linux (and probably any environment that supports Python virtualenv and bash)
- virtualenv
- Python-2.7.12 (running in an isolated vitualenv)
- Redis
- MySQL or mariadb [optional - required for Panorama]
- A Graphite implementation sending data would help :)
requirements.txt
¶
The requirements.txt
file lists the required packages and the last
verified as working version with Python-2.7 within a virtaulenv.
Recent changes in the pip environment¶
Note
If you use pip or virtualenv in any other context, please note the following and assess if and/or how it pertains to you environment
The new pip, setuptools, distribute and virtualenv methodology that the
pypa community has adopted is in a bit of a state of flux in terms of
the setuptools version and virtualenv implementation of making the
--distribute
and --setuptools
legacy parameters has caused pip
8.1.1 and setuptools 20.9.0 and virtualenv 15.0.1 to complain if
distribute is one of the requirements on packages that pip tries to
install. However, it is needed for Mirage if you are trying to
run/upgrade on Python-2.6
pandas¶
Updated: 20161120 - can no longer be supported as of Ionosphere (probably)
The PANDAS_VERSION
variable was added to settings.py to handle
backwards compatability with any instances of Skyline that are run older
versions that perhaps cannot upgrade to a later version due to any
mainstream packaging restrictions. This PANDAS_VERSION
is required
to run the applicable panda functions dependent on the version that is
use.
- pandas 0.17.0 deprecated the pandas.Series.iget in favour of
.iloc[i]
or.iat[i]
- pandas 0.18.0 introduced a change in the Exponentially-weighted moving average function used in a number of algorithms
if PANDAS_VERSION < '0.18.0':
expAverage = pandas.stats.moments.ewma(series, com=15)
else:
expAverage = pandas.Series.ewm(series, ignore_na=False, min_periods=0, adjust=True, com=15).mean()
if PANDAS_VERSION < '0.17.0':
return abs(series.iget(-1)) > 3 * stdDev
else:
return abs(series.iat[-1]) > 3 * stdDev
Skyline should be able to run on pandas versions 0.12.0 - 0.18.0 (or later)
Python-2.6 - may work, but not supported¶
Skyline can still run on Python-2.6. However deploying Skyline on Python-2.6 requires jumping through some hoops. This is because of the dependencies and pip packages moving a lot. At some point some older pip package is not going to be available any longer and it will no longer be possible, unless you are packaging the pip packages into your own packages, e.g. with fpm or such.
Updated: 20161120 - can no longer be supported as of Ionosphere (probably)
RedHat family 6.x¶
If you are locked into mainstream versions of things and cannot run Skyline in an isolated virtualenv, but have to use the system Python and pip. The following pip installs and versions are known to working, with a caveat on the scipy needs to be installed via yum NOT pip and you need to do the following:
Updated: 20161120 - this information is now too old to be applicable really.
pip install numpy==1.8.0
yum install scipy
pip install --force-reinstall --ignore-installed numpy==1.8.0
Known working scipy rpm - scipy-0.7.2-8.el6.x86_64
pip packages (these are the requirements and their dependencies), it is possible that some of these pip packages will no longer exist or may not exist in the future, this is documented here for info ONLY.
argparse (1.2.1)
backports.ssl-match-hostname (3.4.0.2)
cycler (0.9.0)
distribute (0.7.3)
Flask (0.9)
hiredis (0.1.1)
iniparse (0.3.1)
iotop (0.3.2)
Jinja2 (2.7.2)
lockfile (0.9.1)
MarkupSafe (0.23)
matplotlib (1.5.0)
mock (1.0.1)
msgpack-python (0.4.2)
nose (0.10.4)
numpy (1.7.0)
ordereddict (1.2)
pandas (0.12.0)
patsy (0.2.1)
pip (1.5.4)
pycurl (7.19.0)
pygerduty (0.29.1)
pygpgme (0.1)
pyparsing (1.5.6)
python-daemon (1.6)
python-dateutil (2.3)
python-simple-hipchat (0.3.3)
pytz (2014.4)
redis (2.7.2)
requests (1.1.0)
scipy (0.7.2)
setuptools (11.3.1)
simplejson (2.0.9)
six (1.6.1)
statsmodels (0.5.0)
tornado (2.2.1)
unittest2 (0.5.1)
urlgrabber (3.9.1)
Werkzeug (0.9.4)
yum-metadata-parser (1.1.2)
Getting started¶
At some point hopefully it will be pip install skyline but for now see the Installation page after reviewing the below
Realistic expectations¶
Anomaly detection is not easy. Skyline is not easy to set up, it has a number of moving parts that need to be orchestrated. Further to this, for Skyline to be configured, trained and start learning takes a lot of time.
Take time to read through the documentation and review settings.py
at the
same time.
Anomaly detection is a journey not an app¶
Anomaly detection is partly about automated anomaly detection and partly about knowing your metrics and timeseries patterns. Not all timeseries are created equally.
It helps to think of anomaly detection as an ongoing journey. Although ideally it would be great to be able to computationally detect anomalies with a high degree of certainty, there is no getting away from the fact that the more you learn, “help” and tune your anomaly detection, the better it will become.
The fact that Skyline does computationally detect anomalies with a high degree of certainty, can be a problem in itself. But it is not Skyline’s fault that:
- a lot of your metrics are anomalous
- that feeding all your metrics to Skyline and setting alerting on all your metric namespaces is too noisy to be generally considered useful
Enabling metrics incrementally¶
Skyline was originally pitched to automatically monitor #allthethings, all your metrics, it can but...
Skyline should have been pitched to monitor your KEY metrics.
To begin with decide what your 10 most important Graphite metrics are and
only configure settings.ALERTS
and settings.SMTP_OPTS
on those
to begin with and get to know what Skyline does with those. Add more key metric
groups as you get the hang of it.
You cannot rush timeseries.
Enabling Skyline modules incrementally¶
Skyline’s modules do different things differently and understanding the process and pipeline helps to tune each Skyline module appropriately for your data.
Each analysis based module, Analyzer, Mirage, Boundary, Ionosphere (and Crucible), have their own specific configurations. These configurations are not extremely complex, but they are not obvious or trivial either when you are starting out. Bringing Skyline modules online incrementally over time, helps you to understand the processes and their different configuration settings easier. Easier than trying to get the whole stack up and running straight off.
Start with Horizon, Analyzer, Webapp and Panorama¶
It is advisable to only start the original Horizon, Analyzer and Webapp daemons
and Panorama initially and take time to understand what Skyline is doing. Take
some time to tune Analyzer’s ALERTS
and learn the patterns in your IMPORTANT
metrics:
- which metrics trigger anomalies?
- when the metrics trigger anomalies?
- why/what known events are triggering anomalies?
- are there seasonalities/periodicity in anomalies some metrics?
- what metrics are critical and what metrics are just “normal”/expected noise
Panorama will help you view what things are triggering as anomalous.
Once you have got an idea of what you want to anomaly detect on and more importantly, on what and when you want to alert, you can start to define the settings for other Skyline modules such as Mirage, Boundary and Ionosphere and bring them online too.
Add Mirage parameters to the ALERTS
¶
Once you have an overview of metrics that have seasonalities that are greater
than the settings.FULL_DURATION
, you can add their Mirage parameters to
the settings.ALERTS
tuples and start the Mirage daemon.
Add Boundary settings¶
You will know what your key metrics are and you can define their acceptable
boundaries and alerting channels in the settings.BOUNDARY_METRICS
tuples
and start the Boundary daemon.
Train Ionosphere¶
Via the alert emails or in the Skyline Ionosphere UI, train Ionosphere on what is NOT anomalous.
Ignore Crucible¶
EXPERIMENTAL
By default Crucible is enabled in the settings.py
however, for other Skyline
modules to send Crucible data, Crucible has to be enabled via the appropriate
settings.py
variable for each module.
Crucible has 2 roles:
- Store resources (timeseries json and graph pngs) for triggered anomalies - note this can consume a lot of disk space if enabled.
- Run ad-hoc analysis on any timeseries and create matplotlib plots for the run algorithms.
It is not advisable to enable Crucible on any of the other modules unless you really want to “see” anomalies in great depth. Crucible is enabled as there is a Crucible frontend view on the roadmap that will allow the user to test any timeseries of any metric directly through the UI.
Running Skyline in a Python virtualenv¶
Running Skyline in a Python virtualenv is the recommended way to run Skyline. This allows for Skyline apps and all dependencies to be isolated from the system Python version and packages and allows Skyline be be run with a specific version of Python and as importantly specific versions of the dependencies packages.
The possible overhead of adding Python virtualenv functionality to any configuration management is probably worth the effort in the long run.
sudo
¶
Use sudo
appropriately for your environment wherever necessary.
HOWTO Python virtualenv Skyline¶
Dependencies¶
Building Python versions from the Python sources in Python virtualenv requires the following system dependencies:
- RedHat family
yum -y install epel-release
yum -y install autoconf zlib-devel openssl-devel sqlite-devel bzip2-devel \
gcc gcc-c++ readline-devel ncurses-devel gdbm-devel compat-readline5 \
freetype-devel libpng-devel python-pip wget tar git
- Debian family
apt-get -y install build-essential
apt-get -y install autoconf zlib1g-dev libssl-dev libsqlite3-dev lib64bz2-dev \
libreadline6-dev libgdbm-dev libncurses5 libncurses5-dev libncursesw5 \
libfreetype6-dev libxft-dev python-pip wget tar git
virtualenv¶
Regardless of your OS as long as you have pip installed you can install virtualenv. NOTE: if you are using a version of Python virtualenv already, this may not suit your setup.
virtualenv must be >= 15.0.1 due to some recent changes in the pip and setuptools, see Recent changes in the pip environment section in Requirements for more details.
This is using your system pip at this point only to install virtualenv.
pip install 'virtualenv>=15.0.1'
Python version¶
Below we use the path /opt/python_virtualenv
, which you can substitute
with any path you choose. We are going to use the Python-2.7.12 source and
build and install an isolated Python-2.7.12, this has no effect on your system
Python:
PYTHON_VERSION="2.7.12"
PYTHON_MAJOR_VERSION="2.7"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
mkdir -p "${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}"
mkdir -p "${PYTHON_VIRTUALENV_DIR}/projects"
cd "${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}"
wget -q "https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz"
tar -zxvf "Python-${PYTHON_VERSION}.tgz"
cd ${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}/Python-${PYTHON_VERSION}
./configure --prefix=${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}
make
# Optionally here if you have the time or interest you can run
# make test
make altinstall
You will now have a Python-2.7.12 environment with the Python
executable: /opt/python_virtualenv/versions/2.7.12/bin/python2.7
Create a Skyline Python virtualenv¶
A word of warning relating to pip, setuptools and distribute if you have opted not to use the above as you have Python virtualenv already. As of virtualenv 15.0.1 the pip community adopted the new pkg_resources _markerlib package structure, which means the following in the virtualenv context:
- distribute cannot be installed
- pip must be >=8.1.0
- setuptools must be >=20.2.2
Once again using Python-2.7.12:
PYTHON_VERSION="2.7.12"
PYTHON_MAJOR_VERSION="2.7"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py2712"
cd "${PYTHON_VIRTUALENV_DIR}/projects"
virtualenv --python="${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}/bin/python${PYTHON_MAJOR_VERSION}" "$PROJECT"
Make sure to add the /etc/skyline/skyline.conf
file - see
Installation
Installation¶
Intended audience¶
Skyline is not really a localhost
application, it needs lots of data, unless
you have a localhost
Graphite or pickle Graphite to your localhost.
Given the specific nature of Skyline, it is assumed that the audience will have a certain level of technical knowledge, e.g. it is assumed that the user will be familiar with the installation, configuration, operation and security practices and considerations relating to the following components:
- Graphite
- Redis
- MySQL
- Apache
- memcached
This installation document is specifically related to the required installs and configurations of things that are directly related Skyline. For notes regarding automation and configuration management see the section at the end of this page.
What the components do¶
- Graphite - sends metric data to Skyline Horizon via a pickle
- Redis - stores mod:settings.FULL_DURATION seconds (usefully 24 hours worth) of timeseries data
- MySQL - stores data about anomalies and timeseries features fingerprints and for learning things that are not anomalous.
- Apache - serves the Skyline webapp via gunicorn and handles basic http auth
- memcached - caches Ionosphere MySQL data
sudo
¶
Use sudo
appropriately for your environment wherever necessary.
Steps¶
Note
All the documentation and testing is based on running Skyline in a Python-2.7.12 virtualenv, if you choose to deploy Skyline another way, you are on your own. Although it is possible to run Skyline in a different type of environment, it does not lend itself to repeatability or a common known state.
- Create a python-2.7.12 virtualenv for Skyline to run in see Running in Python virtualenv
- Setup firewall rules to restrict access to the following:
settings.WEBAPP_IP
- default is 127.0.0.1settings.WEBAPP_PORT
- default 1500- The IP address and port being used to reverse proxy the Webapp (if implementing) e.g. <YOUR_SERVER_IP_ADDRESS>:8080
- The IP address and port being used by MySQL (if implementing)
- The IP address and ports 2024 and 2025
- The IP address and port being used by Redis
- Install Redis - see Redis.io
- Ensure Redis has socket enabled with the following permissions in your redis.conf
unixsocket /tmp/redis.sock
unixsocketperm 777
Note
The unixsocket on the apt redis-server package is
/var/run/redis/redis.sock
if you use this path ensure you change
settings.REDIS_SOCKET_PATH
to this path
- Start Redis
- Install memcached and start memcached see memcached.org
- Make the required directories
mkdir /var/log/skyline
mkdir /var/run/skyline
mkdir /var/dump
mkdir -p /opt/skyline/panorama/check
mkdir -p /opt/skyline/mirage/check
mkdir -p /opt/skyline/crucible/check
mkdir -p /opt/skyline/crucible/data
mkdir /etc/skyline
mkdir /tmp/skyline
- git clone Skyline (git should have been installed in the Running in Python virtualenv section)
mkdir -p /opt/skyline/github
cd /opt/skyline/github
git clone https://github.com/earthgecko/skyline.git
- Once again using the Python-2.7.12 virtualenv, install the requirements using the virtualenv pip, this can take a long time, the pandas install takes quite a while.
Warning
When working with virtualenv Python versions you must always remember to use the activate and deactivate commands to ensure you are using the correct version of Python. Although running a virtualenv does not affect the system Python, not using activate can result in the user making errors that MAY affect the system Python and packages. For example, a user does not use activate and just uses pip not bin/pip2.7 and pip installs some packages. User error can result in the system Python being affected. Get in to the habit of always using explicit bin/pip2.7 and bin/python2.7 commands to ensure that it is harder for you to err.
PYTHON_MAJOR_VERSION="2.7"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py2712"
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
# Install the mysql-connector-python package first on its own as due to it
# having to be downloaded and installed from MySQL, if it is not installed
# an install -r will fail as pip cannot find mysql-connector-python
bin/"pip${PYTHON_MAJOR_VERSION}" install http://cdn.mysql.com/Downloads/Connector-Python/mysql-connector-python-1.2.3.zip#md5=6d42998cfec6e85b902d4ffa5a35ce86
# The MySQL download source can now be commented it out of requirements.txt
cat /opt/skyline/github/skyline/requirements.txt | grep -v "cdn.mysql.com/Downloads\|mysql-connector" > /tmp/requirements.txt
# This can take lots and lots of minutes...
bin/"pip${PYTHON_MAJOR_VERSION}" install -r /tmp/requirements.txt
# NOW wait at least 7 minutes (on a Linode 4 vCPU, 4GB RAM, SSD cloud node anyway)
# and once completed, deactivate the virtualenv
deactivate
- Copy the
skyline.conf
and edit theUSE_PYTHON
as appropriate to your setup if it is not using PATH/opt/python_virtualenv/projects/skyline-py2712/bin/python2.7
cp /opt/skyline/github/skyline/etc/skyline.conf /etc/skyline/skyline.conf
vi /etc/skyline/skyline.conf # Set USE_PYTHON as appropriate to your setup
- OPTIONAL but recommended, serving the Webapp via gunicorn with an Apache
reverse proxy.
- Setup Apache (httpd) and see the example configuration file in your cloned
directory
/opt/skyline/github/skyline/etc/skyline.httpd.conf.d.example
modify all the<YOUR_
variables as appropriate for you environment - see Apache and gunicorn - Add a user and password for HTTP authentication, e.g.
- Setup Apache (httpd) and see the example configuration file in your cloned
directory
htpasswd -c /etc/httpd/conf.d/.skyline_htpasswd admin
Note
Ensure that the user and password for Apache match the user and
password that you provide in settings.py for
settings.WEBAPP_AUTH_USER
and settings.WEBAPP_AUTH_USER_PASSWORD
- Deploy your Skyline Apache configuration file and restart httpd.
- Create the Skyline MySQL database for Panorama (see Panorama) and Ionosphere.
- Edit the
settings.py
file and enter your appropriate settings, specifically ensure you set the following variables to the correct setting for your environment, see the documentation links and docstrings in the settings.py file for the full descriptions of each variable:settings.GRAPHITE_HOST
settings.GRAPHITE_PROTOCOL
settings.GRAPHITE_PORT
settings.SERVER_METRICS_NAME
settings.CANARY_METRIC
settings.ALERTS
settings.SMTP_OPTS
settings.HIPCHAT_OPTS
andsettings.PAGERDUTY_OPTS
if to be used, if so ensure thatsettings.HIPCHAT_ENABLED
andsettings.PAGERDUTY_ENABLED
are set toTrue
- If you are deploying with a Skyline MySQL Panorama DB straight away ensure
that
settings.PANORAMA_ENABLED
is set toTrue
and set all the other Panorama related variables as appropriate. settings.WEBAPP_AUTH_USER
settings.WEBAPP_AUTH_USER_PASSWORD
settings.WEBAPP_ALLOWED_IPS
settings.SERVER_PYTZ_TIMEZONE
- For later implementing and working with Ionosphere and setting up learning (see Ionosphere) after you have the other Skyline apps up and running.
cd /opt/skyline/github/skyline/skyline
vi settings.py
- If you are upgrading, at this point return to the Upgrading page.
- Before you test Skyline by seeding Redis with some test data, ensure that you have configured the firewall/iptables with the appropriate restricted access.
- Start the Skyline apps
/opt/skyline/github/skyline/bin/horizon.d start
/opt/skyline/github/skyline/bin/analyzer.d start
/opt/skyline/github/skyline/bin/webapp.d start
# And Panorama if you have setup in the DB at this stage
/opt/skyline/github/skyline/bin/panorama.d start
/opt/skyline/github/skyline/bin/ionosphere.d start
- Check the log files to ensure things started OK and are running and there are no errors.
Note
When checking a log make sure you check the log for the appropriate time, Skyline can log lots fast, so short tails may miss some event you expect between that restart and tail.
# Check what the logs reported when the apps started
head -n 20 /var/log/skyline/*.log
# How are they running
tail -n 20 /var/log/skyline/*.log
# Any errors - each app
find /var/log/skyline -type f -name "*.log" | while read skyline_logfile
do
echo "#####
# Checking for errors in $skyline_logfile"
cat "$skyline_logfile" | grep -B2 -A10 -i "error ::\|traceback" | tail -n 60
echo ""
echo ""
done
- Seed Redis with some test data.
Note
if you are UPGRADING and you are using an already populated Redis store, you can skip seeding data.
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
bin/python2.7 /opt/skyline/github/skyline/utils/seed_data.py
deactivate
- Check the Skyline Webapp frontend on the Skyline machine’s IP address and the
appropriate port depending whether you are serving it proxied or direct, e.g
http://YOUR_SKYLINE_IP:8080
orhttp://YOUR_SKYLINE_IP:1500
. Thehorizon.test.udp
metric anomaly should be in the dashboard after the seed_data.py is complete. If Panorama is set up you will be able to see that in the /panorama view and in the rebrow view as well. - Check the log files again to ensure things are running and there are no errors.
- This will ensure that the Horizon service is properly set up and can receive data. For real data, you have some options relating to getting a data pickle from Graphite see Getting data into Skyline
- Once you have your
settings.ALERTS
configured to test them see Alert testing - If you have opted to not setup Panorama, later see setup Panorama
- For Mirage setup see Mirage
- For Boundary setup see Boundary
- For Ionosphere setup see Ionosphere
Automation and configuration management notes¶
The installation of packages in the requirements.txt
can take a long time,
specifically the pandas build. This will usually take longer than the default
timeouts in most configuration management.
That said, requirements.txt
can be run in an idempotent manner, however
a few things need to be highlighted:
- A first time execution of
bin/"pip${PYTHON_MAJOR_VERSION}" install -r /opt/skyline/github/skyline/requirements.txt
will timeout on configuration management. Therefore consider running this manually first. Once pip has installed all the packages, therequirements.txt
will run idempotent with no issue and be used to upgrade via a configuration management run when therequirements.txt
is updated with any new versions of packages (with the possible exception of pandas). It is obviously possible to provision each requirement individually directly in configuration management and not use pip toinstall -r
therequirements.txt
, however remember the the virtualenv pip needs to be used and pandas needs a LONG timeout value, which not all package classes provide, if you use an exec of any sort, ensure the pandas install has a long timeout. - The mysql-connector-python package is pulled directly from MySQL as no pip
version exists. Therefore during the build process it is recommended to pip
install the MySQL source package first and then the line out comment in
requirements.txt
. Themysql-connector-python==1.2.3
line then ensures the dependency is fulfilled.
Upgrading¶
Upgrading - Crucible to Ionosphere¶
This section covers the steps involved in upgrading an existing Skyline implementation that is running on a Crucible branch version of Skyline (>= 1.0.0)
New settings.py variables¶
There is new dedicated Ionosphere settings section with the additions of all the new Ionosphere variables.
Modified settings.py docstrings¶
There are some changes in the ALERTS docstrings that cover pattern matching that should be reviewed.
Clean up Skyline permissions¶
After restarting all your Skyline apps and verifying all is working, please consider cleaning up any incorrect permissions that were set on the data directories due to an octal bug that was introduced with the Crucible branch.
Warning
The below bash snippet needs the path to your Skyline directory and is based on all your the app directories being subdirectories of this parent directory, if your set up uses different directory paths for different apps, please modify the as below snippet as appropriate for your setup.
# For example - YOUR_SKYLINE_DIR="/opt/skyline"
YOUR_SKYLINE_DIR="<YOUR_SKYLINE_DIR>"
ls -1 /opt/"$YOUR_SKYLINE_DIR"/ | grep "crucible\|ionosphere\|mirage\|panaroma" | while read i_dir
do
chmod 0755 "/opt/${YOUR_SKYLINE_DIR}/${i_dir}"
find "/opt/${YOUR_SKYLINE_DIR}/${i_dir}" -type d -exec chmod 0755 {} \;
find "/opt/${YOUR_SKYLINE_DIR}/${i_dir}" -type f -exec chmod 0644 {} \;
done
Update your MySQL Skyline database¶
- Backup your Skyline MySQL DB.
- Review and run the updates/sql/crucible_to_ionosphere.sql script against your database again. There are a few ALTERs and a number of new tables.
Upgrading - Etsy to Ionosphere¶
First review Upgrading from the Etsy version of Skyline section.
Upgrade steps¶
These step describe an in-situ manual upgrade process (use sudo where appropriate for your environment).
Due to virtualenv being the default and recommended way to now run Skyline, in order to upgrade most of the installation steps in the Installation documentation are still appropriate.
- Setup a Python virtualenv for Skyline - see Running in Python virtualenv - HOWTO
- Go through the installation process, ignoring the Redis install part and breaking out of installation where instructed to and returning to here.
- Stop all your original Skyline apps, ensuring that something like monit or supervisord, etc does not start them again.
- At this point you can either move your current Skyline directory and replace it with the new Skyline directory or just use the new path and just start the apps from their new location.
- If you want to move it, this is the PATH to your current Skyline, which contains:
skyline/bin
skyline/src
# etc
- We are moving your skyline/ directory here.
SKYLINE_DIR=<YOUR_SKYLINE_DIR>
mv "$SKYLINE_DIR" "${SKYLINE_DIR}.etsy.master"
mv /opt/skyline/github/skyline "$SKYLINE_DIR"
- Copy or move your existing Skyline whisper files to their new namespaces on your Graphite server/s (as mentioned above.) Or use Graphite’s whisper-fill.py to populate the new metric namespaces from the old ones after starting the new Skyline apps.
- Start the Skyline apps, either from your path if moved or from the new location, whichever you used
"$SKYLINE_DIR"/bin/horizon.d start
"$SKYLINE_DIR"/bin/analyzer.d start
"$SKYLINE_DIR"/bin/mirage.d start
"$SKYLINE_DIR"/bin/boundary.d start
"$SKYLINE_DIR"/bin/ionosphere.d start
"$SKYLINE_DIR"/bin/pnorama.d start # if you have the MySQL DB set up
"$SKYLINE_DIR"/bin/webapp.d start
- Check the log files to ensure things started OK and are running and there are no errors.
Note
When checking a log make sure you check the log for the appropriate time, Skyline can log lots fast, so short tails may miss some event you expect between that restart and tail.
# Check what the logs reported when the apps started
head -n 20 /var/log/skyline/*.log
# How are they running
tail -n 20 /var/log/skyline/*.log
# Any errors - each app
find /var/log/skyline -type f -name "*.log" | while read skyline_logfile
do
echo "#####
# Checking for errors in $skyline_logfile"
cat "$skyline_logfile" | grep -B2 -A10 -i "error ::\|traceback" | tail -n 60
echo ""
echo ""
done
- If you added the new
skyline_test.alerters.test
alerts tuples to yoursettings.py
you can test them now, see Alert testing - Look at implementing the other new features at your leisure
- Panorama is probably the quickest win if you opted to not install it
- Boundary, Mirage and Ionosphere will take a little assessment over time to see what metrics you want to configure them to monitor. You cannot rush timeseries.
Upgrading - Etsy to Crucible¶
First review Upgrading from the Etsy version of Skyline section.
Upgrade steps¶
These step describe an in-situ manual upgrade process (use sudo where appropriate for your environment).
Due to virtualenv being the default and recommended way to now run Skyline, in order to upgrade most of the installation steps in the Installation documentation are still appropriate.
- Setup a Python virtualenv for Skyline - see Running in Python virtualenv - HOWTO
- Go through the installation process, ignoring the Redis install part and breaking out of installation where instructed to and returning to here.
- Stop all your original Skyline apps, ensuring that something like monit or supervisord, etc does not start them again.
- At this point you can either move your current Skyline directory and replace it with the new Skyline directory or just use the new path and just start the apps from their new location.
- If you want to move it, this is the PATH to your current Skyline, which contains:
skyline/bin
skyline/src
# etc
- We are moving your skyline/ directory here.
SKYLINE_DIR=<YOUR_SKYLINE_DIR>
mv "$SKYLINE_DIR" "${SKYLINE_DIR}.etsy.master"
mv /opt/skyline/github/skyline "$SKYLINE_DIR"
- Copy or move your existing Skyline whisper files to their new namespaces on your Graphite server/s (as mentioned above.) Or use Graphite’s whisper-fill.py to populate the new metric namespaces from the old ones after starting the new Skyline apps.
- Start the Skyline apps, either from your path if moved or from the new location, whichever you used
"$SKYLINE_DIR"/bin/horizon.d start
"$SKYLINE_DIR"/bin/analyzer.d start
"$SKYLINE_DIR"/bin/webapp.d start
"$SKYLINE_DIR"/bin/pnorama.d start # if you have the MySQL DB set up
- Check the log files to ensure things started OK and are running and there are no errors.
Note
When checking a log make sure you check the log for the appropriate time, Skyline can log lots fast, so short tails may miss some event you expect between that restart and tail.
# Check what the logs reported when the apps started
head -n 20 /var/log/skyline/*.log
# How are they running
tail -n 20 /var/log/skyline/*.log
# Any errors - each app
find /var/log/skyline -type f -name "*.log" | while read skyline_logfile
do
echo "#####
# Checking for errors in $skyline_logfile"
cat "$skyline_logfile" | grep -B2 -A10 -i "error ::\|traceback" | tail -n 60
echo ""
echo ""
done
- If you added the new
skyline_test.alerters.test
alerts tuples to yoursettings.py
you can test them now, see Alert testing - Look at implementing the other new features at your leisure
- Panorama is probably the quickest win if you opted to not install it
- Boundary and Mirage will take a little assessment to see what metrics you want to configure them for.
Backup your Redis!¶
Do it.
Upgrading from the Etsy version of Skyline¶
This section covers the steps involved in upgrading an existing Skyline implementation. For the sake of fullness this describes the changes from the last github/etsy/skyline commit to date.
Do a new install¶
If you are upgrading from an Etsy version consider doing a new install, all you need is your Redis data (but BACK IT UP FIRST)
However you may have some other monitoring or custom inits, etc set up on your Skyline then the below is a best effort guide.
Things to note¶
This new version of Skyline sees a lot of changes, however although certain things have changed and much has been added, whether these changes are backwards incompatible in terms of functionality is debatable. That said, there is a lot of change. Please review the following key changes relating to upgrading.
Directory structure changes¶
In order to bring Skyline in line with a more standard Python package structure the directory structure has had to changed to accommodate sphinx autodoc and the setuptools framework.
settings.py¶
The settings.py
has had a face lift too. This will probably be the
largest initial change that any users upgrading from Skyline < 1.0.0
will find.
The format is much less friendly to the eye as it now has all the comments in Python docstrings rather than just plain “#”ed lines. Apologies for this, but for complete autodoc coverage and referencing it is a necessary change.
Analyzer optimizations¶
There has been some fairly substantial performance improvements that may affect
your Analyzer settings. The optimizations should benefit any deployment,
however they will benefit smaller Skyline deployments more than very large
Skyline deployments. Finding the optimum settings for your Analyzer deployment
will require some evaluation of your Analyzer run_time, total_metrics and
settings.ANALYZER_PROCESSES
.
See Analyzer Optimizations and regardless of
whether your deployment is small or large the new
settings.RUN_OPTIMIZED_WORKFLOW
timeseries analysis will improve
performance and benefit all deployments.
Skyline graphite namespace for metrics¶
The original metrics namespace for skyline was skyline.analyzer
,
skyline.horizon
, etc. Skyline now shards metrics by the Skyline
server host name. So before you start the new Skyline apps when
referenced below in the Upgrade steps, ensure you move/copy your whisper
files related to Skyline to their new namespaces, which are
skyline.<hostname>.analyzer
, skyline.<hostname>.horizon
, etc. Or
use Graphite’s whisper-fill.py to populate the new metric namespaces
from the old ones.
New required components¶
MySQL and memcached.
Clone and have a look¶
If you are quite familiar with your Skyline setup it would be useful to clone the new Skyline and first have a look at the new structure and assess how this impacts any of your deployment patterns, init, supervisord, etc and et al. Diff the new settings.py and your existing settings.py to see the additions and all the changes you need to make.
Getting data into Skyline¶
You currently have two options to get data into Skyline, via the Horizon service:
A note on time snyc¶
Although it may seems obvious, it is important to note that any metrics coming into Graphite and Skyline should come from synchronised sources. If there is more than 60 seconds (or highest resolution metric), certain things in Skyline will start to become less predictable, in terms of the functioning of certain algorithms which expect very recent datapoints. Time drift does decrease the accuracy and effectiveness of some algorithms. In terms of machine related metrics, normal production grade time snychronisation will suffice.
TCP pickles¶
Horizon was designed to support a stream of pickles from the Graphite carbon-relay service, over port 2024 by default. Carbon relay is a feature of Graphite that immediately forwards all incoming metrics to another Graphite instance, for redundancy. In order to access this stream, you simply need to point the carbon relay service to the box where Horizon is running. In this way, Carbon-relay just thinks it’s relaying to another Graphite instance. In reality, it’s relaying to Skyline.
Here are example Carbon configuration snippets:
relay-rules.conf:
[all]
pattern = .*
destinations = 127.0.0.1:2014, <YOUR_SKYLINE_HOST>:2024
[default]
default = true
destinations = 127.0.0.1:2014:a, <YOUR_SKYLINE_HOST>:2024:a
carbon.conf:
[relay]
RELAY_METHOD = rules
DESTINATIONS = 127.0.0.1:2014, <YOUR_SKYLINE_HOST>:2024
USE_FLOW_CONTROL = False
MAX_QUEUE_SIZE = 5000
A quick note about the carbon agents: Carbon-relay is meant to be the primary metrics listener. The 127.0.0.1 destinations in the settings tell it to relay all metrics locally, to a carbon-cache instance that is presumably running. If you are currently running carbon-cache as your primary listener, you will need to switch it so carbon-relay is primary listener.
Note the small MAX_QUEUE_SIZE - in older versions of Graphite, issues can arise when a relayed host goes down. The queue will fill up, and then when the relayed host starts listening again, Carbon will attempt to flush the entire queue. This can block the event loop and crash Carbon. A small queue size prevents this behavior.
See the docs for a primer on Carbon relay.
Of course, you don’t need Graphite to use this listener - as long as you pack and pickle your data correctly (you’ll need to look at the source code for the exact protocol), you’ll be able to stream to this listener.
UDP messagepack¶
Horizon also accepts metrics in the form of messagepack encoded strings
over UDP, on port 2025. The format is
[<metric name>, [<timestamp>, <value>]]
. Simply encode your metrics
as messagepack and send them on their way.
However a quick note, on the transport any metrics data over UDP.... sorry if did you not get that.
Adding a Listener¶
If neither of these listeners are acceptable, it’s easy enough to extend them. Add a method in listen.py and add a line in the horizon-agent that points to your new listener.
settings.FULL_DURATION
¶
Once you get real data flowing through your system, the Analyzer will be able start analyzing for anomalies.
Note
Do not expect to see anomalies or anything in the Webapp immediately
after starting the Skyline services. Realistically settings.FULL_DURATION
should have been passed, before you begin to assess any triggered anomalies,
after all settings.FULL_DURATION
is the baseline. Although not all
algorithms utilize all the settings.FULL_DURATION
data points, some do
and some use only 1 hour’s worth. However the Analyzer log should still report
values in the exception stats, reporting how many metrics were boring, too
short, etc as soon as it is getting data for metrics that Horizon is populating
into Redis.
Alert testing¶
The settings.py has a default non-existent metric namespace for testing your
alerters, in the skyline_test.alerters.test
settings.ALERTS
tuple.
If you want to test your email, Hipchat room and Pagerduty OPTS are correctly configured and working, do the following, once again using the Python-2.7.12 virtualenv and documentation PATHs as an example:
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py2712"
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
python /opt/skyline/github/skyline/utils/verify_alerts.py --trigger True --metric 'skyline_test.alerters.test'
deactivate
You should get an alert on each alerter that you have enabled. Do note that the graphs sent with the alerts will have no data in them.
Strictly increasing monotonicity¶
Skyline specifically handles positive, strictly increasing monotonic timeseries. These are metrics that are integral (an incrementing count) in Graphite and converts the values to their derivative products.
In terms of visualisation Graphite has the nonNegativeDerivative function, which converts an integral or incrementing count to a derivative by calculating the delta between subsequent datapoints. The function ignores datapoints that trend down. With Skyline’s new functionality this is a requirement for metrics that only increase over time and then at some point reset to 0 (e.g. interface packets counts, varnish or haproxy backend connections, etc).
By default, Skyline now identifies and converts monotonically increasing timeseries to their derivative products through most of the Skyline analysis apps (Analyzer, Mirage and Ionosphere).
Unless you tell Skyline not to. You may have some monotonically increasing
metric that you do not want to be forced to convert to its derivative product
just because it is more convenient for Skyline to analyse, be taught and learn
the derivatives. You may have a metric that does not reset and want analysed
the way it used to be analysed in the old days as you feel it worked, that is OK,
you can, using settings.NON_DERIVATIVE_MONOTONIC_METRICS
However, Skyline will automatically classify metrics into derivative_metrics (strictly positive monotonic metrics) and non_derivative_metrics. This process has been automated so that new metrics of this nature will be identified and processed via their derivative if a new namespace of this nature should to added without the requirement for each namespace pattern to be declared in the settings.
Further metrics identified as derivative_metrics continue to be identified as
such even if their metric changes from a strictly increasing monotonic nature,
like gets reset to zero. Skyline sets a Redis z.derivative_metric key at an
expiry time of settings.FULL_DURATION
to handle counter resets.
However there are some edge case that will not become strictly increasing again
until say a mysql.commands.create_table is run. So on very slowly changing
metrics a derivative metric may be analyzed as a non_derivative_metric until
such a point as it reverts to a strictly increasing monotonic nature again.
However seeing as it would just revert for a period back to how these types of
metrics were handled in Skyline for years now, which were not wrong, the method
was just not conducive to profiling and learning, its deemed a caveat, rather
than a bug as if any metrics of this extreme nature can be handled via SKIP or
using settings.NON_DERIVATIVE_MONOTONIC_METRICS
Really why is this important?¶
If you are still reading...
With the original Analyzer/Mirage, single time resolution and 3-sigma based
algorithms, these types of metrics with strictly increasing monotonicity
that reset always triggered 3-sigma and alerts (false positives). That is
not to say that they were not effective once the metric had
settings.FULL_DURATION
seconds of timeseries data, while it was
sending the incrementing data 3-sigma would work, until it reset :) So it
worked most of time and falsely some of the time. Not ideal.
Widening the range of the time resolutions that Skyline operates in with Mirage and Ionosphere requires that this class of metrics be converted otherwise they are limited in terms of the the amount of seasonal analysis, profiling and learning that can be done on this type of timeseries data.
So Skyline now analyses all monotonic metrics via their derivatives as these work in terms of more historical analysis, features calculations and comparison, and learning patterns due to the step nature of the data and the variability of the steps, in terms of profiling features these type of data are almost like snowflakes. The derivatives of these strictly increasing metrics are the kind of patterns that lend themselves to Skyline’s timeseries comparison and analysis methods.
The Skyline nonNegativeDerivative is based on part of the Graphite render function nonNegativeDerivative at: https://github.com/graphite-project/graphite-web/blob/1e5cf9f659f5d4cc0fa53127f756a1916e62eb47/webapp/graphite/render/functions.py#L1627
Horizon¶
The Horizon service is responsible for collecting, cleaning, and formatting
incoming data. It consists of Listeners, Workers, and Roombas. Horizon can be
started, stopped, and restarted with the bin/horizon.d
script.
Listeners¶
Listeners are responsible for listening to incoming data. There are currently two types: a TCP-pickle listener on port 2024, and a UDP-messagepack listener on port 2025. The Listeners are easily extendible. Once they read a metric from their respective sockets, they put it on a shared queue that the Workers read from. For more on the Listeners, see Getting Data Into Skyline.
Workers¶
The workers are responsible for processing metrics off the queue and inserting them into Redis. They work by popping metrics off of the queue, encoding them into Messagepack, and appending them onto the respective Redis key of the metric.
Roombas¶
The Roombas are responsible for trimming and cleaning the data in Redis. You
will only have a finite amount of memory on your server, and so if you just let
the Workers append all day long, you would run out of memory. The Roomba cycles
through each metric in Redis and cuts it down so it is as long as
settings.FULL_DURATION
. It also dedupes and purges old metrics.
Analyzer¶
The Analyzer service is responsible for analyzing collected data. It has
a very simple divide-and-conquer strategy. It first checks Redis to get
the total number of metrics stored, and then it fires up a number of
processes equal to settings.ANALYZER_PROCESSES
, assigning each
processes a number of metrics. Analyzing a metric is a very
CPU-intensive process, because each timeseries must be decoded from
Messagepack and then run through the algorithms. Analyzer is also routes
metric checks to other services (Mirage, Panorama and Ionosphere) for
further analysis or recording an anomaly event, as appropriately.
Due to Analyzer being the most CPU-intensive Skyline process, it is advisable to
set settings.ANALYZER_PROCESSES
to about the number of cores you have -
leaving a few for the Skyline services and Redis.
The original documentation and settings for skyline were based on:
a flow of about 5k metrics coming in every second on average (with 250k distinct metrics). We use a 32 core Sandy Bridge box, with 64 gb of memory. We experience bursts of up to 70k TPS on Redis
Skyline runs OK on much less. It can handle ~45000 metrics per minute on
a 4 vCore, 4GB RAM cloud SSD server, even before the introduction of the
settings.RUN_OPTIMIZED_WORKFLOW
methodology.
Do read the notes in settings.py
related to the
settings.ANALYZER_PROCESSES
settings.ANALYZER_OPTIMUM_RUN_DURATION
if you are only processing a few 1000 metrics with a data point every minute
then the optimum settings will most likely be something similar to:
ANALYZER_PROCESSES = 1
ANALYZER_OPTIMUM_RUN_DURATION = 60
Python multiprocessing is not very efficient if it is not need, in fact the overall overhead of the spawned processes ends up greater than the overhead of processing with a single process.
See Optimizations results and Analyzer Optimizations
Algorithms¶
Skyline Analyzer was designed to handle a very large number of metrics, for which picking models by hand would prove infeasible. As such, Skyline Analyzer relies upon the consensus of an ensemble of a few different algorithms. If the majority of algorithms agree that any given metric is anomalous, the metric will be classified as anomalous. It may then be surfaced to the Webapp or pushed to Mirage, if Mirage is enabled and configured for the namespace of the anomalous metric.
Currently, Skyline does not come with very many algorithmic batteries included. This is by design. Included are a few algorithms to get you started, but you are not obligated to use them and are encouraged to extend them to accommodate your particular data. Indeed, you are ultimately responsible for using the proper statistical tools the correct way with respect to your data.
Of course, we welcome all pull requests containing additional algorithms
to make this tool as robust as possible. To this end, the algorithms
were designed to be very easy to extend and modify. All algorithms are
located in algorithms.py
. To add an algorithm to the ensemble, simply
define your algorithm and add the name of your settings.ALGORITHMS
.
Make sure your algorithm returns either True
, False
or None
, and be
sure to update the settings.CONSENSUS
setting appropriately.
Algorithm philosophy¶
The basic algorithm is based on 3-sigma, derived from Shewhart’s statistical process control. However, you are not limited to 3-sigma based algorithms if you do not want to use them - as long as you return a boolean, you can add any sort of algorithm you like to run on timeseries and vote.
Explanation of Exceptions¶
TooShort: The timeseries was too short, as defined in
settings.MIN_TOLERABLE_LENGTH
Incomplete: The timeseries was less than settings.FULL_DURATION
seconds long
Stale: The timeseries has not received a new metric in more than
settings.STALE_PERIOD
seconds
Boring: The timeseries has been the same value for the past
settings.MAX_TOLERABLE_BOREDOM
seconds
Other: There’s probably an error in the code, if you’ve been making changes or we have.
Metrics monotonicity¶
Analyzer is used to identify what metric timeseries are strictly increasing monotonically, metrics that have an incrementing increasing count, so that these timeseriers can be handled via their derivative products where appropriate. For full details see Monotonic metrics
Push to Mirage¶
Analyzer can push anomalous metrics that have a seasonality /
periodicity that is greater than settings.FULL_DURATION
to the Mirage
service, see Mirage.
Analyzer settings.ALERTS
¶
Order Matters¶
In terms of the settings.ALERTS
order matters in Analyzer and in the
Mirage context as well.
Warning
It is important to note that Analyzer uses the first alert tuple that matches.
So for example, with some annotation. Let us say we have a set of metrics related to how many requests are made per customer. We have two very important customers which we have tight SLAs and we want to know very quickly if there is ANY anomalies in the number of requests they are doing as it has immediate effect on our revenue. We have other customers too, we want to know there is a problem but we do not want to be nagged, just reminded about them every hour if there are anomalous changes.
ALERTS = (
('skyline', 'smtp', 3600),
('stats.requests.bigcheese_customer', 'smtp', 600), # --> alert every 10 mins
('stats.requests.biggercheese_customer', 'smtp', 600), # --> alert every 10 mins
('stats.requests\..*', 'smtp', 3600), # --> alert every 60 mins
)
The above would ensure if Analyzer found bigcheese_customer or biggercheese_customer metrics anomalous, they would fire off an alert every 10 minutes, but for all other metrics in the namespace, Analyzer would only fire off an alert every hour if they were found to be anomalous.
The below would NOT have the desired effect of analysing the metrics for bigcheese_customer and biggercheese_customer
ALERTS = (
('skyline', 'smtp', 3600),
('stats.requests\..*', 'smtp', 3600), # --> alert every 60 mins
('stats.requests.bigcheese_customer', 'smtp', 600), # --> NEVER REACHED
('stats.requests.biggercheese_customer', 'smtp', 600), # --> NEVER REACHED
)
Hopefully it is clear that Analyzer would not reach the bigcheese_customer and
biggercheese_customer alert tuples as in the above example the
stats.requests\..*
tuple would match BEFORE the specific tuples were
evaluated and the bigcheese metrics would be alerted on every 60 mins instead of
the desired every 10 minutes.
Please refer to Mirage - Order Matters section for a similar example of how order matters in the Mirage context.
Analyzer SMTP alert graphs¶
Analyzer by default now sends 2 graphs in any SMTP alert. The original Graphite graph is sent and an additional graph image is sent that is plotted using the actual Redis timeseries data for the metric.
The Redis data graph has been added to make it specifically clear as to the data that Analyzer is alerting on. Often your metrics are aggregated in Graphite and a Graphite graph is not the exact representation of the timeseries data that triggered the alert, so having both is clearer.
The Redis data graph also adds the mean and the 3-sigma boundaries to the plot, which is useful for brain training. This goes against the “less is more (effective)” data visualization philosophy, however if the human neocortex is presented with 3-sigma boundaries enough times, it will probably eventually be able to calculate 3-sigma boundaries in any timeseries, reasonably well.
Bearing in mind that when we view anomalous timeseries in the UI we are presented with a red line depicting the anomalous range, this graph just does the similar in the alert context.
Should you wish to disable the Redis data graph and simply have the Graphite
graph, simply set settings.PLOT_REDIS_DATA
to False
.
Example alert¶

Example of the Redis data graph in the alert
Note
The Redis data graphs do make the alerter a little more CPU when matplotlib plots the alerts and the alert email larger in size.
What Analyzer does¶
- Analyzer determines all unique metrics in Redis and divides them
between
settings.ANALYZER_PROCESSES
to be analysed betweenspin_process
processes. - The spawned
spin_process
processes pull the all timeseries for theirassigned_metrics
they have been assigned from Redis and iterate through each metric and analyze the timeseries against thesettings.ALGORITHMS
declared in the settings.py - The
spin_process
will add any metric that it finds anomalous (triggerssettings.CONSENSUS
number of algorithms) to a list of anomalous_metrics. - The parent Analyzer process will then check every metric in the
anomalous_metrics list to see if:
- If the metric matches an
settings.ALERTS
tuple in settings.py - If a Mirage parameter is set in the tuple, then Analyzer does not alert, but hands the metric off to Mirage by adding a Mirage check file.
- If a metric is an Ionosphere enabled metric, then Analyzer does not alert, but hands the metric off to Ionosphere by adding an Ionosphere check file.
- If
ENABLE_CRUCIBLE
is True, Analyzer adds timeseries as a json file and a Crucible check file. - If no Mirage parameter, but the metric matches an
settings.ALERTS
tuple namespace, Analyzer then checks if an Analyzer alert key exists for the metric by querying the metric’s Analyzer alert key in Redis. - If no alert key, Analyzer sends alert/s to the configured alerters
and sets the metric’s Analyzer alert key for
settings.EXPIRATION_TIME
seconds. - Analyzer will alert for an Analyzer metric that has been returned from Ionosphere as anomalous having not matched any known features profile or layers.
- If the metric matches an
Analyzer Optimizations¶
The original implementation of Skyline has worked for years, almost flawlessly it must be said. However the original implementation of Skyline was patterned and somewhat even hard coded to run in a very large and powerful setup in terms of the volume of metrics it was handling and the server specs it was running on.
This setup worked and ran OK on all metric volumes. However if Skyline was setup in a smaller environment with a few 1000 metrics, the CPU graphs and load_avg stats of your Skyline server suggested that Skyline did need a LOT of processing power. However this is no longer true.
This was due to one single line of code at the end of Analyzer module, which only slept if the runtime of an analysis was less than 5 seconds, undoubtedly resulted in a lot of Skyline implementations seeing constantly high CPU usage and load_avg when running Skyline. As this resulted in Analyzer running in say 19 seconds and then immediately spawning again and again, etc.
https://github.com/etsy/skyline/blob/master/src/analyzer/analyzer.py#L242
# Sleep if it went too fast
if time() - now < 5:
logger.info('sleeping due to low run time...')
sleep(10)
Number of Analyzer processors¶
A number of optimizations have changed the required number of processors
to assign to settings.ANALYZER_PROCESSES
, quite dramatically in some
cases. Specifically in cases where the number of metrics being analyzed is not
in the 10s of 1000s.
Python multiprocessing is not very efficient if it is not need, in fact
the overall overhead of the spawned processes ends up greater than the
overhead of processing with a single process. For example, if we have a
few 1000 metrics and we have 4 processors assigned to Analyzer and the
process duration is 19 seconds, in the original Analyzer we would have
seen 4 CPUs running at 100% constantly (due to the above Sleep if it
went to fast). Even if we leave to Sleep if went too fast in and we
change to settings.ANALYZER_PROCESSES
to 1, we will find that:
- we now see 1 CPU running at 100%
- our duration has probably increased to about 27 seconds
- we use a little more memory on the single process
When we optimize the sleep to match the environment with the
settings.ANALYZER_OPTIMUM_RUN_DURATION
of in this case say 60 instead of
5, we will find that:
- we now see 1 CPU running at a few percent, only spiking up for 27 seconds
When we further optimize and use the settings.RUN_OPTIMIZED_WORKFLOW
we
will find that:
- we now see 1 CPU running at a few percent, only spiking up for 14 seconds
- our duration has probably decreased to about 50%
See Optimizations results at the end of this page.
Analyzer work rate¶
The original Analyzer analyzed all timeseries against all algorithms
which is the maximum possible work. In terms of the settings.CONSENSUS
based model, this is not the most efficient work rate.
Performance tuning¶
A special thanks to Snakeviz for a very useful Python profiling tool which enabled some minimal changes in the code and substantial improvements in the performance, along with cprofilev and vmprof.
Using anomaly_breakdown metrics graphs to tune the Analyzer workflow¶
anomaly_breakdown metrics were added to Skyline on 10 Jun 2014, yet never merged into the main Etsy fork. However, in terms of performance tuning and profiling Skyline they are quite useful. They provide us with the ability to optimize the analysis of timeseries data based on 2 simple criteria:
- Determine the algorithms that are triggered most frequently
- Determine the computational expense of each algorithm (a development
addition) that adds
algorithm_breakdown.*.timing.times_run
and thealgorithm_breakdown.*.timing.total_time
metrics to skyline.analyzer.hostname
graphite metric namespaces.
We can use these data to determine the efficiency of the algorithms
and when this is applied to the Analyzer settings.CONSENSUS
model we can
optimize algorithms.py
to run in the most efficient manner possible.
Originally algorithms.py simply analyzed every timeseries against every
algorithm in ALGORITHMS
and only checked the settings.CONSENSUS
threshold at the end. However a very small but effective optimization is to use
the above data to run the following optimizations.
- The most frequently and least expensive
settings.CONSENSUS
number of algorithms and then determine ifsettings.CONSENSUS
can be achieved. Currently there are 9 algorithms that Analyzer uses. However the same optimization is valid if more algorithms were added. - If our
settings.CONSENSUS
was 6 and Analyzer has not been able to trigger any of the most frequently and least expensive 5 algorithms, then there is no need to analyze the timeseries against the remaining algorithms. This surprisingly reduces the work of Analyzer by ~xx% on average (a lot). - The cost of this optimization is that we lose the original
algorithm_breakdown.*
metrics which this was evaluated and patterned against. However two additional factors somewhat mitigate this but it is definitely still skewed. The mitigations being that:- When a timeseries is anomalous more than one algorithm triggers anyway.
- When an algorithm is triggered, more algorithms are run. Seeing as we have optimized to have the least frequently triggered algorithms be run later in the workflow, it stands to reason that a lot of the time, they would not have triggered even if they were run. However it is still skewed.
These optimizations are now the default in settings.py
, however they
have been implemented with backwards compatibility and for the purpose
of running Analyzer without optimization of the algorithms to ensure
that they can be benchmarked again should any further algorithms ever be
added to Analyzer or any existing algorithms modified in any way.
algorithm benchmarks¶
analyzer_dev can be/was used as a benchmarking module to determine the execution times of algorithms.
Considerations - approximation of timings¶
The algorithm benchmark timings are simply approximations of the real times that the algorithm execution is undertaken in (float).
tmpfs vs multiprocessing Value¶
Recording the algorithm counts and timings without using multiprocessing Value
and associated overhead of locks, etc, etc. /tmp was opted for instead and the
variable settings.SKYLINE_TMP_DIR
was added. In most cases /tmp is tmpfs
which is memory anyway so all the heavy lifting in terms of locking etc is
offloaded to the OS and modules do not have to incur the additional complexity
in Python. A simple yet effective win. Same same but different. There may be
some valid reasons for the use multiprocessing Value or Manager().list()
Algorithms ranked by triggered count¶
Using the anomaly_breakdown
metrics data it shows that on a plethora of
machine and application related metrics, we can determine the most
triggered algorithms by rank:
- stddev_from_average
- mean_subtraction_cumulation
- first_hour_average
- histogram_bins
- least_squares
- grubbs
- stddev_from_moving_average
- median_absolute_deviation
- ks_test
Algorithms ranked by execution time¶
Using the algorithm_breakdown
metrics data we can determine the most
“expensive” algorithms by total time to run:
- least_squares (avg: 0.563052576667)
- stddev_from_moving_average (avg: 0.48511087)
- mean_subtraction_cumulation (avg: 0.453279348333)
- median_absolute_deviation (avg: 0.25222528)
- stddev_from_average (avg: 0.173473198333)
- first_hour_average (avg: 0.151071298333)
- grubbs (avg: 0.147807641667)
- histogram_bins (avg: 0.101075738333)
- ks_test (avg: 0.0979568116667)
Performance weighting¶
If we change the order in which the timeseries are run through the algorithms in Analyzer, we can improve the overall performance by running the most expensive computational algorithms later in the analysis.
+-----------------------------+----------------+---------------------+
| Algorithm | Triggered rank | Execution time rank |
+=============================+================+=====================+
| histogram_bins | 4 | 8 |
+-----------------------------+----------------+---------------------+
| first_hour_average | 3 | 6 |
+-----------------------------+----------------+---------------------+
| stddev_from_average | 1 | 5 |
+-----------------------------+----------------+---------------------+
| grubbs | 6 | 7 |
+-----------------------------+----------------+---------------------+
| ks_test | 9 | 9 |
+-----------------------------+----------------+---------------------+
| mean_subtraction_cumulation | 3 | 2 |
+-----------------------------+----------------+---------------------+
| median_absolute_deviation | 8 | 4 |
+-----------------------------+----------------+---------------------+
| stddev_from_moving_average | 7 | 2 |
+-----------------------------+----------------+---------------------+
| least_squares | 5 | 1 |
+-----------------------------+----------------+---------------------+
settings.RUN_OPTIMIZED_WORKFLOW
¶
The original version of Analyzer ran all timeseries through all
ALGORITHMS
like so:
ensemble = [globals()[algorithm](timeseries) for algorithm in ALGORITHMS]
After running all the algorithms, it then determined whether the last datapoint for timeseries was anomalous.
The optimized workflow uses the above triggered / execution time ranking
matrix to run as efficiently as possible and achieve the same results
(see caveat below) but up to ~50% quicker and less CPU cycles. This is
done by iterating through the algorithms in order based on their
respective matrix rankings and evaluating the whether settings.CONSENSUS
can be achieved or not. The least_squares algorithm, which is the most
computationally expensive, now only runs if settings.CONSENSUS
can be
achieved.
The caveat to this is that this skews the anomaly_breakdown
metrics.
However seeing as the anomaly_breakdown
metrics were not part of the
original Analyzer this is a mute point. That said the performance tuning
and optimizations were made possible by these data, therefore it remains
possible to implement the original configuration and also time all
algorithms (see Development modes if you are interested). A word of
warning, if you have setup a Skyline implementation after the
settings.RUN_OPTIMIZED_WORKFLOW
and you have > 1000 metrics running the
unoptimized workflow with the original 5 seconds may send the load_avg
through the roof.
The original Analyzer settings.ALGORITHMS
setting was:
ALGORITHMS = [
'first_hour_average',
'mean_subtraction_cumulation',
'stddev_from_average',
'stddev_from_moving_average',
'least_squares',
'grubbs',
'histogram_bins',
'median_absolute_deviation',
'ks_test',
]
The new optimized Analyzer settings.ALGORITHMS
setting based on the above
performance weighing matrix is:
ALGORITHMS = [
'histogram_bins',
'first_hour_average',
'stddev_from_average',
'grubbs',
'ks_test',
'mean_subtraction_cumulation',
'median_absolute_deviation',
'stddev_from_moving_average',
'least_squares',
]
Optimizations results¶
These server graphs show the pre and post crucible update metrics related to CPU and loadavg for a dedicated Skyline server running on a 4 vCPU, 4GB RAM, SSD cloud server. The server is handling ~3000 metrics and is solely dedicated to running Skyline. It was upgraded from the Boundary branch version to Crucible.
It was running Analyzer, Mirage and Boundary, however these graphs clearly show the impact that the Analyzer optimizations have on the overall workload. Interestingly, after deployment the server is also running MySQL and the Panorama daemon in addition to what it was running before.
The server was running the skyline.analyzer.metrics branch Analyzer which was only a few steps away from Etsy master and those steps were related to Analyzer sending a few more Skyline metric namespaces to Graphite, in terms of Analyzer and the algorithms logic pre update were identical to Etsy master.






Mirage¶
The Mirage service is responsible for analyzing selected timeseries at custom
time ranges when a timeseries seasonality does not fit within
settings.FULL_DURATION
. Mirage allows for testing of real time data
and algorithms in parallel to Analyzer. Mirage was inspired by Abe Stanway’s
Crucible and the desire to extend the temporal data pools available to Skyline
in an attempt to handle seasonality better, reduce noise and increase signal,
specifically on seasonal metrics.
An overview of Mirage¶
- Mirage is fed specific user defined metrics by Analyzer.
- Mirage gets timeseries data for metrics from Graphite.
- Mirage does not have its own
ALERTS
settings it usessettings.ALERTS
just like Analyzer does. - Mirage also sends anomaly details to Panorama, like Analyzer does.

Fullsize overview image for a clearer picture.
Why Mirage?¶
Analyzer’s settings.FULL_DURATION
somewhat limits Analyzer’s usefulness
for metrics that have a seasonality / periodicity that is greater than
settings.FULL_DURATION
. This means Analyzer is not great in terms of
“seeing the bigger picture” when it comes to metrics that have a weekly pattern
as well as a daily patterns for example.
Increasing settings.FULL_DURATION
to anything above 24 hours (86400) is
not necessarily realistic or useful, because the greater the
settings.FULL_DURATION
, the greater memory required for Redis and the
longer Analyzer will take to run.
What Mirage can and cannot do¶
It is important to know that Mirage is not necessarily suited to making highly variable metrics less noisy e.g. spikey metrics.
Mirage is more useful on fairly constant rate metrics which contain known or expected seasonalities. For example take a metric such as office.energy.consumation.per.hour, this type of metric would most likely have 2 common seasonalities.
As an example we can use the Department of Education Sanctuary Buildings energy consumption public data set to demonstrate how Mirage and Analyzer views are different. The energy consumption in an office building is a good example of a multi-seasonal data set.
- Office hour peaks
- Out of hour troughs
- Weekend troughs
- Holidays
- There could be summer and winter seasonality too
For now let us just consider the daily and weekly seasonality.
The difference between the Analyzer and Mirage views of a timeseries¶
(Source code, png, hires.png, pdf)

Fullsize image for a clearer picture.
As we can see above, on a Saturday morning the energy consumption does not
increase as it normally does during the week days. Analyzer would probably find
the metric to be anomalous if settings.FULL_DURATION
was set to 86400 (24
hours), Saturday morning would seem anomalous.
However, if the metric’s alert tuple was set up with a
SECOND_ORDER_RESOLUTION_HOURS
of 168, Mirage would analyze the data point
against a week’s worth of data points and the Saturday and Sunday daytime data
points would have less probability of triggering as anomalous. The above
image is plotted as if the Mirage SECOND_ORDER_RESOLUTION_HOURS
was set to
172 hours just so that the trailing edges can be seen.
A real world example with tenfold.com¶
blak3r2: | Our app logs phone calls for businesses and I want to be able to detect when VIP phone systems go down or act funny and begin flooding us with events. Our work load is very noisy from 9-5pm... where 9-5 is different for each customer depending on their workload so thresholding and modeling isn’t good. |
---|---|
earthgecko: | Yes, Mirage is great at user defined seasonality, in your case weekday 9-5 peaks, evening drop offs, early morning and weekend lows - multi seasonal, Mirage is the ticket. Your best bet would be to try 7days (168) as your SECOND_ORDER_RESOLUTION_HOURS value for those app log metrics, however, you may get away with a 3 day window, it depends on the metrics really, but it may not be noisy at 3 days resolution, even at the weekends. |
Mirage “normalizes”¶
Mirage is a “tuning” tool for seasonal metrics and it is important to understand that Mirage is probably using aggregated data (unless your Graphite is not using retentions and aggregating) and due to this Mirage will lose some resolution resulting in it being less sensitive to anomalies than Analyzer is.
So Mirage does some “normalizing” if your have aggregations in Graphite (e.g retentions), however it is analyzing the timeseries at the aggregated resolution so it is “normalised” as the data point that Analyzer triggered on is ALSO aggregated in the timeseries resolution that Mirage is analyzing. Intuitively one may think it may miss it in the aggregation then. This is true to an extent, but Analyzer will likely trigger multiple times if the metric IS anomalous, so when Analyzer pushes to Mirage again, each aggregation is more likely to trigger as anomalous, IF the metric anomalous at the user defined full duration. A little flattened maybe, a little lag maybe, but less noise, more signal.
Setting up and enabling Mirage¶
By default Mirage is disabled, various Mirage options can be configured in the
settings.py
file and Analyzer and Mirage can be configured as appropriate
for your environment.
Mirage requires some directories as per settings.py
defines (these require
absolute path):
mkdir -p $MIRAGE_CHECK_PATH
mkdir -p $MIRAGE_DATA_FOLDER
Configure settings.py
with some settings.ALERTS
alert tuples that
have the SECOND_ORDER_RESOLUTION_HOURS
defined. For example below is an
Analyzer only settings.ALERTS
tuple that does not have Mirage enabled as
it has no SECOND_ORDER_RESOLUTION_HOURS
defined:
ALERTS = (
('stats_counts.http.rpm.publishers.*', 'smtp', 300), # --> Analyzer sends to alerter
)
To enable Analyzer to send the metric to Mirage we append the metric alert tuple
in settings.ALERTS
with the SECOND_ORDER_RESOLUTION_HOURS
value.
Below we have used 168 hours to get Mirage to analyze any anomalous metric
in the ‘stats_counts.http.rpm.publishers.*’ namespace using using 7 days worth
of timeseries data from Graphite:
ALERTS = (
# ('stats_counts.http.rpm.publishers.*', 'smtp', 300), # --> Analyzer sends to alerter
('stats_counts.http.rpm.publishers.*', 'smtp', 300, 168), # --> Analyzer sends to Mirage
)
Order Matters¶
Warning
It is important to note that Mirage enabled metric namespaces must be defined before non Mirage enabled metric namespace tuples as Analyzer uses the first alert tuple that matches.
So for example, with some annotation
ALERTS = (
('skyline', 'smtp', 1800),
('stats_counts.http.rpm.publishers.seasonal_pub1', 'smtp', 300, 168), # --> To Mirage
('stats_counts.http.rpm.publishers.seasonal_pub_freddy', 'smtp', 300, 168), # --> To Mirage
('stats_counts.http.rpm.publishers.*', 'smtp', 300), # --> To alerter
)
The above would ensure if Analyzer found seasonal_pub1 or seasonal_pub_freddy
anomalous, instead of firing an alert as it does for all other
stats_counts.http.rpm.publishers.*
, because they have 168 defined, Analyzer
sends the metric to Mirage.
The below would NOT have the desired effect of analysing the metrics seasonal_pub1 and seasonal_pub_freddy with Mirage
ALERTS = (
('skyline', 'smtp', 1800),
('stats_counts.http.rpm.publishers.*', 'smtp', 300), # --> To alerter
('stats_counts.http.rpm.publishers.seasonal_pub1', 'smtp', 300, 168), # --> NEVER gets reached
('stats_counts.http.rpm.publishers.seasonal_pub_freddy', 'smtp', 300, 168), # --> NEVER gets reached
)
Hopefully it is clear that the first stats_counts.http.rpm.publishers.*
alert tuple would route ALL to alerter and seasonal_pub1 and seasonal_pub_freddy
would never get sent to Mirage to be analyzed.
Enabling¶
And ensure that settings.py
has Mirage options enabled, specifically the
basic ones:
ENABLE_MIRAGE = True
ENABLE_FULL_DURATION_ALERTS = False
MIRAGE_ENABLE_ALERTS = True
Start Mirage and restart Analyzer:
cd skyline/bin
./mirage.d start
./analyzer.d restart
Rate limited¶
Mirage is rate limited to analyze 30 metrics per minute, this is by design and
desired. Surfacing data from Graphite and analyzing ~1000 data points in a
timeseries takes less than 1 second and is much less CPU intensive than
Analyzer in general, but it is probably sufficient to have 30 calls to Graphite
per minute. If a large number of metrics went anomalous, even with Mirage
discarding settings.MIRAGE_STALE_SECONDS
checks due to processing limit,
signals would still be sent.
What Mirage does¶
- If Analyzer finds a metric to be anomalous at
settings.FULL_DURATION
and the metric alert tuple hasSECOND_ORDER_RESOLUTION_HOURS
andsettings.ENABLE_MIRAGE
isTrue
, Analyzer will push the metric variables to the Mirage check file. - Mirage watches for added check files.
- When a check is found, Mirage determines what the configured
SECOND_ORDER_RESOLUTION_HOURS
is for the metric from the tuple insettings.ALERTS
- Mirage queries graphite to surface the json data for the metric timeseries at
SECOND_ORDER_RESOLUTION_HOURS
. - Mirage then analyses the retrieved metric timeseries against the configured
settings.MIRAGE_ALGORITHMS
. - If a metric is an Ionosphere enabled metric, then Mirage does not alert, but hands the metric off to Ionosphere by adding an Ionosphere check file.
- If the metric is anomalous over
SECOND_ORDER_RESOLUTION_HOURS
then alerts via the configured alerters for the matching metricsettings.ALERT
tuple and sets the metric alert key forEXPIRATION_TIME
seconds. - Mirage will alert for a Mirage metric that has been returned from Ionosphere as anomalous having not matched any known features profile or layers.
Boundary¶
Boundary is an extension of Skyline that enables very specific analysis of specified metrics with specified algorithms, with specified alerts.
Boundary was added to allow for threshold-like monitoring to the Skyline
model, it was specifically added to enable the detect_drop_off_cliff
algorithm which could not be bolted nicely into Analyzer (although it
was attempted, it was ugly). While Analyzer allows for the passive
analysis of 1000s of metrics, its algorithms are not perfect. Boundary
allows for the use of the Skyline data and model as a scapel, not just a
sword. Just like Analyzer, Boundary has its own algorithms and
importantly, Boundary is not CONSENSUS
based. This means that you
can match key metrics on “thresholds/limits” and somewhat dynamically
too.
The Boundary concept is quite like Skyline backwards, enilyks. This is because where Analyzer is almost all to one configuration, Boundary is more one configuration to one or many. Where Analyzer is all metrics through all algorithms, Boundary is each metric through one algorithm. Analyzer uses a large range of the timeseries data, Boundary uses the most recent (the now) portion of the timeseries data.
Boundary currently has 3 defined algorithms:
- detect_drop_off_cliff
- less_than
- greater_than
Boundary is run as a separate process just like Analyzer, horizon and mirage. It was not envisaged to analyze all your metrics, but rather your key metrics in an additional dimension/s. If it was run across all of your metrics it would probably be:
- VERY noisy
- CPU intensive
If deployed only key metrics it has a very low footprint (9 seconds on 150 metrics with 2 processes assigned) and a high return. If deployed as intended it should easily coexist with an existing Skyline Analyzer/Mirage setup, with adding minimal load. This also allows one to implement Boundary independently without changing, modifying or impacting on a running Analyzer.
Boundary alerting is similar to Analyzer alerting, but a bit more featureful and introduces the ability to rate limit alerts per alerter channel, as it is not beyond the realms of possibility that at some point all your key metrics may drop off a cliff, but maybe 15 pagerduty alerts every 30 minutes is sufficient, so alert rates are configurable.
Configuration and running Boundary¶
settings.py
has an independent setting blocks and has detailed information
on each setting in its docstring, the main difference from Analyzer being in
terms of number of variables that have to be declared in the alert tuples, e.g:
BOUNDARY_METRICS = (
# ('metric', 'algorithm', EXPIRATION_TIME, MIN_AVERAGE, MIN_AVERAGE_SECONDS, TRIGGER_VALUE, ALERT_THRESHOLD, 'ALERT_VIAS'),
('metric1', 'detect_drop_off_cliff', 1800, 500, 3600, 0, 2, 'smtp'),
('metric2.either', 'less_than', 3600, 0, 0, 15, 2, 'smtp|hipchat'),
('nometric.other', 'greater_than', 3600, 0, 0, 100000, 1, 'smtp'),
)
Once settings.py
has all the Boundary configuration done, start
Boundary:
/opt/skyline/github/skyline/bin/boundary.d start
detect_drop_off_cliff algorithm¶
The detect_drop_off_cliff algorithm provides a method for analysing a
timeseries to determine is the timeseries “dropped off a cliff”. The
standard Skyline Analyzer algorithms do not detect the drop off cliff
pattern very well at all, testing with Crucible has proven. Further to
this, the CONSENSUS
methodology used to determine whether a
timeseries deemed anomalous or not, means that even if one or two
algorithms did detect a drop off cliff type event in a timeseries, it
would not be flagged as anomalous if the CONSENSUS
threshold was not
breached.
The detect_drop_off_cliff algorithm - does just what it says on the tin. Although this may seem like setting and matching a threshold, it is more effective than a threshold as it is dynamically set depending on the data range.
Some things to note about analyzing a timeseries with the algorithm are:
- This algorithm is most suited (accurate) with timeseries where there is a
large range in the timeseries most datapoints are > 100 (e.g high rate).
Arbitrary
trigger
values in the algorithm do filter peaky low rate timeseries, but they can become more noisy with lower value data points, as significant cliff drops are from a lower height, however it still generally matches drops off cliffs on low range metrics. - The trigger tuning based on the timeseries sample range is fairly arbitrary, but has been tested and does filter peaky noise in low range timeseries, which filters most/lots of noise.
- The alogrithm is more suited to data sets which come from multiple sources, e.g. an aggregation of a count from all servers, rather than from individual sources, e.g. a single server’s metric. The many are less likely to experience false positive cliff drops, whereas the individual is more likely to experience true cliff drops.
- ONLY WORKS WITH:
- Positive, whole number timeseries data
- Does not currently work with negative integers in the timeseries values (although it will not break, will just skip if a negative integer is encountered)
For more info see: detect_drop_off_cliff
What Boundary does¶
Boundary is very similar in work flow to Analyzer in terms of how it surfaces and analyzes metrics.
- Boundary determines all unique metrics in Redis.
- Boundary determines what metrics should be analyzed from the
BOUNDARY_METRICS
tuple insettings.py
, matching the defined namespaces to all the unique_metrics list from Redis. - These are divided between the
BOUNDARY_PROCESSES
to be analyzed. - Boundary’s spawned processes pull the all timeseries for the metrics
they are assigned from Redis and iterate through each metric and
analyses it’s timeseries against the algorithm declared for the
metric in the matching
BOUNDARY_METRICS
tuple/s insettings.py
- The Boundary process will add any metric that it finds anomalous to a list of anomalous_metrics.
- The parent Boundary process will then check every metric in the
anomalous_metrics list to see if:
- An alert has been triggered for the metric within the last
EXPIRATION_TIME
seconds by querying the metric’s Boundary alert key in Redis - If no alert key is set, send alert/s to configured alerters and sets
the metric’s Boundary alert key in for
EXPIRATION_TIME
seconds in Redis. - If no alert key is set and
settings.PANORAMA_ENABLED
is True, the anomalous metrics details will be inserted into the database.
- An alert has been triggered for the metric within the last
Crucible¶
Crucible is an extension of Skyline based on Abe Stanway’s Crucible testing suite. Crucible has been integrated into Skyline as a module and daemon to allow for the following:
- Allowing for the ad-hoc analysis of a timeseries and generation of resultant resources (e.g. plot images, etc).
- Allowing for the ad-hoc analysis of a timeseries by a specific or adhoc defined algorithma nd generation of resultant resources (e.g. plot images, etc).
- For storing the individual timeseries data for triggered anomalies.
Be forewarned that Crucible can generated a substantial amount of data in the timeseries json archives, especially if it is enabled in any of the following contexts:
settings.ANALYZER_CRUCIBLE_ENABLED
settings.MIRAGE_CRUCIBLE_ENABLED
settings.BOUNDARY_CRUCIBLE_ENABLED
Usage¶
Crucible is not necessarily meant for generic Analyzer, Mirage or Boundary inputs although it does works and was patterned using data fed to it from Analyzer. Crucible is more aimed and having the ability to add miscellaneous timeseries and algorithms in an ad-hoc manner. Enabling the use of data sources other than Graphite and Redis data and testing alogrithms in an ad-hoc manner too.
Crucible is enabled by default in its own settings block in settings.py
, but
it disabled default in each of the apps own settings.
Why use .txt check files¶
The rationale behind offloading checks.
A number of the Skyline daemons create txt check and metadata files, and json timeseries files.
For example Analyzer creates txt check files for Mirage, Crucible and Panorama. These txt files are created in logical directory structures that mirror Graphite’s whisper storage directories for stats, etc. Often other timeseries data sets that are not Graphite, machine or app related metrics, are also structured in a directory or tree structure and follow similar naming convention, which allows for this tree and txt files design to work with a large number of other timeseries data sets too.
Although there is an argument that checks and their related metadata could also be queued through Redis or another queue application, using the local filesystem is the simplest method to pass data between the Skyline modules, without introducing additional dependencies.
While ensuring that Redis is being queried as minimally as required to do analysis. The shuffling of data and querying of “queues” is offloaded to the filesystem. Resulting in each module being somewhat autonomous in terms of managing its own work, decoupled from the other modules.
Today’s filesystems are more than capable of handling this load. The use of txt files also provides an event history, which transient Redis data does not.
Importantly in terms of Crucible ad-hoc testing, txt and json timeseries files provide a simple and standardised method to push ad-hoc checks into Crucible.
What Crucible does¶
Crucible has 3 roles:
- Store resources (timeseries json and graph pngs) for triggered anomalies.
- Run ad-hoc analysis on any timeseries and create matplotlib plots for the run algorithms.
- To update the Panorama database (tbd for future Panorama branch)
Crucible can be used to analyse any triggered anomaly on an ad-hoc basis. The timeseries is stored in gzipped json for triggered anomalies so that retrospective full analysis can be carried out on a snapshot of the timeseries as it was when the trigger/s fired without the timeseries being changed by aggregation and retention operations.
Crucible can create a large amount of data files and require significant disk space.
Panorama¶
The Panorama service is responsible for recording the metadata for each anomaly.
It is important to remember that the Skyline analysis apps only alert on metrics
that have alert tuples set. Panorama records samples of all metrics that are
flagged as anomalous. Sampling at settings.PANORAMA_EXPIRY_TIME
, the
default is 900 seconds.
The settings.PANORAMA_CHECK_MAX_AGE
ensures that Panorama only processes
checks that are not older than this value. This mitigates against Panorama
stampeding against the MySQL database, if either Panorama or MySQL were stopped
and there are a lot of Panorama check files queued to process. If this is set
to 0, Panorama will process all checks, regardless of age.
There is a Panorama view in the Skyline Webapp frontend UI to allow you to search and view historical anomalies.
Create a MySQL database¶
You can install and run MySQL on the Skyline server or use any existing MySQL
server to create the database on. The Skyline server just has to be able to
access the database with the user and password you configure in settings.py
settings.PANORAMA_DBUSER
and settings.PANORAMA_DBUSERPASS
Note
It is recommended, if possible that MySQL is configured to use a single
file per InnoDB table with the MySQL config option - innodb_file_per_table=1
This is due to the fact that the anomalies and Ionosphere related MySQL tables
are InnoDB tables and all the other core Skyline DB tables are MyISAM.
If you are adding the Skyline DB to an existing MySQL database server please
consider the ramifications to your setup. It is not a requirement, just a
tidier and more efficient way to run MySQL InnoDB tables in terms of
managing space allocations with InnoDB and it segregates databases from each
other in the context on the .ibd file spaces, making for easier management of
each individual database in terms of ibd file space. However that was really
only an additional caution. Retrospectively, it is unlikely that the
anomalies table will ever really be a major problem in terms of the page space
requirements any time soon. It appears and it is hoped anyway, time and
really big data sets may invalidate this in the future, Gaia DR1 MySQL say :)
- See
skyline.sql
in your cloned Skyline repo for the schema creation script - Enable Panorama and set the other Panorama settings in
settings.py
- Start Panorama (use you appropriate PATH) - or go back to Installation and continue with the installation steps and Panorama will be started later in the installation process.
/opt/skyline/github/skyline/bin/panorama.d start
Webapp¶
The Webapp service is a simple gunicorn/Flask based frontend that provides a web UI to:
- Visualise the latest anomalies that have been triggered.
- Visualise historic anomalies that have been recorded by Panorama.
- An api to query Redis for a metric timeseries which returns the timeseries in
json e.g.
/api?metric=skyline.analyzer.hostname.total_metrics
- An api to query Graphite for a metric timeseries which returns the timeseries in
json that takes the following parameters:
graphite_metric
- metric namefrom_timestamp
- unix timestampuntil_timestamp
- unix timestamp- e.g.
/api?graphite_metric=skyline.analyzer.hostname.total_metrics&from_timestamp=1370975198&until_timestamp=1403204156
- Publish the Skyline docs.
- Browse the Redis DB with a port of Marian Steinbach’s excellent rebrow https://github.com/marians/rebrow
A basic overview of the Webapp¶

Deploying the Webapp¶
Originally the Webapp was deployed behind the simple Flask development server, however for numerous reasons, this is less than ideal. Although the Webapp can still be run with Flask only, the recommended way to run the Webapp is via gunicorn, which can be HTTP proxied by Apache or nginx, etc. The gunicorn Webapp can be exposed just like the Flask Webapp, but it is recommended to run it HTTP proxied.
Using a production grade HTTP application¶
It must be noted and stated that you should consider running the Skyline Webapp behind a production grade HTTP application, regardless of the implemented basic security measures. Something like Apache or nginx serving the Webapp via gunicorn.
This may seem like overkill, however there are a number of valid reasons for this.
Production infrastructure¶
It is highly probable that Skyline will often be run on cloud based infrastructure which is public and should therefore really be considered production.
Flask has a Deploying Options sections that covers running Flask apps in production environments. See http://flask.pocoo.org/docs/0.11/deploying/
In addition to that, considering that the Webapp now has MySQL in the mix, this element adds further reason to properly secure the environment.
There is potential for XSS and SQL injection via the Webapp, ensure TRUSTED access only.
Apache and gunicorn¶
Although there are a number of options to run a production grade wsgi frontend,
the example here will document serving gunicorn via Apache reverse proxy with
authentication. Although Apache mod_wsgi may seem like the natural fit here, in
terms of virtualenv and Python make_altinstall
, gunicorn has much less
external dependencies. gunicorn can be easily installed and run in any
virtualenv, therefore it keeps it within the Skyline Python environment, rather
than offloading very complex Python and mod_wsgi compiles to the user,
orchestration and package management.
Apache is a common enough pattern and gunicorn can be handled within the Skyline package and requirements.txt
See etc/skyline.httpd.conf.d.example
for an example of an Apache conf.d
configuration file to serve the Webapp via gunicorn and reverse proxy on port
8080 with basic HTTP authentication and restricted IP access. Note that your
username and password must match in both the Apache htpasswd and the
settings.WEBAPP_AUTH_USER
/settings.WEBAPP_AUTH_USER_PASSWORD
contexts as Apache will authenticate the user and forward on the authentication
details to gunicorn for the Webapp to also authenticate the user.
Authentication is enabled by default in settings.py
.
Feel free to use nginx, lighttpd, et al.
Securing the Webapp¶
Firewall rules¶
The Webapp should be secured with proper firewall rules that restrict access
to the settings.WEBAPP_IP
and settings.WEBAPP_PORT
(and/or just
the reverse proxy port for gunicorn if being used) from trusted IP
addresses only.
Basic security¶
There are some simple and basic security measures implemented with the Webapp.
IP restricted access¶
The default settings.WEBAPP_ALLOWED_IPS
only allows from 127.0.0.1, add
your desired allowed IPs.
psuedo basic HTTP auth¶
There is single user that can access the web UI all access must be authenticated.
Restricted by default¶
These simple measures are an attempt to ensure that the Skyline web UI is not totally open by default, but rather totally restricted by default. This adds a bit of defense in depth and hopefully will mitigate against unauthorized access in the event that some day, someone may have their firewall misconfigured in some way, either through error or accident.
These basic restrictions DO NOT replace the need for proper firewall rules or a production grade HTTP application.
Logging¶
Flask’s development server is based on werkzeug, whose WSGIRequestHandler is, in turn, based in the BaseHTTPServer from the standard lib. This means that WSGIRequestHandler overrides the logging methods, log_request, log_error and log_message, to use it’s own logging.Logger. So there is no access logging in Skyline Webapp log. It is possible to hack this around a bit, but this means application error logging would get shifted from the Webapp log to the access log, which is not ideal.
Panorama web UI¶
Basic function¶
The Panorama web UI allows you to search the anomalies recorded by Panorama in
the database. It currently allows you to search through the anomaly records by
various filters, which are converted into MySQL SELECT
queries which
return the details regarding the anomalies found from the search criteria. The
Webapp then returns these to the browser and the client side javascript then
passes the relevant metric details to the Webapp api endpoint to surface the
metric timeseries from Graphite and the api returns the timeseries json to the
browser to graph the timeseries.
Closest approximations¶
The Panorama anomaly records only hold the details regarding the anomaly, not the data. The Panorama UI takes the returned anomalies from a search and retrieves the timeseries for the time period relevant to the anomaly from Graphite on demand. The UI graphs the timeseries to visualise the context of the anomaly, as best possible. Due to the fact that Panorama is storing anomaly details in real time and the Panorama web UI is surfacing timeseries historically, any Graphite aggregations in timeseries can result in the specific anomalous datapoint not being present in the related timeseries. In these instances the Panorama graphs will indicate this and visually present a closest approximation of where the anomalous line would be, using a thicker, orange horizontal line as the indicator, rather than the thinner, normal red horizontal line.

Time zones¶
It must be noted that the Panorama view graphs can be rendered differently
depending on the browser and server time zone. The original dygraph renders
used the javascript Date
function to generate the datetime ticker, etc. If
your Graphite server happens to be in a different time zone to user browser,
this would display clock skews where the Panorama reported anomaly details do
not match the graph times displayed. Webapp in the Panorama view allows you to
either use the browser time zone or use a fixed timezone so that all rendered
graphs are the same no matter where is the world they are viewed from.
By default, the browser time zone setting is used, as per the original Skyline
UI, you can use settings.WEBAPP_USER_TIMEZONE
and
settings.WEBAPP_FIXED_TIMEZONE
to modify this behavior if required.
rebrow¶
Skyline uses a modified port of Marian Steinbach’s excellent rebrow Flask Redis browser - rebrow. A modified port was used for a number of reasons:
- rebrow does not handle msg-pack encoded keys.
- The pubsub functionality was unneeded.
- Serving it in an iframe was bothersome.
- Having an additional dependency, app and service for another Flask app seemed to be a bit of overkill.
- Having it native in the Skyline Webapp UI was neater and prettier.
Please do clone https://github.com/marians/rebrow, just so Marian gets some clones.
With the addition of a number of Panorama and other app related keys,
rebrow adds a window into Redis, to allow for the verification of
key creation and providing a view of *last_alert.*
and
panorama.mysql_ids.*
keys.
Basic function¶
The Panorama web UI allows you to search the anomalies recorded by Panorama in
the database. It currently allows you to search through the anomaly records by
various filters, which are converted into MySQL SELECT
queries which
return the details regarding the anomalies found from the search criteria. The
Webapp then returns these to the browser and the client side javascript then
passes the relevant metric details to the Webapp api endpoint to surface the
metric timeseries from Graphite and the api returns the timeseries json to the
browser to graph the timeseries.
Ionosphere¶
Ionosphere is a story about timeseries.
Dedicated to my father, Derek, a man of numbers if there ever was one.
What Ionosphere is for - humble beginnings¶
YOU can teach a system.
YOU can help a system to LEARN.
You want to monitor metrics on small VPSs that do not do a great deal, meaning there is no high, rate constant work or 3sigma consistency in the metrics. There are 0 to 7 apache.sending a day, not 7000 a minute. Or there are some peaks on the metric stats_counts.statsd.bad_lines_seen

This is NOT ANOMALOUS
Even in massive, workload intensive, dynamic systems there are always some metrics that are low rate and relatively low range. And at 3sigma... that means noisy.
Ionosphere’s goal is to allow us to train Skyline on what is not anomalous and thereby compliment statistical 3simga means with some new methodologies of trying to “teach” the system. Of course teaching statistics algorithms something new is unfortunately not possible, but luckily teaching the system is possible. Giving the system better contextual data to work with is the key factor.
Overview¶
- Ionosphere gives the operator an input into Skyline to allow them to train it and compliment the existing statistical 3sigma anomaly detection methods.
- Ionosphere then starts to learn, retrospectively.
- Is Ionosphere the panacea for anomaly detection? No.
- Is Ionosphere immediately useful for web scale anomaly detection? No. Unless you are already doing web scale anomaly detection with Skyline, then still no. However over time yes. You cannot rush timeseries.
- Is Ionosphere a lot of work to train, yes.
- Ionosphere uses a timeseries similarality comparison method to compare two different timeseries for a metric.
- This method is based on summing the features that are calculated for the timeseries using tsfresh to “fingerprint” the data set.
- This allows the operator to fingerprint and profile what is not anomalous, but normal and to be expected, even if 3sigma will always detect it as anomalous.
- Ionosphere also allows for the creation of layers rules which allow the operator to define ad-hoc boundaries and conditions that can be considered as not anomalous for each features profile that is created.
How well does it work?¶
This is a graph of the impact that Ionosphere has had on the anomaly count of ionosphere_enabled metrics before and after the deployment of Ionosphere.

It is clear that when the features profiles and layers are introduced and Skyline starts to learn, the anomaly counts for metrics that have been trained drop significantly, but it is not perfect.
Significant events which happen infrequently and change machine metrics significantly are still difficult to handle. Things like the occasional importing of a DB, reboots, zipping up a bunch of things, however layers do allow one to “describe” these events on metrics as normal should they so wish.
And Ionosphere does not make handling downtime and missing data points in metrics any easier, in fact it can make it harder.
“Do I want to train Skyline to think that the hole in this metric for 10 hours yesterday while OVH had 2 x 20 Kv lines down and generator problems in France, is normal? Maybe I should wait a week before training Skyline on any metrics that have been OVH’d...”
Deploying and setting Ionosphere up¶
Ionosphere has a specific section of variables in settings.py
that are
documented in the settings specific documentation see all the settings below:
settings.IONOSPHERE_CHECK_PATH
- http://earthgecko-skyline.readthedocs.io/en/latest/skyline.html#settings.IONOSPHERE_CHECK_PATH
The Ionosphere settings follow normal Skyline settings conventions in terms of being able to paint all metrics with certain defaults and allowing for specific settings for specific or wildcard namespaces. If you know Skyline then these should all be fairly intuitive.
If you are new to Skyline, then by the time you have all the other Skyline apps set up and running you should have a more intuitive feel for these to after having spent some time looking over your running Skyline. You cannot rush timeseries :)
Ionosphere static demo pages¶
The demo pages show examples of how Ionosphere presents the operator with the metric’s details for the specific point anomaly and the all the Graphite metric graphs at multiple resolutions so that the operator can evaluation the anomaly in the full context of the metric’s history.
Graphs demo page - multiple resolution Graphite graphs for context¶
The demo page uses an edge case anomaly one which is not easy to decide whether it is or is not an anomaly.
Would YOU consider it anomalous or not? Would you want an alert?
Note that this is demo page is from an older version of Ionosphere before learn was introduced. So it does not give the operator the 2 options:
- Create features profile and DO NOT LEARN
- Create features profile and LEARN
It only gives the one option, however this demo page is about the multiple resolution Graphite graphs giving the operator context, so do not worry about the buttons, look at the graphs, see the Graphite :: graphs NOW section on the below demo page.
See Ionosphere static multiple resolution graphs demo page for a clearer picture (opens in a new browser tab). Is it anomalous or not anomalous?
Features profile demo page with matched graphs¶
This demo page shows an existing features profile with all the graph resources that the profile was created, see the Graphite :: graphs WHEN created :: at 7h, 24h, 7d and 30d section in the below demo page.
A series of matched graphs, showing the instances where Ionosphere has
analyzed the Analyzer detected anomalouse timeseries and found it be not
anomalous because the calculated features_sum
of the Analyzer anomalous
timeseries were within 1% difference of the features_sum :: 73931.7673978000
that was calculated for features profile 269 see
Graphite :: graphs MATCHED section in the below demo page.
See Ionosphere static features profile demo page with matched graphs for a clearer picture.
Features profile search demo page with generational information¶
See Ionosphere static search features profiles demo page with generation info for a clearer picture with generational, parent_id, number of times checked, number of times matched information.
Things to consider¶
Contextual anomalies - Earthquakes and Earth tremors¶
A point anomaly is only as contextual as the timeframe in which it is considered to be anomalous in.
The following metaphor can be used to describe this concept. As this concept is important to understand in terms of Ionosphere and Mirage.
Let us take Bob as an example, Bob lived in the UK and decided to move to San Francisco because he landed a good job at a cool San Jose data analytics upstart. In the first week he is there, the Bay Area has a few Earth tremors, to Bob from the UK this is ANOMALOUS!!! Luckily Bob has a friendly co-worker called Alice and she directs him to http://earthquaketrack.com/us-ca-san-jose/recent and shows him it is not that anomalous, it is quite normal. Alice shows Bob to consider the context of these events in at a fuller duration.
- 5 earthquakes in the past 24 hours
- 5 earthquakes in the past 7 days
- 28 earthquakes in the past 30 days
- 381 earthquakes in the past 365 days
See also
Bob’s Hampshire earthquake data, if there is any...
And the penny drops for Bob that HERE in the Bay Area this is obviously this quite normal.
Bob then wonders to himself why he did not think about this before leaving his stable shire in the UK. He consoles himself by thinking “Well all the VCs and players are here... so it obviously cannot be a big single point of failure.
bob = 'Skyline'
alice = 'You'
Skyline does not know all the contexts to the data, you do. Ionosphere lets us teach Bob that is not an earthquake!!! and enables Bob to look and ask, “Did Alice say this was not an earthquake, let me look”.
“Create” or “Create and LEARN”¶
With Ionosphere, you have the option to allow it to learn these things for itself, as long as you tell it what it is ALLOWED to learn at the fuller duration.
So Ionosphere gives you 2 options:

Only make a features profile based on the settings.FULL_DURATION
data.

This is not an anomaly now or then or in the forseeable future if it
looks anything like the settings.FULL_DURATION
or any of the multiple
resolution Graphite graphs, LEARN it at the learn_full_duration
.
This means you do not have to ship that earthquake that happened 17 days ago into
Ionosphere’s features profiles and teach it BAD, badly. You can just tell it
to see the relevant Analyzer settings.FULL_DURATION
or Mirage the
SECOND_ORDER_RESOLUTION_HOURS
data as not anomalous and not learn at
the fuller duration of the metric’s learn_full_duration
.
You can teach Ionosphere badly, but to unteach it is just a SQL update.
How Ionosphere works - as simple overview as possible¶
Firstly one needs to understand there is a chicken and egg aspect to Ionosphere. Which if you have read up to this point, hopefully you have already got that point.
Ionosphere has a number of roles that are centered on feature extractions, feature calculations and comparisons and a role centered on learning.
The features role¶
Ionosphere only analyses SMTP alerter enabled metrics.
Once Ionosphere is enabled, if Analyzer or Mirage detect an anomaly on a metric they:
- Save the training data set and the anomaly details
- If the metric is not an
ionosphere_enabled
metric and SMTP alert enabled metric, an alert is triggered and all the created alert images are saved in the training data directory as well. - If the metric is an
ionosphere_enabled
metric, Analyzer and Mirage defer the timeseries to Ionosphere, via a check file, for Ionosphere to make a decision on. More on that below.
Ionosphere serves the training data set for each triggered anomaly, ready for a human to come along in the Webapp Ionosphere UI and say, “that is not anomalous” (if it is not).
At the point the operator makes a features profile, all the features values that are created for the not anomalous timeseries are entered into the database and the metric becomes an
ionosphere_enabled
metric, if it was not one already.All the anomaly resources are then copied to the specific features profile directory that is created for the features profile.Over time,
Once a metric is
ionosphere_enabled
, both Analyzer and Mirage will refer any anomalies found for the metric to Ionosphere instead of just alerting.When a 3sigma anomalous timeseries is sent to Ionosphere, it calculates the features with tsfresh for the 3sigma anomalous timeseries and then compares the common features sums with those of previously recorded features profiles. If the two values are less than
settings.IONOSPHERE_LEARN_DEFAULT_MAX_PERCENT_DIFF_FROM_ORIGIN
- Ionosphere will deem the timeseries as not anomalous and update the related training data as MATCHED.
- If Ionosphere does not find a match, it analyses the timeseries against any defined layers, if there are any and if a match is found Ionosphere will deem the timeseries as not anomalous and update the related training data as MATCHED.
- If Ionosphere does not find a match, it tells the originating app (
Analyzer or Mirage) to send out the anomaly alert with a
[Skyline alert] - Ionosphere ALERT
subject field.
The learning role¶
- Once a features profile has been made for a metric with the LEARN option, for
every unmatched anomaly that training_data is created for, after the
learn_valid_ts_older_than
seconds have been reached, Ionosphere will attempt to “learn” whether the anomalous event afterlearn_valid_ts_older_than
seconds and any subsequent aggregration has had time to occur, if the timeseries features atlearn_full_duration
seconds match any feature profiles that were created for the metric at thelearn_full_duration
. - If Ionosphere finds a match to the features calculated from the metric
timeseries that it surfaces the from Graphite at
learn_full_duration
, it will use the anomaly training data to create a features profile for the metric at the metric’ssettings.FULL_DURATION
orSECOND_ORDER_RESOLUTION_HOURS
(whichever is applicable) and it will also create a features profile with thelearn_full_duration
data that matched, as long as thesettings.FULL_DURATION
orSECOND_ORDER_RESOLUTION_HOURS
features sum as long as the difference is within thesettings.IONOSPHERE_LEARN_DEFAULT_MAX_PERCENT_DIFF_FROM_ORIGIN
or the metric specificmax_percent_diff_from_origin
Input¶
When an anomaly alert is sent out via email, a link to the Ionosphere training data is included in the alert. This link opens the Ionosphere UI with the all training data for the specific anomaly where the user can submit the metric timeseries as not anomalous and generate have Skyline generate a features profile with tsfresh (and optionally some additional layers, which are covered further down on this page).
features profiles¶
When a training data set is submitted as not anomalous for a metric a features
profile is extracted from the timeseries using tsfresh. This features profile
contains the about values of 216 features (currently as of tsfresh-0.4.0), such
as median, mean, variance, etc, for a full list of known features that are
calculated see tsfresh_feature_names.TSFRESH_FEATURES
.
This features profile is then stored in the Skyline MySQL database in the following manner. For every metric that has a features profile that is created, 2 MySQL InnoDB tables are created for the metric.
- The features profile details are inserted into the ionosphere table and the features profile gets a unique id.
- z_fp_<metric_id> - features profile metric table which contains the features profile id, feature name id and the calculated value of the feature.
- z_ts_<metric_id> - the timeseries data for the metric on which a features profile was calculated.
These tables are prefixed with z_
so that they are all listed after all core
Skyline database tables. Once a metric has a z_fp_<metric_id> and a
z_ts_<metric_id>, these tables are updated any future features profiles and
timeseries data. So there is are 2 tables per metric, not tables per features
profile.
How Ionosphere is “learning”?¶
Ionosphere may have had humble beginnings, but adding this seemingly trivial function was anything but humble, simple or easy. So to solve the seemingly simple problem, something completely new had to be pieced together.
Ionosphere “learns” timeseries and makes decisions based on a timeseries similarities comparison method, based on a method using the tsfresh package.
This “learning” is base upon determining the similarities in timeseries that could be best described as attempting to determine how similar 2 timeseries are in terms of the amount of “power/energy”, range and “movement” there is within the timeseries data set. A fingerprint or signature if you like, but understand that neither are perfect. This timeseries similarities comparison method is not perfect in the dynamic, operational arena, but it achieves the goal of being useful. However, it must be stated that it can be almost perfect, a tsfresh features profile sum is (about as) perfect as you can get at 0% difference (there may be edge cases). However using it with 100% matching is not useful to learning and trying to profile something like the Active Brownian Motion (for want of a better way of explaining it). Lots of dynamic metrics/systems will exhibit a tendency to try an achieve Active Brownian Motion, not all but many and definitely at differing and some times multiple seasonalities.
For a very good overview of Active Brownian Motion please see the @kempa-liehr description at
See also
https://github.com/blue-yonder/tsfresh/pull/143#issuecomment-272314801 - “Dynamic systems have stable and unstable fixed points. Without noise these systems would be driven to one of their stable fixed points and stay there for ever. So, fixed points resemble an equilibrium state”
Ionosphere enables us to try and profile something similar to Active Brownian Motion as the norm, again for want of a better way of trying to explain it.
However, contextually, Ionosphere nor the tsfresh implemented method, will ever be perfect, unless 2 timeseries have identical data, consistently, without change. But how challenging would that be? :)
Also it may be possible that an identical timeseries reversed may give the same or negative of a features sum and a mirror image timeseries can have very similar calculated feature sums.
Anyway, it is not perfect, by design. Evolution tends to not achieve perfection, attaining a working, functional state is usually the norm in evolution it seems.
Evolutionary learning - generations¶
Ionosphere uses an evolutionary learning model that records (and limits) the
generations of trained and learnt features profiles per metric. Limits can be
set in `settings.py
and played around with. For veterans of Skyline, these
tend to be much like settings.CONSENSUS
, what is the correct CONSENSUS
?
They are tweak and tunable. Keep them low, you give Ionosphere less leverage to learn. But you will bump them up so that it can learn more and better.
Although this documentation may seem overly chatty and verbose, all things have stories. And why should documentation be overly dull, try explaining Skyline to someone, good luck. You should see me at parties. Anyway not many people read this, so it does not really matter :)
If you want to understand Skyline more, look at the code. But be gentle :)
Or better yet, set it up. Being able to teach a system and see it learn it pretty cool, just look at https://github.com/xviniette/FlappyLearning and NeuroEvolution.js (thanks for the inspiration from @nylar and @xviniette)
Lots of Skyline concepts are easy enough to get, some are not so easy and when they are all tied together with a splash of seasonality and a dash of similarities, it gets quite complicated.
However, all Skyline pieces, individually, are relatively simple. Seeing them work helps or hinders depending on your outlook... “Shit lots of stuff is anomalous” can often lead to lots of work, debugging, fine tuning and making better or polishing a turd or diamante.
Layers¶
Ionosphere allows the operator to train Skyline a not anomalous timeseries in terms of generating a features profile to be compared to anomalies in the future, however Ionosphere also allows the operator to define “layers” rules at the time of feature profile creation.
Layers rules allow us to train Skyline on boundaries as well, on the fly via the
UI at the time of features profile creation, which puts all the work for the
operator in the one place. Think of them as metric AND feature profile specific
algorithms. A layer should only ever be used to describe the features profile
settings.FULL_DURATION
timeseries. The operator should limit their
layers values to within acceptable bounds of the range within the features
profile. The operator should not try and use a single layer to try and describe
the entire range they “think” the metric can go to, a layer is meant to match with
a features profile, not a metric. If this methodology is followed, layers and
features profiles “retire” around the same time as metrics change over time,
an old features profile that no longer describes the current active motion state
will no longer ever be matched, neither should its layers. One of the
things some way down the road on the Ionosphere roadmap is
Feature #1888: Ionosphere learn - evolutionary maturity forget
Layers were added to reduce the number of features profiles one has to train Skyline on. They were introduced for humans and to make it easier and more useful. However they come at a cost. Every layer created reduces Ionosphere’s opportunities to be trained and learn. It is a compromise to save on the amount of monkeys you have to have or need to be to train Skyline properly. Unfortunately someone has to be the monkey, but for every features profile/layer you create, you create a Skyline monkey to watch that. A monkey with fairly simple instructions.
A layer consist of a series of simple algorithms that are run against a timeseries after Analyzer/Mirage and Ionosphere features comparisons. The layers are defined as:
D layer [required] if last_datapoint [<, >, ==, !=, <=, >=] x
DISCARD - not_anomalous False
The D layer can be used as the upper or lower limit, e.g if value > x (probably or certainly anomalous). Or this can be used if this metric operates in the negative range or if you want it too not discard on 0 as you want to match 0, set it to -1 or > 0.1 or > 1. On a high constant rate metric the D layer can be used to discard if < x so the the layer does not silence a drop. This layer can be complimented by the optional D1 layer below. Remember a match here disables any of the other below layers being checked
D1 layer [optional] if datapoint [<, >, ==, !=, <=, >=] x in the last y values in the timeseries
DISCARD - not_anomalous False
The D1 layer can be used as an upper or lower limit, so the D layer does not silence a drop. Remember a match here disables any of the other below layer conditions from being checked.
E layer [required] if datapoint [<, >, ==, !=, <=, >=] x in the last y values in the timeseries
not anomalous
The Es, F1 and F2 layers shall not be discussed as NOT IMPLEMENTED YET.
An example layer
For instance, say occasionally we can expect to see a spike of 404s status codes on a web app due to bots or your own scanning, with layers we can tell Ionosphere that a timeseries was not anomalous if the datapoint is less than 120 and has values in the last 3 datapoints is less than 50. This allows for a somewhat moving window and an alert that would be delayed by say 3 minutes, but it is a signal, rather than noise. Let us describe that layer as gt_120-5_in_3
To demonstrate how the above layer would work, an example of 404 counts per minute:
D layer :: if value > 120 :: [do not check] :: ['ACTIVE']
D1 layer :: if value none none in last none values :: [do not check] :: ['NOT ACTIVE - D1 layer not created']
E layer :: if value < 5 in last 3 values :: [not_anomalous, if active Es, F1 and F2 layers match] :: ['ACTIVE']
Es layer :: if day None None :: [not_anomalous, if active F1 and F2 layers match] :: ['NOT ACTIVE - Es layer not created']
F1 layer :: if from_time > None :: [not_anomalous, if active F2 layer matchs] :: ['NOT ACTIVE - F1 layer not created']
F2 layer :: if until_time < None :: [not_anomalous] :: ['NOT ACTIVE - F2 layer not created']
Apply against
13:10:11 2
13:11:11 0
13:12:11 8
13:13:11 60
13:14:11 0
With the above described layer, this would be classified as not anomalous, however if the data was:
13:10:11 2
13:11:11 0
13:12:11 800
The layer would not ever report the timeseries as not anomalous as the 800 exceeds the gt_120, so the rest of the layer definition would not be evaluated.
Warning
Layers may seem simple, but the layers must be thought about carefully as it is possible for a metric to have multiple layers created on multiple features profiles, that could silence any anomalies on the metric. Specifically D layer, however layer D1 was added to remove this possibility, if the layers are properly implemented. The D1 layer is optional (and is reverse capable with with any existing layers that were created prior to 1.1.3-beta) as is there to let the operator set upper and lower bounds where necessary.
Be careful that you do not create another layer later that silences bad, e.g. dropped to 0, the above example is not a good example of the as we want and expect 0 on the 404 found generally, but if it was status code 200, we would not want any layers silencing a drop to 0, please try and use layer D1 wisely were required.
DISABLED features profiles¶
Ionosphere learning is not perfect, sometimes it will get it wrong as far as a human is concerned. Luckily that does not happen often, but it will happen.
Ionosphere lets the operator disable any features profile that they deem as anomalous. This can be due to a features profile having been LEARNT and the operator thinks it to be anomalous or an operator could create a features profile that they decide was in error, this can especially be true when on re-evaluating after creating with the “and LEARN” option enabled, but looking at the 30 day data and thinking... “hmmm actually I do not really want it to learn that spike from 2 weeks ago”.
If a features profile is DISABLED, all its progeny features profiles are disabled as well. This ensures that every features profile was LEARNT from the profile and any that were LEARNT from any of them are disabled too so that the pattern is removed from evaluation during analysis of the metric in the future.
No machine learning¶
Ionosphere brings no machine learning to Skyline per se. It is merely making programmatic decisions based on the data it is provided with, things a human operator tells it are not anomalous. Ionosphere is an attempt to give Skyline an Apollo Program refit. Enabling the pilots to take control, have inputs.
For Humans¶
If Ionosphere achieves the sentiments expressed in Brian L. Troutwine @bltroutwine seminal Belgium 2014 devopsdays presentation, then it has achieved a goal.
- Automation with Humans in Mind: Making Complex Systems Predictable, Reliable and Humane - https://legacy.devopsdays.org/events/2014-belgium/proposals/automation-with-humans-in-mind/
- video - http://www.ustream.tv/recorded/54703629
Ionosphere first and foremost was created to give this dimension of human piloting where necessary. Giving Skyline that ability to allow human input in some form to “teach” Skyline what is not anomalous comes with a number additional benefits, like giving the Skyline the information needed to learn how to make decisions based on the input data it is provided within.
The initial goal has been achieved, but it comes at a price. Everything has a
cost and here the cost is the operator needs to
train_ionosphere_learn == time_in_seconds # about 12 seconds
.
Ionosphere can only be activated by the input from a human neocortex to tell it what is not anomalous. Some brain CPU cycles, opening emails and clicks, assess 1 or 2 more clicks. It is not easy, however that said it is effective at what it set out to achieve.
Current state¶
It appears that Ionosphere is better at doing what it was intended for than doing what it was not intended for. All timeseries not being created equal.
Ionosphere does low range, low rate metrics very well.
Ionosphere does them better than high rate, highly variable metrics, when it saw
first light at least. This is not to say that it does not do high rate, highly
variable metrics, it just needs a lot more features profiles for the metric
describing what is not anomalous. However it is possible that a larger
settings.IONOSPHERE_LEARN_DEFAULT_MAX_PERCENT_DIFF_FROM_ORIGIN
or metric
specific max_percent_diff_from_origin
may work quite well on large volume
and high variability metrics, time will tell.
Over the fullness of time and data, these learning efficiency metrics will be available via the database data for analysis.
tsfresh¶
The tsfresh package and features extraction functions, enabled this ability of features calculation on a wholesale scale, without having to design lots of algorithms to calculate the timeseries features for. The tsfresh package enabled Ionosphere to happen much FASTER, it calculates all the features that are required to make this method viable and work. They said:
"Spend less time on feature engineering"
They were not wrong. Skyline has added a lot of “checks” to ensure consistency in the tsfresh calculated features so that a features profile is not affected by any changes that may be implemented in the tsfresh package. All of this has been pushed back into tsfresh and may be one of the reasons why the actual development of Ionosphere took so long, but you cannot rush timeseries.
This overview of Ionosphere could not be complete without a special thanks to the tsfresh people @MaxBenChrist, @nils-braun and @jneuff who are some of nicest people in open source, on par with @astanway :)
Thanks to blue-yonder for supporting the open sourcing of tsfresh.
memcached¶
Ionosphere uses memcached and pymemcache (see https://github.com/pinterest/pymemcache) to cache some DB data. This optimises DB usage and ensures that any large anomalous event does not result in Ionosphere making all the DB metrics become anomalous :)
The architectural decision to introduce memcached when Redis is already available, was done to ensure that Redis is for timeseries data (and alert keys) and memcached isolates DB data caching. The memcache data is truly transient, where as the Redis data is more persistent data and memcached is a mature, easy and well documented.
Cached data¶
Ionosphere caches the following data:
- features profile features values from the z_fp_<metric_id> table - no expiry
- metrics table metric record - expire=3600
- metric feature profile ids - expire=3600
Note
due to caching a metric and a features profile can take up to 1 hour to become live.
Operational considerations¶
No UI data update method¶
There is no method to modify the DB data via the UI. If you want to make any
changes, they must be made directly against the DB. Disabling a features
profile, deleting features profiles, changing any of the metrics values once set
for metrics e.g. learn_full_duration
, learn_valid_ts_older_than
,
max_generations
or max_percent_diff_from_origin
Backup¶
- Backup the MySQL DB to another machine or better slave it and backup the slave.
- rsync backup /opt/skyline/ionosphere/features_profiles to another machine, frequently (for the time being, until autobuild is available, however autobuild will not a able to recreate all the resources, but most).
MySQL configuration¶
There could be a lot of tables. DEFINITELY implement innodb_file_per_table
in MySQL.
Ionosphere - autobuild features_profiles dir¶
Warning
autobuild - TBD at some point in the future, for now see the Backup section above.
The number of features_profiles dirs that Ionosphere learn could spawn and the amount of data storage that would result is unknown. It is possible the operator is going to need to prune this data a lot of which will probably never be looked at. Or a Skyline node is going to fail, not have the features_profiles dirs backed up and all the data is going to be lost or deleted. So it is possible for Ionosphere to created all the human interrupted resources for the features profile back under a best effort methodology. Although the original Redis graph image would not be available, nor the Graphite graphs in the resolution at which the features profile was created, however the fp_ts is available so the Redis plot could be remade and all the Graphite graphs could be made as best effort with whatever resolution is available for that time period.
This allows the operator to delete/prune feature profile dirs by possibly least matched by age, etc or all and still be able to surface the available features profile page data on-demand.
Note
expire features profiles older than? Ionosphere forget.
tsfresh¶
EXPERIMENTAL
The Ionosphere branch introduced tsfresh to the Skyline stack to enable the creation of feature profiles for timeseries that the user deems to be not anomalous.
https://github.com/blue-yonder/tsfresh/
See Development - Ionosphere for the long trail that lead to tsfresh.
tsfresh and Graphite integration¶
Skyline needs to tie Graphite, Redis and tsfresh together. However these is fairly straight forward really, but to save any others having to reverse engineer the process the skyline/tsfresh_features/scripts are written is a generic type of way that does not require downloading Skyline, they should run standalone so that others can use them if they want some simple Graphite -> tsfresh feature extraction capabilities.
See:
- skyline/tsfresh_features/scripts/tsfresh_graphite_csv.py
- skyline/tsfresh_features/scripts/tsfresh_graphite_csv.requirements.txt
skyline/tsfresh_features/scripts/tsfresh_graphite_csv.py¶
Assign a Graphite single tiemseries metric csv file to tsfresh to process and calculate the features for.
param path_to_your_graphite_csv: | |
---|---|
the full path and filename to your Graphite single metric timeseries file saved from a Graphite request with &format=csv | |
type path_to_your_graphite_csv: | |
str | |
param pytz_tz: | [OPTIONAL] defaults to UTC or pass as your pytz timezone string. For a list of all pytz timezone strings see https://github.com/earthgecko/skyline/blob/ionosphere/docs/development/pytz.rst and find yours. |
type string: | str |
Run the script with, a virtualenv example is shown but you can run just with Python-2.7 from wherever you save the script:
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
bin/python2.7 tsfresh_features/scripts/tsfresh_graphite_csv path_to_your_graphite_csv [pytz_timezone]
deactivate
Where path_to_your_graphite_csv.csv is a single metric timeseries that has been from retrieved from Graphite with the &format=csv request parameter and saved to a file.
The single metric timeseries could be the result of a graphite function on multiple timeseries, as long as it is a single timeseries. This does not handle multiple timeseries data, meaning a Graphite csv with more than one data set will not be suitable for this script.
This will output 2 files:
- path_to_your_graphite_csv.features.csv (default tsfresh column wise format)
- path_to_your_graphite_csv.features.transposed.csv (human friendly row wise format) you look at this csv :)
Your timeseries features.
Warning
Please note that if your timeseries are recorded in a daylight savings timezone, this has not been tested the DST changes.
Redis integration¶
Part of the Horizon service’s job is to input data into Redis.
How are timeseries stored in Redis?¶
Skyline uses MessagePack to store data in Redis. When a data point comes in, a Horizon worker will pack the datapoint with the schema [timestamp, value] into a MessagePack-encoded binary string and make a redis.append() call to append this string to the appropriate metric key. This way, we can make very easily make many updates to Redis at once, and this is how Skyline is able to support a very large firehose of metrics.
One downside to this scheme is that once timeseries in Redis start to get very long, Redis’ performance suffers. Redis was not designed to contain very large strings. We may one day switch to an alternative storage design as proposed by Antirez - see https://github.com/antirez/redis-timeseries - but for now, this is how it is.
In addition to all the metrics, there are two specialized Redis sets: ‘metrics.unique_metrics’ and ‘mini.unique_metrics.’ These contain every key that exists in Redis, and they are used by the Analyzer to make it easier for the Analyzer to know what to mget from Redis (as opposed to doing a very expensive keys * query)
What’s with the ‘mini’ and ‘metrics’ namespaces?¶
The ‘mini’ namespace currently has two uses: it helps performance-wise for
Oculus, if you choose to use it, and it also is used by the webapp to display a
settings.MINI_DURATION
seconds long view.
Skyline and friends¶
Skyline has some close relationships to a number of metric pipelining things.
Graphite - a close relationship¶
Anyone having used Skyline may have wondered in the past why Skyline sent metrics to Graphite. One may have also wondered why there was never a Statsd option, why just Graphite?
It seems natural that Etsy might have had Skyline feed it metrics to Statsd as
an option at least. However, there never was a STATSD_HOST
setting and this
is quite fortunate.
The relationship between Graphite and Skyline is very close, as in they can monitor each other through a direct feedback loop or interaction. If Statsd was ever an option, it would add a degree of separation between the 2 which is not required or desirable, although it would work.
Feedback loop¶
Skyline’s own metrics really are an important aspect of Skyline’s operations over time, in terms of:
- monitoring Skyline
- monitoring performance in terms of:
- Skyline’s own running times, load, algorithm performance, etc
- being able monitoring the overall performance of your “things” over time
- To the new user these things may seem like uninteresting, probably never to be looked much metrics, however over time they will describe your ups and downs, your highs and lows and hopefully add to your understanding of your “things”
BuckyServer¶
Before tcp transport was added to Statsd was BuckyServer for long haul TCP transport of your metrics to local Statsd -> Graphite.
Sensu¶
Sensu can feed Graphite - Sensu on github
Riemann¶
Riemann.io can feed Graphite - Riemann on github
Logstash¶
Logstash can feed Graphite - Logstash on github
Many more¶
There are a great deal of apps that can feed Skyline, this just just to mention a few.
Logging¶
A few considerations to note about logging.
- Logging Python multiprocessing threads is not easy.
- Rotating Python TimedRotatingFileHandler logs in not easy.
- logrotate on Python multiprocessing threads is not easy.
- Logging something that eats through as much data and does as many things as Skyline does is not easy.
Skyline logging can be viewed as having 2 contexts:
- Logging the Skyline application operations - Skyline app log
- Logging the details of anomalous timeseries - syslog
Any long time users of Skyline will have undoubtedly run into a logging pain with Skyline at some point. Whether that be in terms of logs being overwritten or no errors being logged but a Skyline module broken or hung. Although logging is very mature, especially in an ecosystem as mature as Python and one may be led to believe it should be easy, however in reality it is non-trivial.
A modified logging methodology¶
The logging implementation in Skyline is quite complicated due to the abovementioned reasons. It is important to note that some optimisations have had to be added to the logging process which could be described as unintuitive AND simply as un-log like.
Therefore it is important to explain why it is necessary to use a modified logging methodology and so that the user understands conceptually what is being logged and what is not being logged.
In terms of Analyzer if errors are encountered in any algorithms, we sample the errors. This achieves a balance between reporting errors useful and not using lots of I/O and disk space if something goes wrong. Skyline has the potential to do a runaway log, but not reporting errors is not useful when you are trying to pinpoint what is wrong.
However, it is right that algorithms should just True or False and not impact on the performance or operation of Skyline in anyway.
This achieves a balance.
Skyline app logs¶
Each Skyline app has its own log. These logs are important from a Skyline perspective in terms of monitoring the Skyline processes. See Monitoring Skyline
It is recommended to NOT stream the Skyline app logs through any logging pipeline e.g. rsyslog, logstash, elasticsearch, etc. The syslog alert trigger was added for this purpose. If you wish to parse and rate anomalies do so via syslog (see below).
Log rotation is handled by the Python TimedRotatingFileHandler and by default keeps the 5 last logs:
handler = logging.handlers.TimedRotatingFileHandler(
settings.LOG_PATH + '/' + skyline_app + '.log',
when="midnight",
interval=1,
backupCount=5)
The Skyline app logs and the rotation is relatively cheap on disk space even on production machines handling 10s of 1000s of metrics each.
===============================================================
HOST:skyline-prod-4-96g-luk1 EXITCODE:0 STDOUT:
===============================================================
38M /var/log/skyline
===============================================================
===============================================================
HOST:skyline-prod-3-40g-ruk2 EXITCODE:0 STDOUT:
===============================================================
22M /var/log/skyline
===============================================================
===============================================================
HOST:skyline-prod-2-96g-luk1 EXITCODE:0 STDOUT:
===============================================================
52M /var/log/skyline
===============================================================
Skyline app log preservation¶
It should be noted that the bin/ bash scripts are used to ensure logs are
preserved and not overwritten by any Python multiprocessing processes or the
lack of mode='a'
in the Python TimedRotatingFileHandler. It is for this
reason that the Skyline app logs should not be streamed through a logging
pipeline as logstash, et al as this in a logging pipeline with say rsyslog can
result in the log being pushed multiple times due to the following scenario:
- Skyline app bin is called to start
- Skyline app bin makes a last log from the Skyline app log
- Skyline Python app is started, creates log with
mode=w
- Skyline Python app pauses and the app bin script contenates last log and new log file
- Skyline bin script exits and Skyline Python app continues writing to the log
In terms of rsyslog pipelining this would result in the log being fully submitted again on every restart.
syslog¶
With this in mind a syslog alert trigger was added to Skyline apps to handle the logging the details of anomalous timeseries to syslog, groking the syslog for Skyline logs and rates etc is the way to go. Bearing in mind that only anomalies from metrics with set alert tuples are logged.
Tuning tips¶
Okay, so you’ve got everything all set up, data is flowing through, and...what? You can’t consume everything on time? Allow me to help:
- Try increasing
settings.CHUNK_SIZE
- this increases the size of a chunk of metrics that gets added onto the queue. Bigger chunks == smaller network traffic. - Try increasing
settings.WORKER_PROCESSES
- this will add more workers to consume metrics off the queue and insert them into Redis. - Try decreasing
settings.ANALYZER_PROCESSES
- this all runs on one box (for now), so share the resources! - Still can’t fix the performance? Try reducing your
settings.FULL_DURATION
. If this is set to be too long, Redis will buckle under the pressure. - Is your analyzer taking too long? Maybe you need to make your algorithms faster, or use fewer algorithms in your ensemble.
- Reduce your metrics! If you’re using StatsD, it will spit out lots of
variations for each metric (sum, median, lower, upper, etc). These are largely
identical, so it might be worth it to put them in
settings.SKIP_LIST
. - Disable Oculus - if you set settings.OCULUS_HOST to ‘’, Skyline will not
write metrics into the
mini.
namespace - this should result in dramatic speed improvements. At Etsy, they had a flow of about 5k metrics coming in every second on average (with 250k distinct metrics). They used a 32 core Sandy Bridge box, with 64 gb of memory. We experience bursts of up to 70k TPS on Redis. Here are their relevant settings:
CHUNK_SIZE: 7000
WORKER_PROCESSES: 2
ANALYZER_PROCESSES: 25
FULL_DURATION: 86400
Smaller deployments¶
Skyline runs OK on much less. It can handle ~45000 metrics per minute on a 4 vCore, 4GB RAM cloud SSD server where the metric resolution is 1 datapoint per 60 seconds, it will run loaded, but OK.
Do take note of the notes in settings.py related to the settings.ANALYZER_PROCESSES
and settings.ANALYZER_OPTIMUM_RUN_DURATION
if you are only processing a few
1000 metrics with a datapoint every minute then the optimum settings will most
likely be something similar to:
ANALYZER_PROCESSES = 1
ANALYZER_OPTIMUM_RUN_DURATION = 60
Reliability¶
Skyline has been verified running in production against 6.5 million request per minute peak internet advertising infrastructure, since 20131016.
However it should be noted that something should monitor the Skyline processes. Over a long enough timeline Python or Redis or some I/O issue is going to lock up and the Python process is just going to hang. This means that it is not really sufficient to just monitor the process with something like monit.
Skyline has made some progress in monitoring its own process threads and keeping itself running sanely, however not every possible issue can be mitigated against, therefore some another external to Python monitoring can help.
Monitoring Skyline¶
It should be noted that something should monitor the Skyline processes. Over a long enough timeline Python or Redis or some I/O issue is going to lock up and the Python process is just going to hang. This means that it is not really sufficient to just monitor the process with something like monit.
KISS¶
Skyline logs¶
Each Skyline app is expected to log at a certain intervals, by design for this very purpose. So if a Skyline app has not written to its log in 120 seconds, it has hung. So some simple bash scripts can be crond to restart the application if the log has not been written to in 120 seconds.
This very simple mechanism solves a very difficult problem to solve within Python itself. It works.
The Graphite pickle¶
The same can be said for the pickle from Graphite. The same simple methodology
can be employed on Graphite too, a simple script to determine the number of
carbon fullQueueDrops to the destinations metric for your skyline node/s e.g.
carbon.relays.skyline-host-a.destinations.123_234_213_123:2024:None.fullQueueDrops
If the number of drops is greater than x, restart carbon-cache or just the
carbon-relay if you have rcarbon-relay enabled, which you should have :)
A further advantage having carbon-relay enabled independently of carbon-cache
is that implementing a monitor script on your carbon.relays.*.fullQueueDrops
metrics means that if carbon-cache itself has a pickle issue, it should be
resolved by a carbon-relay restart, just like Skyline Horizon is fixed by its
monitor.
Debian and Vagrant Installation Tips¶
Please note that this info is old and has not been updated since 25 Sep 2013
Get a Wheezy box¶
From this useful tutorial
The previous version, Debian Squeeze, struggled with an easy installation of Redis and you can get it working with back-ports.
vagrant box add wheezy64 http://dl.dropbox.com/u/937870/VMs/wheezy64.box
mkdir skyline
cd skyline
vagrant init wheezy64
Extra Vagrant configuration¶
Forward the port required for the web application
config.vm.network :forwarded_port, guest: 1500, host: 1500
Python requirements¶
You’ll need pip to be able to add some of the requirements for python, as well as python-dev without which some of the packages used with pip will not work!
sudo apt-get install python-dev
sudo apt-get install python-pip
After that we can clone the project’s repository
git clone https://github.com/etsy/skyline.git
Once we have the project we can change into the folder and start going through all the python dependencies.
cd skyline
sudo pip install -r requirements.txt
sudo apt-get install python-numpy python-scipy python-scikits.statsmodels
sudo pip install patsy msgpack_python
Skyline dependencies¶
sudo pip install -r requirements.txt
sudo apt-get install python-numpy python-scipy python-scikits.statsmodels
sudo pip install patsy msgpack_python
Redis 2.6 on Debian Wheezy¶
Version 2.6 is available on the Wheezy back-port which means it will be available in the next Debian stable version and was back ported to Wheezy.
sudo vim /etc/apt/sources.list
Add the back-port, replace $YOUR_CONFIGURATION according to your local configuration.
deb http://**$YOUR_CONFIGURATION**.debian.org/debian/ wheezy-backports main
Update and install Redis-Server, don’t forget to kill the process as it will be started with the default config right after the installation.
sudo apt-get update
sudo apt-get -t wheezy-backports install redis-server
sudo pkill redis
Skyline configuration¶
With all dependencies installed you can now copy an example for the settings file. This will be the place to add your Graphite URL and other details but for now we only need to update the IP bound to the web application so the Vagrant host will display it.
cp src/settings.py.example src/settings.py
Edit file for bind.
vim src/settings.py
And change the IP to listen on host’s requests.
WEBAPP_IP = '0.0.0.0'
Release Notes¶
1.1.11 - the ionosphere branch¶
v1.1.11-ionosphere - Mar 29, 2018
Changes¶
- Bumped to v1.1.11
- Adds a helper script to create any missing z_fp_ and z_ts_ tables that are missing if MySQL was updated to 5.7, related to #45 MySQL key_block_size (https://github.com/earthgecko/skyline/issues/45) (Bug #2340) - skyline/tsfresh_features/autobuild_features_profile_tables.py
Script usage¶
Change the Python version and directories as appropriate to your set up.
# BACKUP the skyline MySQL database FIRST
PYTHON_MAJOR_VERSION="2.7"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py2712"
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
cd /opt/skyline/github/skyline
python${PYTHON_MAJOR_VERSION} skyline/tsfresh_features/autobuild_features_profile_tables.py
deactivate
1.1.10 - the ionosphere branch¶
v1.1.10-ionosphere - Mar 24, 2018
Changes¶
- Bumped to v1.1.10
- Release to fixed #45 MySQL key_block_size (Bug #2340) https://github.com/earthgecko/skyline/issues/45
- Minor corrections to a log line in ionosphere and panorama
1.1.9 - the ionosphere branch¶
v1.1.9-ionosphere - Dec 28, 2017
Changes¶
- Bumped to v1.1.9
- Change Boundary to only send to Panorama on alert (Task #2236) - fixed some minor bugs related to the tmp_panaroma_anomaly_file and add missing move_file argument in boundary.py. Added algorithm as it is required if the metric has multiple rules covering a number of algorithms.
1.1.8 - the ionosphere branch¶
v1.1.8-ionosphere - Dec 16, 2017
Changes¶
- Bumped to v1.1.8
- Expiry boundary last_seen keys appropriately (Bug #2232)
- Fix panorama metric_vars value check (Bug #2234)
- Change Boundary to only send to Panorama on alert (Task #2236)
1.1.7 - the ionosphere branch¶
v1.1.7-ionosphere - Nov 8, 2017
Changes¶
- Bumped to v1.1.7
- features profiles by performance (Feature #2184). Added fp efficiency and layer efficiency calculations to the search features results table. This is a simple percentage calculation of matched over number of time checked.
- existing features profiles link (Feature #2210). Add a link to features profiles for a metric so that the operator can open up the features profile search page with all the features profiles for the metric to evaluate efficiency and count of features profiles to determine how effectively Ionosphere is working on a metric. This could be a table insert like Existing layers, but big tables may hinder easy navigation in the training page, so a link and new tab is ok.
- Fixed variable typo which resulted in layer last checked - ionosphere_backend.py
- D1 layer issue (Bug #2208). Corrected copy paste typo in layers
- fix alert Redis derivative graphs. (Bug #2168). Stop nonNegativeDerivative being applied twice
- panorama incorrect mysql_id cache keys (Bug #2166). There are SOME cache key msgpack values that DO == string.printable for example [73] msgpacks to I panorama.mysql_ids will always be msgpack so if ‘panorama.mysql_ids’ in key name set test_string = False
- Ionosphere - matches page (Feature #1996). Included the features_profiles_matches.html Create the ionosphere_summary_memcache_object. Added get_fp_matches and get_matched_id_resources, get_fp_matches, get_matched_id_resources, fp_search_or_matches_req and fp_matches. Handle both fp_search and fp_matches. Added ‘matched_fp_id’, ‘matched_layer_id’ request arguments. Added matched_graph_image_file to show the graph for the matched event in the features_profile MATCHED Evaluation page. Added the Matches Ionosphere page and menu tabs
- panorama incorrect mysql_id cache keys (Bug #2166). Wrap in except around mysql_select
- webapp Redis metric check, existing but sparsely represented metrics. If this is an fp_view=true, it means that either the metric is sparsely represented or no longer exists, but an fp exists so continue and do not 404 (Bug #2158). Note about unused TSFRESH_VERSION in skyline/features_profile.py
- ionosphere - mismatching timestamp metadata (Bug #2162 and Feature #1872) Iterate back a few seconds as the features profile dir and file resources may have a slight offset timestamp from the created_timestamp which is based on MySQL CURRENT_TIMESTAMP
1.1.6 - the ionosphere branch¶
v1.1.6-ionosphere - Sep 13, 2017
Changes¶
- Bumped to v1.1.6
- disabled_features_profiles (Feature #2056)
- Populated the enabled_list
- Added enabled_list to display DISABLED in search_features_profiles page results.
- Added enabled query modifier to display enabled or disabled profiles in the search_features_profiles page results.
- Updated docs with memcached stuff and DISABLED features profiles
- Added the functionality in Webapp to disable features profiles and their progeny features profiles
- Tidy up and use settings memcache variables in ionosphere.py
- Optimise Ionosphere DB usage with memcached - (Task #2132)
- Infrequent missing new_ Redis keys (Bug #2154)
- Update derivative_metrics if the analyzer.derivative_metrics_expiry key is going to expire within 60s
- Increased analyzer.derivative_metrics_expiry ttl from 120 to 300
- Updated deps (Task #2138)
- 7bit SMTP encoding breaking long urls (Bug #2142)
- Broke body into body and more_body to workaround the 990 character limit per line for SMTP, using mixed MIMEMultipart
1.1.5 - the ionosphere branch¶
v1.1.5-ionosphere - Aug 4, 2017
Minor bug fixes
Changes¶
- Update training_data page with some additional info relating to FULL_DURATION and the D1 layer
- Added Order Matters section to Analyzer doc
- pep8 changes in boundary.py
- Added new context parameter to verify_alerts.py
- Make fp_generation_created default to 0 for backwards compatible with older features profiles (IssueID #1854: Ionosphere learn - generations)
- Align webapp backend anomaly_count and limit_count
- Support #2072: Make Boundary hipchat alerts show fixed timeframe
- Bug #2050: analyse_derivatives - change in monotonicity - Removed unused ionosphere_alerts_returned variable - Removed a copy paste last_alert that was not deleted when changed to the last_derivative_metric_key
- Bug fix - if the nonNegativeDerivative has been calculated we need to reset the x and y as nonNegativeDerivative has to discard the first value as it has no delta and resulted a matplotlib/axes/_base.py error in _xy_from_xy raise ValueError(“x and y must have same first dimension”) (Bug #2068: Analyzer smtp alert error on Redis plot with derivative metrics)
- Bug fix to handle MySQL Aborted_clients where duplicate engines were being accquired and engines were not being disposed of (Bug #2136: Analyzer stalling on no metrics)
- Bug fix - All del methods have been wrapped with an except logger.error method to prevent any stalling on the variable/object not existing (Bug #2136: Analyzer stalling on no metrics)
- Show a human date in alerts. Added alerted_at, long overdue.
- Also wrapped the mirage_alerters.alert_smtp body creation in try except
1.1.4 - the ionosphere branch¶
v1.1.4-beta-ionosphere - Jun 19, 2017
Upgrade NOTES¶
Any features profiles that were LEARNT prior to v1.1.4-beta-ionosphere had their generation details added to the DB incorrectly, this needs to be fixed in-situ against your Skyline database. The below pattern can be used replacing the DB user, password, etc to match you environment.
BACKUP_DIR="/tmp" # Where you want to backup the DB to
MYSQL_USER="<YOUR_MYSQL_USER>"
MYSQL_PASS="<YOUR_MYSQL_PASS>"
MYSQL_HOST="127.0.0.1" # Your MySQL IP
MYSQL_DB="skyline" # Your MySQL Skyline DB name
mkdir -p $BACKUP_DIR
mysqldump -u$MYSQL_USER -p$MYSQL_PASS $MYSQL_DB > $BACKUP_DIR/pre.corrected.generations.dump.$MYSQL_DB.sql
mysql -u$MYSQL_USER -p$MYSQL_PASS -h $MYSQL_HOST $MYSQL_DB -sss -e "SELECT id, parent_id, generation FROM ionosphere WHERE generation > 1" | tr -s '\t' ',' > $BACKUP_DIR/skyline.original.fp_ids.generations.txt
for i_line in $(cat $BACKUP_DIR/skyline.original.fp_ids.generations.txt)
do
FP_ID=$(echo "$i_line" | cut -d',' -f1)
PARENT_ID=$(echo "$i_line" | cut -d',' -f2)
GENERATION=$(echo "$i_line" | cut -d',' -f3)
if [ $GENERATION -lt 2 ]; then
continue
fi
PARENT_GENERATION=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -h $MYSQL_HOST $MYSQL_DB -sss -e "SELECT generation FROM ionosphere WHERE id=$PARENT_ID")
GEN_DIFF=$((GENERATION-PARENT_GENERATION))
if [ $GEN_DIFF -ne 1 ]; then
CORRECT_GENERATION=$((PARENT_GENERATION+1))
echo "INCORRECT- fp_id $FP_ID generation is incorrect $GENERATION and parent generation is $PARENT_GENERATION, corrected to $CORRECT_GENERATION"
mysql -u$MYSQL_USER -p$MYSQL_PASS -h $MYSQL_HOST $MYSQL_DB -sss -e "UPDATE ionosphere SET generation=$CORRECT_GENERATION WHERE id=$FP_ID"
fi
done
Changes¶
- Some debug choices for daylight savings countdown issue (Feature 1876: Ionosphere - training_data learn countdown)
- Handle change in monotonicity with a check if a metric has its own Redis z.derivative_metric key that has not expired. (Bug 2050: analyse_derivatives - change in monotonicity)
- Added D1 layer to features_profiles.html
- Added D1 layer is required to ensure that layers do not inadvertently silence true anomalies due the D layer being a singular value, a D1 layer is required to allow the operator to mitigate against this, by setting a min and max threshold to skip the layer. (Feature 2048: D1 ionosphere layer)
- Allow the operator to save a training_data set with save_training_data_dir in ionosphere_backend, etc (Feature 2054: ionosphere.save.training_data)
- Updated docs with layers and misc modifications
- Ionosphere - rate limiting profile learning.Rate limit Ionosphere to only learn one profile per resolution per metric per hour. This is to stop Ionosphere LEARNT from iteratively learning bad if something becomes anomalous and Skyline creates lots of training data sets for a metric. Ionosphere learn could come along and start to incrementally increase the range in LEARNT features profiles, with each iteration increasing a bit and going to the next training data set another profile to match. A spawn.c incident where Ionosphere learnt bad and badly. (Feature 2010: Ionosphere learn - rate limiting profile learning)
- Ionosphere - analyse_derivatives (Feature 2034: analyse_derivatives). Add functionality that enables Skyline apps to convert a timeseries to the nonNegativeDerivative too properly analyse incremented count metrics. This will allow for Ionosphere to learn these types of metrics too, as previously it could only learn step changes at static rates as the features profiles change at every step. NON_DERIVATIVE_MONOTONIC_METRICS setting.
- Added matched_greater_than query to the search features profile page. At the moment this only covers features profiles matches not layers matches. (Feature 1996: Ionosphere - matches page)
- validated features profiles. Added a validated component to the features profiles (Feature 2000: Ionosphere - validated)
1.1.3 - the ionosphere branch¶
v1.1.3-beta-ionosphere - Apr 24, 2017
- Added a validated component to the features profiles
- keying training data. The addition of each training_dir and data set is now Redis keyed to allow for an increase efficiency in terms of disk I/O for ionosphere.py and making keyed data available for each training_dir data set so that transient matched data can be surfaced for the webapp along with directory paths, etc
1.1.2 - the ionosphere branch¶
v1.1.2-beta-ionosphere - Apr 2, 2017
- Added ionosphere_layers table to DB related resources
- Added preliminary Ionosphere layers functionality
- v1.1.2-alpha
- Allow for the declaration of a DO_NOT_SKIP_LIST in the worker context
- Allow Ionosphere to send Panorama checks for ionosphere_metrics, not Mirage
- Display the original anomalous datapoint value in the Redis plot alert image
1.1.1 - the ionosphere branch¶
v1.1.1-beta-ionosphere - Feb 25, 2017
Added learning via features extracted using https://github.com/blue-yonder/tsfresh/releases/tag/v0.4.0 Dedicated to my father, Derek, a man of numbers if there ever was one.
- Corrected sum_common_values column name
- Remove needs_sphinx for RTD
- needs_sphinx = ‘1.3.5’ for RTD
- 0 should be None as this was causing an error if the full_duration is not in the features_profile_details_file, which it was not for some reason.
- Added metric_check_file and ts_full_duration is needed to be determined and added the to features_profile_details_file as it was not added here on the 20170104 when it was added the webapp and ionosphere - so added to features_profile.py
- Added all_fp_ids and lists of ionosphere_smtp_alerter_metrics and ionosphere_non_smtp_alerter_metrics to ionosphere.py so to only process ionosphere_smtp_alerter_metrics
- Added lists of smtp_alerter_metrics and non_smtp_alerter_metrics to analyzer.py so to only process smtp_alerter_metrics
- Restored the previous redis_img_tag method as some smtp alerts were coming without a Redis graph, not all but some and for some reason, I am pretty certain retrospectively that it was done that way from testing I just wanted to try and be cleaner.
- Added all_calc_features_sum, all_calc_features_count, sum_calc_values, common_features_count, tsfresh_version to SQL
- Added details of match anomalies for verification
- Added all_calc_features_sum, all_calc_features_count, sum_calc_values, common_features_count, tsfresh_version to ionosphere.py and SQL
- Added graphite_matched_images|length to features_prfile.html
- More to do in webapp context to pass verfication values
- Added graphite_matched_images context and matched.fp_id to skyline_functions.get_graphite_metric
- Added graphite_matched_images context and db operations to ionosphere_backend.py (geez that is a not of not DRY on the DB stuff)
- Added graphite_matched_images gmimages to webapp and features_profile.html and ported the training_data.html image order method to features_profile.html
- Done to the The White Stripes Live - Under Nova Scotian Lights Full https://www.youtube.com/watch?v=fpG8-P_BpcQ I was tried of Flowjob https://soundcloud.com/search/sounds?q=flowjob&filter.duration=epic which a lot of Ionosphere has been patterned to.
- Fixed typo in database.py
- Added ionosphere_matched update to ionosphere.py
- Un/fortunately there is no simple method by which to annotate these Graphite NOW graphs at the anomaly timestamp, if these were from Grafana, yes but we cannot add Grafana as a dep :) It would be possible to add these using the dygraph js method ala now, then and Panorama, but that is BEYOND the scope of js I want to have to deal with. I think we can leave this up to the operator’s neocortex to do the processing. Which may be a valid point as sticking a single red line vertical line in the graphs ala Etsy deployments https://codeascraft.com/2010/12/08/track-every-release/ or how @andymckay does it https://blog.mozilla.org/webdev/2012/04/05/tracking-deployments-in-graphite/ would arguably introduce a bias in this context. The neocortex should be able to handle this timeshifting fairly simply with a little practice.
- Added human_date to the Graphite graphs NOW block for the above
- Exclude graph resolution if matches TARGET_HOURS - unique only
- Added ionosphere_matched_table_meta
- Standardised all comments in SQL
- Added the ionosphere_matched DB table
- After debugging calls on the readthedocs build, adding to readthedocs.requirements.txt should solve this
- After debugging calls on the readthedocs build, adding to requirements.txt should solve this
- Try use setup.cfg install_requires
- use os.path.isfile although I am not sure on the path
- readthedocs build is failing as they are Running Sphinx v1.3.5 and returns Sphinx version error: This project needs at least Sphinx v1.4.8 and therefore cannot be built with this version.
- Added returns to skyline_functions.get_graphite_metric and specific webapp Ionosphere URL parameters for the Graphite NOW graphs
- Removed unused get_graphite_metric from skyline/webapp/backend.py
- Added get_graphite_metric, the graphite_now png context and retrieving the graphite_now_images at TARGET_HOURS, 24h, 7d and 30d
- Removed the reference to #14 from webapp Ionosphere templates
- Added order and graphite_now_images block to training_data.html
- Added sorted_images for order and graphite_now_images to webapp
- Added Grumpy testing to the roadmap
- bug on graphite in image match - stats.statsd.graphiteStats.flush_time, but graphite could be in the namespace. So match on .redis.plot and removed the unknown image context as it should not really ever happen...?
- Enabled utilities as the files were added
- Added the full_duration parameter so that the appropriate graphs can be embedded for the user in the training data page in the webapp context
- Added utilities TODO
- Removed the buggy skyline_functions.load_metric_vars and replaced with the new_load_metric_vars(self, metric_vars_file) function
- Fixes #14
- Changed ionosphere_backend.py to the new function
- Removed the buggy skyline_functions.load_metric_vars and replaced with the new_load_metric_vars(self, metric_vars_file) function
- Fixes #14
- Clarified Mirage alert matching in docs
- Removed unused things
- Added ‘app’ and ‘source’ to string_keys
- Added anomalous_value to correct calc_value overwriting value
- Added an app alert key ionosphere.%s.alert app alert key
- Remove check file is an alert key exists
- Added TODO: ionosphere.analyzer.unique_metrics (at FULL_DURATION)
- ionosphere.mirage.unique_metrics (NOT at FULL_DURATION)
- Get ionosphere.features_calculation_time Graphite metric working
- Correct auth for requests should webapp be called
- Cache fp ids for 300 seconds?
- Send Ionosphere metrics to Graphite and Reset lists
- Minor log bug fix in features_profile.py
- Correct rate limitining Ionosphere on last_alert cache key
- Added SQLAlchemy engine dispose
- Differentiate between Mirage and Ionosphere context in logging not alerting
- Do not alert on Mirage metrics, surfaced again as Ionosphere was introduced
- Added new new_load_metric_vars to ionosphere to get rid of the skyline_functions load_metric_vars as imp is deprecated in py3 anyway and this should fix #24 as well.
- Added THE FIRST to docs
- Clarifed log message
- Handle refreshing mirage.unique_metrics and ionosphere.unique_metrics
- Refactoring some things in ionosphere.py
- THE FIRST prod match is at this commit
- Added last_checked and checked_count to features profile details to Ionosphere features profile page, feature profile details page.
- Added ionosphere last_checked checked_count columns to record the number of times a features profile is checked and when last checked
- Ionosphere update checked count and timestamp
- int timestamp in the alerters
- Set default as 0 on ionosphere matched_count and last_matched
- Set default as NULL on ionosphere matched_count and last_matched
- Added context in Analyzer
- Added in Analyzer added a self.all_anomalous_metrics to join any Ionosphere Redis alert keys with self.anomalous_metrics
- Refresh mirage.unique_metrics Redis set
- Refresh ionosphere.unique_metrics Redis set
- Pushing alerts back to the apps from Ionosphere
- Added update SQL
- PoCing pushing alerts back to the apps from Ionosphere
- Only match features profiles that have the same full_duration
- Added the full_duration context to the send_anomalous_metric to Ionosphere in Analyzer, Mirage, skyline_functions and the database which needs to be recorded to allow Mirage metrics to be profiled on Redis timeseries data at FULL_DURATION
- Added IONOSPHERE_FEATURES_PERCENT_SIMILAR to validate_settings
- Bringing Ionosphere ionosphere_enabled and ionosphere.unique_metrics online
- Some flake8 linting
- Enable ionosphere metrics in DB if features profile is created
- Added more ionosphere functionality, now checking DB features profiles
- Determine relevant common features
- Calculate percent difference in sums
- Added IONOSPHERE_FEATURES_PERCENT_SIMILAR to settings
- Use str in alerters Graphite URL for full_duration_in_hours
- Added the baselines from tsfresh, they diff the same as 0.3.1
- Update to tsfresh-0.4.0 to make use of the new ReasonableFeatureExtractionSettings that was introduced to exclude the computationally high cost of extracting features from very static timeseries that has little to no variation is the values, which results in features taking up to almost 600 seconds to calculate on a timeseries of length 10075 exclude high_comp_cost features.
- Added the baselines from tsfresh, diff the same
- Use str not int for full_duration_in_hours context in alerts, etc
- Updated tsfresh to 0.3.1 not to 0.4.0 as tqdm may not be so great in the mix for now.
- Added new v0.3.1 baselines from tsfresh 0.4.0 that were committed at https://github.com/blue-yonder/tsfresh/commit/46eb72c60f35de225a962a4149e4a4e25dd02aa0 They test fine.
- Update deps
- Send Graphite metrics from mirage and ionosphere
- Added stop process after terminate, if any still exist as any terminate called on extract_features calls terminate, but the pid remains, debugging
- Added __future__ division to all scopes using division (pep-0238)
- Changed to InnoDB from MyISAM as no files open issues and MyISAM clean up, there can be LOTS of file_per_table z_fp_ tables/files without the MyISAM issues. z_fp_ and z_ts_ tables are mostly read and will be shuffled in the table cache as required.
- Added some additional logging in ionosphere on the slow hunt to determine why tsfresh extract_features is sometimes just hanging and needs to be killed
- Removed p.join from all p.terminate blocks as it hangs there
- Added missing app context to features_profile.py
- mirage send anomaly timestamp not last timestamp from SECOND_ORDER_RESOLUTION timeseries
- Always pass the f_calc time to webapp
- Aligned the alerters HTML format
- Added the ionosphere context to features_profile.py
- Changed to InnoDB from MyISAM as no files open issues and MyISAM clean up, there can be LOTS of file_per_table z_fp_ tables/files without the MyISAM issues. z_fp_ tables are mostly read and will be shuffled in the table cache as required.
- Enabled feature calculation on all anomaly checks in ionosphere
- Moved features_profile.py to the skyline dir some can be used by the webapp and ionosphere
- Decouple ionosphere from the webapp
- In ionosphere calculate the features for every anomaly check file, remove to extract features calcuate and wait time from the user.
- Corrected assert in test.
- Added some notes
- Use ast.literal_eval instead of eval
- Added tsfresh feature names with ids in list as of tsfresh-0.3.0
- Deleted skyline/tsfresh/scripts as is now moved to skyline/tsfresh_features/scripts to stop tsfresh namespace pollution
- Moved the tsfresh test resources into tests/baseline to match tsfresh tests methodology.
- Added note to ionosphere.py about numpy.testing.assert_almost_equal
- Reordered TSFRESH_FEATURES based on the new method.
- Adding the meat to Ionosphere bones on the service side, incomplete.
- Updated the upgading doc
- Added development tsfresh docs on howto upgrade, etc
- Added the creation of tsfresh resources to building-documentation.rst
- Updated development/ionosphere.rst to reflect tsfresh is not slow
- Draft text for ionosphere.rst
- Added Ionosphere reference to panorama.rst
- Updated path change in tsfresh.rst
- Modifications for testing
- Added Ionosphere requirements for tsfresh and SQLAlchemy
- Added a tsfresh tests
- Added a tsfresh-0.3.0 baseline features upon which tsfresh is tested
- Added a tsfresh-0.1.2 baseline features upon which tsfresh is tested
- Added a baseline timeseries upon which tsfresh is tested
- Added IONOSPHERE_PROFILES_FOLDER test
- Added tsfresh feature names with ids in list as of tsfresh-0.3.0
- Some minor refactors in skyline_functions.py
- Added RepresentsInt
- Added the ionosphere tables
- Moved the Ionosphere settings all to the end in settings.py to make it easier to upgrade
- Added some additional Ionosphere settings
- Updated to tfresh-0.3.0
- Added the SQLAlchemy definitions
- Added app.errorhandler(500) to webapp.py traceback rendered nicely by Jinja2 as per https://gist.github.com/probonopd/8616a8ff05c8a75e4601
- Added the ionosphere app.route to webapp.py
- Minor addition to backend.py log and panorama_anomaly_id request arg
- Added Ionosphere to the html templates
- Added webapp training_data.html
- Added webapp traceback.html which provides the user with real traceback
- Added webapp features_profiles.html
- Added webapp ionospere_backend.py which works with training data and features profiles data
- Added webapp functions to create features profiles
- Moved skyline/tsfresh to skyline/tsfresh_features so that it does not pollute the tsfresh namespace
- Added generate_tsfresh_features script
- Added missing string in panorama.py in the logger.error context
- mirage.py refactored quite a bit of the conditional blocks workflow to cater for Ionosphere
- Added metric_timestamp to the trigger_alert metric alert tuple and added sent_to_ metrics (analyzer.py and mirage.py) - metric[2]: anomaly timestamp
- Added Ionosphere training data timeseries json, redis plot png and Ionosphere training data link to the Analyzer and Mirage alerters.
- Mirage alerter creates a Redis timeseries json too - tbd allow user to build features profile on either full Mirage timeseries or on the Redis FULL_DURATION timeseries.
- analyzer.py use skyline_functions.send_anomalous_metric_to (self function removed) and some new Ionosphere refactoring
- Modified bin scripts to not match the app name in the virtualenv path should the path happen to have the app name string in the path
- Corrected webapp bin service string match
- Bifurcate os.chmod mode for Python 2 and 3
- Fixes https://github.com/earthgecko/skyline/issues/27 - return False if stdDev is 0
- Also readded IONOSPHERE_CHECK_MAX_AGE from settings.py as it will be required
- Mirage changes include a changed to panorama style skyline_functions load_metric_vars and fail_check
- Handle Panorama stampede on restart after not running #26 Added to settings and Panorama to allow to discard any checks older than PANORAMA_CHECK_MAX_AGE to prevent a stampede if desired, not ideal but solves the stampede problem for now - https://github.com/earthgecko/skyline/issues/26
- Added the original Skyline UI back as a then tab, for nostalgic and historical reasons.
- Bumped to version 1.0.8
1.1.0 - the ionosphere branch¶
v1.1.0-ionosphere-alpha
- Added alpha Ionosphere functionality using @blue-yonder / tsfresh feature extraction see docs/development/ionosphere.rst
- This also added a fairly generic script to extract_features from a &format=csv Graphite @graphite-project / graphite-web rendered csv for a single timeseries skyline/tsfresh/scripts/tsfresh_graphite_csv.py
1.0.8 - the crucible branch¶
- Mirage changes include a changed to panorama style skyline_functions load_metric_vars and fail_check
- Handle Panorama stampede on restart after not running #26 Added to settings and Panorama to allow to discard any checks older than PANORAMA_CHECK_MAX_AGE to prevent a stampede if desired, not ideal but solves the stampede problem for now - https://github.com/earthgecko/skyline/issues/26
- Added the original Skyline UI back as a then tab, for nostalgic and historical reasons.
- Bumped to version 1.0.8
Added: skyline/webapp/static/css/skyline.css skyline/webapp/templates/then.html
Modified: docs/panorama.rst skyline/mirage/mirage.py skyline/panorama/panorama.py skyline/settings.py skyline/webapp/webapp.py skyline/webapp/templates/layout.html skyline/tests/test_imports.py skyline/skyline_version.py
1.0.7 - the crucible branch¶
Issue #23 Test dependency updates - https://github.com/earthgecko/skyline/issues/23
An update to dependencies and a minor addition to matplotlib implementation in alerts.
All current requirements.txt versions verified as of 20160820
Changes include:
- Updated the following dependecies: scipy==0.17.1 to scipy==0.18.0 pytz==2016.4 to pytz==2016.6.1 pyparsing==2.1.5 to pyparsing==2.1.8 matplotlib==1.5.1 to matplotlib==1.5.2 msgpack-python==0.4.7 to msgpack-python==0.4.8 requests==2.10.0 to requests==2.11.1
- In alerters matplotlib==1.5.2 requries an additional configuration defined of matplotlib.use(‘Agg’) for the the generation of a plot in a non windowed environment
- Bumped to 1.0.7
Modified: requirements.txt skyline/analyzer/alerters.py skyline/mirage/mirage_alerters.py
1.0.6 - the crucible branch¶
A very quick release to fix the immediate last release which was missing a settings.py variable related to ENABLE_WEBAPP_DEBUG in v1.0.5-crucible-beta The variable had been included in config management but not the codebase hence it tested fine.
Changes include:
- The addition of a missing settings.py variable in v1.0.5-crucible-beta ENABLE_WEBAPP_DEBUG
- Added a check to validate_settings.py for ENABLE_WEBAPP_DEBUG
- Minor modification to webapp.py and backend.py to handle it not being defined gracefully
- Fixed a minor bug in the webapp/backend.py to not 500 if no metric_like row ids are found in the MySQL result
- Bumped version to 1.0.6
1.0.5 - the crucible branch¶
Fixes two Analyzer memory leaks and improves overview memory footprint a bit.
Changes include:
- Issue #21 Memory leak in Analyzer - self.mirage_metrics.append(metric) and plt.savefig memory leaks were fixed. Removed unused but appended mirage list. Moved trigger_alert to a multiprocessing process when the alert is smtp. This is used by smtp alerters so that matplotlib objects are cleared down and the alerter cannot create a memory leak in this manner as plt.savefig keeps the object in memory until the process terminates. Seeing as data is being surfaced and processed in the alert_smtp context, multiprocessing the alert creation and handling prevents any memory leaks in the parent. This fixes Issue #21 Memory leak in Analyzer - https://github.com/earthgecko/skyline/issues/21 internal ref #1558: Memory leak in Analyzer
- In agent del the algorithm test objects to free the memory
- Applied in both Analyzer and Mirage
- Some pyflakes linting done
- Reintroduced the original alert substring matching AFTER wildcard matching, to allow more flexibility
- In Analyzer streamlined the Mirage metrics a bit
- Added some settings vaidation in the agents
- Issue #22 - Analyzer also alerting on Mirage metrics now This fixes #22 by introducing a mirage.metrics Redis key namespace for any Mirage metric. This acts as a dynamic SKIP_LIST for Analyzer so to speak and allows for wildcard and substring matching and prevents Analyzer alerting on Mirage metrics.
- Added determine_array_median - readded the method to determine_median of an array which was changed to a timeseries from an array in https://github.com/earthgecko/skyline/commit/9dcf8ffbf6da0820ec5d2f93d3d7079abed3f5a7 errorneously as it was assumed to not be being used however it was being used by algorithm timings
- Some pyflakes linting done with unused imports and unused variables
- Added validation of settings with validate_settings.py, although this does not cover all settings, it tests the critical ones per app that have no except handling - in the agent.py
- Some pyflakes linting done with unused imports and unused variables
- Added del of test objects to reduced memory to all agents that test algorithms
- Added validate_settings to webapp
- Added default recipient which acts as a catchall for alert tuples that do not have a matching namespace defined in recipients in SMTP_OPTS and BOUNDARY_SMTP_OPTS
- Added analyzer_debug to crucible branch, the analyzer_debug app which has a number of memory debugging tools and options that can be turned on and off, etc
- Bumped version to 1.0.5
1.0.4 - the crucible branch¶
Some documentation updates and setup.py things
Changes include:
- Corrected boundary anomaly_seen log info context
- In the skyline_functions get_graphite_metric slip the fetching Graphite json into 2 try: blocks the actual Graphite request and the reading datapoints from json and added the graphite_json_fetched variable to test the condition
- Padded out skyline_functions docstrings with type definitions for each param
- Escape : ( and ) in metric name to Graphite URI for Unescaped Graphite target https://github.com/earthgecko/skyline/issues/20 in mirage and skyline_functions
- Rmoved old sys.path requirements for the old import settings method.
- Added some notes to the development doc regarding ongoing refactoring work
- Added validation on all Panorama GET parameters to mitigate as much XSS and SQL injection as I can at the moment, arachni is happier now.
- Sanitize request.args
- Added missing settings. to CRUCIBLE_PROCESSES thanks @blake3r2, this stops it doing nothing. This branch reallt should have been called panorama, but it started last year as crucible, so crucible was not fully tested in the new structure, apologies.
- Misc docs changes
- Adding additional exception handling to Analyzer - Issue #19 - task1544 https://github.com/earthgecko/skyline/issues/19
- This is a start but not complete, other issues took precedence and these are the changes to date.
- Bumped version to 1.0.4
1.0.3 - the crucible branch¶
Some documentation updates and setup.py things
Changes include:
- Added epel-release to the RedHat family install as python-pip is not in the main repos and gcc
- Added wget and tar in case someone is using a minimal OS install
- Added build-essentials as Ubuntu-14.04.4 has to have build-essentials first or zlib, et al are not available even if zlib1g-dev et al are installed
- Added libreadline6-dev and readline-devel and ncurses
- Added basic overview of the Webapp uml and png to docs as per suggested by @blak3r2
- Wrapped MySQL debug logging in the proper if ENABLE_DEBUG conditionals as these were being generally logged for Panorama patterning
- Added setup.py and setup.cfg for pypi
- Minor docs changes and additions
- Added Known bugs divs to now and Panorama views in the Webapp
- Bumped up to version 1.0.3
1.0.2 - the crucible branch¶
This update fixes a few bugs in the Webapp Panorama and dygraphs implementation related to rendering the graphs in the applicable time zone, whether that be rendering using the browser’s time zone or fixing graphs to render using a fixed and specified time zone.
d3js and/or metricsgraphics.js was considered to fix this maybe, however the moment.js elegant hack is suffice for now.
Changes include:
- Using Eddified’s elegent hack to overwrite Dygraph’s DateAccessorsUTC to return values according to the currently selected timezone as per: http://stackoverflow.com/a/24196184
- Add the WEBAPP timezone related variables to settings.py
- Added moment.js/2.14.1 and moment-timezone/0.5.5 to be pulled from cdnjs.cloudflare.com
- Fixed type error in a skyline_functions.py except
- Bumped up to version 1.0.2
1.0.1 - the crucible branch¶
This is a minor update to the the crucible branch which fixes a few bugs and some documentation corrections and additions. It also adds a Redis data graph to the Analyzer alerter see Analyzer SMTP alert graphs
Changes include:
- blak3r2 patch-5 https://github.com/earthgecko/skyline/pull/12 related to
use skyline;
- Fixed a determine_median error, related to an error that crept in testing although it is not currently used, it was included incorrectly as in patterning the value range of the timeseries had has already been panda.Series into an array.
- Corrected path as per blak3r2 blak3r2:patch-3 as per https://github.com/earthgecko/skyline/pull/11/files
- Added matplotlib Redis data graph to alerters - this adds a matplotlib plot image to the smtp alert with additional details showing the 3sigma upper (and if applicable lower) bounds and the mean.
- Added
settings.PLOT_REDIS_DATA
to the settings.py - Added a note about expected installation pip times
- Updated Python version references in docs from 2.7.11 to 2.7.12
- Added docs/skyline.simplified.workflow.uml and docs/skyline.uml for a workflow diagram with some annotations - things clustered by role
- Bumped up to version 1.0.1
1.0.0 - the crucible branch¶
Including sphinx, setuptools and panorama branches
This release contains a fair amount of changes and additions:
- The addition of the crucible module into Skyline
- The addition of sphinx-apidoc documentation
- A new package layout to accommodate python-setuptools and sphinx-apidoc
- A docs, Panorama and rebrow added to Webapp
These changes are backwards compatible in so much as an existing Skyline directory and file structure could be replaced by this release and a valid settings.py being deployed, whether the settings.py was the old format or the new format, the services for which the settings.py is configured will start. In theory. However there is so much change in this branch it probably cannot be stated with 100% certainty that something may not work as expected. Therefore, to be safe and transparent, lets say it is backwards incompatible. See Whats new for a full rundown since the last Etsy commit.
Crucible¶
The release adds the integration of crucible to allow for the storing of anomalous timeseries and related metadata. With the goal being able to produce anomaly analysis resources on the fly and add ad-hoc timeseries for crucible analysis.
See Crucible
sphinx documentation¶
The code has been documented with sphinx-apidoc, which was dependent on changes to the package structure, see below.
The docs are served via the webapp at http://:/static/docs or from your local repo in your browser at file:///docs/_build/html/index.html
setuptools¶
A package restructure was undertaken to make skyline a more traditional python package. Much of these changes were based on merging some of @languitar changes for setuptools and more pythonic structure that was submitted as:
Provide a setuptools-based build infrastructure #93 - etsy/#91
Webapp¶
The docs are now served by the Webapp and a menu referencing the different available endpoints in Webapp was added.
- Basic security restrictions added.
- gunicorn
- New Panorama endpoint
- New rebrow enadpoint
Performance tuning¶
Analyzer optimizations¶
algorithm_breakdown metrics¶
Known issues in this release¶
Currently sphinx-apidoc is being used to documentation the code, however to get working docs, a small modification is required to the files outputted by sphinx-apidoc.
This is related to the package restructure undertaken to make the Skyline code and layout more in line with a normal python package. Although this seems to have been achieved, the small hack to the sphinx-apidoc output suggests that this is not 100% correct. Further evidence of this is in terms of importing from settings.py, the path needs to be appended in the code, which really should not be required. However, it is working and in the future this should be figured and fixed.
What’s new¶
See Release Notes¶
For v.1.1.1 and later see Release Notes
Skyline v1.1.0-beta the ionosphere branch¶
See Ionosphere <ionosphere.html>
__ and Development - Ionosphere <development/ionosphere.html>
__
Skyline v1.0.4-beta - v1.0.8-beta the crucible branch¶
Mics changes and bug fixes.
Skyline v1.0.3-beta - the crucible branch¶
Some documentation updates and setup.py things
Skyline v1.0.2-beta - the crucible branch¶
Custom time zone settings for the rendering of Webapp Panorama dygraph graphs see Webapp - Time zones
Skyline v1.0.1-beta - the crucible branch¶
Analyzer alerts with a graph plotted from Redis data, not just the Graphite graph see Analyzer SMTP alert graphs
Skyline v1.0.0-beta - the crucible branch¶
The crucible branch had an issue open called Bug #982: Too much in crucible branch
Too much in crucible branch
I have added some pep20, sphinx docs and python package restructuring from @languitar etsy/skyline #93 - https://github.com/languitar/skyline/tree/setuptools - which turns skyline into a python package
The reality was that it was too difficult to reverse engineer all the changes into separate branches, so it continued unabated...
This version of Skyline sees enough changes to worthy of a major version change. That said the changes are/should be backwards compatible with older versions, mostly (best efforts applied) with a caveat on the skyline graphite metrics namespace and running Skyline under python virtualenv.
Conceptual changes¶
The FULL_DURATION
concept is a variable and it is a variable in more ways than
the settings.py context. Conceptually now FULL_DURATION
, full_duration
,
full_duration_seconds
and full_duration_in_hours
are variables in different
contexts. The FULL_DURATION
concept was important in the Analyzer context,
but the concept of the full duration of a timeseries has become somewhat more
variably and is different within different scopes or apps within Skyline. It is
no longer really a single static variable, it is handled quite dynamically in a
number of contexts now.
Etsy to time()
¶
This whats new will cover all the new things that the crucible branch introduces since the last Etsy commit on master of etsy/skyline, although not strictly accrurate, for the purposes of generality it shall be assumed that no one is running the new Skyline features that have not been merged to the Etsy Skyline.
anomaly_breakdown metrics¶
sphinx documentation¶
Performance tuning¶
Process management¶
A number of the apps have had better process management handling added and the
parent process now spawns processes and terminates then if they have not
completed in MAX_ANALYZER_PROCESS_RUNTIME
, ROOMBA_TIMEOUT
and other apps
have this specified too, either using the MAX_ANALYZER_PROCESS_RUNTIME
as a
hardcoded one where appropriate. This handles a very limited number of edge
cases where something that is host machine related causes the Python process to
hang.
Analyzer optimisations¶
algorithm_breakdown metrics¶
Webapp¶
Some simple and basic security was added to the Webapp now it can be enabled to access a MySQL database in the Panorama context.
- Only allow IP addresses in
WEBAPP_ALLOWED_IPS
- There is now a single HTTP auth user
WEBAPP_AUTH_USER
andWEBAPP_AUTH_USER_PASSWORD
- The Webapp can now be served via gunicorn and Apache (or any other HTTP reverse proxy).
See Webapp
Development¶
Building documentation¶
Currently Sphinx and sphinx-apidoc is being used to documented the code and project, however to get working docs, a small modification is required to the files outputted by sphinx-apidoc.
This is related to the package restructure undertaken to make the Skyline code and layout more in line with a normal python package. Although this seems to have been achieved, the small hack to the sphinx-apidoc output suggests that this is not 100% correct. Further evidence of this is in terms of importing from settings.py, the path needs to be appended in the code, which really should not be required. However, it is working and in the future this should be figured and fixed. Perhaps the below edit of the auto generated .rst files could be achieved with a sphinx conf.py setting, if anyone knows please do let us know :)
Part of the build¶
The below documentation script not only builds the documentation, it also auto generates some documentation and it also auto generates some Python code when updates are required, such as the compilation of the tsfresh_features.py
For now...
Install docs-requirements.txt¶
In your Python virtualenv first pip install the required modules in docs-requirements.txt (as per documented Running in Python virtualenv)
PYTHON_MAJOR_VERSION="2.7"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py2712"
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
bin/"pip${PYTHON_MAJOR_VERSION}" install -r /opt/skyline/github/skyline/docs-requirements.txt
deactivate
Your Python interpretor¶
The docs/conf.py self interpolates the Python path if you are running in a virtualenv in the documented manner. If you are not you may need to change the following to your python interpretor site-packages path by setting python_site_packages_path in docs/conf.py, e.g.
# sys.path.insert(0, os.path.abspath('/opt/python_virtualenv/projects/skyline-py2712/lib/python2.7/site-packages'))
sys.path.insert(0, os.path.abspath(python_site_packages_path))
.rst and .md wtf?¶
The documentation is written in both .md and .rst format, because it can be thanks to the awesome of Sphinx. The original Skyline documentation was written in .md for github. The documentation is being ported over to .rst to allow for the full functionality of Sphinx documentation.
Build¶
PYTHON_MAJOR_VERSION="2.7"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py2712"
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
function build_docs() {
# Arguments:
# APP_DIR - path to your Skyline dir, e.g.
# build_docs # e.g. ~/github/earthgecko/skyline/develop/skyline
# pyflakes # run pyflakes if passed
if [ -n "$1" ]; then
APPDIR=$1
fi
if [ -z "$APPDIR" ]; then
echo "error: could not determine APPDIR - $APPDIR"
return 1
fi
if [ ! -d "$APPDIR/docs" ]; then
echo "error: directory not found - $APPDIR/docs"
return 1
fi
# @added 20161119 - Branch #922: ionosphere
# Task #1718: review.tsfresh
# Build the pytz.rst page to generate the pytz timezone list for Skyline
# Ionosphere and tsfresh, creates "$APPDIR/docs/development/pytz.rst"
python${PYTHON_MAJOR_VERSION} "$APPDIR/skyline/tsfresh_features/scripts/make-pytz.all_timezones-rst.py"
# Run tests
ORIGINAL_DIR=$(pwd)
cd "$APPDIR"
python${PYTHON_MAJOR_VERSION} -m pytest tests/
if [ $? -ne 0 ]; then
echo "Tests failed not building documentation"
return 1
fi
# @added 20170308 - Task #1966: Add pyflakes tests to build_docs
# Feature #1960: ionosphere_layers
if [ -n "$2" ]; then
find "$APPDIR" -type f -name "*.py" | while read i_file
do
pyflakes "$i_file"
done
fi
# @added 20170913 - Task #2160: Test skyline with bandit
# For static analysis - https://github.com/openstack/bandit
bandit -r "$APPDIR" -x "${APPDIR}/skyline/settings.py"
cd "$APPDIR/docs"
echo "Building Skyline documentation - in $APPDIR/docs"
sphinx-apidoc --force -o "${APPDIR}/docs" "${APPDIR}/skyline" skyline
# Inline edit all apidoc generated .rst files in docs/skyline.*rst
for i in $(find "${APPDIR}/docs" -type f -name "skyline.*rst")
do
cat "$i" > "${i}.org"
cat "${i}.org" | sed -e '/package/!s/automodule:: skyline\./automodule:: /g' > "$i"
rm -f "${i}.org"
done
cd "$APPDIR/docs"
make clean
rm -rf _build/*
make html
for i in $(find "$APPDIR" -type f -name "*.pyc")
do
rm -f "$i"
done
for i in $(find "$APPDIR" -type d -name "__pycache__")
do
rm -rf "$i"
done
cd "$ORIGINAL_DIR"
}
# Usage: build_docs <app_dir>
# e.g.
# cd /opt/python_virtualenv/projects/skyline-ionosphere-py2712/
# build_docs /home/gary/sandbox/of/github/earthgecko/skyline/ionosphere/skyline
Auto generating .rst files¶
This may be a little unconventional but it probably beats trying to do it via Sphinx support custom extensions, without using generates or includes or Jinga templating, which may or may not work with readthedocs.
The script skyline/tsfresh_features/scripts/make-pytz.all_timezones-rst.py introduces a novel way to automatically generate the docs/development/pytz.rst during the local build process to provide a list of all pytz timezones at the current version.
This pattern could be reused fairly easier.
Building workflow diagrams with UML¶
This can be quite handy to make simple diagrams, if not finicky. A good resource is the PlantUML.com server is handy for making workflow diagrams, without having to create and edit SVGs.
The docs/skyline.simplified.workflow.uml rendered by the PlantUML server: Simplified Skyline workflow with PlantUML server
The UML source to the above image is:
@startuml
title <font color=#6698FF>Sky</font><font color=#dd3023>line</font> Webapp - basic overview
actor You << Human >>
node "Graphite"
node "Redis"
node "MySQL"
node "webapp" {
package "now" as now
package "Panorama" as Panorama
package "rebrow" as rebrow
now <.. Redis : timeseries data
Panorama <.. MySQL : anomaly details
Panorama <.. Graphite : timeseries data for anomaly
rebrow <.. Redis : keys
}
You <.. webapp : View UI via browser
right footer \nSource https://github.com/earthgecko/skyline/tree/v1.1.0-beta-ionosphere/docs/building-documentation.html\nGenerated by http://plantuml.com/plantuml
@enduml
Development - Webapp¶
Flask¶
The Skyline Webapp has arguably grown to the point were Flask may no longer necessarily by the best choice for the frontend UI any more. For a number of reasons, such as:
- The frontend UI functionality is going to grow, with the addition of other things requiring more visualizations.
- A high-level Python Web framework like Django may be more appropriate in the long run.
The reasons for sticking with Flask at this point are:
- Because Flask is pretty cool.
- It is a microframework not a full blown web framework.
- It is probably simpler.
- Therefore, it keeps more focus at the “doing stuff” with Skyline, other and Python side of the equation for now, rather than at writing a new web UI with Django and porting the current stuff to Django.
- A fair bit of time has been spent adding new things with Flask.
- With gunicorn, Flask can.
- For now it is more than good enough.
- web development, one drop at a time
Development - Debugging¶
A large number of tools and techniques have been used to get to grips with the Skyline code base, it’s debugging, profiling, performance tuning, memory leaking and so forth. These tools and techniques give “some” insight into the number of objects, calls, reference counts, etc, etc. A myriad maze of Python’s intestinal details, where it is possible to get lost and tangled up fairly quickly. Python being such a vast and mature ecosystem, the choices are vast. The following is for future reference on what works, where and how and how well. Due to the number of objects and calls that Skyline makes in a run some of these tools perform better than others, however it must be noted that there may be some tools that are very good, but do not necessarily give output that is useful in terms of making sense of some of it or knowing where to find the needle in the haystack, that is if you know it is a needle you are looking for and not a nail.
Code profiling¶
The following is a list and notes on some of the tools used, with examples if useful for future reference.
TODO - doc using vizsnake, cProfile, et al
Memory debugging¶
As per http://www.bdnyc.org/2013/03/memory-leaks-in-python/ and http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks
There are only a few reasons for memory leaks in Python.
- There’s some object you are opening (repeatedly) but not closing -
plt.savefig
- You are repeatedly adding data to a list/dict and not removing the old data.
(for instance, if you’re appending data to an array instead of replacing the
array) -
self.mirage_metrics.append(metric)
- There’s a memory leak in one of the libraries you’re calling. (unlikely)
This is True
, the hard part is finding 1 and 2 if you do not know what you
are looking for or how to find it.
analyzer_debug¶
See examples of the implementation of the below in analyzer_debug/analyzer.py at commit https://github.com/earthgecko/skyline/commit/c637ae1bf43126459fe82e21c280d457cafb88aa
multiprocessing¶
A note on multiprocessing. multiprocessing has the advantage of a parent process not inheriting any memory leaks from a spawned process. When the process ends or is terminated by the parent, all memory is returned and there is no opportunity for the parent to inherit any objects from the spawned process other than what is returned by the process itself. This should probably be caveated with there may be a possibility that traceback objects may leak under certain circumstances.
The overall cost in terms of the memory consumed for the spawned process may be fairly high, currently in Analyzer around 80 MB, but it is known, whereas memory leaks are very undesirable.
A move has been made always use multiprocessing spawned processes to deal with any function or operation that involves surfacing and analyzing or processing any timeseries data, so that when done, there are no possibilities of incurring any memory leaks in the parent, e.g. triggering an alert_smtp is now a multiprocessing process, yes 80 MB of memory to send an email :)
resource¶
To be honest in terms of identifying at which point memory leaks were occurring after trying all of the below listed tools, the one which end up pin pointing them was literally stepping through the code operation by operation and wrapping each operation in analyzer.py and alerters.py in:
import resource
logger.info('debug :: Memory usage before blah: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
# do blah
logger.info('debug :: Memory usage after blah: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
This was done iteratively at selected points where memory leaking was suspected to be occurring. This enabled identifying exactly what the before and after memory usage where in terms of when functions where executed and after the process the memory usage incremented up and the increment remained.
Days were spent trying to glean this exact information with all the below mentioned tools and techniques, but in terms of debugging and understanding Analyzer’s memory usage and allocation none provided exact enough actionable information, in all likelihood this was probably a result of a PEBKAB type of problem and possibly not wielding the tools to their full potential. That said they definitely do provide some insight in other ways.
pmap¶
Some simple use of pmap can highlight a memory leak if there are a large number of anon allocations and also gives a good overview of what the process is using in terms of memory and by what.
From the command line query the app parent pid to determine all the allocations using more than 1000 kb of memory e.g.
pmap 4306 | sed 's/\]//' | awk '{print $2 " " $NF}' | sed 's/K / /' | sort -n | awk '$1 > 1000'
The output is similar to
1012 /opt/python_virtualenv/projects/skyline2711/lib/python2.7/site-packages/pandas/lib.so
1392 /opt/python_virtualenv/projects/skyline2711/lib/python2.7/site-packages/scipy/special/_ufuncs.so
1504 /opt/python_virtualenv/projects/skyline2711/lib/python2.7/site-packages/pandas/tslib.so
1516 /opt/python_virtualenv/projects/skyline2711/bin/python2.7
...
...
35920 /opt/python_virtualenv/projects/skyline2711/lib/python2.7/site-packages/numpy/.libs/libopenblasp-r0-39a31c03.2.18.so
63524 anon
96844 /usr/lib/locale/locale-archive
798720 798720K
If you pmap a spawned process rather than the parent process fairly variable results are expected.
In terms of catching python memory leaks, you can grep out the anon allocations at intervals between app runs
pmap 4306 | sed 's/\]//' | awk '{print $2 " " $NF}' | sed 's/K / /' | sort -n | awk '$1 > 1000' | grep anon | tee /tmp/pmap.4306.anon.1
# Some minutes later
pmap 4306 | sed 's/\]//' | awk '{print $2 " " $NF}' | sed 's/K / /' | sort -n | awk '$1 > 1000' | grep anon | tee /tmp/pmap.4306.anon.2
diff /tmp/pmap.4306.anon.1 /tmp/pmap.4306.anon.2
An anon process with incrementing memory usage is a likely list candidate.
More and more added anon objects is likely something/s with a reference cycle or
some object not closed, e.g. a matplotlib savefig which does not have
fig.clf()
and plt.close()
on it (although did not help).
gc¶
Forcing Python garbage collection can be useful in terms of determining objects, type and reference counts
from gc import get_objects
# Debug with garbage collection - http://code.activestate.com/recipes/65333/
import gc
#
# Before something
before = defaultdict(int)
after = defaultdict(int)
for i in get_objects():
before[type(i)] += 1
# Something or lots of things
for i in get_objects():
after[type(i)] += 1
gc_result = [(k, after[k] - before[k]) for k in after if after[k] - before[k]]
for i in gc_result:
logger.info('debug :: %s' % str(i))
In the relevant log there will be a ton of output similar to this
tail -n 1000 /var/log/skyline/analyzer.log | grep " (<" | grep "14:21:4"
2016-08-06 14:21:44 :: 1349 :: (<class '_ast.Eq'>, 36)
2016-08-06 14:21:44 :: 1349 :: (<class '_ast.AugLoad'>, 36)
2016-08-06 14:21:44 :: 1349 :: (<class 'six.Module_six_moves_urllib'>, 36)
2016-08-06 14:21:44 :: 1349 :: (<class 'scipy.stats._continuous_distns.lognorm_gen'>, 36)
2016-08-06 14:21:44 :: 1349 :: (<class '_weakrefset.WeakSet'>, 6480)
2016-08-06 14:21:44 :: 1349 :: (<class 'scipy.stats._continuous_distns.halfnorm_gen'>, 36)
2016-08-06 14:21:44 :: 1349 :: (<class 'matplotlib.markers.MarkerStyle'>, 4318)
...
...
Forcing gc.collect()
every run did seem ease a memory leak initially, but it
did not solve it and gc.collect()
over the period of 12 hours took
increasing long to run. Preferably gc should only ever be used in Skyline for
debugging and development.
pyflakes¶
pyflakes is useful for finding imported and defined things that are not used and do not need to be imported, every little helps.
(skyline2711) earthgecko@localhost:/opt/python_virtualenv/projects/skyline2711$ bin/python2.7 -m flake8 --ignore=E501 /home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py
/home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py:11:1: F401 'msgpack.unpackb' imported but unused
/home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py:13:1: F401 'os.system' imported but unused
/home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py:17:1: F401 'socket' imported but unused
/home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py:22:1: F401 'sys' imported but unused
/home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py:28:1: F403 'from algorithm_exceptions import *' used; unable to detect undefined names
/home/earthgecko/github/skyline/develop/skyline/skyline/analyzer/analyzer.py:54:13: F401 'mem_top.mem_top' imported but unused
pympler¶
Does not work with Pandas and 2.7
mem_top¶
Some info but moved on to other tools - see analyzer_debug app
pytracker¶
pytracker, although did not get to far with pytracker, could only get it running in agent.py could not get any Trackable outputs in analyzer.py all None?
See analyzer_debug app.
objgraph¶
Some useful info, see analyzer_debug app.
guppy and heapy¶
heapy or hpy can given some useful insight similar to resource.
Development - Ionosphere¶
What Ionosphere is for¶
I want to monitor metrics on small VPSs that do not do a great deal, meaning there is no high, rate constant work or 3sigma consistency in the metrics. There are 7 apache.sending a day, not 7000 a minute.
Or there are peaks of 3 on the metric stats_counts.statsd.bad_lines_seen

This is not anomalous.
Ionosphere’s goal is to allow us to train Skyline on what is not anomalous.
Almost a year in review¶
For a large part of 2016 a method of implementing the Ionosphere concept has been sort after.
Ramblings - the development of Ionosphere¶
The original idea behind Ionosphere was...
Ionosphere is a user feedback/input to Skyline that allows for the creation of timeseries specific “algorithms”.
Skyline Mirage (and Analyzer) work well on fairly constant rate, high range metrics, however on low rate, low range metrics like machine stats metrics it tends to be less effective and these types of timeseries make Mirage chatty even when set too very high resolution e.g. 730 hours.
Ionosphere enables the user to create custom algorithms for specific timeseries. While defining all the custom “rules” to match different types of timeseries and metrics, much of the recent efforts in anomaly detection have been dedicated to creation of automated, algorithmic and machine learning processes doing everything with minimal user input, other than some configurations.
However, in truth anomaly detection is not necessarily equal among everything and the pure computational anomaly detection system is not there yet. There is still an aspect of art in the concept.
Ionosphere adds a missing piece to the Skyline anomaly detection stack, in our attempts to computationally handle the anomaly detection process we have removed or neglected a very important input, allowing the human to fly the craft.
Requirements¶
Ionosphere is dependent on the panorama branch (Branch #870: panorama)
20160719 update
We have Panorama now.
Allow us to have an input to allow us to tell Skyline that an alert was not anomalous after it had been reviewed in context and probably many, many times. Certain timeseries patterns are normal, but they are not normal in terms of 3sigma and never will be. However, they are not anomalous in their context.
It would be great if Skyline could learn these.
The initial idea was to attempt to start introducing modern machine learning models and algorithms into the mix, to essentially learn machine learning, the models, algorithms and methods. This it appears is easier said than done, it appears that machine learning with timeseries is not simple or straight forward. In fact, it seems in terms of machine learning, well timeseries are the one of the things that machine learning is not really good at yet.
A seemingly obvious method would be to consider trying to determine similarities between 2 timeseries, once again easier said than done.
Determining timeseries similarities - 20160719¶
Researching computing timeseries similarities in terms of both machine learning and statistics means, it appears that there is a fair bit of stuff in R that handles timeseries quite well.
- bsts: Bayesian Structural Time Series - Time series regression using dynamic linear models fit using MCMC (Google) - https://cran.r-project.org/web/packages/bsts/index.html
- The R package PDC provides complexity-based dissimilarity calculation and clustering, and also provides p values for a null hypothesis of identical underlying generating permutation distributions. The R package TSclust was recently updated and provides (among PDC) a number of approaches to time series dissimilarities. (https://www.researchgate.net/post/How_can_I_perform_time_series_data_similarity_measures_and_get_a_significance_level_p-value)
And python maybe RMS - Erol Kalkan · United States Geological Survey, “Another approach to compute the differences between two time series is moving window root-mean-square. RMS can be run for both series separately. This way, you can compare the similarities in energy (gain) level of time series. You may vary the window length for best resolution.” (https://www.researchgate.net/post/How_can_I_perform_time_series_data_similarity_measures_and_get_a_significance_level_p-value) http://stackoverflow.com/questions/5613244/root-mean-square-in-numpy-and-complications-of-matrix-and-arrays-of-numpy#
However finding a good similarity measure between time series is a very non-trivial task.
http://alexminnaar.com/time-series-classification-and-clustering-with-python.html
Spent tinkered with LBKeogh and sklearn.metrics import classification_report, he is not wrong. Any which way I think bringing R into the evaluation is going to be useful long term. rpy2
Other:
- Fast Time Series Evaluation (FTSE) - older algorithm from 2007 but potential - http://dbgroup.eecs.umich.edu/files/sigmod07timeseries.pdf
- https://www.semanticscholar.org/paper/Benchmarking-dynamic-time-warping-on-nearest-Tselas-Papapetrou/69683d13b7dfac64bd6d8cd6654b617361574baf
Aggregation. Further complicates... a lot I think.
Clustering not the way?
- Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research - http://www.cs.ucr.edu/~eamonn/meaningless.pdf - covers K-means and STS
- http://amid.fish/anomaly-detection-with-k-means-clustering
- https://github.com/mrahtz/sanger-machine-learning-workshop
Time clustering algorithm. An idea. We need to give time more dimensionality. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
Which ends up relating to Feature #1572: htsa - Haversine Time Similarity Algorithm, an idea to perhaps pursue in the future, non-trivial, but may be able to add some additional clusterable dimensionalities to timeseries at some point in the future. Are timeseries difficult for machine learning to understand as in simple timeseries, a timestamp is a 1 dimensional things and has no relationship to anything else. Whereas time really has a number of additional aspects to it as well. In the very least, it has a spatial dimension as well. Lucky with timeseries we have metric_name, timestamp and value, we could cluster on grouping of classifiable (classable) parts of namespaces say. Which would at least add a certain layer of machine learning, but to what end currently?
It is not easy.
But maybe it can be.
Originally Ionosphere was envisioned as bringing some machine learning to Skyline and lots of paths and methods, etc have been reviewed and unfortunately no simple way can be seen of achieving this in any meaningful way in terms of Skyline and it original purpose, anomaly detecting in machine metrics timeseries. Although it would be nice to update to the current Skyline stack and pipeline to use something not from the 1920’s, it is alas a fork to far at this point.
And a bit of 1920’s with a bit of modern key value and some Python, with a dash of “lets try CONSENSUS”, does not do bad a job. Considering off the production line and into the digital pipeline, with a sprinkling of new ideas.
However, not all metrics do well with 3sigma either :) Machine learning, scikit-learn, tensorflow, NuPIC, TPOT, k-means, et al are not helping either.
Thresholding, no. Although thresholding in Boundary is useful for it purpose, direct thresholding for Ionosphere has been constantly been looked away from as it is felt that simple thresholding is not useful or helpful in terms of learning for Skyline and people. We have done thresholding and we can.
Ionosphere should be about learning and teaching, for want of better words. Machine learning has training data sets. So Ionosphere needs training data sets. So lets give it training data sets.
Updated by Gary Wilson 3 months ago
Small steps¶
For some reason I think this should be at least started on before feature extraction.
Add user input
- via Panorama tickbox - not anomalous
- via alert with link to Panorama not anomalous
- Calculate things - stdDev - slope - linear regression - variability - etc - values for the timeseries
- Store the triggered timeseries for X amount of time to allow the user to process the anomaly and timeseries as it was, real timeseries data that was used and review the metric, if no action is taken on the anomaly, prune older than X.
- Begins to provide training data sets or at least calculated metrics as above about anomalies
Analyzer says this is anomalous, user can evaluate and say:
This is not anomalous at X time range. This is not anomalous on a Tuesday.
Updated by Gary Wilson about 1 month ago
Simple Ionosphere¶
Some nice simple ideas, yesterday morning and I think it might be workable.
- Operator flags as not_anomalous (within a 24 hr period)
- Take saved Redis timeseries and: - sum - determine mean - determine median - determine min - determine max - determine 3sigma - determine range - determine range in the 95% percentile - determine count in the 95% percentile - determine range in the 5% percentile - determine count in the 5% percentile
- Save not_anomalous details to Ionosphere table
The devil may be in the details
- Match entire namespace, wildcarding, may be difficult and expensive, but may not if Redis keyed
- Expire training sets and trained results, how long?
- Ionosphere before Mirage? May be mute point as should work in both, each with their own metrics as alerts will come from the responsible app and therefore they can be tagged independently.
Workflow¶
This works for Analyzer and Mirage, only Analyzer is described here as it has the additional task of checking if it is a Mirage metric.
- Analyzer -> detect anomaly
- Analyzer checks mirage.unique_metrics set, if mirage_metric == True send to Mirage and skip sending to Ionosphere, continue.
- Analyzer sends to Ionosphere if not a Mirage metric
- Analyzer then checks ionosphere.unique_metrics set, to see if it is an Ionosphere metric, if True, Analyzer skips alerting and continues. Ionosphere will analyze the metric that Analyzer just sent and score it and alert if it is anomalous.
- Ionosphere -> calculate a similarity score for the anomalous timeseries based on trained values. If less than x similar, alert, else proceed as normal.
Updated by Gary Wilson 18 days ago
Maybe we took a step - FRESH¶
In terms of one aspect of machine learning timeseries, tsfresh
The paper is interesting - https://arxiv.org/pdf/1610.07717v1.pdf
Updated by Gary Wilson 1 day ago
Task #1718: review.tsfresh - still in progress¶
tsfresh possibly does have the potential to fulfill some of the functionality of Ionosphere as described above.
It is not super fast, quite processor intensive, but... lots of features!!!
THE FIRST¶
o/ before the end of the year!!!
This was the first production Ionosphere matched tsfresh features profile after a year in the waiting and making.
A special thanks to @MaxBenChrist, @nils-braun and @jneuff over at https://github.com/blue-yonder/tsfresh
Well it took a year :) But... it works :)
- Branch #922: ionosphere - created 2015-12-26 11:21 AM
- Task #1658: Patterning Skyline Ionosphere
ionosphere.log¶
Then along came learn¶
After achieving the above first milestone and then the second, etc, etc. A significant amount of time has gone into improving the data presentation to the operator in terms of providing Graphite NOW graphs at various resolutions, etc.
So that I could make better more informed decisions with a fuller picture of the metric and the point anomaly in context, to decide as to whether the alert was anomalous or not. I would open Graphite link and looks at -7h and then change to -24h, -7d and then -30d. So added to Ionosphere UI.
And in that process, one can see learn. Decisions as to whether a point anomaly is contextually anomalous are based on longer than Skyline analyses, which is fine. However if we define that a human operator should not create a features profile if they cannot say that the point anomalous is NOT anomalous at all resolutions, we give Ionosphere multiple resolutions to enable it to train itself.
Ionosphere does not attempt to achieve statistical or scientific accuracy it merely attempts to make 3sigma based anomaly detection methods better. That said there may be some within the data science arena that may wish to argue that a number of things in the Ionosphere learning method would not necessarily hold up to some UCR review or Kaggle competition. Luckily Ionosphere makes no bones in this regard as this is not Ionosphere’s domain space. Ionosphere’s domain space is machine and application metrics, with ambitions for bigger things. Skyline and Ionosphere will find a timeseries Rebra (I reckon) :)
Ionosphere is the culmination of over 24 months of deconstruction and refitting of Etsy’s now much aged Skyline simple Python anomaly detection (complicated stack) application.
Ionosphere began with a very, very small goal. To teach Skyline what things were not anomalous. The journey of trying to build something that would do that went from confused, impossible, no idea .... to a multi-generation one-step learning system, like a timeseries Rebra finder.
Development - tsfresh¶
tsfresh in the Skyline context¶
Skyline uses its own internal list of tsfresh feature names. When a tsfresh update is required, this internal list of tsfresh feature names may need additions in Skyline as well in skyline/tsfresh_feature_names.py
How Skyline was over complicatedly upgraded from tsfresh-0.1.2 to tsfresh-0.3.0¶
- Locally upgrade your tsfresh version in your Skyline Python virtualenv
- Run tests and if they fail we need to ensure that any new feature names are updated in skyline/tsfresh_feature_names.py
- Calculate the features of the baseline data set and compare the last version baseline e.g. tests/tsfresh-0.1.2.stats.statsd.bad_lines_seen.20161110.csv.features.transposed.csv
- If you have verified that only new feature names have been added, to generate the TSFRESH_FEATURES list
- Update skyline/tsfresh_feature_names.py with new feature names AND tsfresh versions
- Add the new baseline for the version of tsfresh
- Run tests
Development - pytz¶
Warning
This pytz.rst file is automatically generated DO NOT edit this page directly, edit its source skyline/tsfresh/scripts/make-pytz.all_timezones-rst.py and see Building documentation)
pytz.all_timezones¶
pytz version: 2016.7
This is an automatically generated list of all pytz timezones for the specified pytz version generated using pytz.all_timezones. It is generated so there is an easy reference for users to find their relevant pytz timezone string.
It is specifically related to Skyline Ionosphere, but it is also applicable more generally to working with Graphite timeseries data, different timezones, pytz and formatting the Graphite timeseries data in UTC for feature extraction using tsfresh.
In terms of Ionosphere or generally working with Graphite timeseries data, this is only applicable for anyone want who has Graphite implemented in a timezone other than UTC and wishes to pass a timezone to extract the features of a metric timeseries with tsfresh.
The user should not have to fire up pytz locally or on their server to find their applicable pytz timezone, they are all listed here below, which is yours? Or in more detail - https://en.wikipedia.org/wiki/Tz_database
It is hoped that the methods used handle daylight savings timezones (DST), the methods implemented has been taken from some best practices, however they are not tested on DST changes, so the outcome is somewhat unknown.
pytz version: 2016.7
Timezones list for pytz version¶
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Juba
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa
Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Timbuktu
Africa/Tripoli
Africa/Tunis
Africa/Windhoek
America/Adak
America/Anchorage
America/Anguilla
America/Antigua
America/Araguaina
America/Argentina/Buenos_Aires
America/Argentina/Catamarca
America/Argentina/ComodRivadavia
America/Argentina/Cordoba
America/Argentina/Jujuy
America/Argentina/La_Rioja
America/Argentina/Mendoza
America/Argentina/Rio_Gallegos
America/Argentina/Salta
America/Argentina/San_Juan
America/Argentina/San_Luis
America/Argentina/Tucuman
America/Argentina/Ushuaia
America/Aruba
America/Asuncion
America/Atikokan
America/Atka
America/Bahia
America/Bahia_Banderas
America/Barbados
America/Belem
America/Belize
America/Blanc-Sablon
America/Boa_Vista
America/Bogota
America/Boise
America/Buenos_Aires
America/Cambridge_Bay
America/Campo_Grande
America/Cancun
America/Caracas
America/Catamarca
America/Cayenne
America/Cayman
America/Chicago
America/Chihuahua
America/Coral_Harbour
America/Cordoba
America/Costa_Rica
America/Creston
America/Cuiaba
America/Curacao
America/Danmarkshavn
America/Dawson
America/Dawson_Creek
America/Denver
America/Detroit
America/Dominica
America/Edmonton
America/Eirunepe
America/El_Salvador
America/Ensenada
America/Fort_Nelson
America/Fort_Wayne
America/Fortaleza
America/Glace_Bay
America/Godthab
America/Goose_Bay
America/Grand_Turk
America/Grenada
America/Guadeloupe
America/Guatemala
America/Guayaquil
America/Guyana
America/Halifax
America/Havana
America/Hermosillo
America/Indiana/Indianapolis
America/Indiana/Knox
America/Indiana/Marengo
America/Indiana/Petersburg
America/Indiana/Tell_City
America/Indiana/Vevay
America/Indiana/Vincennes
America/Indiana/Winamac
America/Indianapolis
America/Inuvik
America/Iqaluit
America/Jamaica
America/Jujuy
America/Juneau
America/Kentucky/Louisville
America/Kentucky/Monticello
America/Knox_IN
America/Kralendijk
America/La_Paz
America/Lima
America/Los_Angeles
America/Louisville
America/Lower_Princes
America/Maceio
America/Managua
America/Manaus
America/Marigot
America/Martinique
America/Matamoros
America/Mazatlan
America/Mendoza
America/Menominee
America/Merida
America/Metlakatla
America/Mexico_City
America/Miquelon
America/Moncton
America/Monterrey
America/Montevideo
America/Montreal
America/Montserrat
America/Nassau
America/New_York
America/Nipigon
America/Nome
America/Noronha
America/North_Dakota/Beulah
America/North_Dakota/Center
America/North_Dakota/New_Salem
America/Ojinaga
America/Panama
America/Pangnirtung
America/Paramaribo
America/Phoenix
America/Port-au-Prince
America/Port_of_Spain
America/Porto_Acre
America/Porto_Velho
America/Puerto_Rico
America/Rainy_River
America/Rankin_Inlet
America/Recife
America/Regina
America/Resolute
America/Rio_Branco
America/Rosario
America/Santa_Isabel
America/Santarem
America/Santiago
America/Santo_Domingo
America/Sao_Paulo
America/Scoresbysund
America/Shiprock
America/Sitka
America/St_Barthelemy
America/St_Johns
America/St_Kitts
America/St_Lucia
America/St_Thomas
America/St_Vincent
America/Swift_Current
America/Tegucigalpa
America/Thule
America/Thunder_Bay
America/Tijuana
America/Toronto
America/Tortola
America/Vancouver
America/Virgin
America/Whitehorse
America/Winnipeg
America/Yakutat
America/Yellowknife
Antarctica/Casey
Antarctica/Davis
Antarctica/DumontDUrville
Antarctica/Macquarie
Antarctica/Mawson
Antarctica/McMurdo
Antarctica/Palmer
Antarctica/Rothera
Antarctica/South_Pole
Antarctica/Syowa
Antarctica/Troll
Antarctica/Vostok
Arctic/Longyearbyen
Asia/Aden
Asia/Almaty
Asia/Amman
Asia/Anadyr
Asia/Aqtau
Asia/Aqtobe
Asia/Ashgabat
Asia/Ashkhabad
Asia/Baghdad
Asia/Bahrain
Asia/Baku
Asia/Bangkok
Asia/Barnaul
Asia/Beirut
Asia/Bishkek
Asia/Brunei
Asia/Calcutta
Asia/Chita
Asia/Choibalsan
Asia/Chongqing
Asia/Chungking
Asia/Colombo
Asia/Dacca
Asia/Damascus
Asia/Dhaka
Asia/Dili
Asia/Dubai
Asia/Dushanbe
Asia/Gaza
Asia/Harbin
Asia/Hebron
Asia/Ho_Chi_Minh
Asia/Hong_Kong
Asia/Hovd
Asia/Irkutsk
Asia/Istanbul
Asia/Jakarta
Asia/Jayapura
Asia/Jerusalem
Asia/Kabul
Asia/Kamchatka
Asia/Karachi
Asia/Kashgar
Asia/Kathmandu
Asia/Katmandu
Asia/Khandyga
Asia/Kolkata
Asia/Krasnoyarsk
Asia/Kuala_Lumpur
Asia/Kuching
Asia/Kuwait
Asia/Macao
Asia/Macau
Asia/Magadan
Asia/Makassar
Asia/Manila
Asia/Muscat
Asia/Nicosia
Asia/Novokuznetsk
Asia/Novosibirsk
Asia/Omsk
Asia/Oral
Asia/Phnom_Penh
Asia/Pontianak
Asia/Pyongyang
Asia/Qatar
Asia/Qyzylorda
Asia/Rangoon
Asia/Riyadh
Asia/Saigon
Asia/Sakhalin
Asia/Samarkand
Asia/Seoul
Asia/Shanghai
Asia/Singapore
Asia/Srednekolymsk
Asia/Taipei
Asia/Tashkent
Asia/Tbilisi
Asia/Tehran
Asia/Tel_Aviv
Asia/Thimbu
Asia/Thimphu
Asia/Tokyo
Asia/Tomsk
Asia/Ujung_Pandang
Asia/Ulaanbaatar
Asia/Ulan_Bator
Asia/Urumqi
Asia/Ust-Nera
Asia/Vientiane
Asia/Vladivostok
Asia/Yakutsk
Asia/Yangon
Asia/Yekaterinburg
Asia/Yerevan
Atlantic/Azores
Atlantic/Bermuda
Atlantic/Canary
Atlantic/Cape_Verde
Atlantic/Faeroe
Atlantic/Faroe
Atlantic/Jan_Mayen
Atlantic/Madeira
Atlantic/Reykjavik
Atlantic/South_Georgia
Atlantic/St_Helena
Atlantic/Stanley
Australia/ACT
Australia/Adelaide
Australia/Brisbane
Australia/Broken_Hill
Australia/Canberra
Australia/Currie
Australia/Darwin
Australia/Eucla
Australia/Hobart
Australia/LHI
Australia/Lindeman
Australia/Lord_Howe
Australia/Melbourne
Australia/NSW
Australia/North
Australia/Perth
Australia/Queensland
Australia/South
Australia/Sydney
Australia/Tasmania
Australia/Victoria
Australia/West
Australia/Yancowinna
Brazil/Acre
Brazil/DeNoronha
Brazil/East
Brazil/West
CET
CST6CDT
Canada/Atlantic
Canada/Central
Canada/East-Saskatchewan
Canada/Eastern
Canada/Mountain
Canada/Newfoundland
Canada/Pacific
Canada/Saskatchewan
Canada/Yukon
Chile/Continental
Chile/EasterIsland
Cuba
EET
EST
EST5EDT
Egypt
Eire
Etc/GMT
Etc/GMT+0
Etc/GMT+1
Etc/GMT+10
Etc/GMT+11
Etc/GMT+12
Etc/GMT+2
Etc/GMT+3
Etc/GMT+4
Etc/GMT+5
Etc/GMT+6
Etc/GMT+7
Etc/GMT+8
Etc/GMT+9
Etc/GMT-0
Etc/GMT-1
Etc/GMT-10
Etc/GMT-11
Etc/GMT-12
Etc/GMT-13
Etc/GMT-14
Etc/GMT-2
Etc/GMT-3
Etc/GMT-4
Etc/GMT-5
Etc/GMT-6
Etc/GMT-7
Etc/GMT-8
Etc/GMT-9
Etc/GMT0
Etc/Greenwich
Etc/UCT
Etc/UTC
Etc/Universal
Etc/Zulu
Europe/Amsterdam
Europe/Andorra
Europe/Astrakhan
Europe/Athens
Europe/Belfast
Europe/Belgrade
Europe/Berlin
Europe/Bratislava
Europe/Brussels
Europe/Bucharest
Europe/Budapest
Europe/Busingen
Europe/Chisinau
Europe/Copenhagen
Europe/Dublin
Europe/Gibraltar
Europe/Guernsey
Europe/Helsinki
Europe/Isle_of_Man
Europe/Istanbul
Europe/Jersey
Europe/Kaliningrad
Europe/Kiev
Europe/Kirov
Europe/Lisbon
Europe/Ljubljana
Europe/London
Europe/Luxembourg
Europe/Madrid
Europe/Malta
Europe/Mariehamn
Europe/Minsk
Europe/Monaco
Europe/Moscow
Europe/Nicosia
Europe/Oslo
Europe/Paris
Europe/Podgorica
Europe/Prague
Europe/Riga
Europe/Rome
Europe/Samara
Europe/San_Marino
Europe/Sarajevo
Europe/Simferopol
Europe/Skopje
Europe/Sofia
Europe/Stockholm
Europe/Tallinn
Europe/Tirane
Europe/Tiraspol
Europe/Ulyanovsk
Europe/Uzhgorod
Europe/Vaduz
Europe/Vatican
Europe/Vienna
Europe/Vilnius
Europe/Volgograd
Europe/Warsaw
Europe/Zagreb
Europe/Zaporozhye
Europe/Zurich
GB
GB-Eire
GMT
GMT+0
GMT-0
GMT0
Greenwich
HST
Hongkong
Iceland
Indian/Antananarivo
Indian/Chagos
Indian/Christmas
Indian/Cocos
Indian/Comoro
Indian/Kerguelen
Indian/Mahe
Indian/Maldives
Indian/Mauritius
Indian/Mayotte
Indian/Reunion
Iran
Israel
Jamaica
Japan
Kwajalein
Libya
MET
MST
MST7MDT
Mexico/BajaNorte
Mexico/BajaSur
Mexico/General
NZ
NZ-CHAT
Navajo
PRC
PST8PDT
Pacific/Apia
Pacific/Auckland
Pacific/Bougainville
Pacific/Chatham
Pacific/Chuuk
Pacific/Easter
Pacific/Efate
Pacific/Enderbury
Pacific/Fakaofo
Pacific/Fiji
Pacific/Funafuti
Pacific/Galapagos
Pacific/Gambier
Pacific/Guadalcanal
Pacific/Guam
Pacific/Honolulu
Pacific/Johnston
Pacific/Kiritimati
Pacific/Kosrae
Pacific/Kwajalein
Pacific/Majuro
Pacific/Marquesas
Pacific/Midway
Pacific/Nauru
Pacific/Niue
Pacific/Norfolk
Pacific/Noumea
Pacific/Pago_Pago
Pacific/Palau
Pacific/Pitcairn
Pacific/Pohnpei
Pacific/Ponape
Pacific/Port_Moresby
Pacific/Rarotonga
Pacific/Saipan
Pacific/Samoa
Pacific/Tahiti
Pacific/Tarawa
Pacific/Tongatapu
Pacific/Truk
Pacific/Wake
Pacific/Wallis
Pacific/Yap
Poland
Portugal
ROC
ROK
Singapore
Turkey
UCT
US/Alaska
US/Aleutian
US/Arizona
US/Central
US/East-Indiana
US/Eastern
US/Hawaii
US/Indiana-Starke
US/Michigan
US/Mountain
US/Pacific
US/Pacific-New
US/Samoa
UTC
Universal
W-SU
WET
Zulu
DRY¶
The current iteration of Skyline has a fair bit of repetition in some of the functions, etc in each module. The process of consolidating these is ongoing, however due to the nature of some of the things, some things do need slightly different implementations.
On going refactoring¶
The code style in the Skyline codebase is somewhat organic, over time more and changes have been made mostly in the following contexts.
- Logging trying to achieve consistency in logging string formats such as
error :: message
, etc. Maybe there should be independent standard and error logs :)
- Quotation style, although there are not any hard and fast rules in this there
- has been an attempt to try and refactor any strings that were quoted with double quotes to single quoted strings. This was more to do with try align the code so that double quotes can be reserved for strings that interpolate the variable in the string not via “%”. Although this is specifically Pythonic in a way, it can be and although not covered in any PEP seen, here have been some references to this being an acceptable and preferred style by some. Seeing as this is common in other languages too, such a ruby and shell. It is not a rule :) and there are still instances of double quoted string in places, maybe.
except:
- Attempts are always being made to try andtry:
andexcept:
wherever possible with the goal being that all Skyline apps handle failure gracefully, log informatively, without logging vommiting through an entireassigned_metrics
loop and exiting whenever a critical error is encountered at start up.
- Trying to achieve PEP8, wherever possible and not inconvenient.
- Adding any found Python-3 patterns discovered while doing something as long as
- it is wrapped in the appropriate conditional, it should have no effect on the functioning of the apps with Python2.7 and is there to test when Python3 becomes desired.
Roadmap¶
Things on the horizon :)
This is not really a roadmap per se it is more just a collection of ideas at the moment, in no particular order.
Further performance improvements and Python-3.5¶
Further performance improvements where possible with the use of cython. pypy is unfortunately not an overall viable route due to no real support for scipy and limited numpy support. pypy would be a quick win if it were possible, so cython where applicable is the obvious choice.
Continue updating the code base to ensure that everything can be run on >= Python-3.5.x
Ionosphere¶
Functionality to allow the operator to flag false positives, with a view to using machine learning to train Skyline on anomalies. Using the entire data set for training it, perhaps even using an entire namespace to increase the accuracy of the anomaly detection via multiple avenues of analysis.
In progress¶
See Ionosphere and Development - Ionosphere
Meteor¶
Add the ability to inject data into a data set and test a metric, the workflow, algorithms, alerters and the apps themselves,etc.
Constellations¶
A pure Python implementation of the Oculus functionality, but not necessarily exactly the same, but similar. Calculating in realtime using the redis data and fingerprinting the slope/gradient/alphabet of the last x datapoints of each timeseries on the fly and correlating / finding similarities in the timeseries in terms of patterns. Perhaps some inplementation of Cosine Similarity could be used. A pure Python implementation of Oculus functionality within Skyline would remove the requirement for the additional overheads of ruby and Elasticsearch.
This would allow for the correlations to be determined for any metrics
at any point within the FULL_DURATION
period.
Help wanted.
Skyline and NASA/FITS data¶
There is a Skyline module in development that is specifically designed for analysing Kepler FITS timeseries data. Although it specifically focus at extracting lightcurve data from the Kepler llc.fits data files, that in itself has added the functionality to convert any relevant FITS based data file that has a timeseries element in it. This should extend to other data types that the FITS data format (Flexible Image Transport System) handles:
- Much more than just another image format (such as JPEG or GIF)
- Used for the transport, analysis, and archival storage of scientific data sets
- Multi-dimensional arrays: 1D spectra, 2D images, 3D+ data cubes
- Tables containing rows and columns of information
- Header keywords provide descriptive information about the data
It should be possible for Skyline to be able to ingest any FITS data file or ingest an image file from a FITS file and feed it to scikit-image, etc.
NuPIC¶
Last year NuPIC was assessed in terms of
use in Internet advertising for an additional monitoring dimension. Although
the assessment was fairly brief, it’s predictive capabilities were quite
impressive. Working with a 14 month data set for one advertising service it was
able to predict the service’s ad_requests per 10 minutes for the 14 month
period remarkably well given the nature of the specific domain. The result was
a little peaky, with a few large peaks. Evaluation of the event streams from the
timeframes around the large peaks in question, identified known causes for
those peaks, with a “noise filter” applied to exclude > x
to reduce
those now known few noisy events and the result were indeed remarkable. In fact
NuPIC predicted our requests for 14 months within the bounds of ~5% error margin
and when it made and error, it learned.
Truly remarkable given this is adtech data, 6.5 million request per minute peaks, low troughs with sparse seasonalities (if there is such a thing) at times. NuPIC and the Cortical Learning Algorithm are really amazing, if not a little difficult upfront.
NuPIC predictions and Graphite timeseries¶
Feeding NuPIC prediction data back into Graphite helps our own Neocortex to visually analyse the results too (data courtesy of Coull).
Original requests timeseries¶

Original requests timeseries data
NuPIC predicted timeseries¶

NuPIC predicted data
percentage error - positive and negative¶
- positive - under predicted (we did more than predicted)
- negative - over predicted (we did less than predicted)
asPercent(diffSeries(stats_counts.radar.varnish.rpm.total,stats.nupic.predictions.radar.varnish.rpm.total),stats_counts.radar.varnish.rpm.total)
Drop the noisy for an average representation

NuPIC percentage error
Overlaid¶
Real data, NuPIC predictions and percentage error (on the 2nd unticked y axis as above)

Real data, NuPIC predictions and percentage error
Quite amazing. It a not beyond the realms of possibility to have a Horizon feeding specific metrics to various NuPIC HTM Cortical Learning Algorithms models...
Update the NAB Scoreboard¶
Look at the automated running of the Numenta Anomaly Benchmark (NAB) data and frequently determine the Standard Profile, Reward Low FP and Reward Low FN scores (metrics). This will only aid and improve the evaluation of any additional algorithms, methods or techniques that are added or applied to Skyline in the future, e.g:
- Does Mirage change the score?
- Does Boundary?
- Would the addition of pyculiarity as an “algorithm”? (https://github.com/nicolasmiller/pyculiarity)
Automated NAB benchmark metrics would be a nice thing to have :)
Update 20170225: Skyline can no longer necessarily be NAB-ed due to it now functioning in multiple temporal resolutions as a whole.
Machine learning¶
Bring additional dimensions of machine learning capabilities into Skyline, too many avenues to mention...
Grumpy¶
Investigate if any elements could benefit performance wise from being implemented in Grumpy - go running Python - https://opensource.googleblog.com/2017/01/grumpy-go-running-python.html there may be some mileage in using go instead of multiprocessing in some cases, algorithms, ionosphere, tsfresh chunks to go in extract_features rather than multiprocessing, scope for investigation.
skyline package¶
Subpackages¶
skyline.analyzer package¶
Submodules¶
skyline.analyzer.agent module¶
skyline.analyzer.alerters module¶
-
skyline_version
= 'Skyline (ionosphere v1.1.11-stable)'¶ Create any alerter you want here. The function will be invoked from trigger_alert.
Three arguments will be passed, two of them tuples: alert and metric.
alert: the tuple specified in your settings:
alert[0]: The matched substring of the anomalous metric
alert[1]: the name of the strategy being used to alert
alert[2]: The timeout of the alert that was triggered
metric: information about the anomaly itself
metric[0]: the anomalous value
metric[1]: The full name of the anomalous metric
metric[2]: anomaly timestamp
context: app name
-
alert_smtp
(alert, metric, context)[source]¶ Called by
trigger_alert()
and sends an alert via smtp to the recipients that are configured for the metric.
-
alert_pagerduty
(alert, metric, context)[source]¶ Called by
trigger_alert()
and sends an alert via PagerDuty
-
alert_hipchat
(alert, metric, context)[source]¶ Called by
trigger_alert()
and sends an alert the hipchat room that is configured in settings.py.
-
alert_syslog
(alert, metric, context)[source]¶ Called by
trigger_alert()
and log anomalies to syslog.
-
trigger_alert
(alert, metric, context)[source]¶ Called by
skyline.analyzer.Analyzer.spawn_alerter_process
to trigger an alert.Analyzer passes three arguments, two of them tuples. The alerting strategy is determined and the approriate alert def is then called and passed the tuples.
Parameters: - alert –
The alert tuple specified in settings.py.
alert[0]: The matched substring of the anomalous metric
alert[1]: the name of the strategy being used to alert
alert[2]: The timeout of the alert that was triggered
- meric –
The metric tuple.
metric[0]: the anomalous value
metric[1]: The full name of the anomalous metric
metric[2]: anomaly timestamp
- context (str) – app name
- alert –
skyline.analyzer.algorithms module¶
skyline.analyzer.analyzer module¶
Module contents¶
skyline.analyzer_dev package¶
Submodules¶
skyline.analyzer_dev.agent module¶
skyline.analyzer_dev.alerters module¶
-
alert_smtp
(alert, metric)[source]¶ Called by
trigger_alert()
and sends an alert via smtp to the recipients that are configured for the metric.
-
alert_pagerduty
(alert, metric)[source]¶ Called by
trigger_alert()
and sends an alert via PagerDuty
-
alert_hipchat
(alert, metric)[source]¶ Called by
trigger_alert()
and sends an alert the hipchat room that is configured in settings.py.
-
alert_syslog
(alert, metric)[source]¶ Called by
trigger_alert()
and log anomalies to syslog.
-
trigger_alert
(alert, metric)[source]¶ Called by
run
to trigger an alert, analyzer passes two arguments, both of them tuples. The alerting strategy is determined and the approriate alert def is then called and passed the tuples.Parameters: - alert –
The alert tuple specified in settings.py.
alert[0]: The matched substring of the anomalous metric
alert[1]: the name of the strategy being used to alert
alert[2]: The timeout of the alert that was triggered
- meric –
The metric tuple.
metric[0]: the anomalous value metric[1]: The full name of the anomalous metric
- alert –
skyline.analyzer_dev.algorithms_dev module¶
skyline.analyzer_dev.analyzer_dev module¶
Module contents¶
skyline.boundary package¶
Submodules¶
skyline.boundary.agent module¶
skyline.boundary.boundary module¶
skyline.boundary.boundary_alerters module¶
-
skyline_app_logfile
= '/var/log/skyline/boundary.log'¶ Create any alerter you want here. The function is invoked from trigger_alert. 4 arguments will be passed in as strings: datapoint, metric_name, expiration_time, algorithm
skyline.boundary.boundary_algorithms module¶
-
boundary_no_mans_land
()[source]¶ This is no man’s land. Do anything you want in here, as long as you return a boolean that determines whether the input timeseries is anomalous or not.
To add an algorithm, define it here, and add its name to
settings.BOUNDARY_ALGORITHMS
.
-
autoaggregate_ts
(timeseries, autoaggregate_value)[source]¶ This is a utility function used to autoaggregate a timeseries. If a timeseries data set has 6 datapoints per minute but only one data value every minute then autoaggregate will aggregate every autoaggregate_value.
-
less_than
(timeseries, metric_name, metric_expiration_time, metric_min_average, metric_min_average_seconds, metric_trigger)[source]¶ A timeseries is anomalous if the datapoint is less than metric_trigger
-
greater_than
(timeseries, metric_name, metric_expiration_time, metric_min_average, metric_min_average_seconds, metric_trigger)[source]¶ A timeseries is anomalous if the datapoint is greater than metric_trigger
-
detect_drop_off_cliff
(timeseries, metric_name, metric_expiration_time, metric_min_average, metric_min_average_seconds, metric_trigger)[source]¶ A timeseries is anomalous if the average of the last 10 datapoints is <trigger> times greater than the last data point AND if has not experienced frequent cliff drops in the last 10 datapoints. If the timeseries has experienced 2 or more datapoints of equal or less values in the last 10 or EXPIRATION_TIME datapoints or is less than a MIN_AVERAGE if set the algorithm determines the datapoint as NOT anomalous but normal. This algorithm is most suited to timeseries with most datapoints being > 100 (e.g high rate). The arbitrary <trigger> values become more noisy with lower value datapoints, but it still matches drops off cliffs.
Module contents¶
skyline.crucible package¶
Submodules¶
skyline.crucible.agent module¶
skyline.crucible.crucible module¶
skyline.crucible.crucible_algorithms module¶
Module contents¶
skyline.horizon package¶
Submodules¶
skyline.horizon.listen module¶
-
class
SafeUnpickler
[source]¶ Bases:
object
-
PICKLE_SAFE
= {'copy_reg': set(['_reconstructor']), '__builtin__': set(['object'])}¶
-
skyline.horizon.roomba module¶
skyline.horizon.worker module¶
-
class
Worker
(queue, parent_pid, skip_mini, canary=False)[source]¶ Bases:
multiprocessing.process.Process
The worker processes chunks from the queue and appends the latest datapoints to their respective timesteps in Redis.
Module contents¶
skyline.ionosphere package¶
Submodules¶
skyline.ionosphere.agent module¶
skyline.ionosphere.ionosphere module¶
skyline.ionosphere.layers module¶
skyline.ionosphere.learn module¶
Module contents¶
skyline.mirage package¶
Submodules¶
skyline.mirage.agent module¶
skyline.mirage.mirage module¶
skyline.mirage.mirage_alerters module¶
-
skyline_version
= 'Skyline (ionosphere v1.1.11-stable)'¶ Create any alerter you want here. The function will be invoked from trigger_alert. Four arguments will be passed, two of them tuples: alert and metric.
alert: the tuple specified in your settings:
alert[0]: The matched substring of the anomalous metric
alert[1]: the name of the strategy being used to alert
alert[2]: The timeout of the alert that was triggered
alert[3]: The SECOND_ORDER_RESOLUTION_HOURS
metric: information about the anomaly itself
metric[0]: the anomalous value
metric[1]: The full name of the anomalous metric
metric[2]: anomaly timestamp
second_order_resolution_seconds: int
context: the app name
-
alert_smtp
(alert, metric, second_order_resolution_seconds, context)[source]¶ Called by
trigger_alert()
and sends an alert via smtp to the recipients that are configured for the metric.
-
alert_pagerduty
(alert, metric, second_order_resolution_seconds, context)[source]¶ Called by
trigger_alert()
and sends an alert via PagerDuty
-
alert_hipchat
(alert, metric, second_order_resolution_seconds, context)[source]¶ Called by
trigger_alert()
and sends an alert the hipchat room that is configured in settings.py.
-
trigger_alert
(alert, metric, second_order_resolution_seconds, context)[source]¶ Called by
run
to trigger an alert, Mirage passes two arguments, both of them tuples. The alerting strategy is determined and the approriate alert def is then called and passed the tuples.Parameters: - alert –
The alert tuple specified in settings.py.
alert[0]: The matched substring of the anomalous metric
alert[1]: the name of the strategy being used to alert
alert[2]: The timeout of the alert that was triggered
alert[3]: The SECOND_ORDER_RESOLUTION_HOURS
- meric –
The metric tuple.
metric[0]: the anomalous value
metric[1]: The full name of the anomalous metric
metric[2]: anomaly timestamp
- context (str) – app name
- alert –
skyline.mirage.mirage_algorithms module¶
skyline.mirage.negaters module¶
Module contents¶
skyline.panorama package¶
Submodules¶
skyline.panorama.agent module¶
skyline.panorama.panorama module¶
Module contents¶
skyline.tsfresh_features package¶
Submodules¶
skyline.tsfresh_features.generate_tsfresh_features module¶
Module contents¶
version info
skyline.webapp package¶
Submodules¶
skyline.webapp.backend module¶
-
panorama_request
()[source]¶ Gets the details of anomalies from the database, using the URL arguments that are passed in by the
request.args
to build the MySQL select query string and queries the database, parse the results and creates an array of the anomalies that matched the query and creates thepanaroma.json
file, then returns the array. The Webapp needs both the array and the JSONP file to serve to the browser for the client sidepanaroma.js
.Parameters: None – determined from request.args
Returns: array Return type: array Note
And creates
panaroma.js
for client side javascript
skyline.webapp.gunicorn module¶
skyline.webapp.ionosphere_backend module¶
skyline.webapp.utilities module¶
skyline.webapp.webapp module¶
Module contents¶
Submodules¶
skyline.algorithm_exceptions module¶
skyline.database module¶
skyline.features_profile module¶
skyline.ionosphere_functions module¶
skyline.settings module¶
Shared settings
IMPORTANT NOTE
You may find reading some of these settings documentation strings http://earthgecko-skyline.readthedocs.io/en/latest/skyline.html#module-settings
-
REDIS_SOCKET_PATH
= '/tmp/redis.sock'¶ Variables: REDIS_SOCKET_PATH (str) – The path for the Redis unix socket
-
LOG_PATH
= '/var/log/skyline'¶ Variables: LOG_PATH (str) – The Skyline logs directory. Do not include a trailing slash.
-
PID_PATH
= '/var/run/skyline'¶ Variables: PID_PATH (str) – The Skyline pids directory. Do not include a trailing slash.
-
SKYLINE_TMP_DIR
= '/tmp/skyline'¶ Variables: SKYLINE_TMP_DIR (str) – The Skyline tmp dir. Do not include a trailing slash. It is recommended you keep this in the /tmp directory which normally uses tmpfs.
-
FULL_NAMESPACE
= 'metrics.'¶ Variables: FULL_NAMESPACE (str) – Metrics will be prefixed with this value in Redis.
-
GRAPHITE_SOURCE
= ''¶ Variables: GRAPHITE_SOURCE (str) – The data source
-
ENABLE_DEBUG
= False¶ Variables: ENABLE_DEBUG (str) – Enable additional debug logging - useful for development only, this should definitely be set to False on production systems.
-
MINI_NAMESPACE
= 'mini.'¶ Variables: MINI_NAMESPACE (str) – The Horizon agent will make T’d writes to both the full namespace and the mini namespace. Oculus gets its data from everything in the mini namespace.
-
FULL_DURATION
= 86400¶ Variables: FULL_DURATION (str) – This is the rolling duration that will be stored in Redis. Be sure to pick a value that suits your memory capacity, your CPU capacity and your overall metrics count. Longer durations take a longer to analyze, but they can help the algorithms reduce the noise and provide more accurate anomaly detection.
-
MINI_DURATION
= 3600¶ Variables: MINI_DURATION (str) – This is the duration of the ‘mini’ namespace, if you are also using the Oculus service. It is also the duration of data that is displayed in the Webapp ‘mini’ view.
-
GRAPHITE_HOST
= 'YOUR_GRAPHITE_HOST.example.com'¶ Variables: GRAPHITE_HOST (str) – If you have a Graphite host set up, set this metric to get graphs on Skyline and Horizon. Don’t include http:// since this is used for carbon host as well.
-
GRAPHITE_PROTOCOL
= 'http'¶ Variables: GRAPHITE_PROTOCOL (str) – Graphite host protocol - http or https
-
GRAPHITE_PORT
= '80'¶ Variables: GRAPHITE_PORT (str) – Graphite host port - for a specific port if graphite runs on a port other than 80, e.g. ‘8888’
-
GRAPHITE_CONNECT_TIMEOUT
= 5¶ Variables: GRAPHITE_CONNECT_TIMEOUT (int) – Graphite connect timeout - this allows for the gracefully failure of any graphite requests so that no graphite related functions ever block for too long.
-
GRAPHITE_READ_TIMEOUT
= 10¶ Variables: GRAPHITE_READ_TIMEOUT (int) – Graphite read timeout
-
GRAPHITE_GRAPH_SETTINGS
= '&width=588&height=308&bgcolor=000000&fontBold=true&fgcolor=C0C0C0'¶ Variables: GRAPHITE_GRAPH_SETTINGS (str) – These are graphite settings in terms of alert graphs - this is defaulted to a format that is more colourblind friendly than the default graphite graphs.
-
TARGET_HOURS
= '7'¶ Variables: TARGET_HOURS (str) – The number of hours data to graph in alerts.
-
GRAPH_URL
= 'http://YOUR_GRAPHITE_HOST.example.com:80/render/?width=1400&from=-7hour&target='¶ Variables: GRAPH_URL (str) – The graphite URL for alert graphs will be appended with the relevant metric name in each alert. Note
There is probably no neeed to change this unless you what a different size graph sent with alerts.
-
CARBON_PORT
= 2003¶ Variables: CARBON_PORT (int) – If you have a Graphite host set up, set its Carbon port.
-
OCULUS_HOST
= ''¶ Variables: OCULUS_HOST (str) – If you have Oculus set up, set this to http://<OCULUS_HOST>
- If you do not want to use Oculus, leave this empty. However if you comment this out, Skyline will not work! Speed improvements will occur when Oculus support is disabled.
-
SERVER_METRICS_NAME
= 'YOUR_HOSTNAME'¶ Variables: SERVER_METRICS_NAME (str) – The hostname of the Skyline. - This is to allow for multiple Skyline nodes to send metrics to a Graphite
instance on the Skyline namespace sharded by this setting, like carbon.relays.
If you want multiple Skyline hosts, set the hostname of the skyline here and
metrics will be as e.g.
skyline.analyzer.skyline-01.run_time
- This is to allow for multiple Skyline nodes to send metrics to a Graphite
instance on the Skyline namespace sharded by this setting, like carbon.relays.
If you want multiple Skyline hosts, set the hostname of the skyline here and
metrics will be as e.g.
-
MIRAGE_CHECK_PATH
= '/opt/skyline/mirage/check'¶ Variables: MIRAGE_CHECK_PATH (str) – This is the location the Skyline analyzer will write the second order resolution anomalies to check to a file on disk - absolute path
-
CRUCIBLE_CHECK_PATH
= '/opt/skyline/crucible/check'¶ Variables: CRUCIBLE_CHECK_PATH (str) – This is the location the Skyline apps will write the anomalies to for crucible to check to a file on disk - absolute path
-
PANORAMA_CHECK_PATH
= '/opt/skyline/panorama/check'¶ Variables: PANORAMA_CHECK_PATH (str) – This is the location the Skyline apps will write the anomalies to for Panorama to check to a file on disk - absolute path
-
PANDAS_VERSION
= '0.18.1'¶ Variables: PANDAS_VERSION (str) – Pandas version in use - Declaring the version of pandas in use reduces a large amount of interpolating in all the skyline modules. There are some differences from pandas >= 0.18.0 however the original Skyline could run on lower versions of pandas.
-
ALERTERS_SETTINGS
= True¶ Note
Alerters can be enabled alerters due to that fact that not everyone will necessarily want all 3rd party alerters. Enabled what 3rd alerters you require here. This enables only the alerters that are required to be imported and means that not all alerter related modules in
requirements.txt
have to be installed, only those you require.
-
SYSLOG_ENABLED
= True¶ Variables: SYSLOG_ENABLED (boolean) – Alerter - enables Skyline apps to submit anomalous metric details to syslog.
-
HIPCHAT_ENABLED
= False¶ Variables: HIPCHAT_ENABLED (boolean) – Enables the Hipchat alerter
-
PAGERDUTY_ENABLED
= False¶ Variables: PAGERDUTY_ENABLED (boolean) – Enables the Pagerduty alerter
-
SLACK_ENABLED
= False¶ Variables: SLACK_ENABLED (boolean) – Enables the Slack alerter
-
ANOMALY_DUMP
= 'webapp/static/dump/anomalies.json'¶ Variables: ANOMALY_DUMP (str) – This is the location the Skyline agent will write the anomalies file to disk. It needs to be in a location accessible to the webapp.
-
ANALYZER_PROCESSES
= 1¶ Variables: ANALYZER_PROCESSES (int) – This is the number of processes that the Skyline Analyzer will spawn. - Analysis is a very CPU-intensive procedure. You will see optimal results if you set ANALYZER_PROCESSES to several less than the total number of CPUs on your server. Be sure to leave some CPU room for the Horizon workers and for Redis.
- IMPORTANTLY bear in mind that your Analyzer run should be able to analyze all your metrics in the same resoluton as your metrics. So for example if you have 1000 metrics at a resolution of 60 seconds (e.g. one datapoint per 60 seconds), you are aiming to try and analyze all of those within 60 seconds. If you do not the anomaly detection begins to lag and it is no longer really near realtime. That stated, bear in mind if you are not processing 10s of 1000s of metrics, you may only need one Analyzer process. To determine your optimal settings take note of ‘seconds to run’ values in the Analyzer log.
-
ANALYZER_OPTIMUM_RUN_DURATION
= 60¶ Variables: ANALYZER_OPTIMUM_RUN_DURATION (int) – This is how many seconds it would be optimum for Analyzer to be able to analyze all your metrics in. Note
In the original Skyline this was hardcorded to 5.
-
MAX_ANALYZER_PROCESS_RUNTIME
= 180¶ Variables: MAX_ANALYZER_PROCESS_RUNTIME (int) – What is the maximum number of seconds an Analyzer process should run analysing a set of assigned_metrics
- How many seconds This is for Analyzer to self monitor its own analysis threads and terminate any threads that have run longer than this. Although Analyzer and mutliprocessing are very stable, there are edge cases in real world operations which can very infrequently cause a process to hang.
-
STALE_PERIOD
= 500¶ Variables: STALE_PERIOD (int) – This is the duration, in seconds, for a metric to become ‘stale’ and for the analyzer to ignore it until new datapoints are added. ‘Staleness’ means that a datapoint has not been added for STALE_PERIOD seconds.
-
MIN_TOLERABLE_LENGTH
= 1¶ Variables: MIN_TOLERABLE_LENGTH (int) – This is the minimum length of a timeseries, in datapoints, for the analyzer to recognize it as a complete series.
-
MAX_TOLERABLE_BOREDOM
= 100¶ Variables: MAX_TOLERABLE_BOREDOM (int) – Sometimes a metric will continually transmit the same number. There’s no need to analyze metrics that remain boring like this, so this setting determines the amount of boring datapoints that will be allowed to accumulate before the analyzer skips over the metric. If the metric becomes noisy again, the analyzer will stop ignoring it.
-
BOREDOM_SET_SIZE
= 1¶ Variables: BOREDOM_SET_SIZE (int) – By default, the analyzer skips a metric if it it has transmitted a single number settings.MAX_TOLERABLE_BOREDOM
times.- Change this setting if you wish the size of the ignored set to be higher (ie,
ignore the metric if there have only been two different values for the past
settings.MAX_TOLERABLE_BOREDOM
datapoints). This is useful for timeseries that often oscillate between two values.
- Change this setting if you wish the size of the ignored set to be higher (ie,
ignore the metric if there have only been two different values for the past
-
CANARY_METRIC
= 'statsd.numStats'¶ Variables: CANARY_METRIC (str) – The metric name to use as the CANARY_METRIC - The canary metric should be a metric with a very high, reliable resolution
that you can use to gauge the status of the system as a whole. Like the
statsd.numStats
or a metric in thecarbon.
namespace
- The canary metric should be a metric with a very high, reliable resolution
that you can use to gauge the status of the system as a whole. Like the
-
ALGORITHMS
= ['histogram_bins', 'first_hour_average', 'stddev_from_average', 'grubbs', 'ks_test', 'mean_subtraction_cumulation', 'median_absolute_deviation', 'stddev_from_moving_average', 'least_squares']¶ Variables: ALGORITHMS (array) – These are the algorithms that the Analyzer will run. To add a new algorithm, you must both define the algorithm in algorithms.py and add it’s name here.
-
CONSENSUS
= 6¶ Variables: CONSENSUS (int) – This is the number of algorithms that must return True before a metric is classified as anomalous by Analyzer.
-
RUN_OPTIMIZED_WORKFLOW
= True¶ Variables: RUN_OPTIMIZED_WORKFLOW (boolean) – This sets Analyzer to run in an optimized manner. - This sets Analyzer to run in an optimized manner in terms of using the CONSENSUS setting to dynamically determine in what order and how many algorithms need to be run be able to achieve CONSENSUS. This reduces the amount of work that Analyzer has to do per run. It is recommended that this be set to True in most circumstances to ensure that Analyzer is run as efficiently as possible, UNLESS you are working on algorithm development then you may want this to be False
-
ENABLE_ALGORITHM_RUN_METRICS
= True¶ Variables: ENABLE_ALGORITHM_RUN_METRICS (boolean) – This enables algorithm timing metrics to Graphite - This will send additional metrics to the graphite namespaces of:
skyline.analyzer.<hostname>.algorithm_breakdown.<algorithm_name>.timings.median_time
skyline.analyzer.<hostname>.algorithm_breakdown.<algorithm_name>.timings.times_run
skyline.analyzer.<hostname>.algorithm_breakdown.<algorithm_name>.timings.total_time
These are related to the RUN_OPTIMIZED_WORKFLOW performance tuning.
- This will send additional metrics to the graphite namespaces of:
-
ENABLE_ALL_ALGORITHMS_RUN_METRICS
= False¶ Variables: ENABLE_ALL_ALGORITHMS_RUN_METRICS (boolean) – DEVELOPMENT only - run and time all Warning
If set to
True
, Analyzer will revert to it’s original unoptimized workflow and will run and time all algorithms against all timeseries.
-
ENABLE_SECOND_ORDER
= False¶ Variables: ENABLE_SECOND_ORDER (boolean) – This is to enable second order anomalies. Warning
EXPERIMENTAL - This is an experimental feature, so it’s turned off by default.
-
ENABLE_ALERTS
= True¶ Variables: ENABLE_ALERTS (boolean) – This enables Analyzer alerting.
-
ENABLE_MIRAGE
= False¶ Variables: ENABLE_MIRAGE (boolean) – This enables Analyzer to output to Mirage
-
ENABLE_FULL_DURATION_ALERTS
= True¶ Variables: ENABLE_FULL_DURATION_ALERTS (boolean) – This enables Analyzer to alert on all FULL_DURATION anomalies. - This enables FULL_DURATION alerting for Analyzer, if
True
Analyzer will send ALL alerts on any alert tuple that have aSECOND_ORDER_RESOLUTION_HOURS
value defined for Mirage in their alert tuple. IfFalse
Analyzer will only add a Mirage check and allow Mirage to do the alerting.
Note
If you have Mirage enabled and have defined
SECOND_ORDER_RESOLUTION_HOURS
values in the desired metric alert tuples, you want this set toFalse
- This enables FULL_DURATION alerting for Analyzer, if
-
ANALYZER_CRUCIBLE_ENABLED
= False¶ Variables: ANALYZER_CRUCIBLE_ENABLED (boolean) – This enables Analyzer to output to Crucible - This enables Analyzer to send Crucible data, if this is set to
True
ensure thatsettings.CRUCIBLE_ENABLED
is also set toTrue
in the Crucible settings block.
Warning
Not recommended from production, will make a LOT of data files in the
settings.CRUCIBLE_DATA_FOLDER
- This enables Analyzer to send Crucible data, if this is set to
-
ALERTS
= (('skyline', 'smtp', 1800), ('skyline_test.alerters.test', 'smtp', 1800), ('skyline_test.alerters.test', 'hipchat', 1800), ('skyline_test.alerters.test', 'pagerduty', 1800))¶ Variables: ALERTS (tuples) – This enables analyzer alerting. This is the config for which metrics to alert on and which strategy to use for each. Alerts will not fire twice within
EXPIRATION_TIME
, even if they trigger again.Tuple schema example:
ALERTS = ( # ('<metric_namespace>', '<alerter>', EXPIRATION_TIME, SECOND_ORDER_RESOLUTION_HOURS), # With SECOND_ORDER_RESOLUTION_HOURS being optional for Mirage ('metric1', 'smtp', 1800), ('important_metric.total', 'smtp', 600), ('important_metric.total', 'pagerduty', 1800), ('metric3', 'hipchat', 600), # Log all anomalies to syslog ('stats.', 'syslog', 1), # Wildcard namespaces can be used as well ('metric4.thing.*.requests', 'stmp', 900), # However beware of wildcards as the above wildcard should really be ('metric4.thing\..*.\.requests', 'stmp', 900), # mirage - SECOND_ORDER_RESOLUTION_HOURS - if added and Mirage is enabled ('metric5.thing.*.rpm', 'smtp', 900, 168), )
Alert tuple parameters are:
Parameters: - metric (str) – metric name.
- alerter (str) – the alerter name e.g. smtp, syslog, hipchat, pagerduty
- EXPIRATION_TIME (int) – Alerts will not fire twice within this amount of seconds, even if they trigger again.
- SECOND_ORDER_RESOLUTION_HOURS (int) – (optional) The number of hours that Mirage should surface the metric timeseries for
Note
Consider using the default skyline_test.alerters.test for testing alerts with.
-
PLOT_REDIS_DATA
= True¶ Variables: PLOT_REDIS_DATA (boolean) – Plot graph using Redis timeseries data on with Analyzer alerts. - There are times when Analyzer alerts have no data in the Graphite graphs and/or the data in the Graphite graph is skewed due to retentions aggregation. This mitigates that by creating a graph using the Redis timeseries data and embedding the image in the Analyzer alerts as well.
Note
The Redis data plot has the following additional information as well, the 3sigma upper (and if applicable lower) bounds and the mean are plotted and reported too. Although less is more effective, in this case getting a visualisation of the 3sigma boundaries is informative.
-
NON_DERIVATIVE_MONOTONIC_METRICS
= ['the_namespace_of_the_monotonic_metric_to_not_calculate_the_derivative_for']¶ Variables: NON_DERIVATIVE_MONOTONIC_METRICS (list) – Strictly increasing monotonically metrics to not calculate the derivative values for Skyline by default automatically converts strictly increasingly monotonically metric values to their derivative values by calculating the delta between subsequent datapoints. The function ignores datapoints that trend down. This is useful for metrics that increase over time and then reset.
Any strictly increasing monotonically metrics that you do not want Skyline to convert to the derivative values are declared here. This list works in the same way that Horizon SKIP_LIST does, it matches in the string or dotted namespace elements.
-
SMTP_OPTS
= {'embed-images': True, 'default_recipient': ['you@your_domain.com'], 'sender': 'skyline@your_domain.com', 'recipients': {'skyline': ['you@your_domain.com', 'them@your_domain.com'], 'skyline_test.alerters.test': ['you@your_domain.com']}}¶ Variables: SMTP_OPTS (dictionary) – Your SMTP settings. Note
For each alert tuple defined in
settings.ALERTS
you need a recipient defined that matches the namespace. The default_recipient acts as a catchall for any alert tuple that does not have a matching recipients defined.
-
HIPCHAT_OPTS
= {'color': 'purple', 'auth_token': 'hipchat_auth_token', 'sender': 'hostname or identifier', 'rooms': {'skyline': (12345,), 'skyline_test.alerters.test': (12345,)}}¶ Variables: HIPCHAT_OPTS (dictionary) – Your Hipchat settings. HipChat alerts require python-simple-hipchat
-
PAGERDUTY_OPTS
= {'auth_token': 'your_pagerduty_auth_token', 'subdomain': 'example', 'key': 'your_pagerduty_service_api_key'}¶ Variables: PAGERDUTY_OPTS (dictionary) – Your SMTP settings. PagerDuty alerts require pygerduty
-
SYSLOG_OPTS
= {'ident': 'skyline'}¶ Variables: SYSLOG_OPTS (dictionary) – Your SMTP settings. syslog alerts requires an ident this adds a LOG_WARNING message to the LOG_LOCAL4 which will ship to any syslog or rsyslog down the line. The
EXPIRATION_TIME
for the syslog alert method should be set to 1 to fire every anomaly into the syslog.
-
WORKER_PROCESSES
= 2¶ Variables: WORKER_PROCESSES (int) – This is the number of worker processes that will consume from the Horizon queue.
-
HORIZON_IP
= '0.0.0.0'¶ Variables: HORIZON_IP (str) – The IP address for Horizon to bind to. Defaults to gethostname()
-
PICKLE_PORT
= 2024¶ Variables: PICKLE_PORT (str) – This is the port that listens for Graphite pickles over TCP, sent by Graphite’s carbon-relay agent.
-
UDP_PORT
= 2025¶ Variables: UDP_PORT (str) – This is the port that listens for Messagepack-encoded UDP packets.
-
CHUNK_SIZE
= 10¶ Variables: CHUNK_SIZE (int) – This is how big a ‘chunk’ of metrics will be before they are added onto the shared queue for processing into Redis. - If you are noticing that Horizon is having trouble consuming metrics, try setting this value a bit higher.
-
MAX_QUEUE_SIZE
= 500¶ Variables: MAX_QUEUE_SIZE (int) – Maximum allowable length of the processing queue This is the maximum allowable length of the processing queue before new chunks are prevented from being added. If you consistently fill up the processing queue, a higher MAX_QUEUE_SIZE will not save you. It most likely means that the workers do not have enough CPU alotted in order to process the queue on time. Try increasing
settings.CHUNK_SIZE
and decreasingsettings.ANALYZER_PROCESSES
or decreasingsettings.ROOMBA_PROCESSES
-
ROOMBA_PROCESSES
= 1¶ Variables: ROOMBA_PROCESSES (int) – This is the number of Roomba processes that will be spawned to trim timeseries in order to keep them at settings.FULL_DURATION
. Keep this number small, as it is not important that metrics be exactlysettings.FULL_DURATION
all the time.
-
ROOMBA_GRACE_TIME
= 600¶ Variables: ROOMBA_GRACE_TIME – Seconds grace Normally Roomba will clean up everything that is older than
settings.FULL_DURATION
if you have metrics that are not coming in every second, it can happen that you’ll end up with INCOMPLETE metrics. With this setting Roomba will clean up evertyhing that is older thansettings.FULL_DURATION
+settings.ROOMBA_GRACE_TIME
-
ROOMBA_TIMEOUT
= 100¶ Variables: ROOMBA_TIMEOUT (int) – Timeout in seconds This is the number seconds that a Roomba process can be expected to run before it is terminated. Roomba should really be expected to have run within 100 seconds in general. Roomba is done in a multiprocessing subprocess, however there are certain conditions that could arise that could cause Roomba to stall, I/O wait being one such edge case. Although 99.999% of the time Roomba is fine, this ensures that no Roombas hang around longer than expected.
-
MAX_RESOLUTION
= 1000¶ Variables: MAX_RESOLUTION (int) – The Horizon agent will ignore incoming datapoints if their timestamp is older than MAX_RESOLUTION seconds ago.
-
SKIP_LIST
= ['skyline.analyzer.', 'skyline.boundary.', 'skyline.ionosphere.', 'skyline.mirage.']¶ Variables: SKIP_LIST (list) – Metrics to skip These are metrics that, for whatever reason, you do not want to analyze in Skyline. The Worker will check to see if each incoming metrics contains anything in the skip list. It is generally wise to skip entire namespaces by adding a ‘.’ at the end of the skipped item - otherwise you might skip things you do not intend to. For example the default
skyline.analyzer.anomaly_breakdown.
which MUST be skipped to prevent crazy feedback.These SKIP_LIST are also matched just dotted namespace elements too, if a match is not found in the string, then the dotted elements are compared. For example if an item such as ‘skyline.analyzer.algorithm_breakdown’ was added it would macth any metric that matched all 3 dotted namespace elements, so it would match:
skyline.analyzer.skyline-1.algorithm_breakdown.histogram_bins.timing.median_time skyline.analyzer.skyline-1.algorithm_breakdown.histogram_bins.timing.times_run skyline.analyzer.skyline-1.algorithm_breakdown.ks_test.timing.times_run
-
DO_NOT_SKIP_LIST
= ['skyline.analyzer.run_time', 'skyline.boundary.run_time', 'skyline.analyzer.ionosphere_metrics', 'skyline.analyzer.mirage_metrics', 'skyline.analyzer.total_analyzed', 'skyline.analyzer.total_anomalies']¶ Variables: DO_NOT_SKIP_LIST (list) – Metrics to skip These are metrics that you want Skyline in analyze even if they match a namespace in the SKIP_LIST. Works in the same way that SKIP_LIST does, it matches in the string or dotted namespace elements.
-
PANORAMA_ENABLED
= True¶ Variables: PANORAMA_ENABLED (boolean) – Enable Panorama
-
PANORAMA_PROCESSES
= 1¶ Variables: PANORAMA_PROCESSES – Number of processes to assign to Panorama, should never need more than 1
-
ENABLE_PANORAMA_DEBUG
= False¶ Variables: ENABLE_PANORAMA_DEBUG (boolean) – DEVELOPMENT only - enables additional debug logging useful for development only, this should definitely be set to False
on production system as LOTS of output
-
PANORAMA_DATABASE
= 'skyline'¶ Variables: PANORAMA_DATABASE (str) – The database schema name
-
PANORAMA_DBHOST
= '127.0.0.1'¶ Variables: PANORAMA_DBHOST (str) – The IP address or FQDN of the database server
-
PANORAMA_DBPORT
= '3306'¶ Variables: PANORAMA_DBPORT (str) – The port to connet to the database server on
-
PANORAMA_DBUSER
= 'skyline'¶ Variables: PANORAMA_DBUSER (str) – The database user
-
PANORAMA_DBUSERPASS
= 'the_user_password'¶ Variables: PANORAMA_DBUSERPASS (str) – The database user password
-
NUMBER_OF_ANOMALIES_TO_STORE_IN_PANORAMA
= 0¶ Variables: NUMBER_OF_ANOMALIES_TO_STORE_IN_PANORAMA (int) – The number of anomalies to store in the Panaroma database, the default is 0 which means UNLIMITED. This does nothing currently.
-
PANORAMA_EXPIRY_TIME
= 900¶ Variables: PANORAMA_EXPIRY_TIME (int) – Panorama will only store one anomaly for a metric every PANORAMA_EXPIRY_TIME seconds. - This is the Panorama sample rate. Please bear in mind Panorama does not use the ALERTS time expiry keys or matching, Panorama records every anomaly, even if the metric is not in an alert tuple. Consider that a metric could and does often fire as anomalous every minute, until it no longer is.
-
PANORAMA_CHECK_MAX_AGE
= 300¶ Variables: PANORAMA_CHECK_MAX_AGE (int) – Panorama will only process a check file if it is not older than PANORAMA_CHECK_MAX_AGE seconds. If it is set to 0 it does all. This setting just ensures if Panorama stalls for some hours and is restarted, the user can choose to discard older checks and miss anomalies being recorded if they so choose to, to prevent Panorama stampeding against MySQL if something went down and Panorama comes back online with lots of checks.
-
MIRAGE_DATA_FOLDER
= '/opt/skyline/mirage/data'¶ Variables: MIRAGE_DATA_FOLDER (str) – This is the path for the Mirage data folder where timeseries data that has been surfaced will be written - absolute path
-
MIRAGE_ALGORITHMS
= ['first_hour_average', 'mean_subtraction_cumulation', 'stddev_from_average', 'stddev_from_moving_average', 'least_squares', 'grubbs', 'histogram_bins', 'median_absolute_deviation', 'ks_test']¶ Variables: MIRAGE_ALGORITHMS (array) – These are the algorithms that the Mirage will run. To add a new algorithm, you must both define the algorithm in
mirage/mirage_algorithms.py
and add it’s name here.
-
MIRAGE_STALE_SECONDS
= 120¶ Variables: MIRAGE_STALE_SECONDS (int) – The number of seconds after which a check is considered stale and discarded.
-
MIRAGE_CONSENSUS
= 6¶ Variables: MIRAGE_CONSENSUS (int) – This is the number of algorithms that must return True
before a metric is classified as anomalous.
-
MIRAGE_ENABLE_SECOND_ORDER
= False¶ Variables: MIRAGE_ENABLE_SECOND_ORDER (boolean) – This is to enable second order anomalies. Warning
EXPERIMENTAL - This is an experimental feature, so it’s turned off by default.
-
MIRAGE_ENABLE_ALERTS
= False¶ Variables: MIRAGE_ENABLE_ALERTS (boolean) – This enables Mirage alerting.
-
NEGATE_ANALYZER_ALERTS
= False¶ Variables: NEGATE_ANALYZER_ALERTS (boolean) – DEVELOPMENT only - negates Analyzer alerts This is to enables Mirage to negate Analyzer alerts. Mirage will send out an alert for every anomaly that Analyzer sends to Mirage that is NOT anomalous at the
SECOND_ORDER_RESOLUTION_HOURS
with aSECOND_ORDER_RESOLUTION_HOURS
graph and the Analyzersettings.FULL_DURATION
graph embedded. Mostly for testing and comparison of analysis at different time ranges and/or algorithms.
-
MIRAGE_CRUCIBLE_ENABLED
= False¶ Variables: MIRAGE_CRUCIBLE_ENABLED (boolean) – This enables Mirage to output to Crucible This enables Mirage to send Crucible data, if this is set to
True
ensure thatsettings.CRUCIBLE_ENABLED
is also set toTrue
in the Crucible settings block.Warning
Not recommended from production, will make a LOT of data files in the
settings.CRUCIBLE_DATA_FOLDER
-
BOUNDARY_PROCESSES
= 1¶ Variables: BOUNDARY_PROCESSES (int) – The number of processes that Boundary should spawn. Seeing as Boundary analysis is focused at specific metrics this should be less than the number of
settings.ANALYZER_PROCESSES
.
-
BOUNDARY_OPTIMUM_RUN_DURATION
= 60¶ Variables: BOUNDARY_OPTIMUM_RUN_DURATION – This is how many seconds it would be optimum for Boundary to be able to analyze your Boundary defined metrics in. This largely depends on your metric resolution e.g. 1 datapoint per 60 seconds and how many metrics you are running through Boundary.
-
ENABLE_BOUNDARY_DEBUG
= False¶ Variables: ENABLE_BOUNDARY_DEBUG (boolean) – Enables Boundary debug logging - Enable additional debug logging - useful for development only, this should definitely be set to False on as production system - LOTS of output
-
BOUNDARY_ALGORITHMS
= ['detect_drop_off_cliff', 'greater_than', 'less_than']¶ Variables: BOUNDARY_ALGORITHMS (array) – Algorithms that Boundary can run - These are the algorithms that boundary can run. To add a new algorithm, you must both define the algorithm in boundary_algorithms.py and add its name here.
-
BOUNDARY_ENABLE_ALERTS
= False¶ Variables: BOUNDARY_ENABLE_ALERTS (boolean) – Enables Boundary alerting
-
BOUNDARY_CRUCIBLE_ENABLED
= False¶ Variables: BOUNDARY_CRUCIBLE_ENABLED (boolean) – Enables and disables Boundary pushing data to Crucible This enables Boundary to send Crucible data, if this is set to
True
ensure thatsettings.CRUCIBLE_ENABLED
is also set toTrue
in the Crucible settings block.Warning
Not recommended from production, will make a LOT of data files in the
settings.CRUCIBLE_DATA_FOLDER
-
BOUNDARY_METRICS
= (('skyline_test.alerters.test', 'greater_than', 1, 0, 0, 0, 1, 'smtp|hipchat|pagerduty'), ('metric1', 'detect_drop_off_cliff', 1800, 500, 3600, 0, 2, 'smtp'), ('metric2.either', 'less_than', 3600, 0, 0, 15, 2, 'smtp|hipchat'), ('nometric.other', 'greater_than', 3600, 0, 0, 100000, 1, 'smtp'))¶ Variables: BOUNDARY_METRICS (tuple) – definitions of metrics for Boundary to analyze This is the config for metrics to analyse with the boundary algorithms. It is advisable that you only specify high rate metrics and global metrics here, although the algoritms should work with low rate metrics, the smaller the range, the smaller a cliff drop of change is, meaning more noise, however some algorithms are pre-tuned to use different trigger values on different ranges to pre-filter some noise.
Tuple schema:
BOUNDARY_METRICS = ( ('metric1', 'algorithm1', EXPIRATION_TIME, MIN_AVERAGE, MIN_AVERAGE_SECONDS, TRIGGER_VALUE, ALERT_THRESHOLD, 'ALERT_VIAS'), ('metric2', 'algorithm2', EXPIRATION_TIME, MIN_AVERAGE, MIN_AVERAGE_SECONDS, TRIGGER_VALUE, ALERT_THRESHOLD, 'ALERT_VIAS'), # Wildcard namespaces can be used as well ('metric.thing.*.requests', 'algorithm1', EXPIRATION_TIME, MIN_AVERAGE, MIN_AVERAGE_SECONDS, TRIGGER_VALUE, ALERT_THRESHOLD, 'ALERT_VIAS'), )
Metric parameters (all are required):
Parameters: - metric (str) – metric name.
- algorithm (str) – algorithm name.
- EXPIRATION_TIME (int) – Alerts will not fire twice within this amount of seconds, even if they trigger again.
- MIN_AVERAGE (int) – the minimum average value to evaluate for
boundary_algorithms.detect_drop_off_cliff()
, in theboundary_algorithms.less_than()
andboundary_algorithms.greater_than()
algorithm contexts set this to 0. - MIN_AVERAGE_SECONDS (int) – the seconds to calculate the minimum average value
over in
boundary_algorithms.detect_drop_off_cliff()
. So ifMIN_AVERAGE
set to 100 andMIN_AVERAGE_SECONDS
to 3600 a metric will only be analysed if the average value of the metric over 3600 seconds is greater than 100. For theboundary_algorithms.less_than()
andboundary_algorithms.greater_than()
algorithms set this to 0. - TRIGGER_VALUE (int) – then less_than or greater_than trigger value set to 0 for
boundary_algorithms.detect_drop_off_cliff()
- ALERT_THRESHOLD (int) – alert after detected x times. This allows you to set
how many times a timeseries has to be detected by the algorithm as anomalous
before alerting on it. The nature of distributed metric collection, storage
and analysis can have a lag every now and then due to latency, I/O pause,
etc. Boundary algorithms can be sensitive to this not unexpectedly. This
setting should be 1, maybe 2 maximum to ensure that signals are not being
surpressed. Try 1 if you are getting the occassional false positive, try 2.
Note - Any
boundary_algorithms.greater_than()
metrics should have this as 1. - ALERT_VIAS (str) – pipe separated alerters to send to.
Wildcard and absolute metric paths. Currently the only supported metric namespaces are a parent namespace and an absolute metric path e.g.
Examples:
('stats_counts.someapp.things', 'detect_drop_off_cliff', 1800, 500, 3600, 0, 2, 'smtp'), ('stats_counts.someapp.things.an_important_thing.requests', 'detect_drop_off_cliff', 600, 100, 3600, 0, 2, 'smtp|pagerduty'), ('stats_counts.otherapp.things.*.requests', 'detect_drop_off_cliff', 600, 500, 3600, 0, 2, 'smtp|hipchat'),
In the above all
stats_counts.someapp.things*
would be painted with a 1800EXPIRATION_TIME
and 500MIN_AVERAGE
, but those values would be overridden by 600 and 100stats_counts.someapp.things.an_important_thing.requests
and pagerduty added.
-
BOUNDARY_AUTOAGGRERATION
= False¶ Variables: BOUNDARY_AUTOAGGRERATION (boolean) – Enables autoaggregate a timeseries This is used to autoaggregate a timeseries with
autoaggregate_ts()
, if a timeseries dataset has 6 data points per minute but only one data value every minute then autoaggregate can be used to aggregate the required sample.
-
BOUNDARY_AUTOAGGRERATION_METRICS
= (('nometrics.either', 60),)¶ Variables: BOUNDARY_AUTOAGGRERATION_METRICS (tuples) – The namespaces to autoaggregate Tuple schema example:
BOUNDARY_AUTOAGGRERATION_METRICS = ( ('metric1', AGGREGATION_VALUE), )
Metric tuple parameters are:
Parameters: - metric (str) – metric name.
- AGGREGATION_VALUE (int) – alerter name.
Declare the namespace and aggregation value in seconds by which you want the timeseries aggregated. To aggregate a timeseries to minutely values use 60 as the
AGGREGATION_VALUE
, e.g. sum metric datapoints by minute
-
BOUNDARY_ALERTER_OPTS
= {'alerter_expiration_time': {'pagerduty': 1800, 'hipchat': 1800, 'smtp': 60}, 'alerter_limit': {'pagerduty': 15, 'hipchat': 30, 'smtp': 100}}¶ Variables: BOUNDARY_ALERTER_OPTS (dictionary) – Your Boundary alerter settings. Note
Boundary Alerting Because you may want to alert multiple channels on each metric and algorithm, Boundary has its own alerting settings, similar to Analyzer. However due to the nature of Boundary and it algorithms it could be VERY noisy and expensive if all your metrics dropped off a cliff. So Boundary introduces alerting the ability to limit overall alerts to an alerter channel. These limits use the same methodology that the alerts use, but each alerter is keyed too.
-
BOUNDARY_SMTP_OPTS
= {'embed-images': True, 'sender': 'skyline-boundary@your_domain.com', 'recipients': {'nometrics': ['you@your_domain.com', 'them@your_domain.com'], 'skyline_test.alerters.test': ['you@your_domain.com'], 'nometrics.either': ['you@your_domain.com', 'another@some-company.com']}, 'graphite_graph_line_color': 'pink', 'graphite_previous_hours': 7, 'default_recipient': ['you@your_domain.com']}¶ Variables: BOUNDARY_SMTP_OPTS (dictionary) – Your SMTP settings.
-
BOUNDARY_HIPCHAT_OPTS
= {'sender': 'hostname or identifier', 'graphite_graph_line_color': 'pink', 'color': 'purple', 'auth_token': 'hipchat_auth_token', 'graphite_previous_hours': 7, 'rooms': {'nometrics': (12345,), 'skyline_test.alerters.test': (12345,)}}¶ Variables: BOUNDARY_HIPCHAT_OPTS (dictionary) – Your Hipchat settings. HipChat alerts require python-simple-hipchat
-
BOUNDARY_PAGERDUTY_OPTS
= {'auth_token': 'your_pagerduty_auth_token', 'subdomain': 'example', 'key': 'your_pagerduty_service_api_key'}¶ Variables: BOUNDARY_PAGERDUTY_OPTS (dictionary) – Your SMTP settings. PagerDuty alerts require pygerduty
-
ENABLE_CRUCIBLE
= True¶ Variables: ENABLE_CRUCIBLE (boolean) – Enable Crucible.
-
CRUCIBLE_PROCESSES
= 1¶ Variables: CRUCIBLE_PROCESSES (int) – The number of processes that Crucible should spawn.
-
CRUCIBLE_TESTS_TIMEOUT
= 60¶ Variables: CRUCIBLE_TESTS_TIMEOUT (int) – # This is the number of seconds that Crucible tests can take. 60 is a reasonable default for a run with a settings.FULL_DURATION
of 86400
-
ENABLE_CRUCIBLE_DEBUG
= False¶ Variables: ENABLE_CRUCIBLE_DEBUG (boolean) – DEVELOPMENT only - enables additional debug logging useful for development only, this should definitely be set to False
on production system as LOTS of output
-
CRUCIBLE_DATA_FOLDER
= '/opt/skyline/crucible/data'¶ Variables: CRUCIBLE_DATA_FOLDER (str) – This is the path for the Crucible data folder where anomaly data for timeseries will be stored - absolute path
-
WEBAPP_SERVER
= 'gunicorn'¶ Variables: WEBAPP_SERVER (str) – Run the Webapp via gunicorn (recommended) or the Flask development server, set this to either 'gunicorn'
or'flask'
-
WEBAPP_PORT
= 1500¶ Variables: WEBAPP_PORT (int) – The port for the Webapp to listen on
-
WEBAPP_AUTH_ENABLED
= True¶ Variables: WEBAPP_AUTH_ENABLED (boolean) – To enable pseudo basic HTTP auth
-
WEBAPP_AUTH_USER
= 'admin'¶ Variables: WEBAPP_AUTH_USER (str) – The username for pseudo basic HTTP auth
-
WEBAPP_AUTH_USER_PASSWORD
= 'aec9ffb075f9443c8e8f23c4f2d06faa'¶ Variables: WEBAPP_AUTH_USER_PASSWORD (str) – The user password for pseudo basic HTTP auth
-
WEBAPP_IP_RESTRICTED
= True¶ Variables: WEBAPP_IP_RESTRICTED (boolean) – To enable restricted access from IP address declared in settings.WEBAPP_ALLOWED_IPS
-
WEBAPP_ALLOWED_IPS
= ['127.0.0.1']¶ Variables: WEBAPP_ALLOWED_IPS (array) – The allowed IP addresses
-
WEBAPP_USER_TIMEZONE
= True¶ Variables: WEBAPP_USER_TIMEZONE (boolean) – This determines the user’s timezone and renders graphs with the user’s date values. If this is set to False
the timezone insettings.WEBAPP_FIXED_TIMEZONE
is used.
-
WEBAPP_FIXED_TIMEZONE
= 'Etc/GMT+0'¶ Variables: WEBAPP_FIXED_TIMEZONE (str) – You can specific a timezone you want the client browser to render graph date and times in. This setting is only used if the settings.WEBAPP_USER_TIMEZONE
is set toFalse
. This must be a valid momentjs timezone name, see: https://github.com/moment/moment-timezone/blob/develop/data/packed/latest.jsonNote
Timezones, UTC and javascript Date You only need to use the first element of the momentjs timezone string, some examples, ‘Europe/London’, ‘Etc/UTC’, ‘America/Los_Angeles’. Because the Webapp is graphing using data UTC timestamps, you may may want to display the graphs to users with a fixed timezone and not use the browser timezone so that the Webapp graphs are the same in any location.
-
WEBAPP_JAVASCRIPT_DEBUG
= False¶ Variables: WEBAPP_JAVASCRIPT_DEBUG (boolean) – Enables some javascript console.log when enabled.
-
ENABLE_WEBAPP_DEBUG
= False¶ Variables: ENABLE_WEBAPP_DEBUG (boolean) – Enables some app specific debugging to log.
-
IONOSPHERE_CHECK_PATH
= '/opt/skyline/ionosphere/check'¶ Variables: IONOSPHERE_CHECK_PATH (str) – This is the location the Skyline apps will write the anomalies to for Ionosphere to check to a file on disk - absolute path
-
IONOSPHERE_ENABLED
= True¶ Variables: IONOSPHERE_ENABLED (boolean) – Enable Ionosphere
-
IONOSPHERE_PROCESSES
= 1¶ Variables: IONOSPHERE_PROCESSES – Number of processes to assign to Panorama, should never need more than 1
-
ENABLE_IONOSPHERE_DEBUG
= False¶ Variables: ENABLE_IONOSPHERE_DEBUG (boolean) – DEVELOPMENT only - enables additional debug logging useful for development only, this should definitely be set to False
on production system as LOTS of output
-
IONOSPHERE_DATA_FOLDER
= '/opt/skyline/ionosphere/data'¶ Variables: IONOSPHERE_DATA_FOLDER (str) – This is the path for the Ionosphere data folder where anomaly data for timeseries will be stored - absolute path
-
IONOSPHERE_PROFILES_FOLDER
= '/opt/skyline/ionosphere/features_profiles'¶ Variables: IONOSPHERE_PROFILES_FOLDER – This is the path for the Ionosphere data folder where anomaly data for timeseries will be stored - absolute path
-
IONOSPHERE_LEARN_FOLDER
= '/opt/skyline/ionosphere/learn'¶ Variables: IONOSPHERE_LEARN_FOLDER (str) – This is the path for the Ionosphere learning data folder where learning data for timeseries will be processed - absolute path
-
IONOSPHERE_CHECK_MAX_AGE
= 300¶ Variables: IONOSPHERE_CHECK_MAX_AGE (int) – Ionosphere will only process a check file if it is not older than IONOSPHERE_CHECK_MAX_AGE seconds. If it is set to 0 it does all. This setting just ensures if Ionosphere stalls for some hours and is restarted, the user can choose to discard older checks and miss anomalies being recorded if they so choose to, to prevent Ionosphere stampeding.
-
IONOSPHERE_KEEP_TRAINING_TIMESERIES_FOR
= 86400¶ Variables: IONOSPHERE_KEEP_TRAINING_TIMESERIES_FOR (int) – Ionosphere will keep timeseries data files for this long, for the operator to review.
-
SKYLINE_URL
= 'http://skyline.example.com:8080'¶ Variables: SKYLINE_URL (str) – The http or https URL (and port if required) to access your Skyline on (no trailing slash).
-
SERVER_PYTZ_TIMEZONE
= 'UTC'¶ Variables: SERVER_PYTZ_TIMEZONE (str) – You must specify a pytz timezone you want Ionosphere to use for the creation of features profiles and converting datetimes to UTC. This must be a valid pytz timezone name, see: https://github.com/earthgecko/skyline/blob/ionosphere/docs/development/pytz.rst http://earthgecko-skyline.readthedocs.io/en/ionosphere/development/pytz.html#timezones-list-for-pytz-version
-
IONOSPHERE_FEATURES_PERCENT_SIMILAR
= 1.0¶ Variables: IONOSPHERE_FEATURES_PERCENT_SIMILAR (float) – The percentage difference between a features profile sum and a calculated profile sum to result in a match.
-
IONOSPHERE_LEARN
= True¶ Variables: IONOSPHERE_LEARN (boolean) – Whether Ionosphere is set to learn Note
The below
IONOSPHERE_LEARN_DEFAULT_
variables are all overrideable in the IONOSPHERE_LEARN_NAMESPACE_CONFIG tuple per defined metric namespace further to this ALL metrics and their settings in terms of the Ionosphere learning context can also be modified via the webapp UI Ionosphere section. These settings are the defaults that are used in the creation of learnt features profiles and new metrics, HOWEVER the database is the preferred source of truth and will always be referred to first and the default or settings.IONOSPHERE_LEARN_NAMESPACE_CONFIG values shall only be used if database values are not determined. These settings are here so that it is easy to paint all metrics and others specifically as a whole, once a metric is added to Ionosphere via the creation of a features profile, it is painted with these defaults or the appropriate namespace settings in settings.IONOSPHERE_LEARN_NAMESPACE_CONFIGWarning
Changes made to a metric settings in the database directly via the UI or your own SQL will not be overridden
IONOSPHERE_LEARN_DEFAULT_
variables or the IONOSPHERE_LEARN_NAMESPACE_CONFIG tuple per defined metric namespace even if the metric matches the namespace, the database is the source of truth.
-
IONOSPHERE_LEARN_DEFAULT_MAX_GENERATIONS
= 16¶ Variables: IONOSPHERE_LEARN_DEFAULT_MAX_GENERATIONS (int) – The maximum number of generations that Ionosphere can automatically learn up to from the original human created features profile within the IONOSPHERE_DEFAULT_MAX_PERCENT_DIFF_FROM_ORIGIN Overridable per namespace in settings.IONOSPHERE_LEARN_NAMESPACE_CONFIG and via webapp UI to update DB
-
IONOSPHERE_LEARN_DEFAULT_MAX_PERCENT_DIFF_FROM_ORIGIN
= 100.0¶ Variables: IONOSPHERE_LEARN_DEFAULT_MAX_PERCENT_DIFF_FROM_ORIGIN (float) – The maximum percent that an automatically generated features profile can be from the original human created features profile, any automatically generated features profile with the a greater percent difference above this value when summed common features are calculated will be discarded. Anything below this value will be considered a valid learned features profile. Note
This percent value will match -/+ e.g. works both ways x percent above or below. In terms of comparisons, a negative percent is simply multiplied by -1.0. The lower the value, the less Ionosphere can learn, to literally disable Ionosphere learning set this to 0. The difference can be much greater than 100, but between 7 and 100 is reasonable for learning. However to really disable learning, also set all max_generations settings to 1.
-
IONOSPHERE_LEARN_DEFAULT_FULL_DURATION_DAYS
= 30¶ Variables: IONOSPHERE_LEARN_DEFAULT_FULL_DURATION_DAYS (int) – The default full duration in in days at which Ionosphere should learn, the default is 30 days. Overridable per namespace in settings.IONOSPHERE_LEARN_NAMESPACE_CONFIG
-
IONOSPHERE_LEARN_DEFAULT_VALID_TIMESERIES_OLDER_THAN_SECONDS
= 3661¶ Variables: IONOSPHERE_LEARN_VALID_TIMESERIES_OLDER_THAN_SECONDS – The number of seconds that Ionosphere should wait before surfacing the metric timeseries for to learn from. What Graphite aggregration do you want the retention at before querying it to learn from? Overridable per namespace in settings.IONOSPHERE_LEARN_NAMESPACE_CONFIG
-
IONOSPHERE_LEARN_NAMESPACE_CONFIG
= (('skyline_test.alerters.test', 30, 3661, 16, 100.0), ('\\*', 30, 3661, 16, 100.0))¶ Variables: IONOSPHERE_LEARN_NAMESPACE_CONFIG – Configures specific namespaces at specific learning full duration in days. Overrides settings.IONOSPHERE_LEARN_DEFAULT_FULL_DURATION_DAYS, settings.IONOSPHERE_LEARN_DEFAULT_VALID_TIMESERIES_OLDER_THAN_SECONDS, settings.IONOSPHERE_MAX_GENERATIONS and settings.IONOSPHERE_MAX_PERCENT_DIFF_FROM_ORIGIN per defined namespace, first matched, used. Order highest to lowest namespace resoultion. Like settings.ALERTS, you know how this works now... This is the config by which each declared namespace can be assigned a learning full duration in days. It is here to allow for overrides so that if a metric does not suit being learned at say 30 days, it could be learned at say 14 days instead if 14 days was a better suited learning full duration.
To specifically disable learning on a namespace, set LEARN_FULL_DURATION_DAYS to 0
Tuple schema example:
IONOSPHERE_LEARN_NAMESPACE_CONFIG = ( # ('<metric_namespace>', LEARN_FULL_DURATION_DAYS, # LEARN_VALID_TIMESERIES_OLDER_THAN_SECONDS, MAX_GENERATIONS, # MAX_PERCENT_DIFF_FROM_ORIGIN), # Wildcard namespaces can be used as well ('metric3.thing\..*', 90, 3661, 16, 100.0), ('metric4.thing\..*.\.requests', 14, 3661, 16, 100.0), # However beware of wildcards as the above wildcard should really be ('metric4.thing\..*.\.requests', 14, 7261, 3, 7.0), # Disable learning on a namespace ('metric5.thing\..*.\.rpm', 0, 3661, 5, 7.0), # Learn all Ionosphere enabled metrics at 30 days ('.*', 30, 3661, 16, 100.0), )
Namespace tuple parameters are:
Parameters: - metric_namespace (str) – metric_namespace pattern
- LEARN_FULL_DURATION_DAYS (int) – The number of days that Ionosphere should should surface the metric timeseries for
- LEARN_VALID_TIMESERIES_OLDER_THAN_SECONDS (int) – The number of seconds that Ionosphere should wait before surfacing the metric timeseries for to learn from. What Graphite aggregration do you want the retention at before querying it to learn from? REQUIRED, NOT optional, we could use the settings.IONOSPHERE_LEARN_DEFAULT_VALID_TIMESERIES_OLDER_THAN_SECONDS but that be some more conditionals, that we do not need, be precise, by now if you are training Skyline well you will understand, be precise helps :)
- MAX_GENERATIONS (int) – The maximum number of generations that Ionosphere can automatically learn up to from the original human created features profile on this metric namespace.
- MAX_PERCENT_DIFF_FROM_ORIGIN – The maximum percent that an automatically generated features profile can be from the original human created features profile for a metric in the namespace.
-
IONOSPHERE_AUTOBUILD
= True¶ Variables: IONOSPHERE_AUTOBUILD (boolean) – Make best effort attempt to auto provision any features_profiles directory and resources that have been deleted or are missing. Note
This is highlighted as a setting as the number of features_profiles dirs that Ionosphere learn could spawn and the amount of data storage that would result is unknown at this point. It is possible the operator is going to need to prune this data a lot of which will probably never be looked at. Or a Skyline node is going to fail, not have the features_profiles dirs backed up and all the data is going to be lost or deleted. So it is possible for Ionosphere to created all the human interrupted resources for the features profile back under a best effort methodology. Although the original Redis graph image would not be available, nor the Graphite graphs in the resolution at which the features profile was created, however the fp_ts is available so the Redis plot could be remade and all the Graphite graphs could be made as best effort with whatever resoultion is available for that time period. This allows the operator to delete/prune feature profile dirs by possibly least matched by age, etc or all and still be able to surface the available features profile page data on-demand.
-
MEMCACHE_ENABLED
= False¶ Variables: MEMCACHE_ENABLED (boolean) – Enables the use of memcache in Ionosphere to optimise DB usage
-
MEMCACHED_SERVER_IP
= '127.0.0.1'¶ Variables: MEMCACHE_SERVER_IP (str) – The IP address of the memcached server
-
MEMCACHED_SERVER_PORT
= 11211¶ Variables: MEMCACHE_SERVER_PORT – The port of the memcached server
skyline.skyline_functions module¶
Skyline functions
These are shared functions that are required in multiple modules.
-
send_graphite_metric
(current_skyline_app, metric, value)[source]¶ Sends the skyline_app metrics to the GRAPHITE_HOST if a graphite host is defined.
Parameters: - current_skyline_app (str) – the skyline app using this function
- metric (str) – the metric namespace
- value (str) – the metric value (as a str not an int)
Returns: True
orFalse
Return type: boolean
-
mkdir_p
(path)[source]¶ Create nested directories.
Parameters: path (str) – directory path to create Returns: returns True
-
load_metric_vars
(current_skyline_app, metric_vars_file)[source]¶ Import the metric variables for a check from a metric check variables file
Parameters: - current_skyline_app (str) – the skyline app using this function
- metric_vars_file (str) – the path and filename to the metric variables files
Returns: the metric_vars module object or
False
Return type: object or boolean
-
write_data_to_file
(current_skyline_app, write_to_file, mode, data)[source]¶ Write date to a file
Parameters: - current_skyline_app (str) – the skyline app using this function
- file (str) – the path and filename to write the data into
- mode (str) –
w
to overwrite,a
to append - data (str) – the data to write to the file
Returns: True
orFalse
Return type: boolean
-
fail_check
(current_skyline_app, failed_check_dir, check_file_to_fail)[source]¶ Move a failed check file.
Parameters: - current_skyline_app (str) – the skyline app using this function
- failed_check_dir (str) – the directory where failed checks are moved to
- check_file_to_fail (str) – failed check file to move
Returns: True
,False
Return type: boolean
-
alert_expiry_check
(current_skyline_app, metric, metric_timestamp, added_by)[source]¶ Only check if the metric does not a EXPIRATION_TIME key set, panorama uses the alert EXPIRATION_TIME for the relevant alert setting contexts whether that be analyzer, mirage, boundary, etc and sets its own cache_keys in redis. This prevents large amounts of data being added in terms of duplicate anomaly records in Panorama and timeseries json and image files in crucible samples so that anomalies are recorded at the same EXPIRATION_TIME as alerts.
Parameters: - current_skyline_app (str) – the skyline app using this function
- metric (str) – metric name
- added_by (str) – which app requested the alert_expiry_check
Returns: True
,False
Return type: boolean
- If inside the alert expiry period returns
True
- If not in the alert expiry period or unknown returns
False
-
get_graphite_metric
(current_skyline_app, metric, from_timestamp, until_timestamp, data_type, output_object)[source]¶ Fetch data from graphite and return it as object or save it as file
Parameters: - current_skyline_app (str) – the skyline app using this function
- metric (str) – metric name
- from_timestamp (str) – unix timestamp
- until_timestamp (str) – unix timestamp
- data_type (str) – image or json
- output_object (str) – object or path and filename to save data as, if set to object, the object is returned
Returns: timeseries string,
True
,False
Return type: str or boolean
-
filesafe_metricname
(metricname)[source]¶ Returns a file system safe name for a metric name in terms of creating check files, etc
-
send_anomalous_metric_to
(current_skyline_app, send_to_app, timeseries_dir, metric_timestamp, base_name, datapoint, from_timestamp, triggered_algorithms, timeseries, full_duration, parent_id)[source]¶ Assign a metric and timeseries to Crucible or Ionosphere.
-
RepresentsInt
(s)[source]¶ As per http://stackoverflow.com/a/1267145 and @Aivar I must agree with @Triptycha > “This 5 line function is not a complex mechanism.”
-
mysql_select
(current_skyline_app, select)[source]¶ Select data from mysql database
Parameters: - current_skyline_app – the Skyline app that is calling the function
- select (str) – the select string
Returns: tuple
Return type: tuple, boolean
Example usage:
from skyline_functions import mysql_select query = 'select id, metric from anomalies' result = mysql_select(query)
Example of the 0 indexed results tuple, which can hold multiple results:
>> print('results: %s' % str(results)) results: [(1, u'test1'), (2, u'test2')] >> print('results[0]: %s' % str(results[0])) results[0]: (1, u'test1')
Note
- If the MySQL query fails a boolean will be returned not a tuple
False
None
-
nonNegativeDerivative
(timeseries)[source]¶ This function is used to convert an integral or incrementing count to a derivative by calculating the delta between subsequent datapoints. The function ignores datapoints that trend down and is useful for metrics that increase over time and then reset. This based on part of the Graphite render function nonNegativeDerivative at: https://github.com/graphite-project/graphite-web/blob/1e5cf9f659f5d4cc0fa53127f756a1916e62eb47/webapp/graphite/render/functions.py#L1627
-
strictly_increasing_monotonicity
(timeseries)[source]¶ This function is used to determine whether timeseries is strictly increasing monotonically, it will only return True if the values are strictly increasing, an incrementing count.
-
in_list
(metric_name, check_list)[source]¶ Check if the metric is in list.
# @added 20170602 - Feature #2034: analyse_derivatives # Feature #1978: worker - DO_NOT_SKIP_LIST This is a part copy of the SKIP_LIST allows for a string match or a match on dotted elements within the metric namespace used in Horizon/worker
-
get_memcache_metric_object
(current_skyline_app, base_name)[source]¶ Return the metrics_db_object from memcache if it exists.
-
get_memcache_fp_ids_object
(current_skyline_app, base_name)[source]¶ Return the fp_ids list from memcache if it exists.
-
move_file
(current_skyline_app, dest_dir, file_to_move)[source]¶ Move a file.
Parameters: - current_skyline_app (str) – the skyline app using this function
- dest_dir (str) – the directory the file is to be moved to
- file_to_move (str) – path and filename of the file to move
Returns: True
,False
Return type: boolean
skyline.skyline_version module¶
version info
skyline.tsfresh_feature_names module¶
-
TSFRESH_VERSION
= '0.4.0'¶ Variables: TSFRESH_VERSION (str) – The version of tsfresh installed by pip, this is important in terms of feature extraction baselines
-
TSFRESH_BASELINE_VERSION
= '0.4.0'¶ Variables: TSFRESH_BASELINE_VERSION (str) – The version of tsfresh that was used to generate feature extraction baselines on.
-
TSFRESH_FEATURES
= [[1, 'value__symmetry_looking__r_0.65'], [2, 'value__first_location_of_maximum'], [3, 'value__absolute_sum_of_changes'], [4, 'value__large_number_of_peaks__n_1'], [5, 'value__large_number_of_peaks__n_3'], [6, 'value__large_number_of_peaks__n_5'], [7, 'value__last_location_of_minimum'], [8, 'value__mean_abs_change_quantiles__qh_0.4__ql_0.0'], [9, 'value__mean_abs_change_quantiles__qh_0.4__ql_0.2'], [10, 'value__mean_abs_change_quantiles__qh_0.4__ql_0.4'], [11, 'value__mean_abs_change_quantiles__qh_0.4__ql_0.6'], [12, 'value__mean_abs_change_quantiles__qh_0.4__ql_0.8'], [13, 'value__maximum'], [14, 'value__value_count__value_-inf'], [15, 'value__skewness'], [16, 'value__number_peaks__n_3'], [17, 'value__longest_strike_above_mean'], [18, 'value__number_peaks__n_5'], [19, 'value__first_location_of_minimum'], [20, 'value__large_standard_deviation__r_0.25'], [21, 'value__augmented_dickey_fuller'], [22, 'value__count_above_mean'], [23, 'value__symmetry_looking__r_0.75'], [24, 'value__percentage_of_reoccurring_datapoints_to_all_datapoints'], [25, 'value__mean_abs_change'], [26, 'value__mean_change'], [27, 'value__value_count__value_1'], [28, 'value__value_count__value_0'], [29, 'value__minimum'], [30, 'value__autocorrelation__lag_5'], [31, 'value__median'], [32, 'value__symmetry_looking__r_0.85'], [33, 'value__mean_abs_change_quantiles__qh_0.8__ql_0.4'], [34, 'value__symmetry_looking__r_0.05'], [35, 'value__mean_abs_change_quantiles__qh_0.8__ql_0.6'], [36, 'value__value_count__value_inf'], [37, 'value__mean_abs_change_quantiles__qh_0.8__ql_0.0'], [38, 'value__mean_abs_change_quantiles__qh_0.8__ql_0.2'], [39, 'value__large_standard_deviation__r_0.45'], [40, 'value__mean_abs_change_quantiles__qh_0.8__ql_0.8'], [41, 'value__autocorrelation__lag_6'], [42, 'value__autocorrelation__lag_7'], [43, 'value__autocorrelation__lag_4'], [44, 'value__last_location_of_maximum'], [45, 'value__autocorrelation__lag_2'], [46, 'value__autocorrelation__lag_3'], [47, 'value__autocorrelation__lag_0'], [48, 'value__autocorrelation__lag_1'], [49, 'value__autocorrelation__lag_8'], [50, 'value__autocorrelation__lag_9'], [51, 'value__range_count__max_1__min_-1'], [52, 'value__variance'], [53, 'value__mean'], [54, 'value__standard_deviation'], [55, 'value__mean_abs_change_quantiles__qh_0.6__ql_0.6'], [56, 'value__mean_abs_change_quantiles__qh_0.6__ql_0.4'], [57, 'value__mean_abs_change_quantiles__qh_0.6__ql_0.2'], [58, 'value__mean_abs_change_quantiles__qh_0.6__ql_0.0'], [59, 'value__symmetry_looking__r_0.15'], [60, 'value__ratio_value_number_to_time_series_length'], [61, 'value__mean_second_derivate_central'], [62, 'value__number_peaks__n_1'], [63, 'value__length'], [64, 'value__mean_abs_change_quantiles__qh_1.0__ql_0.0'], [65, 'value__mean_abs_change_quantiles__qh_1.0__ql_0.2'], [66, 'value__mean_abs_change_quantiles__qh_1.0__ql_0.4'], [67, 'value__time_reversal_asymmetry_statistic__lag_3'], [68, 'value__mean_abs_change_quantiles__qh_1.0__ql_0.6'], [69, 'value__mean_abs_change_quantiles__qh_1.0__ql_0.8'], [70, 'value__sum_of_reoccurring_values'], [71, 'value__abs_energy'], [72, 'value__variance_larger_than_standard_deviation'], [73, 'value__mean_abs_change_quantiles__qh_0.6__ql_0.8'], [74, 'value__kurtosis'], [75, 'value__approximate_entropy__m_2__r_0.7'], [76, 'value__approximate_entropy__m_2__r_0.5'], [77, 'value__symmetry_looking__r_0.25'], [78, 'value__approximate_entropy__m_2__r_0.3'], [79, 'value__percentage_of_reoccurring_values_to_all_values'], [80, 'value__approximate_entropy__m_2__r_0.1'], [81, 'value__time_reversal_asymmetry_statistic__lag_2'], [82, 'value__approximate_entropy__m_2__r_0.9'], [83, 'value__time_reversal_asymmetry_statistic__lag_1'], [84, 'value__symmetry_looking__r_0.35'], [85, 'value__large_standard_deviation__r_0.3'], [86, 'value__large_standard_deviation__r_0.2'], [87, 'value__large_standard_deviation__r_0.1'], [88, 'value__large_standard_deviation__r_0.0'], [89, 'value__large_standard_deviation__r_0.4'], [90, 'value__large_standard_deviation__r_0.15'], [91, 'value__mean_autocorrelation'], [92, 'value__binned_entropy__max_bins_10'], [93, 'value__large_standard_deviation__r_0.35'], [94, 'value__symmetry_looking__r_0.95'], [95, 'value__longest_strike_below_mean'], [96, 'value__sum_values'], [97, 'value__symmetry_looking__r_0.45'], [98, 'value__symmetry_looking__r_0.6'], [99, 'value__symmetry_looking__r_0.7'], [100, 'value__symmetry_looking__r_0.4'], [101, 'value__symmetry_looking__r_0.5'], [102, 'value__symmetry_looking__r_0.2'], [103, 'value__symmetry_looking__r_0.3'], [104, 'value__symmetry_looking__r_0.0'], [105, 'value__symmetry_looking__r_0.1'], [106, 'value__has_duplicate'], [107, 'value__symmetry_looking__r_0.8'], [108, 'value__symmetry_looking__r_0.9'], [109, 'value__value_count__value_nan'], [110, 'value__mean_abs_change_quantiles__qh_0.2__ql_0.8'], [111, 'value__large_standard_deviation__r_0.05'], [112, 'value__mean_abs_change_quantiles__qh_0.2__ql_0.2'], [113, 'value__has_duplicate_max'], [114, 'value__mean_abs_change_quantiles__qh_0.2__ql_0.0'], [115, 'value__mean_abs_change_quantiles__qh_0.2__ql_0.6'], [116, 'value__mean_abs_change_quantiles__qh_0.2__ql_0.4'], [117, 'value__number_cwt_peaks__n_5'], [118, 'value__number_cwt_peaks__n_1'], [119, 'value__sample_entropy'], [120, 'value__has_duplicate_min'], [121, 'value__symmetry_looking__r_0.55'], [122, 'value__count_below_mean'], [123, 'value__quantile__q_0.1'], [124, 'value__quantile__q_0.2'], [125, 'value__quantile__q_0.3'], [126, 'value__quantile__q_0.4'], [127, 'value__quantile__q_0.6'], [128, 'value__quantile__q_0.7'], [129, 'value__quantile__q_0.8'], [130, 'value__quantile__q_0.9'], [131, 'value__ar_coefficient__k_10__coeff_0'], [132, 'value__ar_coefficient__k_10__coeff_1'], [133, 'value__ar_coefficient__k_10__coeff_2'], [134, 'value__ar_coefficient__k_10__coeff_3'], [135, 'value__ar_coefficient__k_10__coeff_4'], [136, 'value__index_mass_quantile__q_0.1'], [137, 'value__index_mass_quantile__q_0.2'], [138, 'value__index_mass_quantile__q_0.3'], [139, 'value__index_mass_quantile__q_0.4'], [140, 'value__index_mass_quantile__q_0.6'], [141, 'value__index_mass_quantile__q_0.7'], [142, 'value__index_mass_quantile__q_0.8'], [143, 'value__index_mass_quantile__q_0.9'], [144, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_2"'], [145, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_2"'], [146, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_2__w_2"'], [147, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_3__w_2"'], [148, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_2"'], [149, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_2"'], [150, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_6__w_2"'], [151, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_2"'], [152, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_2"'], [153, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_2"'], [154, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_2"'], [155, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_2"'], [156, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_2"'], [157, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_2"'], [158, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_2"'], [159, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_5"'], [160, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_5"'], [161, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_2__w_5"'], [162, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_3__w_5"'], [163, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_5"'], [164, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_5"'], [165, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_6__w_5"'], [166, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_5"'], [167, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_5"'], [168, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_5"'], [169, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_5"'], [170, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_5"'], [171, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_5"'], [172, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_5"'], [173, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_5"'], [174, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_10"'], [175, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_10"'], [176, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_2__w_10"'], [177, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_3__w_10"'], [178, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_10"'], [179, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_10"'], [180, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_6__w_10"'], [181, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_10"'], [182, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_10"'], [183, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_10"'], [184, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_10"'], [185, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_10"'], [186, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_10"'], [187, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_10"'], [188, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_10"'], [189, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_20"'], [190, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_20"'], [191, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_2__w_20"'], [192, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_3__w_20"'], [193, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_20"'], [194, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_20"'], [195, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_6__w_20"'], [196, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_20"'], [197, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_20"'], [198, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_20"'], [199, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20"'], [200, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_20"'], [201, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_20"'], [202, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_20"'], [203, '"value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_20"'], [204, 'value__spkt_welch_density__coeff_2'], [205, 'value__spkt_welch_density__coeff_5'], [206, 'value__spkt_welch_density__coeff_8'], [207, 'value__fft_coefficient__coeff_0'], [208, 'value__fft_coefficient__coeff_1'], [209, 'value__fft_coefficient__coeff_2'], [210, 'value__fft_coefficient__coeff_3'], [211, 'value__fft_coefficient__coeff_4'], [212, 'value__fft_coefficient__coeff_5'], [213, 'value__fft_coefficient__coeff_6'], [214, 'value__fft_coefficient__coeff_7'], [215, 'value__fft_coefficient__coeff_8'], [216, 'value__fft_coefficient__coeff_9']]¶ Variables: TSFRESH_FEATURES (array) – This array defines the Skyline id for each known tsfresh feature. Warning
This is array is linked to relational fields in the database and ids as such should be consider immutable objects that must not be modified after they are created. This array should only ever be extended.
Note
There is a helper script to generate is array for the feature names returned by current/running version of tsfresh and compare them to this array. The helper script outputs changes and the full generated array for diffing against this array of known feature names. See: skyline/tsfresh_features/generate_tsfresh_features.py
skyline.validate_settings module¶
Module contents¶
Used by autodoc_mock_imports.