_images/logo.png

Alarmageddon is a Python monitoring framework for RESTful services, built on top of Requests and Fabric.

The following example GETs www.google.com, and reports to HipChat if the return code is not 200:

import alarmageddon
from alarmageddon.validations.http import HttpValidation
from alarmageddon.publishers.hipchat import HipChatPublisher

validations = [HttpValidation.get("http://www.google.com").expect_status_codes([200])]

publishers = [HipChatPublisher("hipchat.route.here","token","stable","hipchat_room")]

alarmageddon.run_tests(validations,publishers)

Features

  • Verify expectations on the following
    • HTTP requests
    • SSH commands
    • RabbitMQ queue lengths
    • Cassandra status
    • Statistics collected in Graphite
    • The behavior of other Alarmageddon tests
  • Report failed verifications to
    • HipChat
    • PagerDuty
    • Graphite
    • Email
    • XML file

Getting Started

A Short Example

This will walk you through creating and running a basic suite of Alarmageddon validations.

Alarmageddon has two main components: validations and publishers. Validations are the tests that will be run, and publishers handle passing the results of those validations along to an external system (eg, PagerDuty).

Creating Validations

To make sure that the world’s search engines are working, let’s use HttpValidation:

from alarmageddon.validations.http import HttpValidation

validations = []
validations.append(HttpValidation.get("http://www.google.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.bing.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.yahoo.com").expect_status_codes([200]))

These validations are constructed to GET the supplied url. We’ve also set up our expectations about the results of GETting the url - in this case, we expect the status code to be 200. This is the basic structure of Alarmageddon’s validations: a validation takes some action and compares the results to the supplied expectations.

Creating Publishers

Of course, if no one knows a validation has failed, it isn’t particularly useful. To have Alarmageddon report on failures, we must supply it with at least one publisher:

from alarmageddon.publishing.hipchat import HipChatPublisher

publishers = []
hipchat_endpoint = "127.0.0.1"
hipchat_token = "token"
environment = "stable"
room = "hipchat_room"
publishers.append(HipChatPublisher(hipchat_endpoint, hipchat_token, environment, room))

This publisher will report failures to hipchat. Note that this example won’t work - you’ll need to supply a valid endpoint and token!

Running Alarmageddon

Given a set of validations and a set of publishers, we can run Alarmageddon:

import alarmageddon

alarmageddon.run_tests(validations,publishers)

This will run the validations. If any failures occur, a message will be passed along to the designated HipChat room. In this case, the resulting message might look something like:

1 failure(s) in stable: (failed) GET http://www.google.com Description: expected status code: 200, actual status code: 504 (Gateway Time-out)

Full Code

Here’s the full source of this example:

import alarmageddon
from alarmageddon.validations.http import HttpValidation
from alarmageddon.publishing.hipchat import HipChatPublisher

validations = []
validations.append(HttpValidation.get("http://www.google.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.bing.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.yahoo.com").expect_status_codes([200]))

publishers = []
hipchat_endpoint = "127.0.0.1"
hipchat_token = "token"
environment = "stable"
room = "hipchat_room"
publishers.append(HipChatPublisher(hipchat_endpoint, hipchat_token, environment, room))

alarmageddon.run_tests(validations,publishers)

Validations

A validation performs some action and then checks the results of that action against a set of expectations. Alarmageddon comes with validations for checking the results of HTTP calls, checking the output of SSH commands, and checking the length of RabbitMQ queues.

All validations accept a priority argument. This should be one of Priority.LOW, Priority.NORMAL, or Priority.CRITICAL. This priority level is used to determine whether or not a publisher should publish the results of the validation.

HTTP

You can create HttpValidations for various HTTP methods:

HttpValidation.get("http://www.google.com")
HttpValidation.post("http://www.google.com",data={key:value})
HttpValidation.put("http://www.google.com"data={key:value})
HttpValidation.options("http://www.google.com")
HttpValidation.head("http://www.google.com")

You can change the timeout length:

HttpValidation.get("http://www.google.com", timeout=10)

Or designate a number of retry attempts:

HttpValidation.get("http://www.google.com", retries=10)

You can supply custom headers:

header = {"Authorization":"value"}
HttpValidation.get("http://www.google.com", headers=header)

If you’ve created a validation that you would like to apply to multiple hosts:

validation = HttpValidation.get("http://www.google.com")
hosts = ["http://www.bing.com","http://www.yahoo.com"]
new_validations = validation.duplicate_with_hosts(hosts)

An example of expectations on HttpValidations, where we expect to get either a 200 or 404 status code, and expect the result to contain JSON with the designated value:

validation = HttpValidation.get("url")
validation.expect_status_codes([200,404])
validation.expect_json_property("json.path.to.value","expected")

SSH

To perform validations over SSH, you’ll need to supply the appropriate credentials:

ctx = SshContext("username","keyfile_path")

You can check the average load:

LoadAverageValidation(ctx).expect_max_1_minute_load(5, hosts=['127.0.0.1'])

You can verify that an upstart service is running:

UpstartServiceValidation(ctx, "service_name", hosts=['127.0.0.1'])

But ultimately, the above are just convenience classes for common use cases - you can perform arbitrary commands and check the output:

validation = SshCommandValidation(ctx, "validation name", "ps -ef | grep python", hosts=['127.0.0.1'])
validation.expect_output_contains("python")

Cassandra

Cassandra validations are a special case of SSH validations:

CassandraStatusValidation(ssh_ctx, hosts=['127.0.0.1'])

Kafka

Kafka validations will inspect your kafka partitions and leader elections. If a single partition has multiple leaders the validation will fail:

KafkaStatusValidation(ssh_ctx, zookeeper_nodes='127.0.0.1:2181,127.0.0.2:2181,127.0.0.3:2181',hosts=['127.0.0.1'])

RabbitMQ

As with SSH, you have to supply credentials for RabbitMQ Validations:

ctx = RabbitMqContext("127.0.0.1",80,"username","password")

Once you have the context, you can construct validations that check that the number of messages in a queue is less than some value. For example, the following will fail if the queue “queue_name” has more than 1000 messages in it:

RabbitMqValidation(ctx, "validation name", "queue_name", 1000)

Graphite

You also need a context for Graphite:

ctx = GraphiteContext("127.0.0.1")

Given the context, you can check statistics on various Graphite readings:

validation = GraphiteValidation(ctx, "validation name", "Errors")
validation.expect_average_in_range(1,10)

Validation Groups and GroupValidations

You may have a set of tests where individual failures are minor but multiple failures indicate a problem (eg, machines behind an HAProxy). Alarmageddon Validations include the notion of a validation group, which indicate that a set of validations belong together:

validations = []
validations.append(HttpValidation.get("http://www.google.com",group="a").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.yahoo.com",group="a").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.bing.com",group="a").expect_status_codes([200]))

In this case, we have three validations that belong to the validation group “a”. Now that we have a group, we can create a GroupValidation that contains expectations about the results of other validations:

validations.append(GroupValidation("Group a Validation", "a", normal_threshold=1, critical_threshold=2))

This new validation does not have an explicit priority level. Rather, it defaults to LOW priority. If the number of failures in group “a” reaches the normal_threshold, the validation will be considered a failure and the priority will become NORMAL. If it reaches the critical_threshold, the priority will become CRITICAL (and the validation will still be a failure).

You can create GroupValidations on groups of GroupValidations. The only difference is that an order parameter must be passed, to ensure that the tests are run in the correct order:

validations.append(GroupValidation("Group a Validation", "a", normal_threshold=1, critical_threshold=2, group="c"))
validations.append(GroupValidation("Group b Validation", "b", normal_threshold=1, critical_threshold=2, group="c"))
validations.append(alarmageddon.validation.GroupValidation("Group c Validation", "c", normal_threshold=2, order=2))

Publishers

All publishers accept a priority_threshold argument. This should be one of Priority.LOW, Priority.NORMAL, or Priority.CRITICAL. A publisher will only publish failing validations if they are at least as critical as the priority_threshold. For example, to report on all failures, you should set your publisher’s priority_threshold to Priority.LOW.

JUnit XML

The JUnit XML publisher will write out all validation results to an XML file. This publisher is automatically created when you run the validations, and will write out to results.xml.

HipChat

The HipChat publisher will report failures to your hipchat room:

HipChatPublisher("hipchat.route.here","token","stable","hipchat_room")

By default, the HipChat publisher alerts on failures of NORMAL priority or higher.

Http

The Http publisher will report failures to an HTTP Server:

HttpPubliser(success_url="success.url.here", success_url="failure.url.here")

PagerDuty

The PagerDuty publisher will report failures to PagerDuty:

PagerDutyPublisher("pagerduty.route.here", "pagerduty_key")

By default, the PagerDuty publisher alerts only on CRITICAL failures.

Graphite

The Graphite publisher behaves slightly differently than the other publishers. Instead of only logging failures, it logs both successes and failures, providing you with a way to keep track of how often certain validations are passing or failing:

GraphitePublisher("127.0.0.1",8080)

The GraphitePublisher will also keep track of how long the validations took, in the case of HttpValidations. By default, GraphitePublisher will publish on all validations.

Email

There are two email publishers. SimpleEmailPublisher provides basic emailing functionality, and will email all test results to the supplied addresses:

SimpleEmailPublisher({"real_name": "test", "address": "sender@test.com"},
                     [{"real_name": "test", "address": "recipient@test.com"}],
                     host='127.0.0.1', port=1234)

EmailPublisher provides more granular control over the sent messages. For this reason, validations that will be published by the email publisher must be enriched with extra information.

To create an email publisher, you need a config object with the appropriate values in it, and optionally a set of defaults for missing config values:

email_pub = EmailPublisher(config, defaults=general_defaults)

For enrichment, a convenience method is provided in emailer to ensure that the appropriate value are present:

emailer.enrich(validation, email_settings, runtime_context)

Note

For the email publisher to publish a failure, the priority threshold must be reached and the validation must be enriched.

Using the Email Publisher

Due to the extensive flexibility allowed by the email publisher, it involves more configuration than the other publishers. This page is intended to be a guide through that process.

Publisher Configuration

Constructing the Email Publisher is similar to other publishers:

EmailPublisher(config, priority_threshold=Priority.NORMAL, defaults=general_email_defaults)

config is the usual Alarmageddon config object, but it must contain the following email specific field:

"email_template_directory" : "path/to/email/templates"

This specifies where the email jinja templates can be found.

There are a few optional fields as well. The full email settings might look like this:

{
    "email_host" : null,
    "email_port" : null,
    "email_template_directory" : "email_templates",
    "email_defaults" : {
        "general" : {
            "email_template" : "default.template",
            "email_subject_template" : "default_subject.template",
            "email_sender" : {"real_name" : "Alarmageddon Monitor", "address" : "noreply@host.com"},
            "email_recipients" : [
                {"real_name" : "Team", "address" : "team@host.com"}
            ],
            "email_custom_message" : ""
        },
    }
}

email_host and email_port can be set programatically or included in conf.json. email_defaults provides default information about what templates to use and extra fields that can be used in the template. You will have to programatically assign email_defaults to the publisher, as shown in the example constructor above.

Validation Enrichment

For the email publisher to successfully publish a message, the default information provided to each validation is not enough. To include this extra information, an enrichment function must be called on each validation:

emailer.enrich(validation,
               email_settings=validation_settings,
               runtime_context=email_runtime_context)

validation_settings should be a python dictionary of the form:

{
    "email_template" : "alert.template",
    "email_subject_template" : "alert_subject.template",
    "email_recipients" : [
        {"real_name" : "Another Team", "address" : "another@host.com"}
    ],
    "email_custom_message" : """The route located at {{test_name}} failed to respond within the alloted time frame.
                              The node may be offline or missing."""
}

Note that these fields can also appear in the Email Publisher defaults. If validation-specific fields are present, they will be used instead of the defaults.

runtime_context is a dictionary that can contain arbitrary information to be used by the template.

Note

For a validation to be published by the Email Publisher, that validation must both be enriched and be of high enough priority.

You can use Alarmageddon’s dry run feature to verify that the validations that you intended to be published by the email publisher will actually be published by the email publisher.

Email Templates

The Email Publisher uses jinja2 to create its messages. An example email template is provided below:

Validation Failure in environment {{env}}:
{{test_name}} - {{test_description}}

{{email_custom_message}}

The email templates are files stored in the email_template_directory.

Source

alarmageddon package

Subpackages

alarmageddon.publishing package

Submodules
alarmageddon.publishing.emailer module

Support for publishing via e-mail.

Please refer to the [SPHINX DOCUMENTATION] for a detailed usage explanation

class alarmageddon.publishing.emailer.EmailPublisher(config, email_notifications_config_key=None, name='EmailPublisher', defaults=None, priority_threshold=None, connect_timeout_seconds=10)[source]

Bases: alarmageddon.publishing.emailer.SimpleEmailPublisher

A publisher that publishes incidents to e-mail.

For validations to be published by this publisher, they must be enriched with additional data. See :py:func:.emailer.enrich

Parameters:
  • config – A config object containing email config information. See below for a detailed description.
  • email_notifications_config_key – The config key that contains the email configuration.
  • name – The name of the publisher.
  • defaults – Default email templating values.
  • priority_threshold – Will publish validations of this priority or higher if they are appropriately enriched.
  • connect_timeout_seconds – How long to attempt to connect to the SMTP server.

config is an Alarmageddon config object that contains at least the following:

{email_template_directory : Directory containing the e-mail templates.
Can be relative to the location of the alarmageddon script or an absolute directory location,

environment : EMAIL_NOTIFICATIONS

Where EMAIL_NOTIFICATIONS is a dictionary of the form:
“email_notifications” : {
EMAIL_TYPE: {
“email_recipients” : [
{“real_name” : “Some other recipient”,
“address” : “email@address.com“},...

], “email_custom_message” : “Custom email message. Can contain

Jinja replacement tokens.”

},...

}

}

and EMAIL_TYPE is a name that will identify which validations should use that config.

EMAIL_NOTIFICATIONS_CONFIG_KEY = 'email_notifications'
configure_replacement_context(result)[source]

Configures the replacement context for this email publisher

Supported template variables:

{{test_name}} The name of the test.

{{test_description}} The description of the failure.

{{env}} The environment name.

{{email_custom_message}} A custom message used in email alerts. This field can be used to summarize a particular type of alert or include additional details

Runtime Context: All dictionary items contained in runtime context are available.

Parameters:result – The test result whose values will populate the replacement context.
get_email_settings(result)[source]

Returns the email settings of the given result.

get_runtime_context(result)[source]

Returns the runtime context of the given result.

replace_tokens(template, token_dictionary)[source]

Replace templated values with their contents.

Loops multiple times, to handle the case of a template that contains templates.

Templates should be valid Jinja templates:
http://jinja.pocoo.org/
Parameters:
  • template – The template string.
  • token_dictionary – A mapping from template names to values.
send(result)[source]

Constructs a message from a result and send it as an email.

This will only send if the priority threshold is met and the original validation was appropriately enriched.

Parameters:result – The result to publish.
class alarmageddon.publishing.emailer.SilentUndefined(hint=None, obj=missing, name=None, exc=<class 'jinja2.exceptions.UndefinedError'>)[source]

Bases: jinja2.runtime.Undefined

Dont break pageloads because vars arent there!

class alarmageddon.publishing.emailer.SimpleEmailPublisher(sender_address, recipient_addresses, host=None, port=None, name='EmailPublisher', priority_threshold=None, connect_timeout_seconds=10)[source]

Bases: alarmageddon.publishing.publisher.Publisher

A publisher that publishes incidents to e-mail.

Parameters:
  • config – A config object containing email config information. See below for a detailed description.
  • email_notifications_config_key – The config key that contains the email configuration.
  • name – The name of the publisher.
  • defaults – Default email templating values.
  • priority_threshold – Will publish validations of this priority or higher if they are appropriately enriched.
  • connect_timeout_seconds – How long to attempt to connect to the SMTP server.
configure_message(sender_address, recipient_addresses, subject, body)[source]

Creates a MIMEMultipart message with a plain-text body.

Parameters:
  • sender_address – The address the message will be sent from.
  • recipient_addresses – The addresses the message will be sent to.
  • subject – The subject of the email.
  • body – The body of the email.
configure_recipients(recipients)[source]

Properly formats the list of recipient addresses.

Parameters:recipients – A list containing dictionaries of information about the recipients.
configure_sender(sender)[source]

Properly formats the sender address.

Parameters:sender – A dictionary containing information about the sender.
configure_smtp_object(host, port)[source]

Helper method to configure the SMTP object.

send(result)[source]

Constructs a message from a result and send it as an email.

This will only send if the priority threshold is met and the original validation was appropriately enriched.

Parameters:result – The result to publish.
alarmageddon.publishing.emailer.enrich(validation, email_settings, runtime_context=None)[source]

Enriches the validation with a custom email message.

Parameters:
  • validation – The validation object.
  • email_settings – A dictionary object containing settings for email subject, body, sender and recipients. See below for details.
  • runtime_context
    • Additional replacement context settings available

    at runtime. See below for details.

email_settings should be a dictionary of the form:
{
“email_type”: “An environment-specific e-mail type as
defined in the email publisher config”,

“subject”: “The name of the Jinja template for the e-mail subject”,

“body”: “The name of the Jinja template for the e-mail body”,

“sender”: “A dictionary of the form
{“real_name”: “Real Name”, “address”: “email@address.com“}”,
“recipients”: “An iterable of dicionaries of the form
{“real_name”: “Real Name”, “address”: “email@address.com“}”

}

Note that the location of the Jinja templates is defined in the email publisher config.

runtime_context is a dictionary whose values are consumed at runtime inside the Jinja templates defined in email_settings.

alarmageddon.publishing.exceptions module

Exceptions related to publishing TestResults

exception alarmageddon.publishing.exceptions.EnrichmentFailure(publisher, validation, values)[source]

Bases: exceptions.Exception

An exception thrown when the enrichment of a validation fails.

Parameters:
  • publisher – The publisher the validation was enriched for.
  • validation – The validation that failed to be enriched.
  • values – The values that the validation was enriched with.
publisher()[source]

Returns the publisher that the enrichment was for.

validation()[source]

Returns the validation that failed to enrich.

values()[source]

Returns the enrichment values.

exception alarmageddon.publishing.exceptions.PublishFailure(publisher, result)[source]

Bases: exceptions.Exception

An exception thrown when sending a test result to a publisher fails.

Parameters:
  • publisher – The publisher that failed to publish.
  • result – The result that failed to publish.
publisher()[source]

Returns the publisher that could not be published to.

result()[source]

Returns the result that could not be published.

alarmageddon.publishing.graphite module

Support for publishing to Graphite.

class alarmageddon.publishing.graphite.GraphitePublisher(host, port, failed_tests_counter='failed', passed_tests_counter='passed', prefix='alarmageddon', priority_threshold=None)[source]

Bases: alarmageddon.publishing.publisher.Publisher

A Publisher that sends results to Graphite.

Logs the number of successes and failures, and potentially logs how long a validation takes.

Parameters:
  • host – The graphite host.
  • port – The port that graphite is listening on.
  • failed_tests_counter – Name of the graphite counter for failed tests.
  • passed_tests_counter – Name of the graphite coutner for successful tests.
  • prefix – Prefix applied to all graphite fields this publisher will write to.
  • priority_threshold – Will publish validations of this priority or higher.
send(result)[source]

Sends a result to Graphite.

Logs the result as either a success or a failure. Additionally, logs how long the validation took, if a timer_name field is present on the result.

alarmageddon.publishing.hipchat module

Suppport for publishing to HipChat

class alarmageddon.publishing.hipchat.HipChatPublisher(api_end_point, api_token, environment, room_name, priority_threshold=None)[source]

Bases: alarmageddon.publishing.publisher.Publisher

A Publisher that sends results to HipChat.

Publishes all failures to the designated HipChat room. Will publish all results in a single message, collapsings similar errors together to save space.

Parameters:
  • api_end_point – The HipChat API endpoint.
  • api_token – A HipChat API token.
  • environment – The environment that tests are being run in.
  • room_name – The HipChat room to publish results to.
  • priority_threshold – Will publish validations of this priority or higher.
send(result)[source]

sends a result to HipChat if the result is a Failure.

send_batch(results)[source]

Send a batch of results to HipChat.

Collapses similar failures together to save space.

alarmageddon.publishing.http module

A Publisher that publishes to a web application using HTTP

class alarmageddon.publishing.http.HttpPublisher(url=None, success_url=None, failure_url=None, method='POST', headers=None, auth=None, attempts=1, retry_after_seconds=2, timeout_seconds=5, publish_successes=False, expected_status_code=200, name=None, priority_threshold=None)[source]

Bases: alarmageddon.publishing.publisher.Publisher

Creates an HTTP Publisher that publishes successes and/or failures to either one or two HTTP end points.

If you want the same URL to be published to whether or not the the Validation result being published failed or succeeded, please supply only the url parameter and omit the failure_url and success_url parameters.

Conversely, if you want different URLs to be requested based on whether or not the Validation result being published succeeded, please omit the url parameter and supply the success_url and failure_url parameters. The HttpPublisher will use the same method, headers, and authentication parameters when requesting both of those URLs. If that is not acceptable, please override the relevent getter methods.

Parameters:
  • url – The URL that this publisher should publish successful and failed Validation results to.
  • success_url – The URL that this publisher should publish successful Validation results to.
  • failure_url – The URL that this publisher should publish failed Validation results to.
  • method – The HTTP method to use when posting. POST is the default because it is the only HTTP method that allows you to send the results of the published Validation. The GET method is allowed but cannot send the details of the Validation result along with the request.
  • headers – headers to send along with the request
  • auth – if your URLs require authentication you can supply a value like the following: auth=('user', 'pass')
  • attempts – the number of times to try to publish to your URL(s).
  • retry_after_seconds – how many seconds to wait after a failed attempt.
  • timeout_seconds – how long a single attempt can take before it is considered a failed attempt.
  • publish_successes – specify True if you want this HTTP Publisher to publish successful results too. If you provide a success_url, then this HttpPublisher will assume you want to publish successes.
  • expected_status_code – the HTTP status code to expect from your HTTP server if the Validation result was successfully published.
  • name – The name of this publisher.
  • priority_threshold – Will publish validations of this priority or higher.
send(result)[source]

Publish a test result.

Parameters:result – The TestResult of a test.
alarmageddon.publishing.pagerduty module

Support for publishing to PagerDuty.

class alarmageddon.publishing.pagerduty.PagerDutyPublisher(api_end_point, api_key, priority_threshold=None)[source]

Bases: alarmageddon.publishing.publisher.Publisher

A publisher that publishes incidents to PagerDuty.

A unique ID is generated for each failure, built from the failure message. This means that repeated failures for the same test will not cause multiple pages if the original failure has not yet been resolved.

Parameters:
  • api_end_point – The PagerDuty API endpoint.
  • api_token – A PagerDuty API token.
  • priority_threshold – Will publish validations of this priority or higher.
send(result)[source]

Creates an incident in pager duty.

Performs exponential backoff and retry in the case of 403 or 5xx responses.

alarmageddon.publishing.publisher module

The common interface and tools for all Publishers

class alarmageddon.publishing.publisher.Publisher(name=None, priority_threshold=None)[source]

Bases: object

Base class for all test result publishers.

Publishers take test results and publish them to another service.

Parameters:
  • name – The name of this publisher.
  • priority_threshold – Will publish validations of this priority or higher.
name()[source]

Return the name of the publisher.

send(result)[source]

Publish a test result.

Parameters:result – The TestResult of a test.
send_batch(results)[source]

Publish a collection of test results.

Directly called by the Reporter .

Parameters:result – An iterable of TestResult objects.
will_publish(result)[source]

Determine if the publisher will publish the result

To publish a result, the publisher must both be able to publish (_can_publish) and have its priority threshold met (_should_publish).

Parameters:result – The TestResult of a test.
Module contents

This package contains classes that publish test results to different places (e.g. HipChat, PagerDuty, etc.)

alarmageddon.validations package

Submodules
alarmageddon.validations.kafka module

Convenience Validations for working with Kafka

class alarmageddon.validations.kafka.KafkaStatusValidation(ssh_context, zookeeper_nodes, kafka_list_topic_command='/opt/kafka/bin/kafka-list-topic.sh', priority=2, timeout=None, hosts=None)[source]

Bases: alarmageddon.validations.ssh.SshValidation

Validate that the Kafka cluster has all of it’s partitions distributed across the cluster.

Parameters:
  • ssh_contex – An SshContext class, for accessing the hosts.
  • zookeeper_nodes – Kafka zookeeper hosts and ports in CSV. e.g. “host1:2181,host2:2181,host3:2181”
  • kafka_list_topic_command – Kafka command to list topics (defaults to “/opt/kafka/bin/kafka-list-topic.sh”)
  • priority – The Priority level of this validation.
  • timeout – How long to attempt to connect to the host.
  • hosts – The hosts to connect to.
perform_on_host(host)[source]

Runs kafka list topic command on host

alarmageddon.validations.cassandra module

Convenience Validations for working with Cassandra

class alarmageddon.validations.cassandra.CassandraStatusValidation(ssh_context, service_state='UN', number_nodes=5, owns_threshold=40, priority=2, timeout=None, hosts=None)[source]

Bases: alarmageddon.validations.ssh.SshValidation

Validate that the Cassandra ring is within expected parameters.

Check that the specified Cassandra ring is in the specified state and that the ring ownership of the nodes is within a certain threshold.

Parameters:
  • ssh_contex – An SshContext class, for accessing the hosts.
  • service_state – The expected service state value (defaults to “UN”).
  • number_nodes – The expected number of cassandra nodes in the ring.
  • owns_threshold – The maximum percentage of the ring owned by a node.
  • priority – The Priority level of this validation.
  • timeout – How long to attempt to connect to the host.
  • hosts – The hosts to connect to.
check(host, nodes)[source]

Compares the results of nodetool status to the expected results.

perform_on_host(host)[source]

Runs nodetool status and parses the output.

class alarmageddon.validations.cassandra.Node(ip_address, status=0, state=0, load=None, tokens=None, owns=None, host_id=None, rack=None)[source]

Bases: object

Information about a Cassandra node including its load, what percent of the ring it owns, its state, etc.

class alarmageddon.validations.cassandra.NodetoolStatusParser[source]

Bases: object

Parses the output of the Cassandra nodetool status command and tries to make sense of it despite changes made to the format.

parse(status_output)[source]
class alarmageddon.validations.cassandra.State[source]

Bases: object

An enum-like object that represents the state of a Cassandra Node

JOINING = 3
LEAVING = 2
MOVING = 4
NORMAL = 1
UNKNOWN = 0
static from_text(text)[source]
static to_text(value)[source]

Convert State to String

class alarmageddon.validations.cassandra.Status[source]

Bases: object

An enum-like object that represents the status of a Cassandra Node

DOWN = 2
UNKNOWN = 0
UP = 1
static from_text(text)[source]
static to_text(value)[source]

Convert Status to String

alarmageddon.validations.graphite module

Classes that support validation of metrics collected by Graphite

class alarmageddon.validations.graphite.GraphiteContext(graphite_host)[source]

Bases: object

Create one of these and then pass it to all of the GraphiteValidation objects you create.

get_graphite_host()[source]

returns the Graphite host name

class alarmageddon.validations.graphite.GraphiteValidation(context, name, metric_name, time_range=datetime.timedelta(0, 3600), **kwargs)[source]

Bases: alarmageddon.validations.validation.Validation

A Validation that queries Graphite for data and then validates any defined expecations against that data.

expect_average_greater_than(lower_bound)[source]

The average reading of the specified time range should fall above the lower bound

expect_average_in_range(lower_bound, upper_bound)[source]

The average reading of the specified time range should fall between the upper and lower bound

expect_average_less_than(upper_bound)[source]

The average reading of the specified time range should fall below the upper bound

expect_greater_than(lower_bound)[source]

All readings in the specified time range should fall above the lower bound

expect_in_range(lower_bound, upper_bound)[source]

All readings in the specified time range should fall between the upper and lower bound

expect_less_than(upper_bound)[source]

All readings in the specified time range should fall below the upper bound

fail(reason)[source]

Causes this GraphiteValidation to fail with the given reason.

perform(group_failures)[source]

Perform the validation and propagate any failures to reporters

alarmageddon.validations.graphite_expectations module

Expectations that can be held against metrics collected in Graphite

class alarmageddon.validations.graphite_expectations.AverageGreaterThanExpectation(validation, lower_bound)[source]

Bases: alarmageddon.validations.graphite_expectations.GraphiteExpectation

Expect that the average of a graphite metric is greater than a specified number

validate(readings, time_range)[source]
class alarmageddon.validations.graphite_expectations.AverageLessThanExpectation(validation, upper_bound)[source]

Bases: alarmageddon.validations.graphite_expectations.GraphiteExpectation

Expect that the average of a graphite metric is less than a specified number

validate(readings, time_range)[source]
class alarmageddon.validations.graphite_expectations.GraphiteExpectation(validation, name)[source]

Bases: object

An expectation placed on a list of Graphte readings

validate(readings, time_range)[source]

make sure the expectation is met

class alarmageddon.validations.graphite_expectations.GreaterThanExpectation(validation, lower_bound)[source]

Bases: alarmageddon.validations.graphite_expectations.GraphiteExpectation

Expect that a graphite metric is greater than a specified number

validate(readings, time_range)[source]
class alarmageddon.validations.graphite_expectations.LessThanExpectation(validation, upper_bound)[source]

Bases: alarmageddon.validations.graphite_expectations.GraphiteExpectation

Expect that a graphite metric is less than than a specified number

validate(readings, time_range)[source]
alarmageddon.validations.http module

HTTP Validation

class alarmageddon.validations.http.HttpValidation(method, url, data=None, headers=None, priority=2, timeout=None, group=None, retries=1, ignore_ssl_cert_errors=False, auth=None)[source]

Bases: alarmageddon.validations.validation.Validation

A Validation that executes an HTTP request and then performs zero or more checks on the response.

add_expectation(expectation)[source]

Add a custom expecation to the Validation

duplicate_with_hosts(host_names, port=None)[source]

Returns a list of new HttpValidation that are identical to this HttpValidation except with the host name replaced by the elements of host_names.

expect_contains_text(text)[source]

Add an expectation that the HTTP response will contain a particular string.

expect_content_type(content_type)[source]

Add an expectation that the HTTP response’s content type will be equal to the specified content_type.

expect_header(name, value)[source]

Add an expectation that the HTTP response will contain a header with the specified name and value.

expect_json_property_value(json_property_path, expected_value)[source]

Add an expectation that the HTTP response will be JSON and contain a property (found by traversing json_property_path) with the specified value.

expect_json_property_value_greater_than(json_property_path, greater_than)[source]

Add an expectation that the HTTP response will be JSON and contain a numeric property (found by traversing json_property_path) greater than greater_than.

expect_json_property_value_less_than(json_property_path, less_than)[source]

Add an expectation that the HTTP response will be JSON and contain a numeric property (found by traversing json_property_path) less than less_than.

expect_status_codes(status_codes)[source]

Add an expectation that the HTTP response will have one of the specified status_codes.

fail(reason)[source]

Causes this HttpValidation to fail with the given reason.

static get(url, **kwargs)[source]

Create an HttpValidation that will GET to the specified url passing specified headers.

headers - a dictionary where each key is a header name and the value that corresponds to the key is the header value.

priority - the priority of the call; this determines how failures are routed.

timeout - the number of seconds the HTTP request is allowed to take.

group - the group to include this Validation in

get_elapsed_time()[source]
static head(url, **kwargs)[source]

Create an HttpValidation that will retrieve the HEAD of the specified url passing specified headers.

headers - a dictionary where each key is a header name and the value that corresponds to the key is the header value.

priority - the priority of the call; this determines how failures are routed.

timeout - the number of seconds the HTTP request is allowed to take.

group - the group to include this Validation in

static options(url, **kwargs)[source]

Create an HttpValidation that will retrieve OPTIONS for the specified url passing specified headers.

headers - a dictionary where each key is a header name and the value that corresponds to the key is the header value.

priority - the priority of the call; this determines how failures are routed.

timeout - the number of seconds the HTTP request is allowed to take.

group - the group to include this Validation in

perform(group_failures)[source]

Perform the HTTP request and validate the response.

static post(url, **kwargs)[source]

Create an HttpValidation that will POST to the specified url passing specified headers and payload.

headers - a dictionary where each key is a header name and the value that corresponds to the key is the header value.

data - data that is sent along with the request

priority - the priority of the call; this determines how failures are routed.

timeout - the number of seconds the HTTP request is allowed to take.

group - the group to include this Validation in

static put(url, **kwargs)[source]

Create an HttpValidation that will PUT to the specified url passing specified headers and payload.

headers - a dictionary where each key is a header name and the value that corresponds to the key is the header value.

data - data that is sent along with the request

priority - the priority of the call; this determines how failures are routed.

timeout - the number of seconds the HTTP request is allowed to take.

group - the group to include this Validation in

send_header(name, value)[source]

adds an HTTP header with the specified name and value to the request when it’s sent

timer_name()[source]
alarmageddon.validations.http_expectations module

Expectations that can be placed on an HTTP request

class alarmageddon.validations.http_expectations.ExpectContainsText(text)[source]

Bases: alarmageddon.validations.http_expectations.ResponseExpectation

An expectation that an HTTP response will include some text.

validate(validation, response)[source]
class alarmageddon.validations.http_expectations.ExpectedContentType(content_type)[source]

Bases: alarmageddon.validations.http_expectations.ExpectedHeader

An expectation that an HTTP response will have a particular content type

class alarmageddon.validations.http_expectations.ExpectedHeader(name, value)[source]

Bases: alarmageddon.validations.http_expectations.ResponseExpectation

An expectation that an HTTP response will include a header with a specific name and value.

validate(validation, response)[source]
class alarmageddon.validations.http_expectations.ResponseExpectation[source]

Bases: object

An expectation placed on an HTTP response.

validate(validation, response)[source]

If the expectation is met, do nothing. If the expectation is not met, call validation.fail(...)

alarmageddon.validations.json_expectations module

Expectations that can be held against some JSON text

class alarmageddon.validations.json_expectations.ExpectedJsonEquality(json_property_path, value)[source]

Bases: alarmageddon.validations.json_expectations.ExpectedJsonPredicate

expects that a JSON value is equal to a specified value

validate_value(validation, expected_value, actual_value)[source]
class alarmageddon.validations.json_expectations.ExpectedJsonPredicate(json_property_path, value)[source]

Bases: alarmageddon.validations.http_expectations.ResponseExpectation

An expectation that an HTTP response will be JSON and have a property with a specified value.

validate(validation, response)[source]

Validates that the HTTP response is JSON and that it contains a property (found by traversing self.json_property_path) equal to self.value

validate_value(validation, expected_value, actual_value)[source]

validates a JSON value

class alarmageddon.validations.json_expectations.ExpectedJsonValueGreaterThan(json_property_path, value)[source]

Bases: alarmageddon.validations.json_expectations.ExpectedJsonPredicate

Expects that a numeric JSON value is greater than a specified value

validate_value(validation, expected_value, actual_value)[source]
class alarmageddon.validations.json_expectations.ExpectedJsonValueLessThan(json_property_path, value)[source]

Bases: alarmageddon.validations.json_expectations.ExpectedJsonPredicate

Expects that a numeric JSON value is less than a specified value

validate_value(validation, expected_value, actual_value)[source]
alarmageddon.validations.rabbitmq module

Validation for RabbitMQ

class alarmageddon.validations.rabbitmq.RabbitMqContext(host, port, user_name, password)[source]

Bases: object

information needed to connect and interact with RabbitMQ

get_connection(timeout=None)[source]

Connects to RabbitMQ and returns the connection object

Third Party (pika) Bug: https://github.com/pika/pika/issues/354 - Once this bug is fixed we can take out our own retrying logic and use pika’s retry logic. In the mean time, connection failure messages will be inaccurate; they’ll say that only one connection attempt was made.

get_credentials()[source]

get “plain” credentials based on this object’s user name and password

class alarmageddon.validations.rabbitmq.RabbitMqValidation(rabbitmq_context, name, queue_name, max_queue_size, priority=2, timeout=None, num_attempts=4, seconds_between_attempts=2, group=None, ignore_connection_failure=False)[source]

Bases: alarmageddon.validations.validation.Validation

A Validation that can be held against a RabbitMQ server

perform(group_failures)[source]

Perform the validation. If the validation fails, call self.fail passing it the reason for the failure.

alarmageddon.validations.ssh module

Validations that are performed by executing commands remotely on other servers using SSH

We’re using fabric for easy SSH command execution.

class alarmageddon.validations.ssh.LoadAverageValidation(ssh_context, priority=2, timeout=None, group=None, hosts=None)[source]

Bases: alarmageddon.validations.ssh.SshValidation

Validates that a server’s load average falls within a set of parameters

add_expectation(expectation)[source]
check(host, minutes, load)[source]

Make sure that the n-minute load average for the given host is within the allowed range.

expect_exit_code(exit_code)[source]
expect_max_15_minute_load(max_load)[source]

expect a maximum 15 minute load

expect_max_1_minute_load(max_load)[source]

expect a maximum 1 minute load

expect_max_5_minute_load(max_load)[source]

expect a maximum 5 minute load

expect_min_15_minute_load(min_load)[source]

expect a minimum 15 minute load

expect_min_1_minute_load(min_load)[source]

expect a minimum 1 minute load

expect_min_5_minute_load(min_load)[source]

expect a minimum 5 minute load

perform_on_host(host)[source]

Runs the SSH Command on a host and checks to see if all expectations are met.

class alarmageddon.validations.ssh.OutputContains(validation, text)[source]

Bases: alarmageddon.validations.ssh.SshCommandExpectation

Expects that the output of an SSH command is contains specified text

validate(validation, host, command_output, exit_code)[source]
class alarmageddon.validations.ssh.OutputDoesNotContain(validation, text)[source]

Bases: alarmageddon.validations.ssh.SshCommandExpectation

Expects that the output of an SSH command does not contain specified text

validate(validation, host, command_output, exit_code)[source]
class alarmageddon.validations.ssh.OutputGreaterThan(validation, value)[source]

Bases: alarmageddon.validations.ssh.SshCommandExpectation

Expects that the output of an SSH command is greater than the specified value. This method casts the command_output string to a float to do the comparison.

validate(validation, host, command_output, exit_code)[source]
class alarmageddon.validations.ssh.OutputLessThan(validation, value)[source]

Bases: alarmageddon.validations.ssh.SshCommandExpectation

Expects that the output of an SSH command is less than the specified value. This method casts the command_output string to a float to do the comparison.

validate(validation, host, command_output, exit_code)[source]
class alarmageddon.validations.ssh.SshCommandExpectation(validation)[source]

Bases: object

Base class for expectations that can be placed on an SshValidation

fail_on_host(host, reason)[source]

Report a failure and the host the failure occurred on

validate(validation, host, command_output, exit_code)[source]

Defined by derived classes

class alarmageddon.validations.ssh.SshCommandValidation(ssh_context, name, command, working_directory=None, environment=None, priority=2, use_sudo=False, timeout=None, connection_retries=0, group=None, hosts=None)[source]

Bases: alarmageddon.validations.ssh.SshValidation

A validation that runs a command and checks zero or more expectations against its exit code and/or output.

perform_on_host(host)[source]

Runs the SSH Command on a host and checks to see if all expectations are met.

class alarmageddon.validations.ssh.SshCommands[source]

Bases: object

Some commands that might be helpful

static get_cpu_count()[source]

return the number of processors on the server

static get_uptime()[source]

return the system uptime

class alarmageddon.validations.ssh.SshContext(user, key_file)[source]

Bases: object

Context that SSH commands execute in: the user and the user’s key file.

Note that the list of hosts is not part of the SshContext because it changes at a very high rate compared to the user name and their key file.

class alarmageddon.validations.ssh.SshValidation(ssh_context, name, priority=2, timeout=None, group=None, connection_retries=0, hosts=None)[source]

Bases: alarmageddon.validations.validation.Validation

A Validation that is performed using SSH (more specifically, fabric)

add_expectation(expectation)[source]

Adds an expectation deriving from SshCommandExpectation to the list of expectations to be performed as part of the validation.

add_hosts(hosts)[source]

Add additional hosts to run validations against

expect_exit_code(exit_code)[source]

Add the expectation that the SSH command’s exit code is equal to exit_code

expect_output_contains(text)[source]

Add the expectation that the SSH command’s output contains text

expect_output_does_not_contain(text)[source]

Add the expectation that the SSH command’s output does not contain text

fail_on_host(host, reason)[source]

signal failure the test on a particular host

perform(group_failures)[source]

Perform validation against all of this object’s hosts

perform_on_host(host)[source]

perform a validation against a particular host

class alarmageddon.validations.ssh.UpstartServiceValidation(ssh_context, service_name, service_state='running', priority=2, timeout=None, group=None, hosts=None)[source]

Bases: alarmageddon.validations.ssh.SshCommandValidation

Validates that the specified upstart process is in the specified state (e.g. running)

Module contents

Validations that Alarmageddon can perform

Submodules

alarmageddon.config module

Configuration object used by Alarmageddon

class alarmageddon.config.Config(dictionary, environment_name)[source]

Bases: dict

Alarmageddon configuration object.

A configuration object that both acts like a read-only dictionary and provides some methods to access application specific settings

Parameters:
  • dictionary – A dictionary of the form {‘env’:{config options},...}
  • environment_name – The environment that this Config object belongs to
ENVIRONMENT_KEY = 'environment'
environment_name()[source]

returns current environment name

static from_file(config_path, environment_name)[source]

Load a Config object from a file

An environment_name must be provided so that the resulting Config object can provide access to environment specific settings.

hostname(alias)[source]

Returns an environment-specific hostname given its alias.

host names are pulled from the hosts dictionary under each of the environment dictionaries.

test_results_file()[source]

returns the location of the test results file

alarmageddon.reporter module

Reports test results to registered publishers.

class alarmageddon.reporter.Reporter(publishers)[source]

Bases: object

Class for collecting and sending results to publishers.

Parameters:publishers – List of Publisher objects to send results to.
collect(result)[source]

Construct a result from item and store for publishing.

Called by pytest, through the Alarmageddon plugin.

report()[source]

Send reports to all publishers

exception alarmageddon.reporter.ReportingFailure(failures)[source]

Bases: exceptions.Exception

An exception that aggregates multiple PublishFailures.

Parameters:failures – A list of PublishFailures

alarmageddon.result module

Classes that represent possible results of running a test.

class alarmageddon.result.Failure(test_name, validation, description, time=None)[source]

Bases: alarmageddon.result.TestResult

The result of a failed validation.

description is required.

is_failure()[source]

Returns True.

class alarmageddon.result.Success(test_name, validation, description=None, time=None)[source]

Bases: alarmageddon.result.TestResult

The result of a successful validation.

is_failure()[source]

Returns False.

class alarmageddon.result.TestResult(test_name, validation, description=None, time=None)[source]

Bases: object

Base class representing the result of performing a validation.

Contains the outcome information that Alarmageddon will publish.

Parameters:
  • test_name – Name of the validation this result is associated with.
  • validation – The Validation this result is associated with.
  • description – Default None. A description of the outcome of the validation. If the validation failed, this field is expected to not be None.
  • time – Default None. How long the validation took to perform.
description()[source]

Returns additional descriptive text about the test.

For Failures, description is required.

is_failure()[source]

Returns True if and only if this Result represents a failed test.

test_name()[source]

Returns the name of the test.

alarmageddon.run module

Methods that support running tests

alarmageddon.run.construct_publishers(config)[source]

Construct the built-in publishers.

Parameters:config – Config object to construct the publishers from.
alarmageddon.run.do_dry_run(validations, publishers)[source]

Print which validations will be published by which publishers.

Assume all validations fail and list the messages that would have been published.

Parameters:
  • validations – List of Validation objects that Alarmageddon would perform.
  • publishers – List of Publisher objects that Alarmageddon would publish validation results to.
alarmageddon.run.load_config(config_path, environment_name)[source]

Helper method for loading a Config

Parameters:
  • config_path – Path to the JSON configuration file.
  • environment_name – The config environment to run Alarmageddon in.
alarmageddon.run.run_tests(validations, publishers=None, config_path=None, environment_name=None, config=None, dry_run=False, processes=1, print_banner=True)[source]

Main entry point into Alarmageddon.

Run the given validations and report them to given publishers.

Either both config_path and environment_name should not be None, or config should not be None.

Parameters:
  • validations – List of Validation objects that Alarmageddon will perform.
  • publishers – List of Publisher objects that Alarmageddon will publish validation results to.
  • dry_run – When True, will prevent Alarmageddon from performing validations or publishing results, and instead will print which validations will be published by which publishers upon failure.
  • processes – The number of worker processes to spawn. Does not run spawn additional processes if set to 1.
  • print_banner – When True, print the Alarmageddon banner.

Deprecated since version 1.0.0: These parameters are no longer used: config_path, environment_name, config. Configuration happens when constructing publishers instead.

Module contents

Alarmageddon main module

Indices and tables