
Alarmageddon is a Python monitoring framework for RESTful services, built on top of Requests and Fabric. Alarmageddon supports Python 2.7 and Python 3.4+
The following example GETs www.google.com, and reports to HipChat if the return code is not 200:
import alarmageddon
from alarmageddon.validations.http import HttpValidation
from alarmageddon.publishing.hipchat import HipChatPublisher
validations = [HttpValidation.get("http://www.google.com").expect_status_codes([200])]
publishers = [HipChatPublisher("hipchat.route.here","token","stable","hipchat_room")]
alarmageddon.run_tests(validations,publishers)
Features¶
- Verify expectations on the following
- HTTP requests
- SSH commands
- RabbitMQ queue lengths
- Cassandra status
- Statistics collected in Graphite
- The behavior of other Alarmageddon tests
- Report failed verifications to
- HipChat
- Slack
- PagerDuty
- Graphite
- XML file
Getting Started¶
A Short Example¶
This will walk you through creating and running a basic suite of Alarmageddon validations.
Alarmageddon has two main components: validations and publishers. Validations are the tests that will be run, and publishers handle passing the results of those validations along to an external system (eg, PagerDuty).
Creating Validations¶
To make sure that the world’s search engines are working, let’s use HttpValidation:
from alarmageddon.validations.http import HttpValidation
validations = []
validations.append(HttpValidation.get("http://www.google.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.bing.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.yahoo.com").expect_status_codes([200]))
These validations are constructed to GET the supplied url. We’ve also set up our expectations about the results of GETting the url - in this case, we expect the status code to be 200. This is the basic structure of Alarmageddon’s validations: a validation takes some action and compares the results to the supplied expectations.
Creating Publishers¶
Of course, if no one knows a validation has failed, it isn’t particularly useful. To have Alarmageddon report on failures, we must supply it with at least one publisher:
from alarmageddon.publishing.hipchat import HipChatPublisher
publishers = []
hipchat_endpoint = "127.0.0.1"
hipchat_token = "token"
environment = "stable"
room = "hipchat_room"
publishers.append(HipChatPublisher(hipchat_endpoint, hipchat_token, environment, room))
This publisher will report failures to hipchat. Note that this example won’t work - you’ll need to supply a valid endpoint and token!
Running Alarmageddon¶
Given a set of validations and a set of publishers, we can run Alarmageddon:
import alarmageddon
alarmageddon.run_tests(validations,publishers)
This will run the validations. If any failures occur, a message will be passed along to the designated HipChat room. In this case, the resulting message might look something like:
1 failure(s) in stable: (failed) GET http://www.google.com Description: expected status code: 200, actual status code: 504 (Gateway Time-out)
Full Code¶
Here’s the full source of this example:
import alarmageddon
from alarmageddon.validations.http import HttpValidation
from alarmageddon.publishing.hipchat import HipChatPublisher
validations = []
validations.append(HttpValidation.get("http://www.google.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.bing.com").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.yahoo.com").expect_status_codes([200]))
publishers = []
hipchat_endpoint = "127.0.0.1"
hipchat_token = "token"
environment = "stable"
room = "hipchat_room"
publishers.append(HipChatPublisher(hipchat_endpoint, hipchat_token, environment, room))
alarmageddon.run_tests(validations,publishers)
Validations¶
A validation performs some action and then checks the results of that action against a set of expectations. Alarmageddon comes with validations for checking the results of HTTP calls, checking the output of SSH commands, and checking the length of RabbitMQ queues.
All validations accept a priority
argument. This should be one of Priority.LOW
, Priority.NORMAL
, or Priority.CRITICAL
. This priority level is used to determine whether or not a publisher should publish the results of the validation.
HTTP¶
You can create HttpValidations for various HTTP methods:
HttpValidation.get("http://www.google.com")
HttpValidation.post("http://www.google.com",data={key:value})
HttpValidation.put("http://www.google.com"data={key:value})
HttpValidation.options("http://www.google.com")
HttpValidation.head("http://www.google.com")
You can change the timeout length:
HttpValidation.get("http://www.google.com", timeout=10)
Or designate a number of retry attempts:
HttpValidation.get("http://www.google.com", retries=10)
You can supply custom headers:
header = {"Authorization":"value"}
HttpValidation.get("http://www.google.com", headers=header)
If you’ve created a validation that you would like to apply to multiple hosts:
validation = HttpValidation.get("http://www.google.com")
hosts = ["http://www.bing.com","http://www.yahoo.com"]
new_validations = validation.duplicate_with_hosts(hosts)
An example of expectations on HttpValidations, where we expect to get either a 200 or 404 status code, and expect the result to contain JSON with the designated value:
validation = HttpValidation.get("url")
validation.expect_status_codes([200,404])
validation.expect_json_property_value("json.path.to.value","expected")
expect_json_property_value
accepts query string that allows you to pluck values from json. Consider the following json document
{“abc”: “123”, “another”: {“nested”: “entry”}, “alpha”: {“array”: [1, 2, 3, 4]}}
abc
will reference "123"
another.nested
will reference "entry"
array[4]
will reference 4
array[*]
will reference [1, 2, 3, 4]
SSH¶
To perform validations over SSH, you’ll need to supply the appropriate credentials:
ctx = SshContext("username","keyfile_path")
You can check the average load:
LoadAverageValidation(ctx).expect_max_1_minute_load(5, hosts=['127.0.0.1'])
You can verify that an upstart service is running:
UpstartServiceValidation(ctx, "service_name", hosts=['127.0.0.1'])
But ultimately, the above are just convenience classes for common use cases - you can perform arbitrary commands and check the output:
validation = SshCommandValidation(ctx, "validation name", "ps -ef | grep python", hosts=['127.0.0.1'])
validation.expect_output_contains("python")
Cassandra¶
Cassandra validations are a special case of SSH validations:
CassandraStatusValidation(ssh_ctx, hosts=['127.0.0.1'])
Kafka¶
Kafka validations will inspect your kafka partitions and leader elections. If a single partition has multiple leaders the validation will fail:
KafkaStatusValidation(ssh_ctx, zookeeper_nodes='127.0.0.1:2181,127.0.0.2:2181,127.0.0.3:2181',hosts=['127.0.0.1'])
RabbitMQ¶
As with SSH, you have to supply credentials for RabbitMQ Validations:
ctx = RabbitMqContext("127.0.0.1",80,"username","password")
Once you have the context, you can construct validations that check that the number of messages in a queue is less than some value. For example, the following will fail if the queue “queue_name” has more than 1000 messages in it:
RabbitMqValidation(ctx, "validation name", "queue_name", 1000)
Graphite¶
You also need a context for Graphite:
ctx = GraphiteContext("127.0.0.1")
Given the context, you can check statistics on various Graphite readings:
validation = GraphiteValidation(ctx, "validation name", "Errors")
validation.expect_average_in_range(1,10)
Validation Groups and GroupValidations¶
You may have a set of tests where individual failures are minor but multiple failures indicate a problem (eg, machines behind an HAProxy). Alarmageddon Validations include the notion of a validation group, which indicate that a set of validations belong together:
validations = []
validations.append(HttpValidation.get("http://www.google.com",group="a").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.yahoo.com",group="a").expect_status_codes([200]))
validations.append(HttpValidation.get("http://www.bing.com",group="a").expect_status_codes([200]))
In this case, we have three validations that belong to the validation group “a”. Now that we have a group, we can create a GroupValidation that contains expectations about the results of other validations:
validations.append(GroupValidation("Group a Validation", "a", normal_threshold=1, critical_threshold=2))
This new validation does not have an explicit priority level. Rather, it defaults to LOW priority. If the number of failures in group “a” reaches the normal_threshold, the validation will be considered a failure and the priority will become NORMAL. If it reaches the critical_threshold, the priority will become CRITICAL (and the validation will still be a failure).
You can create GroupValidations on groups of GroupValidations. The only difference is that an order
parameter must be passed, to ensure that the tests are run in the correct order:
validations.append(GroupValidation("Group a Validation", "a", normal_threshold=1, critical_threshold=2, group="c"))
validations.append(GroupValidation("Group b Validation", "b", normal_threshold=1, critical_threshold=2, group="c"))
validations.append(alarmageddon.validation.GroupValidation("Group c Validation", "c", normal_threshold=2, order=2))
Publishers¶
All publishers accept a priority_threshold
argument. This should be one of Priority.LOW
, Priority.NORMAL
, or Priority.CRITICAL
.
A publisher will only publish failing validations if they are at least as critical as the priority_threshold
. For example, to report on all failures, you should set your publisher’s priority_threshold
to Priority.LOW
.
JUnit XML¶
The JUnit XML publisher will write out all validation results to an XML file. This publisher is automatically created when you run the validations, and will write out to results.xml.
HipChat¶
The HipChat publisher will report failures to your hipchat room:
HipChatPublisher("hipchat.route.here","token","stable","hipchat_room")
By default, the HipChat publisher alerts on failures of NORMAL priority or higher.
Slack¶
The Slack publisher will report failures to your slack channel:
SlackPublisher("hook.url","stable")
hook.url
should be a slack incoming web hook integration
to the channel that should be published to.
By default, the Slack publisher alerts on failures of NORMAL priority or higher.
Http¶
The Http publisher will report failures to an HTTP Server:
HttpPubliser(success_url="success.url.here", success_url="failure.url.here")
PagerDuty¶
The PagerDuty publisher will report failures to PagerDuty:
PagerDutyPublisher("pagerduty.route.here", "pagerduty_key")
By default, the PagerDuty publisher alerts only on CRITICAL failures.
Graphite¶
The Graphite publisher behaves slightly differently than the other publishers. Instead of only logging failures, it logs both successes and failures, providing you with a way to keep track of how often certain validations are passing or failing:
GraphitePublisher("127.0.0.1",8080)
The GraphitePublisher will also keep track of how long the validations took, in the case of HttpValidations. By default, GraphitePublisher will publish on all validations.
Email¶
There are two email publishers. SimpleEmailPublisher provides basic emailing functionality, and will email all test results to the supplied addresses:
SimpleEmailPublisher({"real_name": "test", "address": "sender@test.com"},
[{"real_name": "test", "address": "recipient@test.com"}],
host='127.0.0.1', port=1234)
EmailPublisher provides more granular control over the sent messages. For this reason, validations that will be published by the email publisher must be enriched with extra information.
To create an email publisher, you need a config object with the appropriate values in it, and optionally a set of defaults for missing config values:
email_pub = EmailPublisher(config, defaults=general_defaults)
For enrichment, a convenience method is provided in emailer to ensure that the appropriate value are present:
emailer.enrich(validation, email_settings, runtime_context)
Note
For the email publisher to publish a failure, the priority threshold must be reached and the validation must be enriched.
Using the Email Publisher¶
Due to the extensive flexibility allowed by the email publisher, it involves more configuration than the other publishers. This page is intended to be a guide through that process.
Publisher Configuration¶
Constructing the Email Publisher is similar to other publishers:
EmailPublisher(config, priority_threshold=Priority.NORMAL, defaults=general_email_defaults)
config
is the usual Alarmageddon config object, but it must contain the following email specific field:
"email_template_directory" : "path/to/email/templates"
This specifies where the email jinja templates can be found.
There are a few optional fields as well. The full email settings might look like this:
{
"email_host" : null,
"email_port" : null,
"email_template_directory" : "email_templates",
"email_defaults" : {
"general" : {
"email_template" : "default.template",
"email_subject_template" : "default_subject.template",
"email_sender" : {"real_name" : "Alarmageddon Monitor", "address" : "noreply@host.com"},
"email_recipients" : [
{"real_name" : "Team", "address" : "team@host.com"}
],
"email_custom_message" : ""
},
}
}
email_host
and email_port
can be set programatically or included in conf.json. email_defaults
provides default information about what templates to use and extra fields that can be used in the template. You will have to programatically assign email_defaults to the publisher, as shown in the example constructor above.
Validation Enrichment¶
For the email publisher to successfully publish a message, the default information provided to each validation is not enough. To include this extra information, an enrichment function must be called on each validation:
emailer.enrich(validation,
email_settings=validation_settings,
runtime_context=email_runtime_context)
validation_settings
should be a python dictionary of the form:
{
"email_template" : "alert.template",
"email_subject_template" : "alert_subject.template",
"email_recipients" : [
{"real_name" : "Another Team", "address" : "another@host.com"}
],
"email_custom_message" : """The route located at {{test_name}} failed to respond within the alloted time frame.
The node may be offline or missing."""
}
Note that these fields can also appear in the Email Publisher defaults. If validation-specific fields are present, they will be used instead of the defaults.
runtime_context
is a dictionary that can contain arbitrary information to be used by the template.
Note
For a validation to be published by the Email Publisher, that validation must both be enriched and be of high enough priority.
You can use Alarmageddon’s dry run feature to verify that the validations that you intended to be published by the email publisher will actually be published by the email publisher.
Email Templates¶
The Email Publisher uses jinja2 to create its messages. An example email template is provided below:
Validation Failure in environment {{env}}:
{{test_name}} - {{test_description}}
{{email_custom_message}}
The email templates are files stored in the email_template_directory
.