Welcome to Python StatsD’s documentation!¶
statsd is a friendly front-end to Graphite. This is a Python client for the statsd daemon.



Code: | https://github.com/jsocol/pystatsd |
---|---|
License: | MIT; see LICENSE file |
Issues: | https://github.com/jsocol/pystatsd/issues |
Documentation: | https://statsd.readthedocs.io/ |
Quickly, to use:
>>> import statsd
>>> c = statsd.StatsClient('localhost', 8125)
>>> c.incr('foo') # Increment the 'foo' counter.
>>> c.timing('stats.timed', 320) # Record a 320ms 'stats.timed'.
You can also add a prefix to all your stats:
>>> import statsd
>>> c = statsd.StatsClient('localhost', 8125, prefix='foo')
>>> c.incr('bar') # Will be 'foo.bar' in statsd/graphite.
Installing¶
The easiest way to install statsd is with pip!
You can install from PyPI:
$ pip install statsd
Or GitHub:
$ pip install -e git+https://github.com/jsocol/pystatsd#egg=statsd
Or from source:
$ git clone https://github.com/jsocol/pystatsd
$ cd statsd
$ python setup.py install
Contents¶
Configuring Statsd¶
It’s easy to configure and use Statsd at runtime, but there are also two shortcuts available.
Runtime¶
If you are running the statsd server locally and on the default port, it’s extremely easy:
from statsd import StatsClient
statsd = StatsClient()
statsd.incr('foo')
There are several arguments to configure your StatsClient
instance.
They, and their defaults, are:
from statsd import StatsClient
statsd = StatsClient(host='localhost',
port=8125,
prefix=None,
maxudpsize=512,
ipv6=False)
host
is the host running the statsd server. It will support any kind of
name or IP address you might use.
port
is the statsd server port. The default for both server and client is
8125
.
prefix
helps distinguish multiple applications or environments using the
same statsd server. It will be prepended to all stats, automatically. For
example:
from statsd import StatsClient
foo_stats = StatsClient(prefix='foo')
bar_stats = StatsClient(prefix='bar')
foo_stats.incr('baz')
bar_stats.incr('baz')
will produce two different stats, foo.baz
and bar.baz
. Without the
prefix
argument, or with the same prefix
, two StatsClient
instances
will update the same stats.
New in version 2.0.3.
maxudpsize
specifies the maximum packet size statsd will use. This is an
advanced option and should not be changed unless you know what you are doing.
Larger values then the default of 512 are generally deemed unsafe for use on
the internet. On a controlled local network or when the statsd server is
running on 127.0.0.1 larger values can decrease the number of UDP packets when
pipelining many metrics. Use with care!
New in version 3.2.
ipv6
tells the client explicitly to look up the host using IPv6 (True
)
or IPv4 (False
).
Note
Python will will inherently bind to an ephemeral port on all interfaces (0.0.0.0) for each configured client. This is due to the underlying Sockets API in the operating system/kernel. It is safe to block incoming traffic on your firewall if you wish.
TCP Clients¶
TCP-based clients have an additional timeout
argument,
which defaults to None
, and is passed to settimeout.
UnixSocket Clients¶
UnixSocket-based clients have a single required
socket_path
argument instead of host
and port
.
In Django¶
If you are using Statsd in a Django application, you can configure a default
StatsClient
in the Django settings. All of these settings are
optional.
Here are the settings and their defaults:
STATSD_HOST = 'localhost'
STATSD_PORT = 8125
STATSD_PREFIX = None
STATSD_MAXUDPSIZE = 512
STATSD_IPV6 = False
You can use the default StatsClient
simply:
from statsd.defaults.django import statsd
statsd.incr('foo')
From the Environment¶
StatsD isn’t only useful in Django or on the web. A default instance can also be configured via environment variables.
Here are the environment variables and their defaults:
STATSD_HOST=localhost
STATSD_PORT=8125
STATSD_PREFIX=None
STATSD_MAXUDPSIZE=512
STATSD_IPV6=0
and then in your Python application, you can simply do:
from statsd.defaults.env import statsd
statsd.incr('foo')
Note
As of version 3.0, this default instance is always available, configured with the default values, unless overridden by the environment.
Data Types¶
The statsd server supports a number of different data types, and performs different aggregation on each of them. The three main types are counters, timers, and gauges.
The statsd server collects and aggregates in 30 second intervals before flushing to Graphite. Graphite usually stores the most recent data in 1-minute averaged buckets, so when you’re looking at a graph, for each stat you are typically seeing the average value over that minute.
Counters¶
Counters are the most basic and default type. They are treated as a count of a type of event per second, and are, in Graphite, typically averaged over one minute. That is, when looking at a graph, you are usually seeing the average number of events per second during a one-minute period.
The statsd server collects counters under the stats
prefix.
Counters are managed with the StatsClient.incr()
and
StatsClient.decr()
methods:
from statsd import StatsClient
statsd = StatsClient()
statsd.incr('some.event')
You can increment a counter by more than one by passing a second parameter:
statsd.incr('some.other.event', 10)
You can also use the rate
parameter to produce sampled data. The statsd
server will take the sample rate into account, and the StatsClient
will only send data rate
percent of the time. This can help the statsd
server stay responsive with extremely busy applications.
rate
is a float between 0 and 1:
# Increment this counter 10% of the time.
statsd.incr('some.third.event', rate=0.1)
Because the statsd server is aware of the sampling, it will still show you the true average rate per second.
You can also decrement counters. The StatsClient.decr()
method takes
the same arguments as incr
:
statsd.decr('some.other.event')
# Decrease the counter by 5, 15% sample.
statsd.decr('some.third.event', 5, rate=0.15)
Timers¶
Timers are meant to track how long something took. They are an invaluable tool for tracking application performance.
The statsd server collects all timers under the stats.timers
prefix, and
will calculate the lower bound, mean, 90th percentile, upper bound, and count
of each timer for each period (by the time you see it in Graphite, that’s
usually per minute).
- The lower bound is the lowest value statsd saw for that stat during that time period.
- The mean is the average of all values statsd saw for that stat during that time period.
- The 90th percentile is a value x such that 90% of all the values statsd saw for that stat during that time period are below x, and 10% are above. This is a great number to try to optimize.
- The upper bound is the highest value statsd saw for that stat during that time period.
- The count is the number of timings statsd saw for that stat during that time period. It is not averaged.
The statsd server only operates in millisecond timings. Everything should be converted to milliseconds.
The rate
parameter will sample the data being sent to the statsd server,
but in this case it doesn’t make sense for the statsd server to take it into
account (except possibly for the count value, but then it would be lying
about how much data it averaged).
See the timing documentation for more detail on using timers with Statsd.
Gauges¶
Gauges are a constant data type. They are not subject to averaging, and they don’t change unless you change them. That is, once you set a gauge value, it will be a flat line on the graph until you change it again.
Gauges are useful for things that are already averaged, or don’t need to reset
periodically. System load, for example, could be graphed with a gauge. You
might use StatsClient.incr()
to count the number of logins to a
system, but a gauge to track how many active WebSocket connections you have.
The statsd server collects gauges under the stats.gauges
prefix.
The StatsClient.gauge()
method also support the rate
parameter to
sample data back to the statsd server, but use it with care, especially with
gauges that may not be updated very often.
Gauge Deltas¶
Gauges may be updated (as opposed to set) by setting the delta
keyword
argument to True
. For example:
statsd.gauge('foo', 70) # Set the 'foo' gauge to 70.
statsd.gauge('foo', 1, delta=True) # Set 'foo' to 71.
statsd.gauge('foo', -3, delta=True) # Set 'foo' to 68.
Note
Support for gauge deltas was added to the server in 0.6.0.
Sets¶
Sets count the number of unique values passed to a key.
For example, you could count the number of users accessing your system using:
statsd.set('users', userid)
If StatsClient.set()
is called multiple times with the same userid
in the same sample period, that userid will only be counted once.
Using Timers¶
Timers are an incredibly powerful tool for tracking application performance. Statsd provides a number of ways to use them to instrument your code.
There are four ways to use timers.
Calling timing
manually¶
The simplest way to use a timer is to record the time yourself and send
it manually, using the StatsClient.timing()
method:
import time
from datetime import datetime
from statsd import StatsClient
statsd = StatsClient()
# Pass milliseconds directly
start = time.time()
time.sleep(3)
# You must convert to milliseconds:
dt = int((time.time() - start) * 1000)
statsd.timing('slept', dt)
# Or pass a timedelta
start = datetime.utcnow()
time.sleep(3)
dt = datetime.utcnow() - start
statsd.timing('slept', dt)
Using a context manager¶
The StatsClient.timer()
method will return a Timer
object that can be used as both a context manager and a thread-safe decorator.
When used as a context manager, it will automatically report the time taken for the inner block:
from statsd import StatsClient
statsd = StatsClient()
with statsd.timer('foo'):
# This block will be timed.
for i in xrange(0, 100000):
i ** 2
# The timing is sent immediately when the managed block exits.
Using a decorator¶
Timer
objects can be used to decorate a method in a thread-safe
manner. Every time the decorated function is called, the time it took to
execute will be sent to the statsd server.
from statsd import StatsClient
statsd = StatsClient()
@statsd.timer('myfunc')
def myfunc(a, b):
"""Calculate the most complicated thing a and b can do."""
# Timing information will be sent every time the function is called.
myfunc(1, 2)
myfunc(3, 7)
Using a Timer object directly¶
New in version 2.1.
Timer
objects function as context managers and as decorators, but
they can also be used directly. (Flat is, after all, better than nested.)
from statsd import StatsClient
statsd = StatsClient()
foo_timer = statsd.timer('foo')
foo_timer.start()
# Do something fun.
foo_timer.stop()
When Timer.stop()
is called, a timing stat will
automatically be sent to StatsD. You can over ride this behavior with the
send=False
keyword argument to stop()
:
foo_timer.stop(send=False)
foo_timer.send()
Use Timer.send()
to send the stat when you’re ready.
Note
This use of timers is compatible with Pipelines
but the send()
method may not behave exactly as expected. Timing data
must be sent, either by calling stop()
without send=False
or
calling send()
explicitly, in order for it to be included in the
pipeline. However, it will not be sent immediately.
with statsd.pipeline() as pipe:
foo_timer = pipe.timer('foo').start()
# Do something...
pipe.incr('bar')
foo_timer.stop() # Will be sent when the managed block exits.
with statsd.pipeline() as pipe:
foo_timer = pipe.timer('foo').start()
# Do something...
pipe.incr('bar')
foo_timer.stop(send=False) # Will not be sent.
foo_timer.send() # Will be sent when the managed block exits.
# Do something else...
with statsd.pipeline() as pipe:
foo_timer = pipe.timer('foo').start()
pipe.incr('bar')
# Do something...
foo_timer.stop(send=False) # Data will _not_ be sent
Pipelines¶
The Pipeline
class is a subclass of StatsClient
that
batches together several stats before sending. It implements the entire client
interface, plus a send()
method.
Pipeline
objects should be created with
StatsClient.pipeline()
:
client = StatsClient()
pipe = client.pipeline()
pipe.incr('foo')
pipe.decr('bar')
pipe.timing('baz', 520)
pipe.send()
No stats will be sent until send()
is called, at
which point they will be packed into as few UDP packets as possible.
As a Context Manager¶
Pipeline
objects can also be used as context managers:
with StatsClient().pipeline() as pipe:
pipe.incr('foo')
pipe.decr('bar')
Pipeline.send()
will be called automatically when the managed block
exits.
Thread Safety¶
While StatsClient
instances are considered thread-safe (or at least
as thread-safe as the standard library’s socket.send
is),
Pipeline
instances are not thread-safe. Storing stats for later
creates at least two important race conditions in a multi-threaded environment.
You should create one Pipeline
per-thread, if necessary.
TCPStatsClient¶
statsd = TCPStatsClient(host='1.2.3.4', port=8126, timeout=1.)
The TCPStatsClient
class has a very similar interface to
StatsClient
, but internally it uses TCP connections instead of UDP.
These are the main differences when using TCPStatsClient
compared to
StatsClient
:
- The constructor supports a
timeout
parameter to set a timeout on all socket actions. connect()
and all methods that send data can potentially raise socket exceptions.- It is not thread-safe, so it is recommended to not share it across threads unless a lot of attention is paid to make sure that no two threads ever use it at once.
UnixSocketStatsClient¶
statsd = UnixSocketStatsClient(socket_path='/var/run/stats.sock')
The UnixSocketStatsClient
class has a very similar interface to
TCPStatsClient
, but internally it uses Unix Domain sockets instead
of TCP. These are the main differences when using UnixSocketStatsClient
compared to StatsClient
:
- The
socket_path
parameter is required. It has no default. - The
host
,port
andipv6
parameters are not allowed. - The application process must have permission to write to the socket.
API Reference¶
The StatsClient
provides accessors for all the types of data the statsd
server supports.
Note
Each public stats API method supports a rate
parameter, but statsd
doesn’t always use it the same way. See the Data Types for more
information.
-
class
StatsClient
(host='localhost', port=8125, prefix=None, maxudpsize=512)¶ Create a new
StatsClient
instance with the appropriate connection and prefix information.Parameters: - host (str) – the hostname or IP address of the statsd server
- port (int) – the port of the statsd server
- prefix (str or None) – a prefix to distinguish and group stats from an application or environment
- maxudpsize (int) – the largest safe UDP packet to send. 512 is generally considered safe for the public internet, but private networks may support larger packet sizes.
-
StatsClient.
close
()¶ Close the underlying UDP socket.
-
StatsClient.
incr
(stat, count=1, rate=1)¶ Increment a counter.
Parameters: - stat (str) – the name of the counter to increment
- count (int) – the amount to increment by. Typically an integer. May be
negative, but see also
decr()
. - rate (float) – a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server will take the sample rate into account for counters.
-
StatsClient.
decr
(stat, count=1, rate=1)¶ Decrement a counter.
Parameters: - stat (str) – the name of the counter to increment
- count (int) – the amount to increment by. Typically an integer. May be
negative, but that will have the impact of incrementing the counter but
see also
incr()
. - rate (float) – a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server will take the sample rate into account for counters
-
StatsClient.
gauge
(stat, value, rate=1, delta=False)¶ Set a gauge value.
Parameters: - stat (str) – the name of the gauge to set
- value (int or float) – the current value of the gauge
- rate (float) – a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server does not take the sample rate into account for gauges. Use with care
- delta (bool) – whether or not to consider this a delta value or an absolute value. See the gauge type for more detail
Note
Gauges were added to the statsd server in version 0.1.1.
Note
Gauge deltas were added to the statsd server in version 0.6.0.
-
StatsClient.
set
(stat, value, rate=1)¶ Increment a set value.
Parameters: - stat (str) – the name of the set to update
- value – the unique value to count
- rate (float) – a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server does not take the sample rate into account for sets. Use with care.
Note
Sets were added to the statsd server in version 0.6.0.
-
StatsClient.
timing
(stat, delta, rate=1)¶ Record timer information.
Parameters: - stat (str) – the name of the timer to use
- delta (int or float or datetime.timedelta) – the number of milliseconds whatever action took.
datetime.timedelta
objects will be converted to milliseconds - rate (float) – a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server does not take the sample rate into account for timers.
-
StatsClient.
timer
(stat, rate=1)¶ Return a
Timer
object that can be used as a context manager or decorator to automatically record timing for a block or function call. See also the chapter on timing.Parameters: - stat (str) – the name of the timer to use
- rate (float) – a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server does not take the sample rate into account for timers.
with StatsClient().timer(stat, rate=1):
pass
# or
@StatsClient().timer(stat, rate=1)
def foo():
pass
# or (see below for more Timer methods)
timer = StatsClient().timer('foo', rate=1)
with timer:
pass
@timer
def bar():
pass
-
StatsClient.
pipeline
()¶ Returns a
Pipeline
object for collecting several stats. Can also be used as a context manager.
pipe = StatsClient().pipeline()
pipe.incr('foo')
pipe.send()
# or
with StatsClient().pipeline() as pipe:
pipe.incr('bar')
-
class
Timer
¶ The Timer objects returned by
StatsClient.timer()
. These should never be instantiated directly.
Timer
objects should not be shared between threads (except when
used as decorators, which is thread-safe) but could be used within another
context manager or decorator. For example:
@contextmanager
def my_context():
timer = statsd.timer('my_context_timer')
timer.start()
try:
yield
finally:
timer.stop()
Timer
objects may be reused by calling start()
again.
-
Timer.
start
()¶ Causes a timer object to start counting. Called automatically when the object is used as a decorator or context manager. Returns the timer object for simplicity.
-
Timer.
stop
(send=True)¶ Causes the timer object to stop timing and send the results to statsd. Can be called with
send=False
to prevent immediate sending immediately, and usesend()
. Called automatically when the object is used as a decorator or context manager. Returns the timer object.If
stop()
is called beforestart()
, aRuntimeError
is raised.Parameters: send (bool) – Whether to automatically send the results
timer = StatsClient().timer('foo').start()
timer.stop()
-
Timer.
send
()¶ Causes the timer to send any unsent data. If the data has already been sent, or has not yet been recorded, a
RuntimeError
is raised.
timer = StatsClient().timer('foo').start()
timer.stop(send=False)
timer.send()
Note
See the note about timer objects and pipelines.
-
class
Pipeline
¶ A Pipeline object that can be used to collect and send several stats at once. Useful for reducing network traffic and speeding up instrumentation under certain loads. Can be used as a context manager.
Pipeline extends
StatsClient
and has all associated methods.
pipe = StatsClient().pipeline()
pipe.incr('foo')
pipe.send()
# or
with StatsClient().pipeline as pipe:
pipe.incr('bar')
-
Pipeline.
send
()¶ Causes the
Pipeline
object to send all batched stats in as few packets as possible.
-
class
TCPStatsClient
(host='localhost', port=8125, prefix=None, timeout=None, ipv6=False)¶ Create a new
TCPStatsClient
instance with the appropriate connection and prefix information.Parameters: - host (str) – the hostname or IP address of the statsd server
- port (int) – the port of the statsd server
- prefix (str or None) – a prefix to distinguish and group stats from an application or environment.
- timeout (float) – socket timeout for any actions on the connection socket.
TCPStatsClient
implements all methods of StatsClient
, including
pipeline()
, with the difference that it is
not thread safe and it can raise exceptions on connection errors. Unlike
StatsClient
it uses a TCP connection to communicate with StatsD.
In addition to the stats methods, TCPStatsClient
supports the following
TCP-specific methods.
-
TCPStatsClient.
close
()¶ Closes a connection that’s currently open and deletes it’s socket. If this is called on a
TCPStatsClient
which currently has no open connection it is a non-action.
from statsd import TCPStatsClient
statsd = TCPStatsClient()
statsd.incr('some.event')
statsd.close()
-
TCPStatsClient.
connect
()¶ Creates a connection to StatsD. If there are errors like connection timed out or connection refused, the according exceptions will be raised. It is usually not necessary to call this method because sending data to StatsD will call
connect
implicitly if the current instance ofTCPStatsClient
does not already hold an open connection.
from statsd import TCPStatsClient
statsd = TCPStatsClient()
statsd.incr('some.event') # calls connect() internally
statsd.close()
statsd.connect() # creates new connection
-
TCPStatsClient.
reconnect
()¶ Closes a currently existing connection and replaces it with a new one. If no connection exists already it will simply create a new one. Internally this does nothing else than calling
close()
andconnect()
.
from statsd import TCPStatsClient
statsd = TCPStatsClient()
statsd.incr('some.event')
statsd.reconnect() # closes open connection and creates new one
-
class
UnixSocketStatsClient
(socket_path, prefix=None, timeout=None)¶ A version of
StatsClient
that communicates over Unix sockets. It implements all methods ofStatsClient
.Parameters: - socket_path (str) – the path to the (writeable) Unix socket
- prefix (str or None) – a prefix to distinguish and group stats from an application or environment
- timeout (float) – socket timeout for any actions on the connection socket.
Contributing¶
I happily accept patches if they make sense for the project and work well. If you aren’t sure if I’ll merge a patch upstream, please open an issue and describe it.
Patches should meet the following criteria before I’ll merge them:
- All existing tests must pass.
- Bugfixes and new features must include new tests or asserts.
- Must not introduce any PEP8 or PyFlakes violations.
I recommend doing all development in a virtualenv, though this is really up to you.
It would be great if new or changed features had documentation and
included updates to the CHANGES
file, but it’s not totally
necessary.
Running Tests¶
To run the tests, you just need nose2
. This can be installed with
pip
:
$ mkvirtualenv statsd
$ pip install -r requirements.txt
$ nose2
You can also run the tests with tox:
$ tox
Tox will run the tests in Pythons 2.5, 2.6, 2.7, 3.2, 3.3, 3.4, and PyPy, if they’re available.
Writing Tests¶
New features or bug fixes should include tests that fail without the relevant code changes and pass with them.
For example, if there is a bug in the StatsClient._send
method, a
new test should demonstrate the incorrect behavior by failing, and the
associated changes should fix it. The failure can be a FAILURE or an
ERROR.
Tests and the code to fix them should be in the same commit. Bisecting should not stumble over any otherwise known failures.
Note
Pull requests that only contain tests to demonstrate bugs are welcome, but they will be squashed with code changes to fix them.
PEP8 and PyFlakes¶
The development requirements (requirements.txt
) include the
flake8
tool. It is easy to run:
$ flake8 statsd/
flake8
should not raise any issues or warnings.
Note
The docs directory includes a Sphinx-generated conf.py that has several violations. That’s fine, don’t worry about it.
Documentation¶
The documentation lives in the docs/
directory and is automatically
built and pushed to ReadTheDocs.
If you change or add a feature, and want to update the docs, that would
be great. New features may need a new chapter. You can follow the
examples already there, and be sure to add a reference to
docs/index.rst
. Changes or very small additions may just need a new
heading in an existing chapter.