Welcome to event-tracking’s documentation!¶
Contents:
Overview¶
Part of edX code.
Event Tracking library
¶
The event-tracking
library tracks context-aware semi-structured system events.
It captures and stores events with nested data structures in order to truly
take advantage of schemaless data storage systems.
Key features:
- Multiple backends - define custom backends that can be used to persist your event data.
- Nested contexts - allows data to be injected into events even without having to pass around all of said data to every location where the events are emitted.
- Django integration - provides a Django app that allows context aware events to easily be captured by multi-threaded web applications.
- MongoDB integration - support writing events out to a mongo collection.
Example:
from eventtracking import tracker
tracker = tracker.get_tracker()
tracker.enter_context('outer', {'user_id': 10938})
tracker.emit('navigation.request', {'url': 'http://www.edx.org/some/path/1'})
with tracker.context({'user_id': 11111, 'session_id': '2987lkjdyoioey'}):
tracker.emit('navigation.request', {'url': 'http://www.edx.org/some/path/2'})
tracker.emit(
'address.create',
{
'name': 'foo',
'address': {
'postal_code': '90210',
'country': 'United States'
}
}
)
Running the above example produces the following events:
{
"name": "navigation.request",
"timestamp": ...,
"context": {
"user_id": 10938
},
"data": {
"url": "http://www.edx.org/some/path/1"
}
},
{
"name": "navigation.request",
"timestamp": ...,
"context": {
"user_id": 11111,
"session_id": "2987lkjdyoioey"
},
"data": {
"url": "http://www.edx.org/some/path/2"
}
},
{
"name": "address.create",
"timestamp": ...,
"context": {
"user_id": 10938
},
"data": {
"name": "foo",
"address": {
"postal_code": "90210",
"country": "United States"
}
}
}
Configuration¶
Configuration for event-tracking
takes the form of a tree of backends. When a Tracker
is instantiated, it creates a root RoutingBackend
object using the top-level backends and processors that are passed to it. (Or in the case of the DjangoTracker
, the backends and processors are constructed according to the appropriate Django settings.)
In this RoutingBackend
, each event is first passed through the chain of processors in series, and then distributed to each backend in turn. Theoretically, these backends might be the Mongo, Segment, or logger backends, but in practice these are wrapped by another layer of RoutingBackend
. This allows each one to have its own set of processors that are not shared with other backends, allowing independent filtering or event emit cancellation.
Asynchronous Routing¶
Considering the volume of the events being generated, we would want to avoid processing events in the main thread that could cause delays in response depending upon the operations and event processors.
event-tracking
provides a solution for this i.e. AsyncRoutingBackend
.
It extends RoutingBackend
but performs its operations asynchronously.
It can:
- Process event through the configured processors.
- If the event is processed successfully, pass it to the configured backends.
Handling the operations asynchronously would avoid overburdening the main thread and pass the intensive processing tasks to celery workers.
Limitations: Although backends for RoutingBackend
can be configured
at any level of EVENT_TRACKING_BACKENDS
configuration tree,
AsyncRoutingBackend
only supports backends defined at the root level of
EVENT_TRACKING_BACKENDS
setting. It is also only possible to use it
successfully from the default tracker.
An example configuration for AsyncRoutingBackend
is provided below:
EVENT_TRACKING_BACKENDS = {
'caliper': {
'ENGINE': 'eventtracking.backends.async_routing.AsyncRoutingBackend',
'OPTIONS': {
'backend_name': 'caliper',
'processors': [
{
'ENGINE': 'eventtracking.processors.regex_filter.RegexFilter',
'OPTIONS':{
'filter_type': 'allowlist',
'regular_expressions': [
'edx.course.enrollment.activated',
'edx.course.enrollment.deactivated',
]
}
}
],
'backends': {
'caliper': {
'ENGINE': 'dummy.backend.engine',
'OPTIONS': {
...
}
}
},
},
},
'tracking_logs': {
...
}
...
}
Roadmap¶
In the very near future the following features are planned:
- Dynamic event documentation and event metadata - allow event emitters to document the event types, and persist this documentation along with the events so that it can be referenced during analysis to provide context about what the event is and when it is emitted.
Documentation¶
Latest documentation (Hosted on Read the Docs)
License¶
The code in this repository is licensed under version 3 of the AGPL unless otherwise noted.
Please see LICENSE.txt
for details.
Reporting Security Issues¶
Please do not report security issues in public. Please email security@edx.org
Mailing List and IRC Channel¶
You can discuss this code on the edx-code Google Group or in the
edx-code
IRC channel on Freenode.
User Guide¶
Note
This is a proposed design and has not yet been fully implemented in the code.
Design¶
Interface¶
Python¶
-
tracker.
register
(name, description, field_descriptions)¶
name: | A unique identification string for this type of event |
---|---|
description: | A description of the event and the conditions under which it is emitted |
field_descriptions: | |
A dictionary mapping field names to a long form description |
The documentation for each field is saved and used to generate navigable documentation that is shipped with the event log so that users can have some context about the various parameters in the event. Calling this method is optional, and any events emitted without first registering the event type will simply not include a reference to the event metadata.
Note
Field values can be set to any serializable object of arbitrary complexity, however, they must be documented in the documentation text for the field that will contain the object.
Any events emitted with that event type after the registration will contain a reference back to the data generated by the last call to tracking.register() for that event type.
Example:
from eventtracking import tracker
tracker.register(
'edx.navigation.request',
'A user visited a page',
{
'url': 'The url of the page visited.',
'method': 'The HTTP method for the request, can be GET, POST, PUT, DELETE etc.',
'user_agent': 'The user agent string provided by the user's browser.'
'parameters': 'All GET and POST parameters. Note this excludes passwords.'
}
)
-
tracker.
emit
(name, field_values)¶
name: | A unique identification string for an event that has already been registered. |
---|---|
field_values: | A dictionary mapping field names to the value to include in the event. Note that all values provided must be serializable. |
Regardless of previous state or configuration, the data will always be logged, however, in the following conditions will cause a warning to be logged:
- the event type is unregistered
- the data contains a field that was not included in the registered event type
- the data is missing a field that was included in the registered event type
- the field_values are not serializable
- the estimated serialized event size is greater than the maximum supported
-
tracker.
enter_context
(name, context, description, field_descriptions)¶
context: | A dictionary of key-value pairs that will be included in every event emitted after this call. Values defined in this dictionary will override any previous calls to push_context with maps that contain the same key. |
---|---|
name: | A unique identification string for this type of context. |
description: | A clear description of the conditions under which this context is included. |
field_descriptions: | |
A dictionary mapping field names to a long form description. |
Pushes a new context on to the context stack. This context will be applied to every event emitted after this call.
-
tracker.
exit_context
(name)¶
Removes the named context from the stack.
Javascript¶
-
Tracker.
emit
(name, field_values)¶
name: | A unique identification string for an event that has already been registered. |
---|---|
field_values: | An object mapping field names to the value to include in the event. Note that all values provided must be serializable. |
See the documentation for the Python API.
Additionally, the behaviour of this function can be customized to direct events to arbitrary back-ends, and/or pre-process them before transmission to the server.
Event Type Metadata¶
The metadata for all registered event types is persisted along with a unique identifier. After registering metadata for an event type, all events emitted with that event type will contain a reference to the metadata that corresponds to that registration of the event type.
Note
The same event type may be registered multiple times with different metadata in the normal case due to revisions to the schema. This use case is supported and a new metadata record will be created for the new schema and linked to all future events of that type, while the old metadata will remain available for reference.
Nested Context Stack¶
The context stack is designed to simplify the process of including context in your events without having to have that context available at every location where the event might be emitted. It is rather cumbersome to have to pass around an HTTP request object for the sole purpose of gathering context out of it when emitting events. To aide this process you can define nested scopes which add information to the context when entered and remove information from the context when exited.
Example Scopes:
- Process
- Request
- View
Conceptually this is accomplished using a stack of dictionaries to hold all of the contexts. Contexts can be pushed on to and popped off of the stack. When an event is emitted the values for each key are included in the event metadata. Note that if multiple dictionaries on the stack contain the same key, the value from the most recently pushed context is used and the remaining values are ignored.
Example:
from eventtracking import tracker
tracker.enter_context('request', {'user_id': 10938})
tracker.emit('navigation.request', {'url': 'http://www.edx.org/some/path/1'})
tracker.enter_context('session', {'user_id': 11111, 'session_id': '2987lkjdyoioey'})
tracker.emit('navigation.request', {'url': 'http://www.edx.org/some/path/2'})
tracker.exit_context('session')
tracker.emit('navigation.request', {'url': 'http://www.edx.org/some/path/3'})
# The following list shows the contexts and data for the three events that are emitted
# "context": { "user_id": 10938 }, "data": { "url": "http://www.edx.org/some/path/1" }
# "context": { "user_id": 11111, "session_id": "2987lkjdyoioey" }, "data": { "url": "http://www.edx.org/some/path/2" }
# "context": { "user_id": 10938 }, "data": { "url": "http://www.edx.org/some/path/3" }
Best Practices¶
- It is recommended that event types are namespaced using dot notation to avoid naming collisions, similar to DNS names. For example: edx.video.stop, mit.audio.stop
- Avoid using event type names that may cause collisions. The burden is on the analyst to decide whether your event is equivalent to another and should be grouped accordingly etc.
- Do not emit events that you don’t own. This could negatively impact the analysis of the event stream. If you suspect your event is equivalent to another, say so in your documenation, and the analyst can decide whether or not to group them.
Sample Usage¶
Emitting an unregistered event:
tracker.emit('edx.problem.show_answer', {'problem_id': 'i4x://MITx/6.00x/problem/L15:L15_Problem_2'})
Emitting a registered event:
tracker.register('edx.problem.show_answer', 'An answer was shown for a problem', {'problem_id': 'A unique problem identifier'})
tracker.emit('edx.problem.show_answer', {'problem_id': 'i4x://MITx/6.00x/problem/L15:L15_Problem_2'})
Emitting an event with context:
tracker.enter_context('request', {'user_id': '1234'})
try:
tracker.emit('edx.problem.show_answer', {'problem_id': 'i4x://MITx/6.00x/problem/L15:L15_Problem_2'})
finally:
tracker.exit_context('request')
Sample Events¶
Show Answer:
{
"name": "edx.problem.show_answer",
"timestamp": "2013-09-12T12:55:00.12345+00:00",
"name_id": "10ac28",
"context_type_id": "11bd88",
"context": {
"course_id":"",
"user_id": "",
"session_id": "",
"org_id": "",
"origin": "client"
}
"data": {
"problem_id": "i4x://MITx/6.00x/problem/L15:L15_Problem_2"
}
}
Sample Event Type Metadata¶
For the edx.problem.show_answer event type.
schema_id | name | description | timestamp | stack_trace |
---|---|---|---|---|
10ac28 | edx.problem.show_answer | An answer was shown for a problem | 2013-09-12T12:05:00-00:00 | … |
11bd88 | edX context | 2013-09-12T12:05:01-00:00 | … |
schema_field_id | schema_id | name | description |
---|---|---|---|
25 | 10ac28 | problem_id | A unique problem identifier |
26 | 11bd88 | course_id | A unique course identifier |
… | 11bd88 | … | … |
40 | 11bd88 | origin | client || server |
Sample Event Schema¶
Events can be serialized into any format. Here is an example JSON serialization format that could be used to store events.
Event Schema:
{
"type":"object",
"$schema": "http://json-schema.org/draft-03/schema",
"id": "http://edx.org/event",
"required":true,
"title": "Event",
"description": "An event emitted from the edx platform.",
"properties":{
"name": {
"type": "string",
"id": "http://edx.org/event/name",
"description": "A unique identifier for this type of event.",
"required": true
},
"timestamp": {
"type": "string",
"id": "http://edx.org/event/timestamp",
"description": "The UTC time the event was emitted in RFC-3339 format.",
"required": true
}
"name_id": {
"type": "string",
"id": "http://edx.org/event/name_id",
"description": "A unique reference to the metadata for this event type.",
"required": false
},
"context_type_id": {
"type": "string",
"id": "http://edx.org/event/context_type_id",
"description": "A unique reference to the metadata for this context.",
"required": false
},
"context": {
"type": "object",
"id": "http://edx.org/event/context",
"description": "Context for the event that was not explicitly provided during emission.",
"required": false,
"additionalProperties":true
},
"data": {
"type":"object",
"id": "http://edx.org/event/data",
"description": "All custom fields and values provided during emission."
"required": false,
"additionalProperties": true
},
}
}
API Reference¶
eventtracking¶
A simple event tracking library
eventtracking.backends¶
Event tracking backend module.
eventtracking.backends.mongodb¶
MongoDB event tracker backend.
eventtracking.backends.logger¶
Event tracker backend that saves events to a python logger.
eventtracking.backends.routing¶
Route events to processors and backends
-
class
eventtracking.backends.routing.
RoutingBackend
(backends=None, processors=None)[source]¶ Bases:
object
Route events to the appropriate backends.
A routing backend has two types of components:
- Processors - These are run sequentially, processing the output of the previous processor. If you had three processors [a, b, c], the output of the processing step would be c(b(a(event))). Note that for performance reasons, the processor is able to actually mutate the event dictionary in-place. Event dictionaries may be large and highly nested, so creating multiple copies could be problematic. A processor can also choose to prevent the event from being emitted by raising EventEmissionExit. Doing so will prevent any subsequent processors from running and prevent the event from being sent to the backends. Any other exception raised by a processor will be logged and swallowed, subsequent processors will execute and the event will be emitted.
- Backends - Backends are intended to not mutate the event and each receive the same event data. They are not chained like processors. Once an event has been processed by the processor chain, it is passed to each backend in the order that they were registered. Backends typically persist the event in some way, either by sending it to an external system or saving it to disk. They are called synchronously and in sequence, so a long running backend will block other backends until it is done persisting the event. Note that you can register another RoutingBackend as a backend of a RoutingBackend, allowing for arbitrary processing trees.
- backends is a collection that supports iteration over its items using iteritems(). The keys are expected to be
- sortable and the values are expected to expose a send(event) method that will be called for each event. Each backend in this collection is registered in order sorted alphanumeric ascending by key.
processors is an iterable of callables.
- Raises a ValueError if any of the provided backends do not have a callable “send” attribute or any of the
- processors are not callable.
-
process_event
(event)[source]¶ Executes all event processors on the event in order.
event is a nested dictionary that represents the event.
Logs and swallows all Exception except EventEmissionExit which is re-raised if it is raised by a processor.
Returns the modified event.
-
register_backend
(name, backend)[source]¶ Register a new backend that will be called for each processed event.
Note that backends are called in the order that they are registered.
-
register_processor
(processor)[source]¶ Register a new processor.
Note that processors are called in the order that they are registered.
eventtracking.backends.segment¶
Event tracking backend that sends events to segment.com
-
class
eventtracking.backends.segment.
SegmentBackend
[source]¶ Bases:
object
Send events to segment.com
It is assumed that other code elsewhere initializes the segment.com API and makes calls to analytics.identify.
Requires all emitted events to have the following structure (at a minimum):
{ 'name': 'something', 'context': { 'user_id': 10, } }
Additionally, the following fields can optionally be defined:
{ 'context': { 'agent': "your user-agent string", 'client_id': "your google analytics client id", 'host': "your hostname", 'ip': "your IP address", 'page': "your page", 'path': "your path", 'referer': "your referrer", } }
The ‘page’, ‘path’ and ‘referer’ are sent to Segment as “page” information. If the ‘page’ is absent but the ‘host’ and ‘path’ are present, these are used to create a URL value to substitute for the ‘page’ value.
Note that although some parts of the event are lifted out to pass explicitly into the Segment.com API, the entire event is sent as the payload to segment.com, which includes all context, data and other fields in the event.
eventtracking.django¶
eventtracking.processors¶
eventtracking.processors.whitelist¶
Filter out events whose names aren’t on a pre-configured whitelist
eventtracking.processors.exceptions¶
Custom exceptions that are raised by this package
-
exception
eventtracking.processors.exceptions.
EventEmissionExit
[source]¶ Bases:
Exception
Raising this exception indicates that no further processing of the event should occur and it should be dropped.
This should only be raised by processors.
eventtracking.tracker¶
Track application events. Supports persisting events to multiple backends.
Best Practices:
- It is recommended that event types are namespaced using dot notation to avoid naming collisions, similar to DNS names. For example: org.edx.video.stop, edu.mit.audio.stop
- Avoid using event type names that may cause collisions. The burden is on the analyst to decide whether your event is equivalent to another and should be grouped accordingly etc.
- Do not emit events that you don’t own. This could negatively impact the analysis of the event stream. If you suspect your event is equivalent to another, say so in your documenation, and the analyst can decide whether or not to group them.
-
class
eventtracking.tracker.
Tracker
(backends=None, context_locator=None, processors=None)[source]¶ Bases:
object
Track application events. Holds references to a set of backends that will be used to persist any events that are emitted.
-
backends
¶ The dictionary of registered backends
-
context
(name, ctx)[source]¶ Execute the block with the given context applied. This manager ensures that the context is removed even if an exception is raised within the context.
-
emit
(name=None, data=None)[source]¶ Emit an event annotated with the UTC time when this function was called.
- name is a unique identification string for an event that has
- already been registered.
- data is a dictionary mapping field names to the value to include in the event.
- Note that all values provided must be serializable.
-
enter_context
(name, ctx)[source]¶ Enter a named context. Any events emitted after calling this method will contain all of the key-value pairs included in ctx unless overridden by a context that is entered after this call.
-
exit_context
(name)[source]¶ Exit a named context. This will remove all key-value pairs associated with this context from any events emitted after it is removed.
-
located_context
¶ The thread local context for this tracker.
-
processors
¶ The list of registered processors
-
-
eventtracking.tracker.
emit
(name=None, data=None)[source]¶ Calls Tracker.emit on the default global tracker
eventtracking.locator¶
Strategies for locating contexts. Allows for arbitrarily complex caching and context differentiation strategies.
All context locators must implement a get method that returns an OrderedDict-like object.
-
class
eventtracking.locator.
DefaultContextLocator
[source]¶ Bases:
object
One-to-one mapping between contexts and trackers. Every tracker will get a new context instance and it will always be returned by this locator.
-
class
eventtracking.locator.
ThreadLocalContextLocator
[source]¶ Bases:
object
Returns a different context depending on the thread that the locator was called from. Thus, contexts can be isolated from one another on thread boundaries.
Note that this makes use of threading.local(), which is typically monkey-patched by alternative python concurrency frameworks (like gevent).
Calls to threading.local() are delayed until first usage in order to give the third-party concurrency libraries an opportunity to monkey monkey patch it.